You're an ML engineer. You've perfected deploying scikit-learn
models and wrangling PyTorch jobs. Your MLOps stack is dialed in. But now, you're being asked to build and ship AI agents, and suddenly your trusted toolkit is starting to crack.
The Adaptation Struggle: Your MLOps habits (rigorous testing, versioning, CI/CD) donβt map cleanly onto agent development. How do you version a prompt? How do you regression test a non-deterministic system? The tools that gave you confidence for models now create friction for agents.
The Divided Stack: To cope, teams are building a second, parallel stack just for LLM-based systems. Now youβre maintaining two sets of tools, two deployment pipelines, and two mental models. Your classical models live in one world, your agents in another. It's expensive, complex, and slows everyone down.
The Broken Feedback Loop: Getting an agent from your local environment to production is a slow, painful journey. By the time you get feedback on performance, cost, or quality, the requirements have already changed. Iteration is a guessing game, not a data-driven process.
Stop maintaining two separate worlds. ZenML is a unified MLOps framework that extends the battle-tested principles you rely on for classical ML to the new world of AI agents. Itβs one platform to develop, evaluate, and deploy your entire AI portfolio.
# Morning: Your sklearn pipeline is still versioned and reproducible. train_and_deploy_classifier() # Afternoon: Your new agent evaluation pipeline uses the same logic. evaluate_and_deploy_agent() # Same platform. Same principles. New possibilities.
With ZenML, you're not replacing your knowledge; you're extending it. Use the pipelines and practices you already know to version, test, deploy, and monitor everything from classic models to the most advanced agents.
π» See It In Action: Multi-Agent Architecture ComparisonThe Challenge: Your team built three different customer service agents. Which one should go to production? With ZenML, you can build a reproducible pipeline to test them on real data and make a data-driven decision, with full observability via Langgraph, LiteLLM & Langfuse.
zenml_demo_comp.mp4from zenml import pipeline, step from zenml.types import HTMLString import pandas as pd @step def load_real_conversations() -> pd.DataFrame: """Load customer service queries for testing.""" return load_customer_queries() @step def train_intent_classifier(queries: pd.DataFrame): """Train a scikit-learn classifier alongside your agents.""" return train_sklearn_pipeline(queries) @step def load_prompts() -> dict: """Load prompts as versioned ZenML artifacts.""" return load_agent_prompts_from_files() @step def run_architecture_comparison(queries: pd.DataFrame, classifier, prompts: dict) -> tuple: """Test three different agent architectures on the same data.""" architectures = { "single_agent": SingleAgentRAG(prompts), "multi_specialist": MultiSpecialistAgents(prompts), "langgraph_workflow": LangGraphAgent(prompts) # Real LangGraph implementation! } # ZenML automatically versions agent code, prompts, and configurations # LiteLLM provides unified access to 100+ LLM providers # Langgraph orchestrates a multi-agent graph # Langfuse tracks costs, performance, and traces for full observability results = test_all_architectures(queries, architectures) mermaid_diagram = generate_langgraph_visualization() return results, mermaid_diagram @step def evaluate_and_decide(queries: pd.DataFrame, results: dict) -> HTMLString: """Generate beautiful HTML report with winner selection.""" return create_styled_comparison_report(results) @pipeline def compare_agent_architectures(): """Data-driven agent architecture decisions with full MLOps tracking.""" queries = load_real_conversations() prompts = load_prompts() # Prompts as versioned artifacts classifier = train_intent_classifier(queries) results, viz = run_architecture_comparison(queries, classifier, prompts) report = evaluate_and_decide(queries, results) if __name__ == "__main__": compare_agent_architectures() # π― Rich visualizations automatically appear in ZenML dashboard
π See the complete working example β
The Result: A clear winner is selected based on data, not opinions. You have full lineage from the test data and agent versions to the final report and deployment decision.
π Get Started (5 minutes) ποΈ Architecture OverviewZenML uses a client-server architecture with an integrated web dashboard (zenml-io/zenml-dashboard) for pipeline visualization and management:
pip install "zenml[server]"
- runs both client and server locallypip install zenml
+ zenml login <server-url>
# Install ZenML with server capabilities pip install "zenml[server]" # Install required dependencies pip install scikit-learn openai numpy # Initialize your ZenML repository zenml init # Start local server or connect to a remote one zenml login # Set OpenAI API key (optional) export OPENAI_API_KEY=sk-svv....Your First Pipeline (2 minutes)
# simple_pipeline.py from zenml import pipeline, step from sklearn.ensemble import RandomForestClassifier from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score from typing import Tuple from typing_extensions import Annotated import numpy as np @step def create_dataset() -> Tuple[ Annotated[np.ndarray, "X_train"], Annotated[np.ndarray, "X_test"], Annotated[np.ndarray, "y_train"], Annotated[np.ndarray, "y_test"] ]: """Generate a simple classification dataset.""" X, y = make_classification(n_samples=100, n_features=4, n_classes=2, random_state=42) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) return X_train, X_test, y_train, y_test @step def train_model(X_train: np.ndarray, y_train: np.ndarray) -> RandomForestClassifier: """Train a simple sklearn model.""" model = RandomForestClassifier(n_estimators=10, random_state=42) model.fit(X_train, y_train) return model @step def evaluate_model(model: RandomForestClassifier, X_test: np.ndarray, y_test: np.ndarray) -> float: """Evaluate the model accuracy.""" predictions = model.predict(X_test) return accuracy_score(y_test, predictions) @step def generate_summary(accuracy: float) -> str: """Use OpenAI to generate a model summary.""" import openai client = openai.OpenAI() # Set OPENAI_API_KEY environment variable response = client.chat.completions.create( model="gpt-3.5-turbo", messages=[{ "role": "user", "content": f"Write a brief summary of a ML model with {accuracy:.2%} accuracy." }], max_tokens=50 ) return response.choices[0].message.content @pipeline def simple_ml_pipeline(): """A simple pipeline combining sklearn and OpenAI.""" X_train, X_test, y_train, y_test = create_dataset() model = train_model(X_train, y_train) accuracy = evaluate_model(model, X_test, y_test) try: import openai # noqa: F401 generate_summary(accuracy) except ImportError: print("OpenAI is not installed. Skipping summary generation.") if __name__ == "__main__": result = simple_ml_pipeline()
Run it:
export OPENAI_API_KEY="your-api-key-here" python simple_pipeline.pyπ£οΈ Chat With Your Pipelines: ZenML MCP Server
Stop clicking through dashboards to understand your ML workflows. The ZenML MCP Server lets you query your pipelines, analyze runs, and trigger deployments using natural language through Claude Desktop, Cursor, or any MCP-compatible client.
π¬ "Which pipeline runs failed this week and why?"
π "Show me accuracy metrics for all my customer churn models"
π "Trigger the latest fraud detection pipeline with production data"
Quick Setup:
.dxt
file from zenml-io/mcp-zenmlThe MCP (Model Context Protocol) integration transforms your ZenML metadata into conversational insights, making pipeline debugging and analysis as easy as asking a question. Perfect for teams who want to democratize access to ML operations without requiring dashboard expertise.
πΌοΈ Getting Started ResourcesThe best way to learn about ZenML is through our comprehensive documentation and tutorials:
For visual learners, start with this 11-minute introduction:
For Teams:
Infrastructure Requirements:
ZenML is featured in these comprehensive guides to production AI systems.
π€ Join ML Engineers Building the Future of AIContribute:
good-first-issue
Stay Updated:
Q: "Do I need to rewrite my agents or models to use ZenML?"
A: No. Wrap your existing code in a @step
. Keep using scikit-learn
, PyTorch, LangGraph, LlamaIndex, or raw API calls. ZenML orchestrates your tools, it doesn't replace them.
Q: "How is this different from LangSmith/Langfuse?"
A: They provide excellent observability for LLM applications. We orchestrate the full MLOps lifecycle for your entire AI stack. With ZenML, you manage both your classical ML models and your AI agents in one unified framework, from development and evaluation all the way to production deployment.
Q: "Can I use my existing MLflow/W&B setup?"
A: Yes! ZenML integrates with both MLflow and Weights & Biases. Your experiments, our pipelines.
Q: "Is this just MLflow with extra steps?"
A: No. MLflow tracks experiments. We orchestrate the entire development process β from training and evaluation to deployment and monitoring β for both models and agents.
Q: "How do I configure ZenML with Kubernetes?"
A: ZenML integrates with Kubernetes through the native Kubernetes orchestrator, Kubeflow, and other K8s-based orchestrators. See our Kubernetes orchestrator guide and Kubeflow guide, plus deployment documentation.
Q: "What about cost? I can't afford another platform."
A: ZenML's open-source version is free forever. You likely already have the required infrastructure (like a Kubernetes cluster and object storage). We just help you make better use of it for MLOps.
Manage pipelines directly from your editor:
π₯οΈ VS Code Extension in Action!Install from VS Code Marketplace.
ZenML is distributed under the terms of the Apache License Version 2.0. See LICENSE for details.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4