A hands-on guide to evaluating LLM agents with LangSmith. This cookbook walks through four evaluation patterns — final response, single step, trajectory, and multi-turn — using real agent examples built with LangGraph.
| Notebook | Evaluation Patterns | Description |
|---|---|---|
email_basic.ipynb |
Final Response, Single Step, Trajectory | Evaluate an email triage-and-response agent using LangChain tools |
email_mcp.ipynb |
Final Response, Single Step, Trajectory | Same evaluation patterns applied to an agent that leverages MCP-based tools |
multi_thread.ipynb |
Multi-Turn Simulation | Simulate multi-turn conversations and evaluate a customer service multi-agent system |
Evaluate the complete agent output against success criteria using an LLM-as-judge.
Evaluate individual agent steps (e.g., triage classification) using exact-match metrics.
Evaluate the sequence of tool calls made by the agent against expected trajectories.
Simulate multi-turn conversations with synthetic users and evaluate across dimensions like resolution, satisfaction, and professionalism.
git clone https://github.com/xuro-langchain/eval-concepts.git
cd eval-conceptsCopy the example .env file and fill in your API keys:
cp .env.example .envUsing uv (recommended):
uv syncUsing pip:
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtLaunch Jupyter and open any notebook in the notebooks/ directory:
jupyter notebook notebooks/Recommended order:
email_basic.ipynb— Core evaluation patterns (final response, single step, trajectory)email_mcp.ipynb— Same patterns with MCP tool integrationmulti_thread.ipynb— Multi-turn simulation evaluations
eval-concepts/
├── notebooks/ # Evaluation tutorial notebooks
│ ├── email_basic.ipynb # Core eval patterns
│ ├── email_mcp.ipynb # MCP variant of email evaluations
│ └── multi_thread.ipynb # Multi-turn simulation evaluations
├── agents/ # Agent implementations
│ ├── email_basic.py # Email agent with LangChain tools
│ ├── email_mcp.py # Email agent with MCP tools
│ └── multi_basic.py # Multi-agent customer service system
├── tools/ # Tool definitions
├── utils/ # Helper utilities and prompts
├── images/ # Diagrams used in notebooks
├── config.py # LangSmith client configuration
└── .env.example # Required environment variables




