LangSmith Agent Evaluation Cookbook

A hands-on guide to evaluating LLM agents with LangSmith. This cookbook walks through four evaluation patterns — final response, single step, trajectory, and multi-turn — using real agent examples built with LangGraph.

What You'll Learn

Notebook	Evaluation Patterns	Description
`email_basic.ipynb`	Final Response, Single Step, Trajectory	Evaluate an email triage-and-response agent using LangChain tools
`email_mcp.ipynb`	Final Response, Single Step, Trajectory	Same evaluation patterns applied to an agent that leverages MCP-based tools
`multi_thread.ipynb`	Multi-Turn Simulation	Simulate multi-turn conversations and evaluate a customer service multi-agent system

Evaluation Patterns

Final Response Evaluations

Evaluate the complete agent output against success criteria using an LLM-as-judge.

Single Step Evaluations

Evaluate individual agent steps (e.g., triage classification) using exact-match metrics.

Trajectory Evaluations

Evaluate the sequence of tool calls made by the agent against expected trajectories.

Multi-Turn Evaluations

Simulate multi-turn conversations with synthetic users and evaluate across dimensions like resolution, satisfaction, and professionalism.

Getting Started

Prerequisites

Python 3.11+
A LangSmith account (LANGCHAIN_API_KEY)
An OpenAI account (OPENAI_API_KEY)

1. Clone the repository

git clone https://github.com/xuro-langchain/eval-concepts.git
cd eval-concepts

2. Set up environment variables

Copy the example .env file and fill in your API keys:

cp .env.example .env

3. Install dependencies

Using uv (recommended):

uv sync

Using pip:

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

4. Run the notebooks

Launch Jupyter and open any notebook in the notebooks/ directory:

jupyter notebook notebooks/

Recommended order:

email_basic.ipynb — Core evaluation patterns (final response, single step, trajectory)
email_mcp.ipynb — Same patterns with MCP tool integration
multi_thread.ipynb — Multi-turn simulation evaluations

Project Structure

eval-concepts/
├── notebooks/               # Evaluation tutorial notebooks
│   ├── email_basic.ipynb        # Core eval patterns
│   ├── email_mcp.ipynb          # MCP variant of email evaluations
│   └── multi_thread.ipynb       # Multi-turn simulation evaluations
├── agents/                  # Agent implementations
│   ├── email_basic.py           # Email agent with LangChain tools
│   ├── email_mcp.py             # Email agent with MCP tools
│   └── multi_basic.py           # Multi-agent customer service system
├── tools/                   # Tool definitions
├── utils/                   # Helper utilities and prompts
├── images/                  # Diagrams used in notebooks
├── config.py                # LangSmith client configuration
└── .env.example             # Required environment variables

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
agents		agents
images		images
notebooks		notebooks
tools		tools
utils		utils
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
config.py		config.py
langgraph.json		langgraph.json
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LangSmith Agent Evaluation Cookbook

What You'll Learn

Evaluation Patterns

Final Response Evaluations

Single Step Evaluations

Trajectory Evaluations

Multi-Turn Evaluations

Getting Started

Prerequisites

1. Clone the repository

2. Set up environment variables

3. Install dependencies

4. Run the notebooks

Project Structure

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

langchain-samples/eval-concepts

Folders and files

Latest commit

History

Repository files navigation

LangSmith Agent Evaluation Cookbook

What You'll Learn

Evaluation Patterns

Final Response Evaluations

Single Step Evaluations

Trajectory Evaluations

Multi-Turn Evaluations

Getting Started

Prerequisites

1. Clone the repository

2. Set up environment variables

3. Install dependencies

4. Run the notebooks

Project Structure

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages