Skip to content

langchain-samples/eval-concepts

Repository files navigation

LangSmith Agent Evaluation Cookbook

A hands-on guide to evaluating LLM agents with LangSmith. This cookbook walks through four evaluation patterns — final response, single step, trajectory, and multi-turn — using real agent examples built with LangGraph.

Evaluation Concepts

What You'll Learn

Notebook Evaluation Patterns Description
email_basic.ipynb Final Response, Single Step, Trajectory Evaluate an email triage-and-response agent using LangChain tools
email_mcp.ipynb Final Response, Single Step, Trajectory Same evaluation patterns applied to an agent that leverages MCP-based tools
multi_thread.ipynb Multi-Turn Simulation Simulate multi-turn conversations and evaluate a customer service multi-agent system

Evaluation Patterns

Final Response Evaluations

Evaluate the complete agent output against success criteria using an LLM-as-judge.

Final Response Evaluation

Single Step Evaluations

Evaluate individual agent steps (e.g., triage classification) using exact-match metrics.

Single Step Evaluation

Trajectory Evaluations

Evaluate the sequence of tool calls made by the agent against expected trajectories.

Trajectory Evaluation

Multi-Turn Evaluations

Simulate multi-turn conversations with synthetic users and evaluate across dimensions like resolution, satisfaction, and professionalism.

Multi-Turn Evaluation

Getting Started

Prerequisites

  • Python 3.11+
  • A LangSmith account (LANGCHAIN_API_KEY)
  • An OpenAI account (OPENAI_API_KEY)

1. Clone the repository

git clone https://github.com/xuro-langchain/eval-concepts.git
cd eval-concepts

2. Set up environment variables

Copy the example .env file and fill in your API keys:

cp .env.example .env

3. Install dependencies

Using uv (recommended):

uv sync

Using pip:

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

4. Run the notebooks

Launch Jupyter and open any notebook in the notebooks/ directory:

jupyter notebook notebooks/

Recommended order:

  1. email_basic.ipynb — Core evaluation patterns (final response, single step, trajectory)
  2. email_mcp.ipynb — Same patterns with MCP tool integration
  3. multi_thread.ipynb — Multi-turn simulation evaluations

Project Structure

eval-concepts/
├── notebooks/               # Evaluation tutorial notebooks
│   ├── email_basic.ipynb        # Core eval patterns
│   ├── email_mcp.ipynb          # MCP variant of email evaluations
│   └── multi_thread.ipynb       # Multi-turn simulation evaluations
├── agents/                  # Agent implementations
│   ├── email_basic.py           # Email agent with LangChain tools
│   ├── email_mcp.py             # Email agent with MCP tools
│   └── multi_basic.py           # Multi-agent customer service system
├── tools/                   # Tool definitions
├── utils/                   # Helper utilities and prompts
├── images/                  # Diagrams used in notebooks
├── config.py                # LangSmith client configuration
└── .env.example             # Required environment variables

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •