A demonstration application showcasing LLM observability with OpenTelemetry and Grafana Cloud.
This project is an SE "art of the possible" demo that shows how to instrument an AI agent application with full observability - tracking costs, performance, and execution flow. Not intended for production use. The demo implements a customer support chatbot with:
- Multi-agent architecture (Supervisor/Router pattern)
- Human-in-the-Loop approval for sensitive operations
- Full OpenTelemetry instrumentation for LLM observability
- Grafana Cloud integration with pre-built dashboard
- Multi-provider LLM support (OpenAI, Anthropic, Google, DeepSeek)
flowchart TD
Start((Start)) --> supervisor
subgraph Routing["🎯 Supervisor Router"]
supervisor{{"Supervisor<br/>(GPT-4o-mini)"}}
end
supervisor -->|"music query"| music_expert
supervisor -->|"support query"| support_rep
subgraph Music["🎵 Music Expert"]
music_expert["Music Expert<br/>(GPT-4o-mini)"]
music_tools[["🔧 Music Tools<br/>• get_albums_by_artist<br/>• get_tracks_by_artist<br/>• check_for_songs<br/>• get_artists_by_genre<br/>• list_genres"]]
music_expert -->|"needs data"| music_tools
music_tools --> music_expert
end
subgraph Support["💼 Support Rep"]
support_rep["Support Rep<br/>(GPT-4o-mini)"]
support_tools[["🔧 Safe Tools<br/>• get_invoice<br/>• get_customer_profile"]]
refund_tools[["⚠️ HITL Tools<br/>• process_refund"]]
support_rep -->|"safe operation"| support_tools
support_rep -->|"refund request"| hitl
support_tools --> support_rep
subgraph HITL["🛑 Human-in-the-Loop"]
hitl{{"Interrupt<br/>for Approval"}}
hitl -->|"approved"| refund_tools
end
refund_tools --> support_rep
end
music_expert -->|"done"| End((End))
support_rep -->|"done"| End
style supervisor fill:#4a90d9,stroke:#2d5a87,color:#fff
style music_expert fill:#50c878,stroke:#2d7a4a,color:#fff
style support_rep fill:#f5a623,stroke:#c77d0a,color:#fff
style hitl fill:#e74c3c,stroke:#a93226,color:#fff
style music_tools fill:#e8f5e9,stroke:#81c784,color:#1a3d1a
style support_tools fill:#fff3e0,stroke:#ffb74d,color:#5d4e37
style refund_tools fill:#ffebee,stroke:#ef5350,color:#7a1f1f
| Component | Model | Purpose |
|---|---|---|
| Supervisor | GPT-4o-mini | Routes requests to Music Expert or Support Rep |
| Music Expert | GPT-4o-mini | Catalog queries - albums, tracks, artists, genres |
| Support Rep | GPT-4o-mini | Account info, invoices, refunds |
| HITL Gate | — | Requires human approval for refunds |
- Python 3.12+ (or use
uv python install 3.12) - uv package manager
- OpenAI API key (or Anthropic/Google)
- Grafana Cloud account (free tier works great)
Clone and install:
git clone https://github.com/scarolan/music_store_assistant
cd music_store_assistant
uv syncDownload the Chinook database:
curl -o Chinook.db https://github.com/lerocha/chinook-database/raw/master/ChinookDatabase/DataSources/Chinook_Sqlite.sqliteCreate a .env file with your configuration (see .env.example for full options):
# Required: LLM Provider
OPENAI_API_KEY=your-openai-key-here
# Required: OTEL Tracing to Grafana Cloud
OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp-gateway-prod-us-central-0.grafana.net/otlp
OTEL_EXPORTER_OTLP_HEADERS=Authorization=Basic%20<your-base64-credentials>
OTEL_SERVICE_NAME=music-store-assistantGetting OTEL credentials:
- Go to Grafana Cloud → Connections → OpenTelemetry
- Copy Instance ID and generate API token
- Base64 encode:
echo -n "instance_id:api_token" | base64 - Use format:
Authorization=Basic%20<result>
uv run uvicorn src.api:app --host 0.0.0.0 --port 8080Then open:
- Customer chat: http://localhost:8080
- Admin dashboard (HITL approvals): http://localhost:8080/admin
Import the pre-built dashboard to visualize your LLM application metrics:
- In Grafana Cloud, go to Dashboards → New → Import
- Click Upload dashboard JSON file
- Select
llm_o11y_dashboard.jsonfrom this repository - Choose your Prometheus and Tempo data sources
- Click Import
The dashboard provides:
- Token usage and costs by agent, model, and conversation
- Performance metrics (latency P50/P95/P99, request rates)
- Error tracking with failure rates and types
- Model distribution showing which models handle requests
Explore individual conversation traces:
- Go to your Grafana Cloud instance → Explore → Tempo
- Query:
{service.name="music-store-assistant"} - Click on any trace to see the full execution flow:
- Supervisor routing decisions
- Agent selection (Music Expert vs Support Rep)
- Tool executions with inputs/outputs
- LLM calls with token counts
- Complete conversation hierarchy
Models are configured via environment variables with provider auto-detection:
| Agent | Default Model | Why |
|---|---|---|
| Supervisor | gpt-4o-mini | Fast routing decisions |
| Music Expert | gpt-4o-mini | Consistent, reliable responses |
| Support Rep | gpt-4o-mini | Reliable for account operations |
# Override any model:
export MUSIC_EXPERT_MODEL=claude-3-5-haiku-20241022 # Anthropic
export MUSIC_EXPERT_MODEL=gpt-4o # OpenAI
export MUSIC_EXPERT_MODEL=deepseek-chat # Budget optionAvailable env vars: SUPERVISOR_MODEL, MUSIC_EXPERT_MODEL, SUPPORT_REP_MODEL
Provider is auto-detected from model name prefix (gpt-*, claude-*, gemini-*, deepseek-*).
Quick Start (launches server + continuous traffic generation):
demo/start_demo.sh # Starts server + generates traffic for 30 minutes
demo/stop_demo.sh # Stops everything cleanlyManual Start:
uv run uvicorn src.api:app --reload --host 0.0.0.0 --port 8000Then open http://localhost:8000 for the customer chat interface, or http://localhost:8000/admin for the HITL approval dashboard.
Validate Setup:
demo/preflight_check.sh # Checks all prerequisites (doesn't start anything)Monitor Traffic:
tail -f /tmp/continuous-traffic.log # Watch continuous traffic generationfrom src.graph import create_graph
graph = create_graph()
# customer_id is passed via context= (secure, not in state)
result = graph.invoke(
{"messages": [("user", "What albums does AC/DC have?")]},
config={},
context={"customer_id": 16}
)uv run python -m src.cliRun the full test suite:
uv run pytestRun specific tests:
uv run pytest -v -k test_refund # HITL flow tests
uv run pytest -v -k test_music # Music expert testsThis is a demonstration project for Grafana Labs. Issues and pull requests are welcome!
See LICENSE for details.
MIT License - see LICENSE file for details.
- Issues: GitHub Issues
- Grafana Community: Community Slack
- Contact: Sean Carolan (@scarolan)
Once running, you'll have full visibility into your LLM application:
- Token usage and costs - Track spend per conversation, agent, and model
- Performance metrics - P50/P95/P99 latency for each operation
- Trace hierarchy - See every LLM call, tool execution, and state transition
- Error tracking - Identify and debug failures with full context
- Custom dashboard - Pre-built panels for key metrics
Try these in the chat interface:
- "What albums does AC/DC have?" (routes to Music Expert)
- "Show me my recent orders" (routes to Support Rep)
- "I want a refund for invoice 98" (triggers HITL approval)
Watch the traces appear in Grafana in real-time!
├── src/
│ ├── graph.py # LangGraph definition + model factory
│ ├── state.py # State schemas
│ ├── api.py # FastAPI backend + HITL management
│ ├── cli.py # Interactive CLI
│ ├── otel.py # 🔭 OpenTelemetry configuration + filtering
│ ├── utils.py # Database utilities
│ └── tools/
│ ├── music.py # Read-only catalog tools
│ └── support.py # Sensitive write tools (HITL)
├── static/
│ ├── index.html # Customer chat interface
│ └── admin.html # HITL approval dashboard
├── tests/ # Pytest suite (80+ tests)
├── llm_o11y_dashboard.json # 📈 Grafana dashboard (import me!)
├── CLAUDE.md # AI assistant context guide
├── Chinook.db # SQLite music catalog
└── .env.example # Configuration template
- CLAUDE.md - Comprehensive codebase guide for AI assistants
- ARCHITECTURE.md - Detailed system architecture and patterns
- Supervisor/Router: Intent classification and routing
- Specialized agents: Music Expert (read-only) and Support Rep (with HITL)
- Tool calling: Database queries and business logic
- Human-in-the-Loop: Approval workflow for sensitive operations
- OpenTelemetry instrumentation: Auto-instrumentation with OpenInference
- Attribute filtering: Keeps traces lean (5-10KB vs 50-100KB raw)
- Grafana Cloud export: OTLP/HTTP to Tempo
- Pre-built dashboard: Token costs, latency, errors, model distribution
- Context schema: Customer ID passed securely (not in LLM-accessible state)
- Scoped queries: Tools automatically filter by authenticated customer
- HITL gate: Only sensitive operations require approval