Skip to content

Latest commit

 

History

History
91 lines (63 loc) · 2.84 KB

File metadata and controls

91 lines (63 loc) · 2.84 KB

Chat Mode — Interactive AI-Powered CLI for Agent Testing

Problem: CLI commands are hard to remember, especially when you're iterating quickly on agent tests.

Solution: EvalView's chat mode lets you interact with your test suite using natural language. Ask "did my refactor break anything?" and get inline answers with regression comparisons.

Don't remember commands? Just ask.

EvalView Chat Demo

evalview chat

Chat mode understands natural language AND knows all EvalView commands:

  • "Run my stock analysis test" → Suggests /run stock-test.yaml
  • "Compare yesterday's run with today" → Runs /compare for you
  • "What adapters do I have?" → Lists available adapters
  • "Show me the trace from my last test" → Displays execution trace

Slash Commands

Command Description
/run <file> Run a test case against its adapter
/test <adapter> <query> Quick ad-hoc test against an adapter
/compare <old> <new> Compare two test runs, detect regressions
/adapters List available adapters
/trace View execution trace from last run
/help Show all commands

Natural Language Execution

When the LLM suggests a command, it asks if you want to run it:

You: How do I test my LangGraph agent?

Claude: To test your LangGraph agent, you can run:
  `/test langgraph "What's the weather?"`

Would you like me to run this command? [y/n]

Provider Options

Free & local — powered by Ollama. No API key needed.

evalview chat                     # Auto-detects Ollama
evalview chat --provider openai   # Or use cloud models
evalview chat --provider anthropic

Quick Comparison

Use the /compare command for side-by-side regression detection:

evalview chat
> /compare .evalview/results/old.json .evalview/results/new.json

Output:

┌─────────────────┬───────────┬───────────┬────────┬──────────┐
│ Test            │ Old Score │ New Score │ Δ      │ Status   │
├─────────────────┼───────────┼───────────┼────────┼──────────┤
│ stock-analysis  │ 92.5      │ 94.0      │ +1.5   │ ✅ OK    │
│ customer-support│ 88.0      │ 71.0      │ -17.0  │ REGR  │
└─────────────────┴───────────┴───────────┴────────┴──────────┘

Related Documentation