marp	true
theme	vibeminds
paginate	true
style	/* Mermaid diagram styling / .mermaid-container { display: flex; justify-content: center; align-items: center; width: 100%; margin: 0.5em 0; } .mermaid { text-align: center; } .mermaid svg { max-height: 280px; width: auto; } .mermaid .node rect, .mermaid .node polygon { rx: 5px; ry: 5px; } .mermaid .nodeLabel { padding: 0 10px; } / Two-column layout / .columns { display: flex; gap: 40px; align-items: flex-start; } .column-left { flex: 1; } .column-right { flex: 1; } .column-left .mermaid svg { min-height: 400px; height: auto; max-height: 500px; } / Section divider slides */ section.section-divider { display: flex; flex-direction: column; justify-content: center; align-items: center; text-align: center; background: linear-gradient(135deg, #1a1a3e 0%, #4a3f8a 50%, #2d2d5a 100%); } section.section-divider h1 { font-size: 3.5em; margin-bottom: 0.2em; } section.section-divider h2 { font-size: 1.5em; color: #b39ddb; font-weight: 400; } section.section-divider p { font-size: 1.1em; color: #9575cd; margin-top: 1em; }

Statistics Agent Team

Building a Multi-Agent System for Verified Statistics

A Production-Ready Implementation

Built with Google ADK, Eino, and Multi-LLM Support

Section 1

Introduction & Problem Statement

Understanding the challenge of verified statistics

The Problem 🎯

Challenge: Finding verified, numerical statistics from reputable web sources

Pain Points

❌ LLMs hallucinate statistics and sources
❌ URLs from LLM memory are often outdated or wrong
❌ No verification that excerpts actually exist
❌ Hard to distinguish reputable vs unreliable sources

Goal: Build a system that provides provably accurate statistics

Requirements 📋

Functional Requirements

✅ Search web for statistics on any topic
✅ Extract numerical values with context
✅ Verify excerpts exist in source documents
✅ Validate numerical accuracy
✅ Prioritize reputable sources (.gov, .edu, research orgs)

Non-Functional Requirements

✅ 60-90% verification rate (vs 0% for direct LLM)
✅ Response time: under 60 seconds
✅ Support multiple LLM providers
✅ Containerized deployment

Section 2

Architecture & Agent Design

Four specialized agents working together

Architecture Overview 🏗️

4 Specialized Agents with dual protocol support:

Research - Tool-based (Search API) - HTTP :8001 | A2A :9001
Synthesis - LLM-based extraction - HTTP :8004 | A2A :9004
Verification - LLM-based validation - HTTP :8002 | A2A :9002
Orchestration - Graph-based workflow - HTTP :8000 | A2A :9000

Orchestration Pattern 🔀

Graph-Based Orchestration (not inter-agent communication)

What This Means

✅ Hub-and-spoke: Orchestrator coordinates all communication
✅ Sequential pipeline: Predictable execution order
✅ Easy to debug: Clear data flow, reproducible behavior
❌ No peer-to-peer: Agents don't message each other directly
❌ No negotiation: No agent-to-agent collaboration protocols

Trade-off: Predictability over flexibility (right choice for production)

Dual Protocol: HTTP + A2A 🔗

Every agent exposes both protocols simultaneously

Protocol	Ports	Purpose
HTTP	800x	Custom security (SPIFFE, KYA, XAA), observability
A2A	900x	Standard agent interoperability (Google protocol)

A2A Endpoints per Agent

GET /.well-known/agent-card.json - Agent discovery
POST /invoke - JSON-RPC execution

Why Both?

✅ A2A: Standard protocol, agent discovery, interoperability
✅ HTTP: Flexibility for security layers, LLM observability
✅ Compare: Evaluate implementation complexity side-by-side

Configuration: A2A_ENABLED=true activates A2A servers

Agent Types Summary 🧩

Agent	Type	Technology	Why?
Orchestrator	⚙️ Graph	Eino workflow	Deterministic, predictable
Research	⚡ Tool	Serper/SerpAPI	No reasoning needed
Synthesis	🧠 LLM	Gemini/Claude/etc	Language understanding
Verification	🧠 LLM	Gemini/Claude/etc	Fuzzy text matching

Key Insight: Use the right tool for each job

❌ Don't force everything through an LLM
✅ Graph for coordination (fast, predictable)
✅ Tool for API calls (simple, reliable)
✅ LLM for language tasks (intelligent, flexible)

Frameworks: ADK Now Justified ✅

Original Question: ADK is for inter-agent communication. Do we need it?

Answer: Yes! A2A protocol support requires ADK.

Agent	ADK Role	A2A Benefit
Synthesis	LLM + A2A server	Standard invocation
Verification	LLM + A2A server	Agent discovery
Research	Tool wrapper + A2A	Interoperability
Orchestrator	Eino wrapped in ADK	A2A compatibility

What ADK Provides for A2A

adka2a.NewExecutor() - Bridges ADK to A2A protocol
adka2a.BuildAgentSkills() - Generates agent card skills
remoteagent.NewA2A() - A2A client for calling remote agents

Verdict: ADK is the right choice. A2A support justifies the framework.

Agent 1: Research Agent 🔍

Responsibility: Find relevant web sources

Implementation (Google ADK)

No LLM required (pure search)
Integrates with Serper/SerpAPI via omniserp library
Filters for reputable domains
Returns 30 URLs by default

Key Decision: Separate search from extraction

Allows caching of search results
Different providers don't need LLM changes
Faster iteration on search queries

Port: 8001

Agent 2: Synthesis Agent 📊

Responsibility: Extract statistics from web pages

Implementation (Google ADK)

Fetches webpage content (30K chars per page)
LLM analyzes text for numerical statistics
Extracts verbatim excerpts
Processes 15+ pages for comprehensive coverage
Returns candidates with metadata

Key Challenge: Getting complete extraction

❌ Initial: Only returned 5-8 statistics
✅ Solution: Increased pages (5→15), content (15K→30K), multiplier (2x→5x)

Port: 8004

Synthesis Agent: Key Learnings 💡

Problem: Low statistical yield (5-8 stats vs ChatGPT's 20+)

Root Cause Analysis

Too few pages processed (only 5)
Too little content per page (15K chars)
Too conservative multiplier (2x)

Solution - Aggressive extraction:

minPagesToProcess := 15  // (increased from 5)
maxContentLen := 30000   // (increased from 15K)
multiplier := 5          // (increased from 2x)

Result: Now matches ChatGPT.com performance!

Agent 3: Verification Agent ✅

Responsibility: Validate statistics against sources

Implementation (Google ADK)

Re-fetches source URLs
Checks excerpts exist verbatim
Validates numerical values match exactly
Uses light LLM assistance for fuzzy matching
Returns pass/fail with detailed reasons

Key Decision: Always fetch original source

No trusting LLM claims
Catches hallucinations
Verifies pages haven't changed

Port: 8002

Agent 4: Orchestration Agent 🎭

Two Implementations Available

Option A: Google ADK (LLM-driven)

Uses LLM to decide workflow steps
Adaptive retry logic
More flexible but slower

Option B: Eino (Deterministic) ⭐ RECOMMENDED

Type-safe graph-based workflow
Predictable, reproducible behavior
Faster and lower cost
No LLM for orchestration decisions

Both run on Port 8000 (choose one)

Eino Orchestration Flow 🔄

flowchart TB A["ValidateInput"] --> B["Research
30 URLs"] B --> C["Synthesis
15+ pages -> candidates"] C --> D["Verification
validate each"] D --> E["QualityCheck
>= min verified?"] E --> F["FormatOutput"] --> G["User"]

Why Eino?

Type-safe operations
No non-deterministic LLM decisions
Easier to debug and test
Production-ready reliability

Benefits:

Predictable execution path
No hidden LLM costs for orchestration
Easy to trace and monitor
Reproducible results

Section 3

Technical Challenges & Solutions

From LLM hallucinations to multi-provider support

Challenge 1: Direct Mode Failure ⚠️

Initial Idea: Let LLM answer from memory (like ChatGPT)

Implementation

./stats-agent search "AI trends" --direct

The Problem

LLM returns statistics from training data (up to Jan 2025)
URLs are guessed - not from real search
Pages have moved, changed, or are paywalled
0% verification rate when validated

The Lesson: Real-time web search is essential for statistics

Direct Mode vs ChatGPT.com 📊

Same Query: "AI trends"

System	Statistics Found	Verification Rate	Why?
ChatGPT.com	20+	✅ 90%+	Real-time Bing search
Direct Mode	10	❌ 0%	LLM memory (outdated URLs)
Pipeline Mode	15-25	✅ 60-90%	Real-time Google search

Key Insight: ChatGPT.com's success comes from web search, not just LLM quality!

Our Solution: Pipeline mode with Serper/SerpAPI

Solution: Pipeline Mode ✅

What We Changed

Made Pipeline mode the default
Added warnings to Direct mode docs
Implemented hybrid mode (Direct + Verification)

README Warning

⚠️ Direct Mode - Not Recommended for Statistics
- ❌ Uses LLM memory (training data)
- ❌ Outdated URLs
- ❌ 0% verification rate

✅ For statistics, use Pipeline mode instead

Result: Clear expectations, better user experience

Challenge 2: Multi-LLM Support 🔧

Requirement: Support multiple LLM providers

Supported Providers

Google Gemini (default) - gemini-2.5-flash / gemini-2.5-pro
Anthropic Claude - claude-sonnet-4-20250514 / claude-opus-4-1-20250805
OpenAI - gpt-4o / gpt-5
xAI Grok - grok-4-1-fast-reasoning / grok-4-1-fast-non-reasoning
Ollama - llama3:8b / mistral:7b (local)

Challenge: Each provider has different APIs, models, rate limits

Solution: Abstraction via omnillm library

Multi-LLM Implementation 🔧

Factory Pattern in pkg/llm/factory.go:

func CreateLLM(cfg *config.Config) (*genai.Client, string, error) {
    switch cfg.LLMProvider {
    case "gemini":
        return createGeminiClient(cfg)
    case "claude":
        return createClaudeClient(cfg)
    case "openai":
        return createOpenAIClient(cfg)
    case "xai":
        return createXAIClient(cfg)
    case "ollama":
        return createOllamaClient(cfg)
    default:
        return nil, "", fmt.Errorf("unsupported provider: %s", cfg.LLMProvider)
    }
}

Benefit: Agents are provider-agnostic

LLM Configuration Example 💻

Simple Environment Variables

# Use Gemini (default)
export GOOGLE_API_KEY="your-key"

# Switch to Claude
export LLM_PROVIDER="claude"
export ANTHROPIC_API_KEY="your-key"

# Switch to local Ollama
export LLM_PROVIDER="ollama"
export OLLAMA_URL="http://localhost:11434"
export LLM_MODEL="llama3:8b"

No code changes required!

Challenge 3: Search Providers 🔍

Requirement: Support multiple search providers

Options

Serper API - $50/month, 5K queries (recommended)
SerpAPI - Alternative with different pricing
Mock - For development without API keys

Challenge: Different APIs, different response formats

Solution: omniserp library abstraction

// Unified interface - works with any provider
result, err := searchClient.SearchNormalized(ctx, params)

Challenge 4: Security 🔒

Initial Design: Client-side LLM (❌ Bad)

# Client needs API key!
export GOOGLE_API_KEY="key"
./stats-agent search "topic" --direct

Problem

Clients need API keys (security risk)
Hard to update prompts
No centralized rate limiting

Solution: Server-side Direct Agent (✅ Good)

Direct Agent server on port 8005
Client makes HTTP requests
Server holds API keys
Centralized control

Direct Agent Implementation 🌐

Built with Huma v2 + Chi router

OpenAPI 3.1 automatic generation
Interactive Swagger UI at /docs
Type-safe request/response handling
Proper HTTP timeouts

Example

type DirectSearchInput struct {
    Body struct {
        Topic    string `json:"topic" minLength:"1"`
        MinStats int    `json:"min_stats" minimum:"1"`
    }
}

huma.Register(api, operation, handler)

Port 8005 - Production-ready with docs!

Challenge 5: JSON Numbers 🔢

The Bug

{
  "value": 2,537  // ❌ Invalid JSON!
}

Root Cause: LLM formats numbers like humans (2,537)

The Fix - Explicit prompt instructions:

CRITICAL: The "value" field must be a plain number
with NO commas (e.g., 2537 not 2,537)

REMEMBER: Numbers like 75,000 should be written
as 75000 (no comma).

Result: Valid JSON every time! ✅

Prompt Engineering Lessons 📝

Problem: LLM returns 1-2 statistics, stops

Bad Prompt

Find statistics about climate change.

Good Prompt

Extract EVERY statistic you find, not just one or two. Be thorough and comprehensive.

If the page contains 10 statistics, return 10 items in the array.

Return empty array [] ONLY if absolutely no statistics are found.

Impact: 2-3x more statistics extracted per page

Section 4

Deployment & Integration

From local development to production

Deployment Architecture 🐳

Two Deployment Methods

Local Development

make run-all-eino  # Start all 4 agents (HTTP + A2A)
./bin/stats-agent search "topic"

Docker Production

docker-compose up -d  # All agents containerized
curl -X POST http://localhost:8000/orchestrate  # HTTP
# or via A2A: POST http://localhost:9000/invoke

Same code, same config - seamless transition!

Agent	HTTP	A2A
Orchestrator	:8000	:9000
Research	:8001	:9001
Verification	:8002	:9002
Synthesis	:8004	:9004

MCP Server Integration 🔌

Model Context Protocol support for AI tool integration

Use Case: Claude Code can search for verified statistics

{
  "mcpServers": {
    "stats-agent": {
      "command": "go",
      "args": ["run", "mcp/server/main.go"]
    }
  }
}

Tools Available

search_statistics - Full pipeline search
verify_statistic - Single verification

Integration: Works with Claude Code, other MCP clients

Performance Metrics 📈

Metric	Direct Mode	Pipeline Mode
Verification Rate	❌ 0-30%	✅ 60-90%
Response Time	⚡ 5-10s	⚡ 30-60s
URLs Searched	0 (LLM memory)	30 (real search)
Pages Processed	0	15+
Cost per Query	Low	Medium
Accuracy	❌ Low	✅ High

Sweet Spot: Pipeline mode for statistics, Direct for general Q&A

Real-World Example 🌍

Query: "climate change statistics"

Result

{
  "name": "Global temperature increase",
  "value": 1.1,
  "unit": "C",
  "source": "IPCC Sixth Assessment Report",
  "source_url": "https://www.ipcc.ch/...",
  "excerpt": "Global surface temperature has increased
             by approximately 1.1C since pre-industrial
             times...",
  "verified": true
}

Verification: Excerpt found verbatim in source! ✅

Technology Stack ⚙️

Language & Runtime

Go 1.21+ - Concurrency, performance, simple deployment

Agent Frameworks

Google ADK - LLM agents + A2A protocol support
Eino - Deterministic graph orchestration
A2A Protocol - Agent-to-agent interoperability (Google)

API & Protocols

HTTP - Custom security, observability (ports 800x)
A2A/JSON-RPC - Standard agent invocation (ports 900x)
Huma v2 - OpenAPI 3.1 generation

Integrations

omnillm - Multi-provider LLM abstraction
omniobserve - Unified LLM observability (Opik, Langfuse, Phoenix)
omniserp - Unified search API

Key Learnings 💡

Real-time search > LLM memory for current data
- 0% vs 60-90% verification rate
Verification is non-negotiable for accuracy
- Always fetch and validate sources
Separation of concerns enables optimization
- Search, extract, verify are independent
Prompt engineering matters at scale
- Explicit completeness instructions needed
Flexibility enables adoption
- Multi-LLM, multi-search provider support

Challenges & Future Work 🚀

Current Limitations

❌ Paywalled content inaccessible
❌ Non-English sources need translation
⚠️ Range statistics (e.g., "79-96%") need schema updates

Future Enhancements

✨ Add value_max field for ranges
✨ Perplexity API integration (built-in search)
✨ Caching layer for search results
✨ Streaming responses for faster perceived performance
✨ Multi-language support

Section 5

Operations & Best Practices

Running the system in production

Complete Workflow Example 🔄

./stats-agent search "renewable energy" --min-stats 10

What Happens

Orchestrator validates input
Research searches 30 URLs via Serper
Synthesis processes 15+ pages (450K+ chars total)
Synthesis extracts 50+ candidate statistics
Verification validates each candidate
Verification returns 12 verified (60% rate)
Orchestrator checks: 12 ≥ 10 ✅
User receives JSON output

Total time: ~45 seconds

Monitoring & Observability 👁️

Structured Logging at each stage:

Research Agent: Found 30 search results
Synthesis Agent: Extracted 8 statistics from nature.com
Synthesis Agent: Total candidates: 52 from 15 pages
Verification Agent: Verified 10/15 candidates (67%)
Orchestration: Target met (10 verified)

Health Checks

/health endpoint on each agent
Docker health checks in production
Timeout monitoring (60s max)

LLM Observability (via OmniObserve)

Automatic tracing of all LLM calls
Token usage and cost tracking
Supports: Comet Opik, Langfuse, Arize Phoenix

Metrics to Track

Verification rate per query
Average response time
Cost per query (API calls)

Developer Experience 👩‍💻

Simple Commands

# Install dependencies
make install

# Build all agents
make build

# Run everything (Eino orchestrator)
make run-all-eino

# Run direct + verification only
make run-direct-verify

# Run tests
make test

Clean Abstractions: Agents don't know about each other's internals

Easy Debugging: Run individual agents in separate terminals

Configuration Management ⚙️

Environment-Based

# .env file
LLM_PROVIDER=gemini
GOOGLE_API_KEY=your-key
SEARCH_PROVIDER=serper
SERPER_API_KEY=your-key

Override per Agent

# Use different LLM for synthesis
export SYNTHESIS_LLM_PROVIDER=claude
export SYNTHESIS_LLM_MODEL=claude-sonnet-4-20250514

Docker-Friendly: All config via environment variables

Mode Comparison Summary 📊

Feature	Direct	Hybrid	Pipeline
Speed	⚡⚡⚡ 5s	⚡⚡ 15s	⚡ 45s
Accuracy	❌ Low	⚠️ Medium	✅ High
Verification	❌ No	⚠️ LLM URLs	✅ Real URLs
Cost	$	$$	$$$
Use Case	Brainstorm	Quick check	Production
Agents Needed	1	2	4

Recommendation: Pipeline mode for statistics that matter

Testing Strategy 🧪

Unit Tests
- Individual function validation
- LLM provider factory
- JSON parsing edge cases
Integration Tests
- Agent-to-agent communication
- HTTP endpoint validation
- Error handling flows
End-to-End Tests
- Complete pipeline execution
- Verification rate validation
- Performance benchmarks
Manual Testing
- Known statistics verification
- Multi-provider compatibility
- Edge case exploration

Error Handling & Resilience 🛡️

Graceful Degradation

// If source unreachable, mark failed
if err := fetchURL(url); err != nil {
    return VerificationResult{
        Verified: false,
        Reason:   "Source unreachable",
    }
}

Retry Logic

HTTP retries with exponential backoff
Automatic quality check retries
Human-in-the-loop for partial results

User-Friendly Messages

"Found 8 of 10 requested, continue? (y/n)"
Clear error messages with remediation steps

Security Considerations 🔐

API Key Management

Environment variables only (never in code)
Server-side storage (clients don't need keys)
Per-agent key rotation possible

Input Validation

Topic length limits (500 chars)
Min/max stats bounds (1-100)
URL validation before fetching

Timeouts

HTTP request timeouts (30-60s)
LLM generation timeouts
Overall query timeout (120s)

Future: Add rate limiting, authentication

Performance Optimization ⚡

Research Agent
- Parallel URL searches where supported
- Connection pooling for HTTP clients
Synthesis Agent
- Parallel page fetching (up to 5 concurrent)
- Content truncation (30K chars max)
- Efficient JSON parsing
Verification Agent
- Batch verification where possible
- Early exit on clear failures
- LLM only for fuzzy matching
Overall
- 45-second average for 10 verified statistics
- Scales linearly with min_stats target

Code Organization 📁

agents/          # Each agent is independent
  ├── research/
  ├── synthesis/
  ├── verification/
  ├── direct/
  └── orchestration-eino/

pkg/             # Shared libraries
  ├── config/    # Centralized configuration
  ├── llm/       # Multi-provider factory
  ├── models/    # Shared data structures
  ├── search/    # Search abstraction
  └── direct/    # Direct search service

main.go          # CLI entry point

Principle: High cohesion, low coupling

Documentation Strategy 📚

README.md

Comprehensive setup instructions
Clear mode comparisons
Warning callouts for limitations

Code Documentation

Inline comments for complex logic
Function documentation (godoc format)
Architecture decision records (ADRs)

API Documentation

OpenAPI 3.1 specification (Huma)
Interactive Swagger UI at /docs
Example requests/responses

Presentation: Architecture overview (this!)

Extensibility & Contributions 🤝

Easy to Extend

Add new LLM provider: Implement omnillm interface
Add new search provider: Implement omniserp interface
Add new agent: Follow existing patterns
Add new verification rules: Extend verification agent

Contribution Areas

🔧 New LLM providers (e.g., Perplexity)
🌍 Multi-language support
📊 Range statistics (value_max)
⚡ Performance optimizations
📚 Documentation improvements

License: MIT (permissive)

Section 6

Production & Scale

Costs, scaling, and enterprise considerations

Real-World Usage Patterns 🌐

Use Case 1: Research Reports
- Pipeline mode with --reputable-only
- Export to JSON for analysis
- Cite sources with URLs
Use Case 2: Data Analysis
- Bulk queries via API
- Process results in pandas/R
- Visualization of trends
Use Case 3: AI Assistant Integration
- MCP server with Claude Code
- LLM asks stats-agent for verified data
- Compose into reports
Use Case 4: Quick Fact-Checking
- Direct mode for fast lookup
- Accept unverified for speed

Cost Analysis 💰

Per Query Costs (estimates):

Component	Direct	Hybrid	Pipeline
Search API	$0.00	$0.00	$0.02
LLM Calls	$0.01	$0.03	$0.08
Total	$0.01	$0.03	$0.10

Cost Drivers

Number of pages processed (15+)
LLM provider choice (Gemini < Claude < GPT-4o/GPT-5)
Verification attempts

Optimization: Use Gemini 2.5 Flash (fast + cheap)

Scaling Considerations 📈

Horizontal Scaling

Each agent scales independently
Load balancer per agent type
Stateless design enables easy scaling

Vertical Scaling

Increase concurrency limits
Larger content chunks (current: 30K)
More parallel page fetching

Optimizations for Scale

Cache search results (1 hour TTL)
Queue-based processing for bulk queries
Database for results persistence

Example: 10 orchestrators + 20 synthesis agents

Production Monitoring 📡

Metrics to Collect
- Verification rate by source domain
- Response time percentiles (p50, p95, p99)
- Error rate by agent
- API cost per query
- Throughput (queries/minute)
Alerting
- Verification rate < 50% (alert)
- Response time > 120s (alert)
- Agent health check failures
- API quota exhaustion
Tools
- OmniObserve for LLM tracing (Opik, Langfuse, Phoenix) ✅
- Prometheus for metrics (future)
- Grafana for dashboards (future)
- Jaeger for distributed tracing (future)

Compliance & Ethics 🌟

Responsible Web Scraping

Respect robots.txt
Rate limiting on URL fetches
User-Agent identification
No aggressive crawling

Data Privacy

No PII collection
No user query logging (optional)
API keys stored securely
GDPR compliance considerations

Source Attribution

Always cite original sources
Provide full URLs
Verbatim excerpts (fair use)

Ethics: Promote verified information, combat misinformation

Competitive Analysis 🏆

System	Search	Verify	Multi-LLM	Open Source
ChatGPT.com	✅ Bing	⚠️ Light	❌ GPT only	❌ Closed
Perplexity	✅ Multiple	⚠️ Light	❌ Limited	❌ Closed
Our System	✅ Google	✅ Strong	✅ 5+	✅ MIT
Direct LLM	❌ Memory	❌ None	✅ Any	N/A

Key Differentiator: Rigorous verification + flexibility

Open Source: Community can audit, extend, trust

Migration Path 🚀

From Direct LLM Usage

// Before: Client-side LLM
resp, err := llmClient.Generate(ctx, "Find climate statistics")

// After: Stats Agent Direct mode
body := DirectSearchRequest{Topic: "climate change", MinStats: 10}
resp, err := http.Post("http://localhost:8005/search", "application/json", body)

From Other APIs

// Before: Direct LLM call (no verification)
stats, err := getLLMStats(ctx, "climate statistics")

// After: Stats Agent Pipeline (verified)
body := OrchestrationRequest{Topic: "climate change", MinVerifiedStats: 10}
resp, err := http.Post("http://localhost:8000/orchestrate", "application/json", body)

Roadmap 🗺️

Q1 2026

✨ Perplexity API integration (built-in search)
✨ Range statistics (value_max field)
✨ Response streaming for faster UX

Q2 2026

✨ Multi-language support (ES, FR, DE, ZH)
✨ Caching layer for search results
✨ GraphQL API option

Q3 2026

✨ Browser extension for fact-checking
✨ Notion/Confluence integrations
✨ Advanced citation formats (APA, MLA)

Community Driven: Submit feature requests on GitHub!

Section 7

Conclusion

Summary, resources, and next steps

Team & Collaboration 👥

Development Approach

Agent-based architecture enables parallel work
Clear interfaces between components
Code reviews for quality
Continuous integration (GitHub Actions)

Best Practices

Branch protection on main
Required passing tests for merge
Semantic versioning
Changelog maintenance

Communication

Architecture decisions documented
Weekly sync meetings
GitHub issues for tracking

Lessons Learned (Summary) 💭

Technical
1. Real-time data > LLM memory for facts
2. Verification is essential, not optional
3. Modular architecture enables optimization
4. Prompt engineering is critical at scale
Process
1. Clear requirements prevent scope creep
2. Early testing reveals issues sooner
3. Documentation enables adoption
4. User feedback drives priorities
Product
1. Be honest about limitations (builds trust)
2. Provide flexibility (multi-LLM, multi-search)
3. Developer experience matters

Conclusion 🎯

What We Built

Production-ready statistics verification system
60-90% verification rate (vs 0% for LLM alone)
Multi-agent architecture with clear separation
Flexible (multi-LLM, multi-search)
Open source (MIT license)

Key Success Factors

Real-time web search for current data
Rigorous verification against sources
Modular, extensible design
Comprehensive testing & documentation

Impact: Enables verified statistics for research, reporting, analysis

Get Involved! 🚀

Repository: github.com/agentplexus/stats-agent-team

Quick Start

git clone https://github.com/agentplexus/stats-agent-team
cd stats-agent-team
make install
make build
make run-all-eino

Contribute

🐛 Report bugs
💡 Suggest features
📝 Improve docs
🔧 Submit PRs

License: MIT (permissive, commercial-friendly)

Questions? 🤔

Contact & Resources

📧 GitHub Issues for questions
📚 Full documentation in README.md
🔗 OpenAPI docs at localhost:8005/docs
💬 Discussions tab for community chat

Thank You! 🙏

Special Thanks

Google ADK team
Eino framework contributors
Open source LLM providers
The Go community

Additional Resources 📖

Documentation

README.md - Setup & usage guide
ROADMAP.md - Planned features & enhancements
4_AGENT_ARCHITECTURE.md - Architecture deep dive
LLM_CONFIGURATION.md - Multi-LLM setup
SEARCH_INTEGRATION.md - Search provider setup
MCP_SERVER.md - MCP integration guide
DOCKER.md - Container deployment

Example Queries

Climate change statistics
AI industry trends
Healthcare outcomes
Economic indicators
Educational metrics

Try it yourself!

FilesExpand file tree

PRESENTATION.md

Latest commit

History

PRESENTATION.md

File metadata and controls

Statistics Agent Team

Building a Multi-Agent System for Verified Statistics

Section 1

Introduction & Problem Statement

The Problem 🎯

Requirements 📋

Functional Requirements

Non-Functional Requirements

Section 2

Architecture & Agent Design

Architecture Overview 🏗️

Orchestration Pattern 🔀

Dual Protocol: HTTP + A2A 🔗

Agent Types Summary 🧩

Frameworks: ADK Now Justified ✅

Agent 1: Research Agent 🔍

Agent 2: Synthesis Agent 📊

Synthesis Agent: Key Learnings 💡

Agent 3: Verification Agent ✅

Agent 4: Orchestration Agent 🎭

Option A: Google ADK (LLM-driven)

Option B: Eino (Deterministic) ⭐ RECOMMENDED

Eino Orchestration Flow 🔄

Section 3

Technical Challenges & Solutions

Challenge 1: Direct Mode Failure ⚠️

Direct Mode vs ChatGPT.com 📊

Solution: Pipeline Mode ✅

Challenge 2: Multi-LLM Support 🔧

Multi-LLM Implementation 🔧

LLM Configuration Example 💻

Challenge 3: Search Providers 🔍

Challenge 4: Security 🔒

Direct Agent Implementation 🌐

Challenge 5: JSON Numbers 🔢

Prompt Engineering Lessons 📝

Section 4

Deployment & Integration

Deployment Architecture 🐳

Local Development

Docker Production

MCP Server Integration 🔌

Performance Metrics 📈

Real-World Example 🌍

Technology Stack ⚙️

Key Learnings 💡

Challenges & Future Work 🚀

Section 5

Operations & Best Practices

Complete Workflow Example 🔄

Monitoring & Observability 👁️

Developer Experience 👩‍💻

Configuration Management ⚙️

Mode Comparison Summary 📊

Testing Strategy 🧪

Error Handling & Resilience 🛡️

Security Considerations 🔐

Performance Optimization ⚡

Code Organization 📁

Documentation Strategy 📚

Extensibility & Contributions 🤝

Section 6

Production & Scale

Real-World Usage Patterns 🌐

Cost Analysis 💰

Scaling Considerations 📈

Production Monitoring 📡

Compliance & Ethics 🌟

Competitive Analysis 🏆

Migration Path 🚀

Roadmap 🗺️

Section 7

Conclusion

Team & Collaboration 👥