Skip to content

Latest commit

 

History

History
1784 lines (1353 loc) · 73.6 KB

File metadata and controls

1784 lines (1353 loc) · 73.6 KB
marp true
theme vibeminds
paginate true
style /* Mermaid diagram styling */ .mermaid-container { display: flex; justify-content: center; align-items: center; width: 100%; margin: 0.5em 0; } .mermaid { text-align: center; } .mermaid svg { max-height: 280px; width: auto; } .mermaid .node rect, .mermaid .node polygon { rx: 5px; ry: 5px; } .mermaid .nodeLabel { padding: 0 10px; } /* Two-column layout */ .columns { display: flex; gap: 40px; align-items: flex-start; } .column-left { flex: 1; } .column-right { flex: 1; } .column-left .mermaid svg { min-height: 400px; height: auto; max-height: 500px; } /* Section divider slides */ section.section-divider { display: flex; flex-direction: column; justify-content: center; align-items: center; text-align: center; background: linear-gradient(135deg, #1a1a3e 0%, #4a3f8a 50%, #2d2d5a 100%); } section.section-divider h1 { font-size: 3.5em; margin-bottom: 0.2em; } section.section-divider h2 { font-size: 1.5em; color: #b39ddb; font-weight: 400; } section.section-divider p { font-size: 1.1em; color: #9575cd; margin-top: 1em; }
<script type="module"> import mermaid from 'https://cdn.jsdelivr.net/npm/mermaid@10/dist/mermaid.esm.min.mjs'; mermaid.initialize({ startOnLoad: true, theme: 'dark', themeVariables: { background: 'transparent', primaryColor: '#7c4dff', primaryTextColor: '#e8eaf6', primaryBorderColor: '#667eea', lineColor: '#b39ddb', secondaryColor: '#302b63', tertiaryColor: '#24243e' } }); </script>

Statistics Agent Team

Building a Multi-Agent System for Verified Statistics

A Production-Ready Implementation

Built with Google ADK, Eino, and Multi-LLM Support


Section 1

Introduction & Problem Statement

Understanding the challenge of verified statistics


The Problem 🎯

Challenge: Finding verified, numerical statistics from reputable web sources

Pain Points

  • ❌ LLMs hallucinate statistics and sources
  • ❌ URLs from LLM memory are often outdated or wrong
  • ❌ No verification that excerpts actually exist
  • ❌ Hard to distinguish reputable vs unreliable sources

Goal: Build a system that provides provably accurate statistics


Requirements 📋

Functional Requirements

  • ✅ Search web for statistics on any topic
  • ✅ Extract numerical values with context
  • ✅ Verify excerpts exist in source documents
  • ✅ Validate numerical accuracy
  • ✅ Prioritize reputable sources (.gov, .edu, research orgs)

Non-Functional Requirements

  • ✅ 60-90% verification rate (vs 0% for direct LLM)
  • ✅ Response time: under 60 seconds
  • ✅ Support multiple LLM providers
  • ✅ Containerized deployment

Section 2

Architecture & Agent Design

Four specialized agents working together


Architecture Overview 🏗️

flowchart LR A["User/CLI"] --> B["Orchestrator
⚙️ Graph
:8000 | :9000"] M["AI Assistant"] -.->|optional| MCP["MCP Server
🔌 Protocol"] MCP -.-> B B -->|HTTP or A2A| C["Research
⚡ Tool
:8001 | :9001"] B -->|HTTP or A2A| D["Synthesis
🧠 LLM
:8004 | :9004"] B -->|HTTP or A2A| E["Verification
🧠 LLM
:8002 | :9002"] C --> F["URLs"] D --> G["Statistics"] E --> H["Verified"] classDef agent fill:#00bfa5,stroke:#00897b,color:#fff class B,C,D,E agent

4 Specialized Agents with dual protocol support:

  1. Research - Tool-based (Search API) - HTTP :8001 | A2A :9001
  2. Synthesis - LLM-based extraction - HTTP :8004 | A2A :9004
  3. Verification - LLM-based validation - HTTP :8002 | A2A :9002
  4. Orchestration - Graph-based workflow - HTTP :8000 | A2A :9000

Orchestration Pattern 🔀

Graph-Based Orchestration (not inter-agent communication)

flowchart LR O["Orchestrator"] -->|"1. search"| R["Research"] R -->|"URLs"| O O -->|"2. extract"| S["Synthesis"] S -->|"candidates"| O O -->|"3. validate"| V["Verification"] V -->|"verified"| O

What This Means

  • ✅ Hub-and-spoke: Orchestrator coordinates all communication
  • ✅ Sequential pipeline: Predictable execution order
  • ✅ Easy to debug: Clear data flow, reproducible behavior
  • ❌ No peer-to-peer: Agents don't message each other directly
  • ❌ No negotiation: No agent-to-agent collaboration protocols

Trade-off: Predictability over flexibility (right choice for production)


Dual Protocol: HTTP + A2A 🔗

Every agent exposes both protocols simultaneously

Protocol Ports Purpose
HTTP 800x Custom security (SPIFFE, KYA, XAA), observability
A2A 900x Standard agent interoperability (Google protocol)

A2A Endpoints per Agent

  • GET /.well-known/agent-card.json - Agent discovery
  • POST /invoke - JSON-RPC execution

Why Both?

  • ✅ A2A: Standard protocol, agent discovery, interoperability
  • ✅ HTTP: Flexibility for security layers, LLM observability
  • ✅ Compare: Evaluate implementation complexity side-by-side

Configuration: A2A_ENABLED=true activates A2A servers


Agent Types Summary 🧩

Agent Type Technology Why?
Orchestrator ⚙️ Graph Eino workflow Deterministic, predictable
Research ⚡ Tool Serper/SerpAPI No reasoning needed
Synthesis 🧠 LLM Gemini/Claude/etc Language understanding
Verification 🧠 LLM Gemini/Claude/etc Fuzzy text matching

Key Insight: Use the right tool for each job

  • ❌ Don't force everything through an LLM
  • ✅ Graph for coordination (fast, predictable)
  • ✅ Tool for API calls (simple, reliable)
  • ✅ LLM for language tasks (intelligent, flexible)

Frameworks: ADK Now Justified ✅

Original Question: ADK is for inter-agent communication. Do we need it?

Answer: Yes! A2A protocol support requires ADK.

Agent ADK Role A2A Benefit
Synthesis LLM + A2A server Standard invocation
Verification LLM + A2A server Agent discovery
Research Tool wrapper + A2A Interoperability
Orchestrator Eino wrapped in ADK A2A compatibility

What ADK Provides for A2A

  • adka2a.NewExecutor() - Bridges ADK to A2A protocol
  • adka2a.BuildAgentSkills() - Generates agent card skills
  • remoteagent.NewA2A() - A2A client for calling remote agents

Verdict: ADK is the right choice. A2A support justifies the framework.


Agent 1: Research Agent 🔍

Responsibility: Find relevant web sources

Implementation (Google ADK)

  • No LLM required (pure search)
  • Integrates with Serper/SerpAPI via omniserp library
  • Filters for reputable domains
  • Returns 30 URLs by default

Key Decision: Separate search from extraction

  • Allows caching of search results
  • Different providers don't need LLM changes
  • Faster iteration on search queries

Port: 8001


Agent 2: Synthesis Agent 📊

Responsibility: Extract statistics from web pages

Implementation (Google ADK)

  • Fetches webpage content (30K chars per page)
  • LLM analyzes text for numerical statistics
  • Extracts verbatim excerpts
  • Processes 15+ pages for comprehensive coverage
  • Returns candidates with metadata

Key Challenge: Getting complete extraction

  • ❌ Initial: Only returned 5-8 statistics
  • ✅ Solution: Increased pages (5→15), content (15K→30K), multiplier (2x→5x)

Port: 8004


Synthesis Agent: Key Learnings 💡

Problem: Low statistical yield (5-8 stats vs ChatGPT's 20+)

Root Cause Analysis

  • Too few pages processed (only 5)
  • Too little content per page (15K chars)
  • Too conservative multiplier (2x)

Solution - Aggressive extraction:

minPagesToProcess := 15  // (increased from 5)
maxContentLen := 30000   // (increased from 15K)
multiplier := 5          // (increased from 2x)

Result: Now matches ChatGPT.com performance!


Agent 3: Verification Agent ✅

Responsibility: Validate statistics against sources

Implementation (Google ADK)

  • Re-fetches source URLs
  • Checks excerpts exist verbatim
  • Validates numerical values match exactly
  • Uses light LLM assistance for fuzzy matching
  • Returns pass/fail with detailed reasons

Key Decision: Always fetch original source

  • No trusting LLM claims
  • Catches hallucinations
  • Verifies pages haven't changed

Port: 8002


Agent 4: Orchestration Agent 🎭

Two Implementations Available

Option A: Google ADK (LLM-driven)

  • Uses LLM to decide workflow steps
  • Adaptive retry logic
  • More flexible but slower

Option B: Eino (Deterministic) ⭐ RECOMMENDED

  • Type-safe graph-based workflow
  • Predictable, reproducible behavior
  • Faster and lower cost
  • No LLM for orchestration decisions

Both run on Port 8000 (choose one)


Eino Orchestration Flow 🔄

flowchart TB A["ValidateInput"] --> B["Research 
30 URLs"] B --> C["Synthesis
15+ pages -> candidates"] C --> D["Verification 
validate each"] D --> E["QualityCheck 
>= min verified?"] E --> F["FormatOutput"] --> G["User"]

Why Eino?

  • Type-safe operations
  • No non-deterministic LLM decisions
  • Easier to debug and test
  • Production-ready reliability

Benefits:

  • Predictable execution path
  • No hidden LLM costs for orchestration
  • Easy to trace and monitor
  • Reproducible results

Section 3

Technical Challenges & Solutions

From LLM hallucinations to multi-provider support


Challenge 1: Direct Mode Failure ⚠️

Initial Idea: Let LLM answer from memory (like ChatGPT)

Implementation

./stats-agent search "AI trends" --direct

The Problem

  • LLM returns statistics from training data (up to Jan 2025)
  • URLs are guessed - not from real search
  • Pages have moved, changed, or are paywalled
  • 0% verification rate when validated

The Lesson: Real-time web search is essential for statistics


Direct Mode vs ChatGPT.com 📊

Same Query: "AI trends"

System Statistics Found Verification Rate Why?
ChatGPT.com 20+ ✅ 90%+ Real-time Bing search
Direct Mode 10 ❌ 0% LLM memory (outdated URLs)
Pipeline Mode 15-25 ✅ 60-90% Real-time Google search

Key Insight: ChatGPT.com's success comes from web search, not just LLM quality!

Our Solution: Pipeline mode with Serper/SerpAPI


Solution: Pipeline Mode ✅

What We Changed

  • Made Pipeline mode the default
  • Added warnings to Direct mode docs
  • Implemented hybrid mode (Direct + Verification)

README Warning

⚠️ Direct Mode - Not Recommended for Statistics
- ❌ Uses LLM memory (training data)
- ❌ Outdated URLs
- ❌ 0% verification rate

✅ For statistics, use Pipeline mode instead

Result: Clear expectations, better user experience


Challenge 2: Multi-LLM Support 🔧

Requirement: Support multiple LLM providers

Supported Providers

  • Google Gemini (default) - gemini-2.5-flash / gemini-2.5-pro
  • Anthropic Claude - claude-sonnet-4-20250514 / claude-opus-4-1-20250805
  • OpenAI - gpt-4o / gpt-5
  • xAI Grok - grok-4-1-fast-reasoning / grok-4-1-fast-non-reasoning
  • Ollama - llama3:8b / mistral:7b (local)

Challenge: Each provider has different APIs, models, rate limits

Solution: Abstraction via omnillm library


Multi-LLM Implementation 🔧

Factory Pattern in pkg/llm/factory.go:

func CreateLLM(cfg *config.Config) (*genai.Client, string, error) {
    switch cfg.LLMProvider {
    case "gemini":
        return createGeminiClient(cfg)
    case "claude":
        return createClaudeClient(cfg)
    case "openai":
        return createOpenAIClient(cfg)
    case "xai":
        return createXAIClient(cfg)
    case "ollama":
        return createOllamaClient(cfg)
    default:
        return nil, "", fmt.Errorf("unsupported provider: %s", cfg.LLMProvider)
    }
}

Benefit: Agents are provider-agnostic


LLM Configuration Example 💻

Simple Environment Variables

# Use Gemini (default)
export GOOGLE_API_KEY="your-key"

# Switch to Claude
export LLM_PROVIDER="claude"
export ANTHROPIC_API_KEY="your-key"

# Switch to local Ollama
export LLM_PROVIDER="ollama"
export OLLAMA_URL="http://localhost:11434"
export LLM_MODEL="llama3:8b"

No code changes required!


Challenge 3: Search Providers 🔍

Requirement: Support multiple search providers

Options

  • Serper API - $50/month, 5K queries (recommended)
  • SerpAPI - Alternative with different pricing
  • Mock - For development without API keys

Challenge: Different APIs, different response formats

Solution: omniserp library abstraction

// Unified interface - works with any provider
result, err := searchClient.SearchNormalized(ctx, params)

Challenge 4: Security 🔒

Initial Design: Client-side LLM (❌ Bad)

# Client needs API key!
export GOOGLE_API_KEY="key"
./stats-agent search "topic" --direct

Problem

  • Clients need API keys (security risk)
  • Hard to update prompts
  • No centralized rate limiting

Solution: Server-side Direct Agent (✅ Good)

  • Direct Agent server on port 8005
  • Client makes HTTP requests
  • Server holds API keys
  • Centralized control

Direct Agent Implementation 🌐

Built with Huma v2 + Chi router

  • OpenAPI 3.1 automatic generation
  • Interactive Swagger UI at /docs
  • Type-safe request/response handling
  • Proper HTTP timeouts

Example

type DirectSearchInput struct {
    Body struct {
        Topic    string `json:"topic" minLength:"1"`
        MinStats int    `json:"min_stats" minimum:"1"`
    }
}

huma.Register(api, operation, handler)

Port 8005 - Production-ready with docs!


Challenge 5: JSON Numbers 🔢

The Bug

{
  "value": 2,537  // ❌ Invalid JSON!
}

Root Cause: LLM formats numbers like humans (2,537)

The Fix - Explicit prompt instructions:

CRITICAL: The "value" field must be a plain number
with NO commas (e.g., 2537 not 2,537)

REMEMBER: Numbers like 75,000 should be written
as 75000 (no comma).

Result: Valid JSON every time! ✅


Prompt Engineering Lessons 📝

Problem: LLM returns 1-2 statistics, stops

Bad Prompt

Find statistics about climate change.

Good Prompt

Extract EVERY statistic you find, not just one or two. Be thorough and comprehensive.

If the page contains 10 statistics, return 10 items in the array.

Return empty array [] ONLY if absolutely no statistics are found.

Impact: 2-3x more statistics extracted per page


Section 4

Deployment & Integration

From local development to production


Deployment Architecture 🐳

Two Deployment Methods

Local Development

make run-all-eino  # Start all 4 agents (HTTP + A2A)
./bin/stats-agent search "topic"

Docker Production

docker-compose up -d  # All agents containerized
curl -X POST http://localhost:8000/orchestrate  # HTTP
# or via A2A: POST http://localhost:9000/invoke

Same code, same config - seamless transition!

Agent HTTP A2A
Orchestrator :8000 :9000
Research :8001 :9001
Verification :8002 :9002
Synthesis :8004 :9004

MCP Server Integration 🔌

Model Context Protocol support for AI tool integration

Use Case: Claude Code can search for verified statistics

{
  "mcpServers": {
    "stats-agent": {
      "command": "go",
      "args": ["run", "mcp/server/main.go"]
    }
  }
}

Tools Available

  • search_statistics - Full pipeline search
  • verify_statistic - Single verification

Integration: Works with Claude Code, other MCP clients


Performance Metrics 📈

Metric Direct Mode Pipeline Mode
Verification Rate ❌ 0-30% ✅ 60-90%
Response Time ⚡ 5-10s ⚡ 30-60s
URLs Searched 0 (LLM memory) 30 (real search)
Pages Processed 0 15+
Cost per Query Low Medium
Accuracy ❌ Low ✅ High

Sweet Spot: Pipeline mode for statistics, Direct for general Q&A


Real-World Example 🌍

Query: "climate change statistics"

Result

{
  "name": "Global temperature increase",
  "value": 1.1,
  "unit": "C",
  "source": "IPCC Sixth Assessment Report",
  "source_url": "https://www.ipcc.ch/...",
  "excerpt": "Global surface temperature has increased
             by approximately 1.1C since pre-industrial
             times...",
  "verified": true
}

Verification: Excerpt found verbatim in source! ✅


Technology Stack ⚙️

Language & Runtime

  • Go 1.21+ - Concurrency, performance, simple deployment

Agent Frameworks

  • Google ADK - LLM agents + A2A protocol support
  • Eino - Deterministic graph orchestration
  • A2A Protocol - Agent-to-agent interoperability (Google)

API & Protocols

  • HTTP - Custom security, observability (ports 800x)
  • A2A/JSON-RPC - Standard agent invocation (ports 900x)
  • Huma v2 - OpenAPI 3.1 generation

Integrations

  • omnillm - Multi-provider LLM abstraction
  • omniobserve - Unified LLM observability (Opik, Langfuse, Phoenix)
  • omniserp - Unified search API

Key Learnings 💡

  1. Real-time search > LLM memory for current data
    • 0% vs 60-90% verification rate
  2. Verification is non-negotiable for accuracy
    • Always fetch and validate sources
  3. Separation of concerns enables optimization
    • Search, extract, verify are independent
  4. Prompt engineering matters at scale
    • Explicit completeness instructions needed
  5. Flexibility enables adoption
    • Multi-LLM, multi-search provider support

Challenges & Future Work 🚀

Current Limitations

  • ❌ Paywalled content inaccessible
  • ❌ Non-English sources need translation
  • ⚠️ Range statistics (e.g., "79-96%") need schema updates

Future Enhancements

  • ✨ Add value_max field for ranges
  • ✨ Perplexity API integration (built-in search)
  • ✨ Caching layer for search results
  • ✨ Streaming responses for faster perceived performance
  • ✨ Multi-language support

Section 5

Operations & Best Practices

Running the system in production


Complete Workflow Example 🔄

./stats-agent search "renewable energy" --min-stats 10

What Happens

  1. Orchestrator validates input
  2. Research searches 30 URLs via Serper
  3. Synthesis processes 15+ pages (450K+ chars total)
  4. Synthesis extracts 50+ candidate statistics
  5. Verification validates each candidate
  6. Verification returns 12 verified (60% rate)
  7. Orchestrator checks: 12 ≥ 10 ✅
  8. User receives JSON output

Total time: ~45 seconds


Monitoring & Observability 👁️

Structured Logging at each stage:

Research Agent: Found 30 search results
Synthesis Agent: Extracted 8 statistics from nature.com
Synthesis Agent: Total candidates: 52 from 15 pages
Verification Agent: Verified 10/15 candidates (67%)
Orchestration: Target met (10 verified)

Health Checks

  • /health endpoint on each agent
  • Docker health checks in production
  • Timeout monitoring (60s max)

LLM Observability (via OmniObserve)

  • Automatic tracing of all LLM calls
  • Token usage and cost tracking
  • Supports: Comet Opik, Langfuse, Arize Phoenix

Metrics to Track

  • Verification rate per query
  • Average response time
  • Cost per query (API calls)

Developer Experience 👩‍💻

Simple Commands

# Install dependencies
make install

# Build all agents
make build

# Run everything (Eino orchestrator)
make run-all-eino

# Run direct + verification only
make run-direct-verify

# Run tests
make test

Clean Abstractions: Agents don't know about each other's internals

Easy Debugging: Run individual agents in separate terminals


Configuration Management ⚙️

Environment-Based

# .env file
LLM_PROVIDER=gemini
GOOGLE_API_KEY=your-key
SEARCH_PROVIDER=serper
SERPER_API_KEY=your-key

Override per Agent

# Use different LLM for synthesis
export SYNTHESIS_LLM_PROVIDER=claude
export SYNTHESIS_LLM_MODEL=claude-sonnet-4-20250514

Docker-Friendly: All config via environment variables


Mode Comparison Summary 📊

Feature Direct Hybrid Pipeline
Speed ⚡⚡⚡ 5s ⚡⚡ 15s ⚡ 45s
Accuracy ❌ Low ⚠️ Medium ✅ High
Verification ❌ No ⚠️ LLM URLs ✅ Real URLs
Cost $ $$ $$$
Use Case Brainstorm Quick check Production
Agents Needed 1 2 4

Recommendation: Pipeline mode for statistics that matter


Testing Strategy 🧪

  1. Unit Tests
    • Individual function validation
    • LLM provider factory
    • JSON parsing edge cases
  2. Integration Tests
    • Agent-to-agent communication
    • HTTP endpoint validation
    • Error handling flows
  3. End-to-End Tests
    • Complete pipeline execution
    • Verification rate validation
    • Performance benchmarks
  4. Manual Testing
    • Known statistics verification
    • Multi-provider compatibility
    • Edge case exploration

Error Handling & Resilience 🛡️

Graceful Degradation

// If source unreachable, mark failed
if err := fetchURL(url); err != nil {
    return VerificationResult{
        Verified: false,
        Reason:   "Source unreachable",
    }
}

Retry Logic

  • HTTP retries with exponential backoff
  • Automatic quality check retries
  • Human-in-the-loop for partial results

User-Friendly Messages

  • "Found 8 of 10 requested, continue? (y/n)"
  • Clear error messages with remediation steps

Security Considerations 🔐

API Key Management

  • Environment variables only (never in code)
  • Server-side storage (clients don't need keys)
  • Per-agent key rotation possible

Input Validation

  • Topic length limits (500 chars)
  • Min/max stats bounds (1-100)
  • URL validation before fetching

Timeouts

  • HTTP request timeouts (30-60s)
  • LLM generation timeouts
  • Overall query timeout (120s)

Future: Add rate limiting, authentication


Performance Optimization ⚡

  1. Research Agent
    • Parallel URL searches where supported
    • Connection pooling for HTTP clients
  2. Synthesis Agent
    • Parallel page fetching (up to 5 concurrent)
    • Content truncation (30K chars max)
    • Efficient JSON parsing
  3. Verification Agent
    • Batch verification where possible
    • Early exit on clear failures
    • LLM only for fuzzy matching
  4. Overall
    • 45-second average for 10 verified statistics
    • Scales linearly with min_stats target

Code Organization 📁

agents/          # Each agent is independent
  ├── research/
  ├── synthesis/
  ├── verification/
  ├── direct/
  └── orchestration-eino/

pkg/             # Shared libraries
  ├── config/    # Centralized configuration
  ├── llm/       # Multi-provider factory
  ├── models/    # Shared data structures
  ├── search/    # Search abstraction
  └── direct/    # Direct search service

main.go          # CLI entry point

Principle: High cohesion, low coupling


Documentation Strategy 📚

README.md

  • Comprehensive setup instructions
  • Clear mode comparisons
  • Warning callouts for limitations

Code Documentation

  • Inline comments for complex logic
  • Function documentation (godoc format)
  • Architecture decision records (ADRs)

API Documentation

  • OpenAPI 3.1 specification (Huma)
  • Interactive Swagger UI at /docs
  • Example requests/responses

Presentation: Architecture overview (this!)


Extensibility & Contributions 🤝

Easy to Extend

  • Add new LLM provider: Implement omnillm interface
  • Add new search provider: Implement omniserp interface
  • Add new agent: Follow existing patterns
  • Add new verification rules: Extend verification agent

Contribution Areas

  • 🔧 New LLM providers (e.g., Perplexity)
  • 🌍 Multi-language support
  • 📊 Range statistics (value_max)
  • ⚡ Performance optimizations
  • 📚 Documentation improvements

License: MIT (permissive)


Section 6

Production & Scale

Costs, scaling, and enterprise considerations


Real-World Usage Patterns 🌐

  1. Use Case 1: Research Reports
    • Pipeline mode with --reputable-only
    • Export to JSON for analysis
    • Cite sources with URLs
  2. Use Case 2: Data Analysis
    • Bulk queries via API
    • Process results in pandas/R
    • Visualization of trends
  3. Use Case 3: AI Assistant Integration
    • MCP server with Claude Code
    • LLM asks stats-agent for verified data
    • Compose into reports
  4. Use Case 4: Quick Fact-Checking
    • Direct mode for fast lookup
    • Accept unverified for speed

Cost Analysis 💰

Per Query Costs (estimates):

Component Direct Hybrid Pipeline
Search API $0.00 $0.00 $0.02
LLM Calls $0.01 $0.03 $0.08
Total $0.01 $0.03 $0.10

Cost Drivers

  • Number of pages processed (15+)
  • LLM provider choice (Gemini < Claude < GPT-4o/GPT-5)
  • Verification attempts

Optimization: Use Gemini 2.5 Flash (fast + cheap)


Scaling Considerations 📈

Horizontal Scaling

  • Each agent scales independently
  • Load balancer per agent type
  • Stateless design enables easy scaling

Vertical Scaling

  • Increase concurrency limits
  • Larger content chunks (current: 30K)
  • More parallel page fetching

Optimizations for Scale

  • Cache search results (1 hour TTL)
  • Queue-based processing for bulk queries
  • Database for results persistence

Example: 10 orchestrators + 20 synthesis agents


Production Monitoring 📡

  1. Metrics to Collect
    • Verification rate by source domain
    • Response time percentiles (p50, p95, p99)
    • Error rate by agent
    • API cost per query
    • Throughput (queries/minute)
  2. Alerting
    • Verification rate < 50% (alert)
    • Response time > 120s (alert)
    • Agent health check failures
    • API quota exhaustion
  3. Tools
    • OmniObserve for LLM tracing (Opik, Langfuse, Phoenix) ✅
    • Prometheus for metrics (future)
    • Grafana for dashboards (future)
    • Jaeger for distributed tracing (future)

Compliance & Ethics 🌟

Responsible Web Scraping

  • Respect robots.txt
  • Rate limiting on URL fetches
  • User-Agent identification
  • No aggressive crawling

Data Privacy

  • No PII collection
  • No user query logging (optional)
  • API keys stored securely
  • GDPR compliance considerations

Source Attribution

  • Always cite original sources
  • Provide full URLs
  • Verbatim excerpts (fair use)

Ethics: Promote verified information, combat misinformation


Competitive Analysis 🏆

System Search Verify Multi-LLM Open Source
ChatGPT.com ✅ Bing ⚠️ Light ❌ GPT only ❌ Closed
Perplexity ✅ Multiple ⚠️ Light ❌ Limited ❌ Closed
Our System ✅ Google Strong ✅ 5+ MIT
Direct LLM ❌ Memory ❌ None ✅ Any N/A

Key Differentiator: Rigorous verification + flexibility

Open Source: Community can audit, extend, trust


Migration Path 🚀

From Direct LLM Usage

// Before: Client-side LLM
resp, err := llmClient.Generate(ctx, "Find climate statistics")

// After: Stats Agent Direct mode
body := DirectSearchRequest{Topic: "climate change", MinStats: 10}
resp, err := http.Post("http://localhost:8005/search", "application/json", body)

From Other APIs

// Before: Direct LLM call (no verification)
stats, err := getLLMStats(ctx, "climate statistics")

// After: Stats Agent Pipeline (verified)
body := OrchestrationRequest{Topic: "climate change", MinVerifiedStats: 10}
resp, err := http.Post("http://localhost:8000/orchestrate", "application/json", body)

Roadmap 🗺️

Q1 2026

  • ✨ Perplexity API integration (built-in search)
  • ✨ Range statistics (value_max field)
  • ✨ Response streaming for faster UX

Q2 2026

  • ✨ Multi-language support (ES, FR, DE, ZH)
  • ✨ Caching layer for search results
  • ✨ GraphQL API option

Q3 2026

  • ✨ Browser extension for fact-checking
  • ✨ Notion/Confluence integrations
  • ✨ Advanced citation formats (APA, MLA)

Community Driven: Submit feature requests on GitHub!


Section 7

Conclusion

Summary, resources, and next steps


Team & Collaboration 👥

Development Approach

  • Agent-based architecture enables parallel work
  • Clear interfaces between components
  • Code reviews for quality
  • Continuous integration (GitHub Actions)

Best Practices

  • Branch protection on main
  • Required passing tests for merge
  • Semantic versioning
  • Changelog maintenance

Communication

  • Architecture decisions documented
  • Weekly sync meetings
  • GitHub issues for tracking

Lessons Learned (Summary) 💭

  1. Technical
    1. Real-time data > LLM memory for facts
    2. Verification is essential, not optional
    3. Modular architecture enables optimization
    4. Prompt engineering is critical at scale
  2. Process
    1. Clear requirements prevent scope creep
    2. Early testing reveals issues sooner
    3. Documentation enables adoption
    4. User feedback drives priorities
  3. Product
    1. Be honest about limitations (builds trust)
    2. Provide flexibility (multi-LLM, multi-search)
    3. Developer experience matters

Conclusion 🎯

What We Built

  • Production-ready statistics verification system
  • 60-90% verification rate (vs 0% for LLM alone)
  • Multi-agent architecture with clear separation
  • Flexible (multi-LLM, multi-search)
  • Open source (MIT license)

Key Success Factors

  • Real-time web search for current data
  • Rigorous verification against sources
  • Modular, extensible design
  • Comprehensive testing & documentation

Impact: Enables verified statistics for research, reporting, analysis


Get Involved! 🚀

Repository: github.com/agentplexus/stats-agent-team

Quick Start

git clone https://github.com/agentplexus/stats-agent-team
cd stats-agent-team
make install
make build
make run-all-eino

Contribute

  • 🐛 Report bugs
  • 💡 Suggest features
  • 📝 Improve docs
  • 🔧 Submit PRs

License: MIT (permissive, commercial-friendly)


Questions? 🤔

Contact & Resources

  • 📧 GitHub Issues for questions
  • 📚 Full documentation in README.md
  • 🔗 OpenAPI docs at localhost:8005/docs
  • 💬 Discussions tab for community chat

Thank You! 🙏

Special Thanks

  • Google ADK team
  • Eino framework contributors
  • Open source LLM providers
  • The Go community

Additional Resources 📖

Documentation

  • README.md - Setup & usage guide
  • ROADMAP.md - Planned features & enhancements
  • 4_AGENT_ARCHITECTURE.md - Architecture deep dive
  • LLM_CONFIGURATION.md - Multi-LLM setup
  • SEARCH_INTEGRATION.md - Search provider setup
  • MCP_SERVER.md - MCP integration guide
  • DOCKER.md - Container deployment

Example Queries

  • Climate change statistics
  • AI industry trends
  • Healthcare outcomes
  • Economic indicators
  • Educational metrics

Try it yourself!