Skip to content

feat: AI Support Triage Agent — BM25 RAG + Deterministic Safety Gate + Multi-Provider Cascade#35

Open
HarshavardhanVemali wants to merge 11 commits into
interviewstreet:mainfrom
HarshavardhanVemali:main
Open

feat: AI Support Triage Agent — BM25 RAG + Deterministic Safety Gate + Multi-Provider Cascade#35
HarshavardhanVemali wants to merge 11 commits into
interviewstreet:mainfrom
HarshavardhanVemali:main

Conversation

@HarshavardhanVemali
Copy link
Copy Markdown

Overview

This PR delivers a terminal-based AI support triage agent for the HackerRank Orchestrate 2026 challenge. The system processes support tickets across three domains — HackerRank, Claude AI, and Visa — using a local BM25 corpus, a deterministic safety engine, and a multi-provider LLM cascade.

Final evaluation results:

  • Total tickets processed : 29 / 29 (100%)
  • Replied (automated) : 24 (82.8%)
  • Escalated (safe routing): 5 (17.2%)
  • Unhandled errors : 0
  • Throughput : ~14.5 tickets/minute

All responses are grounded exclusively in the pre-built local corpus (data/hackerrank, data/claude, data/visa). No live web requests are made during evaluation.


Architecture

    ┌──────────────────────────────┐
    │      support_tickets.csv     │
    └───────────────┬──────────────┘
                    │
                    ▼
┌────────────────────────────────────────┐
│            main.py                     │
│     ThreadPoolExecutor                 │
│                                        │
│  1. corpus/loader.py  — BM25 index     │
│  2. classifier.py     — domain + type  │
│  3. corpus/loader.py  — BM25 search    │
│  4. safety.py         — rule engine    │
│  5. responder.py      — grounded reply │
│  6. logger.py         — log.txt write  │
└────────────────────────────────────────┘
                    │
                    ▼
            output.csv + log.txt

Key Components

1. Corpus Loader & BM25 Retrieval (corpus/loader.py)

The corpus is pre-built by scraping the three support sites offline. During evaluation, only the local data/ directory is used, no network calls.

  • Parses .md, .json, and .txt corpus files recursively.
  • Builds a BM25Okapi index in RAM at startup (~770 documents total).
  • search() supports domain filtering and Intent Boosting (multiplying priority terms like "refund" or "mock" by 100x).
  • Domain Inference Fallback: If the classifier returns unknown, infers the domain by majority vote over the top-5 retrieved documents.

Design decision: BM25 was chosen over vector embeddings because the support corpus is keyword-heavy (technical terms, product names, URLs), retrieval is lightning fast (sub-5ms), and there is zero additional API cost.

2. Classifier (agent/classifier.py)

Zero-shot classification using the Gemini API with JSON mode enforced.

  • Returns: domain, request_type, product_area, confidence.
  • Structured JSON output enforced via response_mime_type: application/json.
  • Falls back to Classification(domain="unknown", confidence=0.0) on any API error, safely triggering escalation.

3. Safety Gate (agent/safety.py)

Deterministic rule engine that runs before any LLM response generation. No API calls. First-match-wins ordered rules:

Priority Rule Trigger
0 Prompt injection System commands, known injection phrases
1 Visa fraud domain=visa AND request_type=fraud
2 Billing dispute Dispute-specific keywords in ticket text
3 Account compromise Compromise keywords (domain-scoped)
4 Legal / compliance Legal trigger words in ticket text
5 No corpus docs found Retrieved docs list is empty
6 Low confidence Classifier confidence < 0.35

Catching English and French prompt injections was a deliberate design choice via strict regex patterns to prevent LLM manipulation.

4. Responder (agent/responder.py)

Generates corpus-grounded replies using the multi-provider cascade.

System prompt enforces strict grounding:

  • Answer ONLY from the provided support documents.
  • If answer not found in context, decline gracefully without hallucinating.
  • Cite document title or URL when providing specific instructions.

Post-generation PII & Hallucination check: Flags responses containing emails or phone numbers that do not lexically overlap with the retrieved corpus. Blocks unverified PII leaks to enforce strict data privacy.

5. Multi-Provider LLM Cascade & API Rotator (utils/model_provider.py & utils/api_rotator.py)

Three-tier cascade with automatic failover to handle extreme loads and prevent pipeline crashing:

             Azure OpenAI
                    │ fail / quota
                    ▼
      Gemini 2.0 Flash (rotating keys)
                    │ RESOURCE_EXHAUSTED 429
                    ▼
         Groq Llama-3 (final fallback)

GeminiRotator is a thread-safe singleton that round-robins across multiple API keys loaded from the environment. Uses threading.Lock for safe concurrent access.

6. Parallel Orchestration (main.py)

  • ThreadPoolExecutor(max_workers=8) for concurrent ticket processing.
  • Results written iteratively to CSV as each ticket completes, preventing data loss on mid-run failures.
  • Final output sorted by ticket_id for deterministic, evaluable ordering.

Design Decisions & Honest Tradeoffs

Decision Rationale Tradeoff
BM25 over embeddings Zero cost, sub-5ms latency, fast on keyword-heavy corpus Weaker on abstract paraphrased queries
Rule-based safety gate Zero LLM cost, zero probabilistic variance Keyword matching can miss novel phrasings
Multi-Provider Cascade Guarantees pipeline survives rate-limits Code complexity increases
Iterative I/O Writing Guarantees data isn't lost if thread crashes Requires final sort step

Files Changed

code/
├── main.py                    # CLI entry, ThreadPoolExecutor pipeline
├── agent/
│   ├── classifier.py          # Gemini JSON-mode zero-shot classifier
│   ├── safety.py              # Deterministic escalation engine
│   └── responder.py           # Grounded reply generator
├── corpus/
│   ├── loader.py              # BM25 index builder + Intent search
│   └── scraper.py             # Offline corpus refresh utility
├── utils/
│   ├── model_provider.py      # 3-tier LLM cascade
│   ├── api_rotator.py         # Thread-safe Gemini key rotator
│   ├── logger.py              # Structured log writer
│   ├── live_scraper.py        # Real-time external link scraper
│   └── analyze_results.py     # Post-run stats generator
└── tests/
    └── test_agent.py          # Unit tests

Environment Variables Required

See code/.env.example for the required injection template:

# Comma-separated list of Gemini API keys for the thread-safe rotator
GEMINI_API_KEYS=key1,key2,key3

# Optional: Azure OpenAI credentials for primary cascade
AZURE_OPENAI_API_KEY=your_azure_key
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
AZURE_OPENAI_DEPLOYMENT_NAME=your_deployment

# Optional: Groq fallback
GROQ_API_KEY=your_groq_key

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant