feat: AI Support Triage Agent — BM25 RAG + Deterministic Safety Gate + Multi-Provider Cascade by HarshavardhanVemali · Pull Request #35 · interviewstreet/hackerrank-orchestrate-may26

HarshavardhanVemali · 2026-05-02T05:47:48Z

Overview

This PR delivers a terminal-based AI support triage agent for the HackerRank Orchestrate 2026 challenge. The system processes support tickets across three domains — HackerRank, Claude AI, and Visa — using a local BM25 corpus, a deterministic safety engine, and a multi-provider LLM cascade.

Final evaluation results:

Total tickets processed : 29 / 29 (100%)
Replied (automated) : 24 (82.8%)
Escalated (safe routing): 5 (17.2%)
Unhandled errors : 0
Throughput : ~14.5 tickets/minute

All responses are grounded exclusively in the pre-built local corpus (data/hackerrank, data/claude, data/visa). No live web requests are made during evaluation.

Architecture

    ┌──────────────────────────────┐
    │      support_tickets.csv     │
    └───────────────┬──────────────┘
                    │
                    ▼
┌────────────────────────────────────────┐
│            main.py                     │
│     ThreadPoolExecutor                 │
│                                        │
│  1. corpus/loader.py  — BM25 index     │
│  2. classifier.py     — domain + type  │
│  3. corpus/loader.py  — BM25 search    │
│  4. safety.py         — rule engine    │
│  5. responder.py      — grounded reply │
│  6. logger.py         — log.txt write  │
└────────────────────────────────────────┘
                    │
                    ▼
            output.csv + log.txt

Key Components

1. Corpus Loader & BM25 Retrieval (`corpus/loader.py`)

The corpus is pre-built by scraping the three support sites offline. During evaluation, only the local data/ directory is used, no network calls.

Parses .md, .json, and .txt corpus files recursively.
Builds a BM25Okapi index in RAM at startup (~770 documents total).
search() supports domain filtering and Intent Boosting (multiplying priority terms like "refund" or "mock" by 100x).
Domain Inference Fallback: If the classifier returns unknown, infers the domain by majority vote over the top-5 retrieved documents.

Design decision: BM25 was chosen over vector embeddings because the support corpus is keyword-heavy (technical terms, product names, URLs), retrieval is lightning fast (sub-5ms), and there is zero additional API cost.

2. Classifier (`agent/classifier.py`)

Zero-shot classification using the Gemini API with JSON mode enforced.

Returns: domain, request_type, product_area, confidence.
Structured JSON output enforced via response_mime_type: application/json.
Falls back to Classification(domain="unknown", confidence=0.0) on any API error, safely triggering escalation.

3. Safety Gate (`agent/safety.py`)

Deterministic rule engine that runs before any LLM response generation. No API calls. First-match-wins ordered rules:

Priority	Rule	Trigger
0	Prompt injection	System commands, known injection phrases
1	Visa fraud	domain=visa AND request_type=fraud
2	Billing dispute	Dispute-specific keywords in ticket text
3	Account compromise	Compromise keywords (domain-scoped)
4	Legal / compliance	Legal trigger words in ticket text
5	No corpus docs found	Retrieved docs list is empty
6	Low confidence	Classifier confidence < 0.35

Catching English and French prompt injections was a deliberate design choice via strict regex patterns to prevent LLM manipulation.

4. Responder (`agent/responder.py`)

Generates corpus-grounded replies using the multi-provider cascade.

System prompt enforces strict grounding:

Answer ONLY from the provided support documents.
If answer not found in context, decline gracefully without hallucinating.
Cite document title or URL when providing specific instructions.

Post-generation PII & Hallucination check: Flags responses containing emails or phone numbers that do not lexically overlap with the retrieved corpus. Blocks unverified PII leaks to enforce strict data privacy.

5. Multi-Provider LLM Cascade & API Rotator (`utils/model_provider.py` & `utils/api_rotator.py`)

Three-tier cascade with automatic failover to handle extreme loads and prevent pipeline crashing:

             Azure OpenAI
                    │ fail / quota
                    ▼
      Gemini 2.0 Flash (rotating keys)
                    │ RESOURCE_EXHAUSTED 429
                    ▼
         Groq Llama-3 (final fallback)

GeminiRotator is a thread-safe singleton that round-robins across multiple API keys loaded from the environment. Uses threading.Lock for safe concurrent access.

6. Parallel Orchestration (`main.py`)

ThreadPoolExecutor(max_workers=8) for concurrent ticket processing.
Results written iteratively to CSV as each ticket completes, preventing data loss on mid-run failures.
Final output sorted by ticket_id for deterministic, evaluable ordering.

Design Decisions & Honest Tradeoffs

Decision	Rationale	Tradeoff
BM25 over embeddings	Zero cost, sub-5ms latency, fast on keyword-heavy corpus	Weaker on abstract paraphrased queries
Rule-based safety gate	Zero LLM cost, zero probabilistic variance	Keyword matching can miss novel phrasings
Multi-Provider Cascade	Guarantees pipeline survives rate-limits	Code complexity increases
Iterative I/O Writing	Guarantees data isn't lost if thread crashes	Requires final sort step

Files Changed

code/
├── main.py                    # CLI entry, ThreadPoolExecutor pipeline
├── agent/
│   ├── classifier.py          # Gemini JSON-mode zero-shot classifier
│   ├── safety.py              # Deterministic escalation engine
│   └── responder.py           # Grounded reply generator
├── corpus/
│   ├── loader.py              # BM25 index builder + Intent search
│   └── scraper.py             # Offline corpus refresh utility
├── utils/
│   ├── model_provider.py      # 3-tier LLM cascade
│   ├── api_rotator.py         # Thread-safe Gemini key rotator
│   ├── logger.py              # Structured log writer
│   ├── live_scraper.py        # Real-time external link scraper
│   └── analyze_results.py     # Post-run stats generator
└── tests/
    └── test_agent.py          # Unit tests

Environment Variables Required

See code/.env.example for the required injection template:

# Comma-separated list of Gemini API keys for the thread-safe rotator
GEMINI_API_KEYS=key1,key2,key3

# Optional: Azure OpenAI credentials for primary cascade
AZURE_OPENAI_API_KEY=your_azure_key
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
AZURE_OPENAI_DEPLOYMENT_NAME=your_deployment

# Optional: Groq fallback
GROQ_API_KEY=your_groq_key

… retrieval

…-Provider Cascade, API Rotator)

…, updated metrics charts

…zontally

HarshavardhanVemali added 11 commits May 1, 2026 22:28

feat: complete support triage agent with parallel RAG and multi-stage…

8a0b1f3

… retrieval

docs: resize performance images and include missing project artifacts

112307d

docs: standardize chart heights in README and fix requirements typo

e360b18

Finalizing Support Triage Pipeline (100% Automation, PII Guard, Multi…

9c34f26

…-Provider Cascade, API Rotator)

Finalizing Submission: Added domain inference, exact image dimensions…

257220b

…, updated metrics charts

Added code/.env.example for evaluators

296ef7c

chore: updated gitignore for local artifacts

e15925e

chore: remove baseline_output_backup.csv

716332e

chore: updated model strings to gemini-2.5-flash

a5beaf4

fix: standardized chart figsize to (8,6) for perfect markdown alignment

dd7c69b

docs: re-ordered results charts to align identical aspect ratios hori…

4e372d5

…zontally

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: AI Support Triage Agent — BM25 RAG + Deterministic Safety Gate + Multi-Provider Cascade#35

feat: AI Support Triage Agent — BM25 RAG + Deterministic Safety Gate + Multi-Provider Cascade#35
HarshavardhanVemali wants to merge 11 commits into
interviewstreet:mainfrom
HarshavardhanVemali:main

HarshavardhanVemali commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Conversation

HarshavardhanVemali commented May 2, 2026

Overview

Architecture

Key Components

1. Corpus Loader & BM25 Retrieval (corpus/loader.py)

2. Classifier (agent/classifier.py)

3. Safety Gate (agent/safety.py)

4. Responder (agent/responder.py)

5. Multi-Provider LLM Cascade & API Rotator (utils/model_provider.py & utils/api_rotator.py)

6. Parallel Orchestration (main.py)

Design Decisions & Honest Tradeoffs

Files Changed

Environment Variables Required

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

1. Corpus Loader & BM25 Retrieval (`corpus/loader.py`)

2. Classifier (`agent/classifier.py`)

3. Safety Gate (`agent/safety.py`)

4. Responder (`agent/responder.py`)

5. Multi-Provider LLM Cascade & API Rotator (`utils/model_provider.py` & `utils/api_rotator.py`)

6. Parallel Orchestration (`main.py`)