From 7752efd0b6d99eca32a29b8ae0f9f04b7a6f8c63 Mon Sep 17 00:00:00 2001 From: Deepak Chander <172876867+DeepakChander@users.noreply.github.com> Date: Sun, 17 May 2026 13:51:33 +0530 Subject: [PATCH] docs(ai): add AI Co-pilot system design (agentic RAG for CA) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adapts the Agentic RAG architecture (Coordinator → Retrieval → Rerank → Context → Generation → Validation → Export) to a CA firm's needs: - Per-firm Qdrant collection (structural tenant isolation) - Two-collection retrieval: firm corpus + shared regulatory corpus - New Compliance Critic agent (validates statute, math, UDIN status) - Citation-must-resolve-to-primary-source rule - Stakes-tiered validation (Low / Medium / High / Blocked) - UDIN + DSC gates (AI never marks output "final") - India-resident inference only (Qwen 2.5 via vLLM on Indian GPU) - Mock providers kept for zero-GPU local dev - 6 use cases mapped: Doc AI, Ask-your-books, Notice intelligence, Certificate draft, WP draft, Article training Q&A - Phase 3A–3F sub-phasing plan with exit criteria Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/README.md | 1 + docs/architecture/ai-copilot-system-design.md | 539 ++++++++++++++++++ 2 files changed, 540 insertions(+) create mode 100644 docs/architecture/ai-copilot-system-design.md diff --git a/docs/README.md b/docs/README.md index ce30733..cceae65 100644 --- a/docs/README.md +++ b/docs/README.md @@ -49,6 +49,7 @@ If you have 10 minutes, read these in order: - [Data model](./architecture/data-model.md) - [Security and data residency](./architecture/security-and-data-residency.md) - [Deployment](./architecture/deployment.md) +- [AI Co-pilot — system design](./architecture/ai-copilot-system-design.md) ### Integrations - [Tally integration](./integrations/tally-integration.md) diff --git a/docs/architecture/ai-copilot-system-design.md b/docs/architecture/ai-copilot-system-design.md new file mode 100644 index 0000000..05f5908 --- /dev/null +++ b/docs/architecture/ai-copilot-system-design.md @@ -0,0 +1,539 @@ +# AI Co-pilot — System Design + +This document is the concrete architectural blueprint for the **Phase 3 AI Co-pilot** module. It adapts the proven **Agentic RAG** pattern (Coordinator → Retrieval → Rerank → Context → Generation → Validation → Export) to the realities of an Indian CA firm: regulatory citations, ICAI rules, UDIN/DSC, DPDP Act, India-only inference, and strict per-firm + per-client tenant isolation. + +Read alongside: +- [`../features/ai-copilot.md`](../features/ai-copilot.md) — product-facing capabilities +- [`tech-stack.md`](./tech-stack.md) — overall stack +- [`security-and-data-residency.md`](./security-and-data-residency.md) — encryption, residency +- [`../compliance/dpdp-act.md`](../compliance/dpdp-act.md) — privacy obligations +- [`../compliance/icai-regulatory.md`](../compliance/icai-regulatory.md) — UDIN / certificate rules + +--- + +## 1. Why an agentic system, not a single LLM call + +A CA firm's questions decompose into multiple steps that benefit from **explicit planning, retrieval, validation, and repair**: + +> "Show me all RCM transactions in Q1 FY26 that haven't been offset against ITC, and check if any are at risk of being denied under Sec 17(5)." + +This is: +- A retrieval task (find vouchers + ledger rows) +- A calculation task (offset math) +- A legal classification task (Sec 17(5) check) +- A grounding task (cite each claim to a primary source) +- A validation task (numbers must match the books exactly) + +A single LLM call would hallucinate at any of these steps. An agent graph lets us **isolate the failure mode, validate each step, and repair before showing the user.** + +--- + +## 2. High-level architecture + +``` +┌──────────────────────────────────────────────────────────────────────────────┐ +│ ONLINE PIPELINE │ +│ │ +│ User query ──► Coordinator ──► Retrieval ──► Rerank ──► Context Builder │ +│ │ │ │ +│ │ ▼ │ +│ │ ┌──────────────┐ │ +│ │ │ Context Critic│ │ +│ │ ┌─── retry ◄──────────┤ sufficient? │ │ +│ │ ▼ └──────┬───────┘ │ +│ │ Retrieval (refined) │ │ +│ │ ▼ │ +│ │ ┌──────────────┐ │ +│ ▼ │ Generation │ │ +│ ┌──────────────┐ │ (Qwen / vLLM)│ │ +│ │ Cost & Audit │ └──────┬───────┘ │ +│ │ Tracker │ │ │ +│ └──────┬───────┘ ▼ │ +│ │ ┌──────────────┐ │ +│ │ │ Validation │ │ +│ │ ┌── repair ◄──────────┤ + Grounding │ │ +│ │ ▼ └──────┬───────┘ │ +│ │ Generation (refined) │ │ +│ │ ▼ │ +│ │ ┌──────────────┐ │ +│ │ │ Compliance │ │ +│ │ │ Critic (CA) │ │ +│ │ └──────┬───────┘ │ +│ │ ▼ │ +│ │ ┌──────────────┐ │ +│ └──────────────────────────────────► Export Agent│ │ +│ └──────┬───────┘ │ +│ │ │ +└───────────────────────────────────────────────────────────────┼──────────────┘ + ▼ + Structured output + + citations + + confidence score + + audit log entry +``` + +``` +┌──────────────────────────────────────────────────────────────────────────────┐ +│ OFFLINE INDEXING PIPELINE │ +│ │ +│ Source events │ +│ ├─ Tally voucher created/modified (via Bridge) │ +│ ├─ Document uploaded to vault (WhatsApp/email/web/mobile) │ +│ ├─ Return filed (GSTR/ITR/MCA) │ +│ ├─ Notice received + OCR'd │ +│ ├─ Regulatory corpus update (CBIC/CBDT/MCA/ICAI feed) │ +│ │ │ +│ ▼ │ +│ Ingestion Service │ +│ │ │ +│ ▼ │ +│ ┌─────────────┐ ┌──────────────┐ ┌────────────┐ ┌───────────────┐ │ +│ │ Clean & │ ─► │ Chunker │ ─► │ Embedder │ ─► │ Per-firm │ │ +│ │ Normalize │ │ (semantic + │ │ (BGE-M3, │ │ Qdrant │ │ +│ │ (redact) │ │ structural) │ │ India) │ │ collection │ │ +│ └─────────────┘ └──────────────┘ └────────────┘ └───────────────┘ │ +│ ▲ │ +│ │ │ +│ Postgres metadata │ +│ (jobs, hashes, lineage) │ +└──────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## 3. Tenant isolation — the most important design rule + +A leaked answer from Firm A's books to Firm B = end of business. Isolation must be **structural**, not just a metadata filter. + +| Boundary | Mechanism | +|----------|-----------| +| **Per-firm Qdrant collection** | One collection per firm (e.g., `firm_42_chunks`). Query agent literally cannot query another firm's collection — different connection scope. | +| **Per-client filter** | Inside a firm's collection, every chunk's payload has `client_id`. Retrieval API hard-rejects queries that don't specify `client_id` for client-scoped questions. | +| **Per-firm Postgres schema (or strict RLS)** | Job table, metadata, lineage — same row-level security model as the rest of the app. | +| **Per-firm KMS data key** | Stored chunk text is encrypted with the firm's DEK. Even an ops engineer with DB access cannot read it. | +| **Per-firm AI quota + cost ledger** | Every generation call attributed to a firm; quota enforced at the gateway. | +| **No cross-firm training, ever** | All foundation models used in inference-only mode; provider contracts explicitly forbid training on our traffic. | +| **No cross-firm embeddings** | Even though embedding space is mathematically shared, no chunk is co-located with another firm's chunk in any index. | + +**Regulatory corpus is the exception**: ICAI Standards, GST law, IT Act, MCA forms, CBIC/CBDT circulars are public and shared. These live in a single global collection `regulatory_corpus` that every firm can query. Per-firm collections never contain a copy of public law. + +--- + +## 4. Agent inventory + +| Agent | Responsibility | Implementation | +|-------|---------------|----------------| +| **Coordinator** | Parses the user request, classifies the use case, plans the agent graph, routes execution, tracks cost + audit | Python class with deterministic dispatch + LLM fallback for ambiguous queries | +| **Retrieval** | Query rewriting, filter construction, Qdrant search; returns top-K candidates from per-firm collection + (optionally) regulatory corpus | Async service; talks to Qdrant HTTP API | +| **Reranker** | Re-orders candidates by semantic match to the rewritten query; trims to top-N | BGE Reranker; CPU-fine to start, GPU when volume demands | +| **Context Builder** | Merges chunks, deduplicates, preserves source IDs, formats for the model's context window | Pure Python; no LLM | +| **Context Critic** | "Is this context sufficient to answer?" If not, requests a refined retrieval (different filter / different query) | Small LLM call (Qwen-1.5B or rules + heuristics) | +| **Generation** | Produces the structured output (JSON) with inline citations | Qwen 7B/14B via vLLM on India GPU host | +| **Validation** | Schema validation, duplicate removal, citation resolution, hallucination check (every cited source must exist) | Deterministic + small LLM for nuanced checks | +| **Compliance Critic** *(CA-specific)* | Verifies cited sections are current law, cross-checks numbers against books, flags UDIN-bearing outputs, enforces "draft for partner review" labels | Rules engine + curated knowledge base + LLM for narrative | +| **Export** | Writes the result to the right destination (DB, audit log, queue for partner review, draft document), emits webhook | Pure Python; no LLM | +| **Cost & Audit Tracker** | Logs every step's tokens, latency, model, and outcome to the firm's audit log; enforces per-firm quota | Cross-cutting middleware | + +Each agent is **independently testable** with mock providers — see [§ 12](#12-local-development). + +--- + +## 5. The six CA use cases and the agents they activate + +| Use case | Coordinator plan | Critical agent | +|----------|------------------|----------------| +| **Doc AI extraction** (invoice, bank statement, Form 16, contract) | Skip retrieval; structured extraction directly from uploaded doc | Generation + Validation (schema-strict) | +| **Ask your books** | Retrieval (firm + client filter) → Generation | Context Critic (often retries) | +| **Notice intelligence** | OCR notice → Retrieval (regulatory corpus + client's filing history) → Generation → draft reply | Compliance Critic (must classify correctly) | +| **Auto-draft certificate** | Retrieval (client's books + template + relevant sections) → Generation → UDIN gate | Compliance Critic (UDIN-bearing flag) | +| **Audit working paper draft** | Retrieval (client's books + SAs + prior year WPs) → Generation → cross-link to assertions | Validation (must cite SA + WP ref) | +| **Article training Q&A** | Retrieval (firm's prior WPs + ICAI material) → Generation (MCQs) → difficulty calibration | Validation (no duplicates, balanced topics) | + +Same chassis. Six different specialised Coordinator plans + prompt templates. + +--- + +## 6. Indexing pipeline — what gets embedded + +### 6.1 Sources, by priority + +| Source | Frequency | Chunk strategy | +|--------|-----------|----------------| +| Tally voucher (sales, purchase, journal, etc.) | Streamed via Bridge, near-real-time | One chunk per voucher with structured payload (date, party, amount, GST split, narration) | +| Bank statement entry | Real-time on upload + recon match | One chunk per row, joined with matched voucher | +| Document vault entry (invoice, contract, board resolution) | On upload | Chunk per page OR semantic chunk; OCR if PDF/image | +| Filed return (GSTR, ITR, MCA) | On filing | Whole-return chunk + per-table chunks | +| Notice received | On upload, after OCR | Full notice + extracted entities (section, demand, due date) | +| Communication thread (email, WhatsApp) | On ingest | Per-message; PII-redacted at chunk time | +| Working paper templates + firm's prior WPs | On upload / on save | Semantic chunks per heading | +| **Regulatory corpus** (global, shared) | Updated weekly via curated feed | Per-section chunks; one collection serves all firms | + +### 6.2 Chunk payload + +Every chunk has a structured payload: + +```json +{ + "chunk_id": "v_42_abc_v0234_p0", + "firm_id": "firm_42", + "client_id": "client_ABC", // null for regulatory chunks + "source_type": "tally_voucher", + "source_id": "V/0234", + "doc_hash": "sha256:...", + "fy": "2026-27", + "period": "2026-Q1", + "date": "2026-04-15", + "tags": ["RCM", "purchase", "GST"], + "version": 3, + "indexed_at": "2026-05-17T08:00:00Z", + "ciphertext_ref": "s3://...firm_42/v_42_abc_v0234.enc" +} +``` + +The plaintext is not stored in Qdrant; only the embedding + payload. Plaintext lives encrypted in S3, fetched on demand by Context Builder using the firm's DEK. + +### 6.3 Reindexing triggers + +- Voucher edited → reindex that voucher's chunk + dependent recons +- Return filed → new chunk for the return + invalidate stale "draft return" chunks +- Document re-uploaded with new content_hash → reindex + supersede prior +- Regulatory update (new circular, rate change) → reindex affected sections of corpus + flag dependent firm-side answers as "may be stale, re-verify" + +--- + +## 7. Online generation pipeline — detail + +### 7.1 Coordinator decision tree (simplified) + +``` +incoming query + ├─ classify intent + │ ├─ "extract X from this doc" → Doc AI plan + │ ├─ "tell me about my books" → Ask-your-books plan + │ ├─ "respond to this notice" → Notice plan + │ ├─ "draft a certificate" → Certificate plan + │ ├─ "draft working paper" → WP plan + │ └─ "create training questions" → Training plan + ├─ check firm AI quota → reject / queue / proceed + ├─ check feature flag for this firm / client → may be disabled + ├─ stamp request with audit context (user, IP, time, model intended) + └─ dispatch plan +``` + +### 7.2 Retrieval with two-collection query + +For most CA queries, retrieve from **two** Qdrant collections in parallel: +- `firm_42_chunks` — the firm's own data, filtered by client_id and period +- `regulatory_corpus` — global, filtered by relevant tags (GST, IT, MCA, ICAI) + +Results are merged before reranking. This gives the LLM both **what the books say** and **what the law says** in the same context. + +### 7.3 Rerank + +BGE Reranker scores each chunk against the rewritten query. Keep top-N where N is dictated by model context budget (e.g., 20 chunks ≈ 8K tokens with margin for 32K context). + +### 7.4 Context Builder + +- Sort by source type then date +- Inline source IDs as bracketed citations: `[V/0234]`, `[Sec 17(5)(g)]`, `[GSTR-3B 2026-04]` +- Strip / mask PII not relevant to the query (Aadhaar, full bank a/c numbers) +- Hard-cap at the model's safe context window + +### 7.5 Context Critic + +Asks: *given this query and this assembled context, can a reasonable answer be produced with cited grounding?* + +If no: +- Identify what's missing (e.g., "no Q1 GSTR-3B chunk in context") +- Request refined retrieval with explicit hints (e.g., add `period:2026-Q1 AND source_type:gstr_3b`) +- Retry up to N=3 times; then surface a "context insufficient" response to the user + +This loop is what makes the system *agentic*, not just RAG. + +### 7.6 Generation + +- Qwen 7B (latency-sensitive) or Qwen 14B (quality-sensitive) via vLLM +- Output: strict JSON schema per use case +- Temperature: 0.0–0.2 (we want determinism, not creativity) +- Token budget: capped per call; overruns abort with a clear error +- Streaming for interactive flows ("Ask your books"); batched for bulk (training Q&A generation) + +### 7.7 Validation + +Three layers: + +| Layer | Checks | +|-------|--------| +| **Schema** | JSON parses; required fields present; types correct | +| **Grounding** | Every cited source_id resolves to a real chunk; cited fact appears in that chunk; numbers in the output match a primary source within rounding tolerance | +| **Quality** | No duplicates; difficulty matches request (training Q&A); language matches firm preference; no PII leaked beyond scope | + +Failures trigger a **repair loop**: regenerate with the validator's error appended to the prompt. Repair budget: 2 attempts. After that, hand the partially-validated output to the user with explicit flags. + +### 7.8 Compliance Critic (CA-specific) + +Distinct from generic validation. Asks: + +- Are cited statutory sections still in force as of the query date? (Cross-check against a curated "effective period" index of sections.) +- Are numerical thresholds current? (₹2 Cr turnover threshold for tax audit, ₹5 Cr for e-invoicing, etc.) +- Is this output a **certificate** (UDIN-bearing)? If yes, force the "draft for partner review" label; never mark "final". +- Is this output making a **legal opinion** that requires partner sign-off? +- Does the output cite the firm's prior position on a similar matter and contradict it? (Surface the inconsistency for partner review.) + +### 7.9 Export + +- Persist output + full chain (query → retrieved chunks → generated → validated → critic notes) +- Write structured audit log entry +- Update firm's AI usage ledger (cost, tokens, latency) +- Webhook to UI / mobile / queue partner notification + +--- + +## 8. Citation model — "must resolve to a primary source" + +Every claim in the output must cite a `source_id` from a **primary source category**: + +| Category | Example source_id format | +|----------|--------------------------| +| Tally voucher | `voucher://firm_42/client_ABC/V0234` | +| Bank statement row | `bank://firm_42/client_ABC/HDFC-2026-04/row-127` | +| Filed return | `gstr3b://firm_42/client_ABC/2026-04` | +| Form-16 | `form16://firm_42/client_ABC/employee_E10/AY-2026-27` | +| Notice | `notice://firm_42/client_ABC/DRC01-2026-05` | +| Statutory section | `statute://CGST/sec-17/5/g` | +| ICAI standard | `icai://SA-230/para-8` | +| MCA form schema | `mca://AOC-4/v3/clause-12` | +| CBIC circular | `cbic://circular-204-2023` | + +A citation that fails to resolve in our authoritative index = hallucination. Validation rejects the output. + +In the UI, every citation is **clickable** — partner sees the original voucher / section / notice in a side panel. + +--- + +## 9. UDIN / DSC integration + +| Output type | AI's role | Final step | +|-------------|-----------|------------| +| Internal note, anomaly flag, ask-your-books answer | Final-ish (user can act on it) | None | +| Notice reply draft | Draft | Partner edits + signs (DSC) | +| Working paper draft | Draft | Preparer edits → reviewer → partner | +| Certificate (15CB, turnover, net worth) | Draft | Partner reviews → generates **UDIN** → signs (DSC) → files | + +The AI never: +- Generates a UDIN +- Invokes a DSC +- Marks an output as "final" or "issued" + +These transitions are gated by an explicit partner action in the UI, recorded in audit log, often with a step-up MFA challenge. + +--- + +## 10. Validation tier by "stakes" (replaces EdTech's "difficulty") + +| Stakes | Examples | Min context recall | Min repair attempts | Compliance Critic strictness | +|--------|----------|--------------------|---------------------|------------------------------| +| Low | Internal Q&A, search summary, "explain this voucher" | 0.6 | 1 | Lenient | +| Medium | Working paper draft, notice classification | 0.8 | 2 | Strict | +| High | Certificate draft, return draft, opinion memo | 0.95 | 3 (then abort) | Maximum + cross-check | +| Blocked | Final UDIN-bearing issuance, DSC-bound output | — | — | AI cannot produce final; partner gate enforced | + +The Coordinator picks the tier at plan time based on intent classification. + +--- + +## 11. Model & infrastructure choices + +| Layer | Choice | Why | +|-------|--------|-----| +| **Generation LLM** | Qwen 2.5 7B (hot path) + Qwen 2.5 14B (high-stakes) | Open weights, strong on structured outputs, can self-host in India | +| **Inference server** | vLLM | High throughput, batching, KV cache reuse | +| **GPU host** | NVIDIA H100 / GH200 in **Tata Comm / Yotta / E2E Cloud / NxtGen** (Indian) | India residency; tier-1 datacentre | +| **Embeddings** | BGE-M3 | Multilingual (English + Hindi + others), strong on technical text | +| **Embedding host** | CPU initially → small GPU later | Embedding is sporadic, CPU OK to ~1000 chunks/min | +| **Reranker** | BGE Reranker v2 | Industry-standard, good quality/cost | +| **Vector DB** | Qdrant | Open source, strong filters, in-memory + on-disk modes | +| **Vector DB host** | Self-hosted on EKS in ap-south-1 | Avoid cross-region calls; data residency | +| **Metadata DB** | Postgres (same as main app) | Co-locate with existing infra; transactional joins | +| **Queue** | Celery + Redis (existing infra) | Reuse | +| **API** | FastAPI for AI service | Async, typed, fits Python ML ecosystem | +| **Gateway** | LiteLLM in front of vLLM (and optional hosted fallback) | Provider-agnostic routing, cost tracking, retries | +| **Provider fallback** | Hosted Indian providers (Sarvam, Krutrim, hosted Qwen) | Capacity overflow; never out of region | + +We **explicitly avoid**: +- US-region OpenAI / Anthropic / Gemini for any client data +- Any provider that trains on inputs +- Any provider that doesn't sign a DPA with India residency clauses + +For the *firm's own training corpus* (article Q&A use case), we have more flexibility — but defaults stay India-resident for consistency. + +--- + +## 12. Local development + +The system **must run with zero GPU** on a developer laptop, or velocity dies. + +Mock providers (provided in the AI service): +- `MockGenerationProvider` — returns deterministic JSON shaped to the expected schema +- `MockEmbeddingProvider` — returns hashed-to-vector embeddings (consistent across runs) +- `MockReranker` — returns input order, slightly shuffled +- `MockQdrant` — in-memory dict keyed by collection + +Switched via env var `AI_PROVIDER=mock|local-vllm|prod-vllm`. + +Docker Compose spec for local dev: +``` +services: + api # FastAPI + postgres # metadata + redis # queue + qdrant # vector store + ai-mock # python service implementing the mock providers +``` + +This setup is the single biggest reason we keep the EdTech architecture as-is — it already nailed dev ergonomics. + +--- + +## 13. Cost model + +Per-call cost = embedding cost + retrieval cost (~free at our scale) + LLM tokens × price. + +| Use case | Avg tokens in | Avg tokens out | Cost / call (₹) at hosted-India pricing | +|----------|---------------|----------------|-----------------------------------------| +| Doc AI extraction | 4K | 0.5K | ~₹1–3 | +| Ask your books | 6K | 0.5K | ~₹2–4 | +| Notice intelligence | 10K | 2K | ~₹6–12 | +| Certificate draft | 8K | 1.5K | ~₹5–10 | +| WP draft | 12K | 3K | ~₹10–20 | +| Training Q&A bulk (20 Qs) | 8K | 4K | ~₹15–25 | + +Self-hosted GPU amortises differently — fixed ~₹1L–1.5L/month for the GPU, marginal cost approaches zero. Crossover happens at ~50K calls/month → switch from hosted to self-hosted. + +Per-firm quota enforcement in Coordinator → never silently blow the budget. + +--- + +## 14. Data residency in summary + +| Asset | Location | +|-------|----------| +| Qdrant collections | AWS Mumbai (ap-south-1), encrypted volumes | +| Postgres metadata | AWS Mumbai | +| Plaintext chunks | S3 Mumbai, per-firm KMS DEK | +| Audit log of AI calls | AWS Mumbai, 8-year retention | +| Inference GPUs | Indian cloud (Tata Comm / Yotta / E2E / NxtGen) | +| Hosted-provider fallback | India-region endpoints only | +| Embedding model weights | Cached in Mumbai region | + +No bytes of any firm's client data ever leave India. + +--- + +## 15. Failure modes & their handling + +| Failure | Detection | Handling | +|---------|-----------|----------| +| Hallucinated citation | Citation doesn't resolve in index | Repair loop; if fails, return partial output with flag | +| Wrong number (doesn't match books) | Compliance Critic cross-check | Repair; if fails, abort and surface to partner | +| Stale statutory citation | Compliance Critic effective-period check | Replace with current; flag old | +| Context window exceeded | Pre-flight token count | Aggressive rerank → reduce N; if still over, return "scope too broad" | +| Model unavailable / vLLM down | Health check fails | Fallback to hosted Indian provider; if both down, queue request, notify firm | +| Embedding model down | Indexer health check | Pause new indexing, alert; existing queries unaffected | +| Qdrant down | Retrieval timeout | Open circuit; fall back to keyword search (Postgres / OpenSearch) with a "degraded mode" badge | +| Per-firm quota exceeded | Pre-call check | Hard stop with clear message + upgrade link | +| Compliance Critic disagrees with Generation | Different outputs | Always trust Critic; force repair or partner review | +| PII leaked into output | Validation scanner | Block output, raise incident, audit | +| Adversarial prompt from user | Prompt injection detector | Strip / refuse; log; rate-limit user | +| Tenant boundary error (chunk from wrong firm) | Pre-output check that all chunk firm_ids match request | Hard abort, paged alert, freeze for incident review | + +--- + +## 16. Phasing plan — sub-phases of Phase 3 + +This is what actually ships, in order: + +### Phase 3A — Doc AI (extraction only) +- Generation + Validation agents +- No retrieval (single doc in context) +- Use case: structured extraction from invoices, bank statements, Form 16, contracts +- Lowest hallucination risk; builds trust +- **Exit criteria**: extraction accuracy >92% on a curated test set + +### Phase 3B — Ask your books (retrieval-only) +- Full agent chain except output is read-only summarisation with citations +- No drafting / no certificate generation +- **Exit criteria**: citation resolution rate ≥98%; partner NPS for the feature ≥40 + +### Phase 3C — Notice intelligence +- Adds OCR-on-notice + classification + draft reply +- Compliance Critic must be solid before this ships +- **Exit criteria**: classification accuracy ≥95% on top-10 notice types + +### Phase 3D — Auto-draft certificates +- UDIN gate enforced +- Stakes-tier "High" validation +- **Exit criteria**: zero "incorrect certificate filed" incidents in 90-day beta + +### Phase 3E — Audit working paper drafting +- WP chunking + SA cross-linking +- Reviewer/partner workflow integration +- **Exit criteria**: partner-acceptance rate of generated WP sections ≥70% + +### Phase 3F — Article training Q&A +- The original EdTech use case +- Lower stakes; can ship earlier if firm-side demand exists +- **Exit criteria**: educator (CA-mentor) approval rating ≥75% + +--- + +## 17. Mapping to the Agentic RAG reference architecture + +For the record, this design **inherits**: + +| From EdTech Agentic RAG | Kept | Changed for CA | +|--------------------------|------|----------------| +| Coordinator agent | ✅ | Plus per-firm quota, audit context, stakes tiering | +| Retrieval + filter + Qdrant search | ✅ | Plus two-collection (firm + regulatory) parallel retrieval | +| BGE-M3 embeddings | ✅ | India-hosted; multilingual including Hindi | +| BGE Reranker | ✅ | Same | +| Context Builder | ✅ | Plus PII redaction at chunk-time | +| Context Critic feedback loop | ✅ | Same — this is the agentic heart | +| Generation via Qwen / vLLM | ✅ | India-hosted GPU only | +| Validation (grounding + dup-removal) | ✅ | Plus citation must resolve to primary source | +| Repair loop | ✅ | Stakes-tier-aware budget | +| Export to DB + API | ✅ | Plus audit log + UDIN/DSC gate | +| Mock providers for dev | ✅ | Same — kept exactly | +| FastAPI + Postgres + Qdrant + Celery | ✅ | Same — exact stack alignment | + +**Newly added agents and rules** specifically for CA: +- Compliance Critic agent +- Stakes tiering system +- UDIN / DSC enforcement gate +- Two-collection retrieval (firm + regulatory) +- Citation-must-resolve-to-primary-source rule +- Per-firm Qdrant collection (structural isolation) +- Per-firm KMS-encrypted chunk plaintext +- 8-year audit retention per ICAI +- Regulatory corpus versioning + "may be stale" propagation + +--- + +## 18. Open questions to settle before Phase 3 build + +1. **Self-hosted vLLM vs hosted Qwen** — when do we flip? Decide at ~50K calls/month milestone. +2. **Qwen 7B vs 14B for high-stakes outputs** — benchmark on Notice intelligence + Certificate drafting before locking in. +3. **Whether to fine-tune** Qwen on a firm's prior outputs (premium tier feature?) — defer until 100+ firms on AI plan. +4. **Document AI: rules + ML hybrid vs pure ML** — likely hybrid (forms have known layouts), but quantify. +5. **Indian GPU provider selection** — RFP at start of Phase 3. +6. **Reranker on GPU vs CPU** — start CPU, move to GPU at ~100K reranks/day. +7. **Multi-language outputs** (Hindi, Tamil, Gujarati for client portal) — Phase 4 by default; revisit if customer demand earlier. + +--- + +## 19. What this doc does not cover + +- Detailed prompt templates (lives in code, not docs) +- Specific JSON schemas per use case (also in code; auto-generated docs from schemas) +- Test harness design for AI quality regression (separate doc: `ai-quality-testing.md`, to be written when build begins) +- Pricing of AI as a premium add-on (in [`../business/pricing.md`](../business/pricing.md))