Production Deployment 20-05-2026 by KDwevedi · Pull Request #67 · OpenAgriNet/amul-oan-api

KDwevedi · 2026-05-20T03:29:21Z

Prod sync from main. 14 commits flowing in, no amul-prod-only work to preserve (verified origin/main..origin/amul-prod is empty content-wise).

Included

feat: sticky Redis %-split between OSS and legacy chat pipelines #66 OSS sticky %-split — Redis-backed per-session split between OSS (vLLM gemma + translategemma) and legacy pipelines. Defaults to OSS_PIPELINE_PCT=0, prod behaviour unchanged. Validated end-to-end on dev (100% OSS clean, 100% legacy clean).
fix(prompt): constrain Output Style to UI-renderable Markdown subset #65 prompt: constrain Output Style to UI-renderable Markdown subset
first question now only runs suggestions agent once #64 fix: suggestions agent double-run race (read-only GET + sticky :pending)
added FE telemetry to Langfuse #63 FE telemetry → Langfuse (with auth + bounded inputs)
milk detals structure update+prompt #61 milk-collection deterministic markdown tables + prompt contract
feat(prompt): strengthen species-default rule (v2) #60 / feat(prompt): default to cattle/buffalo when species unspecified #59 prompt: species-default → cattle/buffalo + strengthen rule
feat(glossary): add વાવા (calf), રેલી (heifer); extend કાચી gap-match #58 glossary: વાવા (calf), રેલી (heifer), કાચી gap-match
feat(pretranslation): filter type=ask + refine glossary (v5) #57 / feat: refine pretranslation glossary v4 (close last issues) #56 / feat: refine pretranslation glossary v3 (fix v2 regressions) #55 / feat: expand pretranslation glossary +15 dairy terms #54 pretranslation glossary v2–v5 + filter type=ask
perf(fcm-auth): verify Firebase tokens against multiple projects concurrently #45 perf: verify Firebase tokens against multiple projects concurrently

Rollout

Will be deployed via ~/amul-infra/scripts/amul-oan-api-deploy.sh on prod VM3 after this merges.

Post-deploy verification

Health probe responds 200
Anonymous traffic in Langfuse chat-production (no errors)
OSS_PIPELINE_PCT env unset (defaults 0) → all traces show variant:legacy

Adds 15 entries identified by Gemma 4 LLM-as-judge on a 400-pair pretranslation eval (7.5% error rate); also expands aliases on 2 existing entries. Includes the production-observed bichdan -> "seeding" mistranslation (correct: artificial insemination), plus other common Gujarati dairy idioms (vetar=in heat, tharvu=conceive, maati khasvi=prolapse, kaachi=fresh cow, kaandh aavvi=yoke sores, kapaasiyo=cottonseed cake, etc). Total: 14 -> 29 entries. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ossary-v2 feat: expand pretranslation glossary +15 dairy terms

The v2 glossary expansion (PR #54) fixed 22 issues but caused 10 regressions. This v3 patch addresses the regression patterns: - Drop the सanwhile_kar entry: 'સંકર' (crossbred) fuzzy-matched 'શંકર' (Shankar, a personal name) and over-rewrote 'Shankar cow' as 'crossbred'. - Tighten ઉથલા: remove રેલી/રેલ aliases — those are used colloquially to mean 'buffalo' more often than 'repeat breeder'. - Tighten વેતર: keep only the full phrases ('વેતરે આવેલ', 'ડુટો પાક્યો', 'હાંહ આવી'); the bare 'વેતર' was fuzzy-matching 'વેતરી' (first-parity calver), which has the opposite meaning. - Broaden પાથરી: context-dependent — urinary context = stones, GI context = straining/tenesmus. - Broaden કરમોડી: context-dependent — hoof context = foot rot, skin context = ringworm, otherwise = lameness. New entries (2): - હડકવા → rabies/hydrophobia (was being mis-translated as FMD or TB) - દામ્યા → dehorning/disbudding (was being mis-translated as 'branding') Total: 29 -> 30 entries. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ossary-v3 feat: refine pretranslation glossary v3 (fix v2 regressions)

Iterative pass after PR #55. Re-eval showed error rate 2.0 percent remaining (8/400). This patch closes the last unique problem patterns: - Add p-aaho entry: 'paho nahi mukti' = milk let-down failure (not nursing the calf), often when calf has died — was being mistranslated as bottle-nipple or social rejection. - Add Shankar cow entry: preserve 'Shankar' as a regional breed proper noun — was being normalized to 'Sahiwal' or 'crossbred'. - Tighten karmodi: now defaults to 'lameness' unless explicit hoof / skin anatomical cues appear in the same sentence (was over-applying 'foot rot'). Total: 30 -> 32 entries. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ossary-v4 feat: refine pretranslation glossary v4 (close last issues)

Two fixes: 1. Code: get_ambiguity_hints_for_query now takes include_ask=False, and the pretranslation call site passes that flag. Without it, the pretranslator was injecting the molhotu (udder) ask rule into its own system prompt and dutifully appending the clarifying question to the English translation (judge flagged this as a hallucinated follow-up). The agent call site keeps include_ask=True (it actually needs those rules to ask the question for real). 2. Glossary: refine 2 entries to fix v4 regressions seen in row 142 (paho) and row 166 (Shankar). - paho: clarify that the literal meaning is 'not letting the calf suckle / approach the udder', not just 'milk let-down', so the translator can pick whichever phrasing fits the sentence. - Shankar cow: narrow the trigger to require the noun phrase (s-shankar gay or s-shankar gayo); drop the v-vachhardi alias. Rule now spells out that bare s-shankar / sankar paired with vachhardi or in breed-standards questions = generic 'crossbred', while s-shankar+ goay (e.g. buying/selling a specific animal) = the proper-noun Shankar cow. Total: 32 entries (unchanged from v4, just refined). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ak-v5 feat(pretranslation): filter type=ask + refine glossary (v5)

…#58) Row-by-row pretranslation eval flagged 3 rows where TranslateGemma misread Gujarati dairy colloquialisms: - વાવા / વાવાની → was rendered as proper name 'Vava'; should be 'calf' - રેલી → was rendered as proper name 'Relli'; should be 'heifer' - કાચી → existing entry exists but match failed on 'ગાય X દિવસ કાચી' pattern (partial_ratio=66 < threshold=80). Added 7 gap-tolerant multi-word gu_terms so the existing rule fires. Closes the last 3 wrong-verdict rows from pretranslation_per_row_grading_400.csv (voice 136, 154, 187). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Goldenset row 4 ("What is the right age for castration?") was contradictory between dev (answered for goats: 3-4 months) and prod (answered for cattle: 6-9 months). Since Amul AI's primary audience is dairy farmers and cattle/buffalo are the default operational context, dev should match prod's behavior here. Adds a "Species Defaulting Rule" section to the system prompt that instructs the model to assume cow/buffalo when no animal is named, while explicitly preserving correct behavior when the user names a non-cattle species (e.g. row 37 "diseases in goats?"). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…60) v1 (PR #59) placed the rule near the end of the prompt (char 29221 of 36450). Verification showed Gemma 4 31B IT still answered for goats on the castration question — RAG returned goat-dominant docs and the model anchored on them. Iteration: - Move the rule directly under ## Mission (top of prompt) so it gets more attention. - Mark as (HIGH PRIORITY). - Add explicit example ("castration → bull calves 6-9 months, NOT kids"). - Add rule for retrieval-dominated-by-other-species case: prefer cattle guidance even when docs lean toward goats/sheep. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* first question now only runs suggestions agent once * increased SUGGESTIONS_WAIT_TIMEOUT_SECONDS

* added FE telemetry to Langfuse * added auth and bounded inputs to telemetry endpoint * telemetry: document ingest input-bound env keys in example.env The 6 TELEMETRY_INGEST_MAX_* tunables added with the input-bounding fix were in config.py but missing from example.env. Document them (commented, with defaults) for parity. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: KDwevedi <kanav11dwevedi@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…urrently (#45) * perf(fcm-auth): verify Firebase tokens against multiple projects concurrently The README explicitly flags this hot-path latency issue: "/auth/webview-url validates FCM tokens by trying configured Firebase service accounts sequentially with dry_run=True, which adds avoidable per-request latency when multiple projects are configured." verify_fcm_token() previously walked _firebase_apps in order, paying a full Firebase round-trip for every project that didn't own the token before reaching the one that did. With N projects this is O(N · T) worst-case latency on the auth path. This change adds verify_fcm_token_async() which schedules the per-app dry_run sends as concurrent threads (asyncio.to_thread + as_completed) and returns on first success, cancelling the rest. With N projects the worst case becomes O(T) when the user's token belongs to any configured project. - The Firebase Admin SDK only ships a synchronous messaging.send, so each per-app check still goes though a worker thread; the async wrapper is just the coordination layer that lets them race. - The original sync verify_fcm_token() is kept for back-compat (now delegates to a small _verify_against_app_sync helper) so any callers outside the FastAPI dependency chain keep working unchanged. - require_fcm_token() drops its outer asyncio.to_thread wrap and calls verify_fcm_token_async() directly — one fewer thread hop per request. tests/test_fcm_auth.py: 7 hermetic tests (Firebase mocked at the per-app primitive). Covers sync back-compat, async first-success, all-reject, no-apps-configured, and explicit timing assertions for parallelism (<0.30s for two 0.20s checks) and short-circuit (<0.20s when a fast acceptor races a 0.50s rejector). No public API or response shape changes. * fcm-auth: harden as_completed race against a task raising If one per-app verification task raises an unexpected exception, the as_completed loop would propagate it and abort the race before other projects (which might accept the token) are observed. Wrap each await in try/except so 'any success wins' holds regardless; a real CancelledError of this coroutine still propagates (BaseException). Adds a regression test: one task raises while another accepts -> True. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: KDwevedi <kanav11dwevedi@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…65) The chat UI (OAN-UI card-bubble) renders responses with react-markdown + remark-gfm but overrides only p/ol/ul/li/strong. Headings (#/##/###) flatten to body text, GFM tables render as unstyled smashed columns, and LaTeX ($\times$) / *** HR leak as raw text to the farmer. Replaces the vague "No unnecessary headings" bullet with explicit constraints: bold/bullets/numbered lists/paragraphs only; no headings, tables, HR, or math. Use **bold:** labels instead of headings, bullets instead of tables, × instead of $\times$. Eval evidence (sme_review_400 / Shridhar OSS eval): 40 chat rows emit ### headings, 5 emit GFM tables, chat51 leaks literal $\times$. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…lines (#66) Adds a per-session sticky router that sends OSS_PIPELINE_PCT% of sessions to the OSS pipeline (vLLM gemma agent + translategemma pre/post-translation, matching dev) while the rest stay on the legacy pipeline. Variant is deterministic by session_id hash, persisted in shared Redis, fail-safe to the deterministic hash on Redis error. With OSS_PIPELINE_PCT=0 every session is 'legacy' and behaviour is byte-identical to today. - agents/models.py: additive OSS model factory (get_model_for_variant / provider_for_variant); never raises at import if OSS env absent. - app/services/pipeline_router.py: sticky variant resolver. - app/services/chat.py: per-request model + provider branch + variant langfuse tags; OSS implies translation pipeline. - app/services/translation.py: per-request OSS vLLM pretranslation override (legacy path untouched when provider=None). - app/config.py: OSS_PIPELINE_PCT (default 0) + OSS endpoint settings. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

KDwevedi and others added 17 commits May 13, 2026 16:58

Merge pull request #54 from OpenAgriNet/feat/expand-pretranslation-gl…

e3edfd3

…ossary-v2 feat: expand pretranslation glossary +15 dairy terms

Merge pull request #55 from OpenAgriNet/feat/refine-pretranslation-gl…

eeebd6b

…ossary-v3 feat: refine pretranslation glossary v3 (fix v2 regressions)

Merge pull request #56 from OpenAgriNet/feat/refine-pretranslation-gl…

0b2a110

…ossary-v4 feat: refine pretranslation glossary v4 (close last issues)

Merge pull request #57 from OpenAgriNet/feat/pretranslation-no-ask-le…

4459ac6

…ak-v5 feat(pretranslation): filter type=ask + refine glossary (v5)

milk detals structure update+prompt (#61)

aeea9b4

first question now only runs suggestions agent once (#64)

baa43ed

* first question now only runs suggestions agent once * increased SUGGESTIONS_WAIT_TIMEOUT_SECONDS

KDwevedi merged commit b76159d into amul-prod May 20, 2026

KDwevedi mentioned this pull request May 20, 2026

promote: moderation carve-out for ભાવફેર / PD / dividend explainers #73

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Production Deployment 20-05-2026#67

Production Deployment 20-05-2026#67
KDwevedi merged 17 commits into
amul-prodfrom
main

KDwevedi commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

KDwevedi commented May 20, 2026

Included

Rollout

Post-deploy verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants