Skip to content

Production Deployment 20-05-2026#67

Merged
KDwevedi merged 17 commits into
amul-prodfrom
main
May 20, 2026
Merged

Production Deployment 20-05-2026#67
KDwevedi merged 17 commits into
amul-prodfrom
main

Conversation

@KDwevedi
Copy link
Copy Markdown
Collaborator

Prod sync from main. 14 commits flowing in, no amul-prod-only work to preserve (verified origin/main..origin/amul-prod is empty content-wise).

Included

Rollout

Will be deployed via ~/amul-infra/scripts/amul-oan-api-deploy.sh on prod VM3 after this merges.

Post-deploy verification

  • Health probe responds 200
  • Anonymous traffic in Langfuse chat-production (no errors)
  • OSS_PIPELINE_PCT env unset (defaults 0) → all traces show variant:legacy

KDwevedi and others added 17 commits May 13, 2026 16:58
Adds 15 entries identified by Gemma 4 LLM-as-judge on a 400-pair
pretranslation eval (7.5% error rate); also expands aliases on
2 existing entries.

Includes the production-observed bichdan -> "seeding" mistranslation
(correct: artificial insemination), plus other common Gujarati dairy
idioms (vetar=in heat, tharvu=conceive, maati khasvi=prolapse,
kaachi=fresh cow, kaandh aavvi=yoke sores, kapaasiyo=cottonseed
cake, etc).

Total: 14 -> 29 entries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ossary-v2

feat: expand pretranslation glossary +15 dairy terms
The v2 glossary expansion (PR #54) fixed 22 issues but caused 10 regressions.
This v3 patch addresses the regression patterns:

- Drop the सanwhile_kar entry: 'સંકર' (crossbred) fuzzy-matched 'શંકર'
  (Shankar, a personal name) and over-rewrote 'Shankar cow' as 'crossbred'.
- Tighten ઉથલા: remove રેલી/રેલ aliases — those are used colloquially
  to mean 'buffalo' more often than 'repeat breeder'.
- Tighten વેતર: keep only the full phrases ('વેતરે આવેલ',
  'ડુટો પાક્યો', 'હાંહ આવી'); the bare 'વેતર' was fuzzy-matching
  'વેતરી' (first-parity calver), which has the opposite meaning.
- Broaden પાથરી: context-dependent — urinary context = stones,
  GI context = straining/tenesmus.
- Broaden કરમોડી: context-dependent — hoof context = foot rot,
  skin context = ringworm, otherwise = lameness.

New entries (2):
- હડકવા → rabies/hydrophobia (was being mis-translated as FMD or TB)
- દામ્યા → dehorning/disbudding (was being mis-translated as 'branding')

Total: 29 -> 30 entries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ossary-v3

feat: refine pretranslation glossary v3 (fix v2 regressions)
Iterative pass after PR #55. Re-eval showed error rate 2.0 percent
remaining (8/400). This patch closes the last unique problem patterns:

- Add p-aaho entry: 'paho nahi mukti' = milk let-down failure
  (not nursing the calf), often when calf has died — was being
  mistranslated as bottle-nipple or social rejection.
- Add Shankar cow entry: preserve 'Shankar' as a regional breed
  proper noun — was being normalized to 'Sahiwal' or 'crossbred'.
- Tighten karmodi: now defaults to 'lameness' unless explicit
  hoof / skin anatomical cues appear in the same sentence
  (was over-applying 'foot rot').

Total: 30 -> 32 entries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ossary-v4

feat: refine pretranslation glossary v4 (close last issues)
Two fixes:

1. Code: get_ambiguity_hints_for_query now takes include_ask=False, and
   the pretranslation call site passes that flag. Without it, the
   pretranslator was injecting the molhotu (udder) ask rule into its
   own system prompt and dutifully appending the clarifying question to
   the English translation (judge flagged this as a hallucinated
   follow-up). The agent call site keeps include_ask=True (it actually
   needs those rules to ask the question for real).

2. Glossary: refine 2 entries to fix v4 regressions seen in row 142
   (paho) and row 166 (Shankar).
   - paho: clarify that the literal meaning is 'not letting the calf
     suckle / approach the udder', not just 'milk let-down', so the
     translator can pick whichever phrasing fits the sentence.
   - Shankar cow: narrow the trigger to require the noun phrase
     (s-shankar gay or s-shankar gayo); drop the v-vachhardi alias.
     Rule now spells out that bare s-shankar / sankar paired with
     vachhardi or in breed-standards questions = generic 'crossbred',
     while s-shankar+ goay (e.g. buying/selling a specific animal) =
     the proper-noun Shankar cow.

Total: 32 entries (unchanged from v4, just refined).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ak-v5

feat(pretranslation): filter type=ask + refine glossary (v5)
…#58)

Row-by-row pretranslation eval flagged 3 rows where TranslateGemma
misread Gujarati dairy colloquialisms:

- વાવા / વાવાની → was rendered as proper name 'Vava'; should be 'calf'
- રેલી → was rendered as proper name 'Relli'; should be 'heifer'
- કાચી → existing entry exists but match failed on 'ગાય X દિવસ કાચી'
  pattern (partial_ratio=66 < threshold=80). Added 7 gap-tolerant
  multi-word gu_terms so the existing rule fires.

Closes the last 3 wrong-verdict rows from
pretranslation_per_row_grading_400.csv (voice 136, 154, 187).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Goldenset row 4 ("What is the right age for castration?") was contradictory
between dev (answered for goats: 3-4 months) and prod (answered for
cattle: 6-9 months). Since Amul AI's primary audience is dairy farmers
and cattle/buffalo are the default operational context, dev should match
prod's behavior here.

Adds a "Species Defaulting Rule" section to the system prompt that
instructs the model to assume cow/buffalo when no animal is named,
while explicitly preserving correct behavior when the user names a
non-cattle species (e.g. row 37 "diseases in goats?").

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…60)

v1 (PR #59) placed the rule near the end of the prompt (char 29221 of
36450). Verification showed Gemma 4 31B IT still answered for goats on
the castration question — RAG returned goat-dominant docs and the model
anchored on them.

Iteration:
- Move the rule directly under ## Mission (top of prompt) so it gets
  more attention.
- Mark as (HIGH PRIORITY).
- Add explicit example ("castration → bull calves 6-9 months, NOT kids").
- Add rule for retrieval-dominated-by-other-species case: prefer cattle
  guidance even when docs lean toward goats/sheep.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* first question now only runs suggestions agent once

* increased SUGGESTIONS_WAIT_TIMEOUT_SECONDS
* added FE telemetry to Langfuse

* added auth and bounded inputs to telemetry endpoint

* telemetry: document ingest input-bound env keys in example.env

The 6 TELEMETRY_INGEST_MAX_* tunables added with the input-bounding
fix were in config.py but missing from example.env. Document them
(commented, with defaults) for parity.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: KDwevedi <kanav11dwevedi@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…urrently (#45)

* perf(fcm-auth): verify Firebase tokens against multiple projects concurrently

The README explicitly flags this hot-path latency issue:
  "/auth/webview-url validates FCM tokens by trying configured Firebase
   service accounts sequentially with dry_run=True, which adds avoidable
   per-request latency when multiple projects are configured."

verify_fcm_token() previously walked _firebase_apps in order, paying a
full Firebase round-trip for every project that didn't own the token
before reaching the one that did. With N projects this is O(N · T)
worst-case latency on the auth path.

This change adds verify_fcm_token_async() which schedules the per-app
dry_run sends as concurrent threads (asyncio.to_thread + as_completed)
and returns on first success, cancelling the rest. With N projects the
worst case becomes O(T) when the user's token belongs to any configured
project.

- The Firebase Admin SDK only ships a synchronous messaging.send, so
  each per-app check still goes though a worker thread; the async
  wrapper is just the coordination layer that lets them race.
- The original sync verify_fcm_token() is kept for back-compat (now
  delegates to a small _verify_against_app_sync helper) so any callers
  outside the FastAPI dependency chain keep working unchanged.
- require_fcm_token() drops its outer asyncio.to_thread wrap and calls
  verify_fcm_token_async() directly — one fewer thread hop per request.

tests/test_fcm_auth.py: 7 hermetic tests (Firebase mocked at the
per-app primitive). Covers sync back-compat, async first-success,
all-reject, no-apps-configured, and explicit timing assertions for
parallelism (<0.30s for two 0.20s checks) and short-circuit
(<0.20s when a fast acceptor races a 0.50s rejector).

No public API or response shape changes.

* fcm-auth: harden as_completed race against a task raising

If one per-app verification task raises an unexpected exception, the
as_completed loop would propagate it and abort the race before other
projects (which might accept the token) are observed. Wrap each
await in try/except so 'any success wins' holds regardless; a real
CancelledError of this coroutine still propagates (BaseException).

Adds a regression test: one task raises while another accepts -> True.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: KDwevedi <kanav11dwevedi@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…65)

The chat UI (OAN-UI card-bubble) renders responses with
react-markdown + remark-gfm but overrides only p/ol/ul/li/strong.
Headings (#/##/###) flatten to body text, GFM tables render as
unstyled smashed columns, and LaTeX ($\times$) / *** HR leak as
raw text to the farmer.

Replaces the vague "No unnecessary headings" bullet with explicit
constraints: bold/bullets/numbered lists/paragraphs only; no
headings, tables, HR, or math. Use **bold:** labels instead of
headings, bullets instead of tables, × instead of $\times$.

Eval evidence (sme_review_400 / Shridhar OSS eval): 40 chat rows
emit ### headings, 5 emit GFM tables, chat51 leaks literal $\times$.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…lines (#66)

Adds a per-session sticky router that sends OSS_PIPELINE_PCT% of sessions to
the OSS pipeline (vLLM gemma agent + translategemma pre/post-translation,
matching dev) while the rest stay on the legacy pipeline. Variant is
deterministic by session_id hash, persisted in shared Redis, fail-safe to the
deterministic hash on Redis error. With OSS_PIPELINE_PCT=0 every session is
'legacy' and behaviour is byte-identical to today.

- agents/models.py: additive OSS model factory (get_model_for_variant /
  provider_for_variant); never raises at import if OSS env absent.
- app/services/pipeline_router.py: sticky variant resolver.
- app/services/chat.py: per-request model + provider branch + variant
  langfuse tags; OSS implies translation pipeline.
- app/services/translation.py: per-request OSS vLLM pretranslation override
  (legacy path untouched when provider=None).
- app/config.py: OSS_PIPELINE_PCT (default 0) + OSS endpoint settings.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants