Conversation
Adds 15 entries identified by Gemma 4 LLM-as-judge on a 400-pair pretranslation eval (7.5% error rate); also expands aliases on 2 existing entries. Includes the production-observed bichdan -> "seeding" mistranslation (correct: artificial insemination), plus other common Gujarati dairy idioms (vetar=in heat, tharvu=conceive, maati khasvi=prolapse, kaachi=fresh cow, kaandh aavvi=yoke sores, kapaasiyo=cottonseed cake, etc). Total: 14 -> 29 entries. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ossary-v2 feat: expand pretranslation glossary +15 dairy terms
The v2 glossary expansion (PR #54) fixed 22 issues but caused 10 regressions. This v3 patch addresses the regression patterns: - Drop the सanwhile_kar entry: 'સંકર' (crossbred) fuzzy-matched 'શંકર' (Shankar, a personal name) and over-rewrote 'Shankar cow' as 'crossbred'. - Tighten ઉથલા: remove રેલી/રેલ aliases — those are used colloquially to mean 'buffalo' more often than 'repeat breeder'. - Tighten વેતર: keep only the full phrases ('વેતરે આવેલ', 'ડુટો પાક્યો', 'હાંહ આવી'); the bare 'વેતર' was fuzzy-matching 'વેતરી' (first-parity calver), which has the opposite meaning. - Broaden પાથરી: context-dependent — urinary context = stones, GI context = straining/tenesmus. - Broaden કરમોડી: context-dependent — hoof context = foot rot, skin context = ringworm, otherwise = lameness. New entries (2): - હડકવા → rabies/hydrophobia (was being mis-translated as FMD or TB) - દામ્યા → dehorning/disbudding (was being mis-translated as 'branding') Total: 29 -> 30 entries. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ossary-v3 feat: refine pretranslation glossary v3 (fix v2 regressions)
Iterative pass after PR #55. Re-eval showed error rate 2.0 percent remaining (8/400). This patch closes the last unique problem patterns: - Add p-aaho entry: 'paho nahi mukti' = milk let-down failure (not nursing the calf), often when calf has died — was being mistranslated as bottle-nipple or social rejection. - Add Shankar cow entry: preserve 'Shankar' as a regional breed proper noun — was being normalized to 'Sahiwal' or 'crossbred'. - Tighten karmodi: now defaults to 'lameness' unless explicit hoof / skin anatomical cues appear in the same sentence (was over-applying 'foot rot'). Total: 30 -> 32 entries. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ossary-v4 feat: refine pretranslation glossary v4 (close last issues)
Two fixes:
1. Code: get_ambiguity_hints_for_query now takes include_ask=False, and
the pretranslation call site passes that flag. Without it, the
pretranslator was injecting the molhotu (udder) ask rule into its
own system prompt and dutifully appending the clarifying question to
the English translation (judge flagged this as a hallucinated
follow-up). The agent call site keeps include_ask=True (it actually
needs those rules to ask the question for real).
2. Glossary: refine 2 entries to fix v4 regressions seen in row 142
(paho) and row 166 (Shankar).
- paho: clarify that the literal meaning is 'not letting the calf
suckle / approach the udder', not just 'milk let-down', so the
translator can pick whichever phrasing fits the sentence.
- Shankar cow: narrow the trigger to require the noun phrase
(s-shankar gay or s-shankar gayo); drop the v-vachhardi alias.
Rule now spells out that bare s-shankar / sankar paired with
vachhardi or in breed-standards questions = generic 'crossbred',
while s-shankar+ goay (e.g. buying/selling a specific animal) =
the proper-noun Shankar cow.
Total: 32 entries (unchanged from v4, just refined).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ak-v5 feat(pretranslation): filter type=ask + refine glossary (v5)
…#58) Row-by-row pretranslation eval flagged 3 rows where TranslateGemma misread Gujarati dairy colloquialisms: - વાવા / વાવાની → was rendered as proper name 'Vava'; should be 'calf' - રેલી → was rendered as proper name 'Relli'; should be 'heifer' - કાચી → existing entry exists but match failed on 'ગાય X દિવસ કાચી' pattern (partial_ratio=66 < threshold=80). Added 7 gap-tolerant multi-word gu_terms so the existing rule fires. Closes the last 3 wrong-verdict rows from pretranslation_per_row_grading_400.csv (voice 136, 154, 187). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Goldenset row 4 ("What is the right age for castration?") was contradictory
between dev (answered for goats: 3-4 months) and prod (answered for
cattle: 6-9 months). Since Amul AI's primary audience is dairy farmers
and cattle/buffalo are the default operational context, dev should match
prod's behavior here.
Adds a "Species Defaulting Rule" section to the system prompt that
instructs the model to assume cow/buffalo when no animal is named,
while explicitly preserving correct behavior when the user names a
non-cattle species (e.g. row 37 "diseases in goats?").
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…60) v1 (PR #59) placed the rule near the end of the prompt (char 29221 of 36450). Verification showed Gemma 4 31B IT still answered for goats on the castration question — RAG returned goat-dominant docs and the model anchored on them. Iteration: - Move the rule directly under ## Mission (top of prompt) so it gets more attention. - Mark as (HIGH PRIORITY). - Add explicit example ("castration → bull calves 6-9 months, NOT kids"). - Add rule for retrieval-dominated-by-other-species case: prefer cattle guidance even when docs lean toward goats/sheep. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* first question now only runs suggestions agent once * increased SUGGESTIONS_WAIT_TIMEOUT_SECONDS
* added FE telemetry to Langfuse * added auth and bounded inputs to telemetry endpoint * telemetry: document ingest input-bound env keys in example.env The 6 TELEMETRY_INGEST_MAX_* tunables added with the input-bounding fix were in config.py but missing from example.env. Document them (commented, with defaults) for parity. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: KDwevedi <kanav11dwevedi@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…urrently (#45) * perf(fcm-auth): verify Firebase tokens against multiple projects concurrently The README explicitly flags this hot-path latency issue: "/auth/webview-url validates FCM tokens by trying configured Firebase service accounts sequentially with dry_run=True, which adds avoidable per-request latency when multiple projects are configured." verify_fcm_token() previously walked _firebase_apps in order, paying a full Firebase round-trip for every project that didn't own the token before reaching the one that did. With N projects this is O(N · T) worst-case latency on the auth path. This change adds verify_fcm_token_async() which schedules the per-app dry_run sends as concurrent threads (asyncio.to_thread + as_completed) and returns on first success, cancelling the rest. With N projects the worst case becomes O(T) when the user's token belongs to any configured project. - The Firebase Admin SDK only ships a synchronous messaging.send, so each per-app check still goes though a worker thread; the async wrapper is just the coordination layer that lets them race. - The original sync verify_fcm_token() is kept for back-compat (now delegates to a small _verify_against_app_sync helper) so any callers outside the FastAPI dependency chain keep working unchanged. - require_fcm_token() drops its outer asyncio.to_thread wrap and calls verify_fcm_token_async() directly — one fewer thread hop per request. tests/test_fcm_auth.py: 7 hermetic tests (Firebase mocked at the per-app primitive). Covers sync back-compat, async first-success, all-reject, no-apps-configured, and explicit timing assertions for parallelism (<0.30s for two 0.20s checks) and short-circuit (<0.20s when a fast acceptor races a 0.50s rejector). No public API or response shape changes. * fcm-auth: harden as_completed race against a task raising If one per-app verification task raises an unexpected exception, the as_completed loop would propagate it and abort the race before other projects (which might accept the token) are observed. Wrap each await in try/except so 'any success wins' holds regardless; a real CancelledError of this coroutine still propagates (BaseException). Adds a regression test: one task raises while another accepts -> True. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: KDwevedi <kanav11dwevedi@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…65) The chat UI (OAN-UI card-bubble) renders responses with react-markdown + remark-gfm but overrides only p/ol/ul/li/strong. Headings (#/##/###) flatten to body text, GFM tables render as unstyled smashed columns, and LaTeX ($\times$) / *** HR leak as raw text to the farmer. Replaces the vague "No unnecessary headings" bullet with explicit constraints: bold/bullets/numbered lists/paragraphs only; no headings, tables, HR, or math. Use **bold:** labels instead of headings, bullets instead of tables, × instead of $\times$. Eval evidence (sme_review_400 / Shridhar OSS eval): 40 chat rows emit ### headings, 5 emit GFM tables, chat51 leaks literal $\times$. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…lines (#66) Adds a per-session sticky router that sends OSS_PIPELINE_PCT% of sessions to the OSS pipeline (vLLM gemma agent + translategemma pre/post-translation, matching dev) while the rest stay on the legacy pipeline. Variant is deterministic by session_id hash, persisted in shared Redis, fail-safe to the deterministic hash on Redis error. With OSS_PIPELINE_PCT=0 every session is 'legacy' and behaviour is byte-identical to today. - agents/models.py: additive OSS model factory (get_model_for_variant / provider_for_variant); never raises at import if OSS env absent. - app/services/pipeline_router.py: sticky variant resolver. - app/services/chat.py: per-request model + provider branch + variant langfuse tags; OSS implies translation pipeline. - app/services/translation.py: per-request OSS vLLM pretranslation override (legacy path untouched when provider=None). - app/config.py: OSS_PIPELINE_PCT (default 0) + OSS endpoint settings. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Prod sync from
main. 14 commits flowing in, noamul-prod-only work to preserve (verifiedorigin/main..origin/amul-prodis empty content-wise).Included
OSS_PIPELINE_PCT=0, prod behaviour unchanged. Validated end-to-end on dev (100% OSS clean, 100% legacy clean).:pending)Rollout
Will be deployed via
~/amul-infra/scripts/amul-oan-api-deploy.shon prod VM3 after this merges.Post-deploy verification
chat-production(no errors)OSS_PIPELINE_PCTenv unset (defaults 0) → all traces showvariant:legacy