Asr fixes by jdye64 · Pull Request #2147 · NVIDIA/NeMo-Retriever

jdye64 · 2026-05-28T14:44:10Z

Description

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

When source-branch is empty, use the version input as the git checkout ref so Helm and other artifacts match the tagged release instead of the workflow UI branch. Only overlay ci/scripts when workflow and source refs differ.

copy-pr-bot · 2026-05-28T14:44:14Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

greptile-apps · 2026-05-28T14:49:24Z

Greptile Summary

This PR introduces configurable ASR inference mode (auto/online/offline) to the Parakeet client, aligns the Helm chart and docker-compose Parakeet NIM_TAGS_SELECTOR to the streaming profile (mode=str), parameterises the OCR NIM service name in the Helm chart, cleans up a file containing unresolved git merge conflict markers (nemotron-ocr-v2.yaml), and refactors _build_ingestor to unify service- and graph-mode extraction wiring behind a shared _attach_extract_stage helper.

ASR infer-mode: AudioInferMode + resolve_audio_infer_mode() added to parakeet.py; ASRParams.audio_infer_mode wired end-to-end through asr_actor.py → ParakeetClient, with three new targeted test cases.
Helm/docker-compose parity: NIM_TAGS_SELECTOR updated consistently in both surfaces; OCR nimServiceName externalised via values.yaml so configmap.yaml and NOTES.txt resolve dynamically.
Service-mode ingestor refactor: _params_to_dict now uses exclude_unset=True to prevent client-side model defaults from tripping the server policy allowlist; _wire_client_stage_params and _filter_policy_allowed centralise that logic across dedup, embed, extract, and store stages.

Confidence Score: 5/5

Safe to merge; all changes are additive or corrective with corresponding tests, and docker-compose/Helm parity is maintained.

The ASR infer-mode feature is well-tested with three targeted parametric cases and a clear fallback to streaming for the auto default. The Helm refactoring is conservative (externalising a hardcoded name with the same default). The exclude_unset=True change in _params_to_dict is intentional and validated by new regression tests. The only gap is that _split_config_for_input_type doesn't produce entries for txt/html, so text-chunk params would be silently dropped in service mode for those input types — but this path had no prior wiring at all, making it a missing feature rather than a regression.

nemo_retriever/src/nemo_retriever/pipeline/__main__.py — the _split_config_for_input_type function should be revisited to clarify whether the service API supports chunking for txt/html extraction modes.

Important Files Changed

Filename	Overview
nemo_retriever/src/nemo_retriever/api/internal/primitives/nim/model_interface/parakeet.py	Adds `AudioInferMode` type alias, `resolve_audio_infer_mode()`, and `infer_mode` parameter to `ParakeetClient` and `create_audio_inference_client`, enabling offline Riva RPC alongside the existing streaming path. Logic is clean; the `_StreamingResponseShim` return shape is compatible with the offline `RecognizeResponse`.
nemo_retriever/src/nemo_retriever/pipeline/main.py	Refactors `_build_ingestor` to unify service and graph extraction paths via `_attach_extract_stage`; adds service-mode caption wiring with a helpful `--caption-invoke-url` warning. Gap: `_split_config_for_input_type` doesn't handle `txt`/`html`, silently dropping chunk params in service mode for those input types.
nemo_retriever/src/nemo_retriever/service_ingestor.py	Introduces `_wire_client_stage_params` and `_filter_policy_allowed` helpers; changes `_params_to_dict` to `exclude_unset=True` so only caller-set fields are forwarded to the service, preventing model defaults from tripping the server allowlist. Clean refactoring with well-documented intent.
nemo_retriever/helm/templates/nims/nemotron-ocr-v2.yaml	File deleted; it contained unresolved git merge conflict markers and was not usable — correct cleanup.
nemo_retriever/helm/values.yaml	Adds `nimServiceName: nemotron-ocr-v1` under `nimOperator.ocr` and updates the Parakeet `NIM_TAGS_SELECTOR` from `mode=ofl` to `mode=str,vad=default,diarizer=disabled`, matching the docker-compose.yaml change.
nemo_retriever/tests/test_parakeet_infer_mode.py	New test file covering `resolve_audio_infer_mode` resolution, rejection of unknown modes, and all three dispatch paths (streaming self-hosted, streaming NVCF, offline explicit). Good coverage for the new infer-mode feature.

Sequence Diagram

sequenceDiagram
    participant CLI as CLI / __main__.py
    participant AI as _attach_extract_stage
    participant SI as ServiceIngestor
    participant WCS as _wire_client_stage_params
    participant PC as ParakeetClient
    participant Riva as Riva gRPC

    CLI->>AI: run_mode, input_type, extract_params, infer_mode, ...
    AI->>SI: extract(extract_params, split_config, extraction_mode)
    SI->>WCS: "merge params + filter allowlist (exclude_unset=True)"
    WCS-->>SI: policy-filtered params dict
    SI-->>CLI: ServiceIngestor (fluent)

    Note over CLI,PC: Audio path (local graph mode)
    CLI->>PC: transcribe(audio_content)
    alt "infer_mode == offline"
        PC->>Riva: offline_recognize(mono_audio_bytes, config)
        Riva-->>PC: RecognizeResponse
    else "infer_mode == online / auto"
        PC->>Riva: StreamingRecognize(chunked PCM)
        Riva-->>PC: streaming results
        PC-->>PC: _StreamingResponseShim(.results)
    end
    PC-->>CLI: response

Prompt To Fix All With AI

Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
nemo_retriever/src/nemo_retriever/pipeline/__main__.py:531-543
**`txt`/`html` split_config silently dropped in service mode**

`_split_config_for_input_type` handles `pdf`/`doc`, `audio`, and `video`, but returns `None` for `txt` and `html`. When a user runs `--input-type txt --enable-text-chunk` in service mode, `chunk_dict` is computed correctly from `_service_text_chunk_dict`, but `_split_config_for_input_type("txt", chunk_dict)` returns `None`, so `ServiceIngestor.extract()` is called with `split_config=None` and the chunking params are silently discarded without any error or warning. If the service API supports `split_config` for text/html extraction (with a key like `"text"` or `"html"`), these cases should be added here; otherwise a warning or `NotImplementedError` should be raised so the caller knows the option has no effect in this mode.

_{Reviews (2): Last reviewed commit: "Revert back to nemotron-ocr-v1 since nem..." | Re-trigger Greptile}

greptile-apps · 2026-05-28T14:49:29Z

+def resolve_audio_infer_mode(mode: str, endpoint: str) -> ResolvedAudioInferMode:
+    """Pick offline vs streaming Riva RPC for a Parakeet endpoint.
+
+    NVCF (``grpc.nvcf.nvidia.com``) and the Helm chart Parakeet NIM (``mode=str``)
+    register streaming (online) models. Use ``audio_infer_mode='offline'`` only when
+    the NIM was deployed with an offline profile (``mode=ofl``).
+    """
+    normalized = (mode or "auto").lower()
+    if normalized == "online":
+        return "online"
+    if normalized == "offline":
+        return "offline"
+    if normalized != "auto":
+        raise ValueError(f"audio_infer_mode must be 'auto', 'online', or 'offline', got {mode!r}")
+    return "online"


Dead endpoint parameter in public function

resolve_audio_infer_mode accepts endpoint: str but never uses it — the auto branch unconditionally returns "online" regardless of the endpoint value. This misleads callers (and the docstring, which explicitly mentions NVCF endpoint detection) into believing the endpoint influences the decision. Any code passing a different endpoint expecting different behavior will be silently surprised. If endpoint-based auto-detection is planned for the future, a # TODO comment would clarify intent; if it was intentionally dropped, the parameter should be removed from the signature to keep the public contract honest.

Rule Used: Every public class and function in nemo_retriever ... (source)

Prompt To Fix With AI

This is a comment left during a code review. Path: nemo_retriever/src/nemo_retriever/api/internal/primitives/nim/model_interface/parakeet.py Line: 56-70 Comment: **Dead `endpoint` parameter in public function** `resolve_audio_infer_mode` accepts `endpoint: str` but never uses it — the `auto` branch unconditionally returns `"online"` regardless of the endpoint value. This misleads callers (and the docstring, which explicitly mentions NVCF endpoint detection) into believing the endpoint influences the decision. Any code passing a different endpoint expecting different behavior will be silently surprised. If endpoint-based auto-detection is planned for the future, a `# TODO` comment would clarify intent; if it was intentionally dropped, the parameter should be removed from the signature to keep the public contract honest. **Rule Used:** Every public class and function in nemo_retriever ... ([source](https://app.greptile.com/review/custom-context?memory=public-api-contract)) How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

greptile-apps · 2026-05-28T14:49:41Z

@@ -1,25 +1,10 @@
 {{- if and (and (.Capabilities.APIVersions.Has "apps.nvidia.com/v1alpha1") .Values.nims.enabled) (eq .Values.nimOperator.ocr.enabled true) -}}
-<<<<<<< HEAD
-{{- $name := .Values.nimOperator.ocr.nimServiceName | default "nemotron-ocr-v2" -}}
+{{- $name := .Values.nimOperator.ocr.nimServiceName | default "nemotron-ocr-v1" -}}


The file is named nemotron-ocr-v2.yaml, yet the default fallback for $name is "nemotron-ocr-v1". With values.yaml now explicitly setting nimServiceName: nemotron-ocr-v1, the default is rarely hit, but anyone deploying the Helm chart without setting nimServiceName will get a resource named nemotron-ocr-v1 from a template file named v2, which is confusing. The default should match what the file's name implies, or the file should be renamed.

Suggested change

{{- $name := .Values.nimOperator.ocr.nimServiceName | default "nemotron-ocr-v1" -}}

{{- $name := .Values.nimOperator.ocr.nimServiceName | default "nemotron-ocr-v2" -}}

Prompt To Fix With AI

This is a comment left during a code review. Path: nemo_retriever/helm/templates/nims/nemotron-ocr-v2.yaml Line: 2 Comment: The file is named `nemotron-ocr-v2.yaml`, yet the default fallback for `$name` is `"nemotron-ocr-v1"`. With `values.yaml` now explicitly setting `nimServiceName: nemotron-ocr-v1`, the default is rarely hit, but anyone deploying the Helm chart without setting `nimServiceName` will get a resource named `nemotron-ocr-v1` from a template file named `v2`, which is confusing. The default should match what the file's name implies, or the file should be renamed. ```suggestion {{- $name := .Values.nimOperator.ocr.nimServiceName | default "nemotron-ocr-v2" -}} ``` How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

… downloading the model profile from NGC

jdye64 added 2 commits May 28, 2026 14:41

fix(ci): checkout release ref from version in Perform Release

69b2477

When source-branch is empty, use the version input as the git checkout ref so Helm and other artifacts match the tagged release instead of the workflow UI branch. Only overlay ci/scripts when workflow and source refs differ.

Helm fixes

3d9ef90

jdye64 requested review from a team as code owners May 28, 2026 14:44

jdye64 requested review from charlesbluca and removed request for a team May 28, 2026 14:44

greptile-apps Bot reviewed May 28, 2026

View reviewed changes

Revert back to nemotron-ocr-v1 since nemotron-ocr-v2 is having issues…

559579d

… downloading the model profile from NGC

jperez999 approved these changes May 28, 2026

View reviewed changes

jdye64 merged commit 2df85c5 into NVIDIA:26.05 May 28, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Asr fixes#2147

Asr fixes#2147
jdye64 merged 3 commits into
NVIDIA:26.05from
jdye64:asr-fixes

jdye64 commented May 28, 2026

Uh oh!

copy-pr-bot Bot commented May 28, 2026

Uh oh!

greptile-apps Bot commented May 28, 2026 •

edited

Loading

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot May 28, 2026

Uh oh!

greptile-apps Bot May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	{{- $name := .Values.nimOperator.ocr.nimServiceName \| default "nemotron-ocr-v1" -}}
	{{- $name := .Values.nimOperator.ocr.nimServiceName \| default "nemotron-ocr-v2" -}}

Conversation

jdye64 commented May 28, 2026

Description

Checklist

Uh oh!

copy-pr-bot Bot commented May 28, 2026

Uh oh!

greptile-apps Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps Bot commented May 28, 2026 •

edited

Loading