Skip to content

Asr fixes#2147

Merged
jdye64 merged 3 commits into
NVIDIA:26.05from
jdye64:asr-fixes
May 28, 2026
Merged

Asr fixes#2147
jdye64 merged 3 commits into
NVIDIA:26.05from
jdye64:asr-fixes

Conversation

@jdye64
Copy link
Copy Markdown
Collaborator

@jdye64 jdye64 commented May 28, 2026

Description

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

jdye64 added 2 commits May 28, 2026 14:41
When source-branch is empty, use the version input as the git checkout
ref so Helm and other artifacts match the tagged release instead of the
workflow UI branch. Only overlay ci/scripts when workflow and source refs differ.
@jdye64 jdye64 requested review from a team as code owners May 28, 2026 14:44
@jdye64 jdye64 requested review from charlesbluca and removed request for a team May 28, 2026 14:44
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 28, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 28, 2026

Greptile Summary

This PR introduces configurable ASR inference mode (auto/online/offline) to the Parakeet client, aligns the Helm chart and docker-compose Parakeet NIM_TAGS_SELECTOR to the streaming profile (mode=str), parameterises the OCR NIM service name in the Helm chart, cleans up a file containing unresolved git merge conflict markers (nemotron-ocr-v2.yaml), and refactors _build_ingestor to unify service- and graph-mode extraction wiring behind a shared _attach_extract_stage helper.

  • ASR infer-mode: AudioInferMode + resolve_audio_infer_mode() added to parakeet.py; ASRParams.audio_infer_mode wired end-to-end through asr_actor.pyParakeetClient, with three new targeted test cases.
  • Helm/docker-compose parity: NIM_TAGS_SELECTOR updated consistently in both surfaces; OCR nimServiceName externalised via values.yaml so configmap.yaml and NOTES.txt resolve dynamically.
  • Service-mode ingestor refactor: _params_to_dict now uses exclude_unset=True to prevent client-side model defaults from tripping the server policy allowlist; _wire_client_stage_params and _filter_policy_allowed centralise that logic across dedup, embed, extract, and store stages.

Confidence Score: 5/5

Safe to merge; all changes are additive or corrective with corresponding tests, and docker-compose/Helm parity is maintained.

The ASR infer-mode feature is well-tested with three targeted parametric cases and a clear fallback to streaming for the auto default. The Helm refactoring is conservative (externalising a hardcoded name with the same default). The exclude_unset=True change in _params_to_dict is intentional and validated by new regression tests. The only gap is that _split_config_for_input_type doesn't produce entries for txt/html, so text-chunk params would be silently dropped in service mode for those input types — but this path had no prior wiring at all, making it a missing feature rather than a regression.

nemo_retriever/src/nemo_retriever/pipeline/__main__.py — the _split_config_for_input_type function should be revisited to clarify whether the service API supports chunking for txt/html extraction modes.

Important Files Changed

Filename Overview
nemo_retriever/src/nemo_retriever/api/internal/primitives/nim/model_interface/parakeet.py Adds AudioInferMode type alias, resolve_audio_infer_mode(), and infer_mode parameter to ParakeetClient and create_audio_inference_client, enabling offline Riva RPC alongside the existing streaming path. Logic is clean; the _StreamingResponseShim return shape is compatible with the offline RecognizeResponse.
nemo_retriever/src/nemo_retriever/pipeline/main.py Refactors _build_ingestor to unify service and graph extraction paths via _attach_extract_stage; adds service-mode caption wiring with a helpful --caption-invoke-url warning. Gap: _split_config_for_input_type doesn't handle txt/html, silently dropping chunk params in service mode for those input types.
nemo_retriever/src/nemo_retriever/service_ingestor.py Introduces _wire_client_stage_params and _filter_policy_allowed helpers; changes _params_to_dict to exclude_unset=True so only caller-set fields are forwarded to the service, preventing model defaults from tripping the server allowlist. Clean refactoring with well-documented intent.
nemo_retriever/helm/templates/nims/nemotron-ocr-v2.yaml File deleted; it contained unresolved git merge conflict markers and was not usable — correct cleanup.
nemo_retriever/helm/values.yaml Adds nimServiceName: nemotron-ocr-v1 under nimOperator.ocr and updates the Parakeet NIM_TAGS_SELECTOR from mode=ofl to mode=str,vad=default,diarizer=disabled, matching the docker-compose.yaml change.
nemo_retriever/tests/test_parakeet_infer_mode.py New test file covering resolve_audio_infer_mode resolution, rejection of unknown modes, and all three dispatch paths (streaming self-hosted, streaming NVCF, offline explicit). Good coverage for the new infer-mode feature.

Sequence Diagram

sequenceDiagram
    participant CLI as CLI / __main__.py
    participant AI as _attach_extract_stage
    participant SI as ServiceIngestor
    participant WCS as _wire_client_stage_params
    participant PC as ParakeetClient
    participant Riva as Riva gRPC

    CLI->>AI: run_mode, input_type, extract_params, infer_mode, ...
    AI->>SI: extract(extract_params, split_config, extraction_mode)
    SI->>WCS: "merge params + filter allowlist (exclude_unset=True)"
    WCS-->>SI: policy-filtered params dict
    SI-->>CLI: ServiceIngestor (fluent)

    Note over CLI,PC: Audio path (local graph mode)
    CLI->>PC: transcribe(audio_content)
    alt "infer_mode == offline"
        PC->>Riva: offline_recognize(mono_audio_bytes, config)
        Riva-->>PC: RecognizeResponse
    else "infer_mode == online / auto"
        PC->>Riva: StreamingRecognize(chunked PCM)
        Riva-->>PC: streaming results
        PC-->>PC: _StreamingResponseShim(.results)
    end
    PC-->>CLI: response
Loading
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
nemo_retriever/src/nemo_retriever/pipeline/__main__.py:531-543
**`txt`/`html` split_config silently dropped in service mode**

`_split_config_for_input_type` handles `pdf`/`doc`, `audio`, and `video`, but returns `None` for `txt` and `html`. When a user runs `--input-type txt --enable-text-chunk` in service mode, `chunk_dict` is computed correctly from `_service_text_chunk_dict`, but `_split_config_for_input_type("txt", chunk_dict)` returns `None`, so `ServiceIngestor.extract()` is called with `split_config=None` and the chunking params are silently discarded without any error or warning. If the service API supports `split_config` for text/html extraction (with a key like `"text"` or `"html"`), these cases should be added here; otherwise a warning or `NotImplementedError` should be raised so the caller knows the option has no effect in this mode.

Reviews (2): Last reviewed commit: "Revert back to nemotron-ocr-v1 since nem..." | Re-trigger Greptile

Comment on lines +56 to +70
def resolve_audio_infer_mode(mode: str, endpoint: str) -> ResolvedAudioInferMode:
"""Pick offline vs streaming Riva RPC for a Parakeet endpoint.
NVCF (``grpc.nvcf.nvidia.com``) and the Helm chart Parakeet NIM (``mode=str``)
register streaming (online) models. Use ``audio_infer_mode='offline'`` only when
the NIM was deployed with an offline profile (``mode=ofl``).
"""
normalized = (mode or "auto").lower()
if normalized == "online":
return "online"
if normalized == "offline":
return "offline"
if normalized != "auto":
raise ValueError(f"audio_infer_mode must be 'auto', 'online', or 'offline', got {mode!r}")
return "online"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Dead endpoint parameter in public function

resolve_audio_infer_mode accepts endpoint: str but never uses it — the auto branch unconditionally returns "online" regardless of the endpoint value. This misleads callers (and the docstring, which explicitly mentions NVCF endpoint detection) into believing the endpoint influences the decision. Any code passing a different endpoint expecting different behavior will be silently surprised. If endpoint-based auto-detection is planned for the future, a # TODO comment would clarify intent; if it was intentionally dropped, the parameter should be removed from the signature to keep the public contract honest.

Rule Used: Every public class and function in nemo_retriever ... (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/api/internal/primitives/nim/model_interface/parakeet.py
Line: 56-70

Comment:
**Dead `endpoint` parameter in public function**

`resolve_audio_infer_mode` accepts `endpoint: str` but never uses it — the `auto` branch unconditionally returns `"online"` regardless of the endpoint value. This misleads callers (and the docstring, which explicitly mentions NVCF endpoint detection) into believing the endpoint influences the decision. Any code passing a different endpoint expecting different behavior will be silently surprised. If endpoint-based auto-detection is planned for the future, a `# TODO` comment would clarify intent; if it was intentionally dropped, the parameter should be removed from the signature to keep the public contract honest.

**Rule Used:** Every public class and function in nemo_retriever ... ([source](https://app.greptile.com/review/custom-context?memory=public-api-contract))

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@@ -1,25 +1,10 @@
{{- if and (and (.Capabilities.APIVersions.Has "apps.nvidia.com/v1alpha1") .Values.nims.enabled) (eq .Values.nimOperator.ocr.enabled true) -}}
<<<<<<< HEAD
{{- $name := .Values.nimOperator.ocr.nimServiceName | default "nemotron-ocr-v2" -}}
{{- $name := .Values.nimOperator.ocr.nimServiceName | default "nemotron-ocr-v1" -}}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The file is named nemotron-ocr-v2.yaml, yet the default fallback for $name is "nemotron-ocr-v1". With values.yaml now explicitly setting nimServiceName: nemotron-ocr-v1, the default is rarely hit, but anyone deploying the Helm chart without setting nimServiceName will get a resource named nemotron-ocr-v1 from a template file named v2, which is confusing. The default should match what the file's name implies, or the file should be renamed.

Suggested change
{{- $name := .Values.nimOperator.ocr.nimServiceName | default "nemotron-ocr-v1" -}}
{{- $name := .Values.nimOperator.ocr.nimServiceName | default "nemotron-ocr-v2" -}}
Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/helm/templates/nims/nemotron-ocr-v2.yaml
Line: 2

Comment:
The file is named `nemotron-ocr-v2.yaml`, yet the default fallback for `$name` is `"nemotron-ocr-v1"`. With `values.yaml` now explicitly setting `nimServiceName: nemotron-ocr-v1`, the default is rarely hit, but anyone deploying the Helm chart without setting `nimServiceName` will get a resource named `nemotron-ocr-v1` from a template file named `v2`, which is confusing. The default should match what the file's name implies, or the file should be renamed.

```suggestion
{{- $name := .Values.nimOperator.ocr.nimServiceName | default "nemotron-ocr-v2" -}}
```

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@jdye64 jdye64 merged commit 2df85c5 into NVIDIA:26.05 May 28, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants