26.05 into main folding pr by jdye64 · Pull Request #2175 · NVIDIA/NeMo-Retriever

jdye64 · 2026-05-29T19:44:53Z

Description

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

Bump docs, Helm chart metadata, and install snippets from 26.03/26.3.0 to the 26.05 line and RC1 tag so published artifacts align with user-facing version references.

upload-artifact flattens nemo_retriever/dist into ./dist on download; use find instead of ./dist/nemo_retriever/dist/*. Re-run failed jobs does not pick up workflow changes — dispatch a new run after merging.

Stage wheels under nemo_retriever/dist in the CI artifact so legacy and new publish paths both work; resolve wheels from multiple download layouts. Handle missing NGC chart (create) and duplicate version (skip) in helm script.

Log pwd, run metadata, directory trees, and glob probe results so CI failures show whether wheels are missing, mis-staged, or the workflow YAML is frozen from an older re-run.

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

…195023, 6195296 (NVIDIA#2074) Co-authored-by: Randy Gelhausen <rgelhau@gmail.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

(cherry picked from commit 13ee35b)

(cherry picked from commit 050331f)

(cherry picked from commit 73f3d5f)

(cherry picked from commit 2524cfb)

…IDIA#2082)

(cherry picked from commit 9330be8)

…(NVBugs 6199005, 6198526) (NVIDIA#2088)

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

…VIDIA#2097)

Signed-off-by: Julio Perez <jperez@nvidia.com> Co-authored-by: Charles Blackmon-Luca <20627856+charlesbluca@users.noreply.github.com> (cherry picked from commit 7130a72)

(cherry picked from commit 744f0bc)

…se (26.05, NVBug 6204537) (NVIDIA#2103)

…ick of NVIDIA#2113) (NVIDIA#2117)

…VIDIA#2133)

…_config note (NVBugs 6218013) (NVIDIA#2126)

Co-authored-by: Julio Perez <37191411+jperez999@users.noreply.github.com>

…ing (NVIDIA#2153)

…DIA#2165)

NVIDIA#2170)

Bring 26.5.0 GA release prep, inprocess default pipeline, service retain_results, and pinned dependencies into main while preserving main's OpenShift docs, GPU CLI tuning, and QA-aligned extraction docs.

copy-pr-bot · 2026-05-29T19:44:57Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

greptile-apps · 2026-05-29T19:48:57Z

Greptile Summary

This is the 26.5.0 GA release fold from the 26.05 branch into main. It promotes the Helm chart and service image to the GA NGC registry, moves nemotron model packages from test-PyPI to production PyPI, and introduces a server-side retain_results flag that lets clients opt-in to keeping raw row payloads in the job tracker instead of discarding them after pipeline completion.

retain_results feature: JobCreateRequest, JobAggregate, WorkItem, and the full streaming chain (ServiceIngestor → RetrieverServiceClient → pipeline pool → job tracker) are updated to propagate a per-job retention flag, with defense-in-depth checks at both the pool layer and the tracker's mark_completed.
Default run_mode flipped to "inprocess": All entry points (GraphIngestor, ingest_documents, resolve_ingest_plan, CLI, harness config) now default to single-process pandas instead of Ray Data; tests are updated accordingly.
Dependency cleanup: nemotron packages move to stable PyPI indexes; langchain-nvidia-ai-endpoints minimum jumps to >=1.4.0; litellm is bounded to <2; langgraph minimum advances to >=1.2.0.

Confidence Score: 3/5

Safe to merge if the team accepts the breaking default-mode change; the retain_results feature is well-tested and structurally sound.

The retain_results feature is implemented consistently across all layers (router → pool → tracker → client) with defense-in-depth guards and good test coverage. The main concern is the intentional but undocumented behavioral breaking change: every existing caller that omits run_mode silently switches from Ray Data to single-process pandas. This affects GraphIngestor(), ingest_documents(), the CLI default, and the harness — all without a deprecation note. Additionally, exact-pinning non-critical packages (urllib3==2.7.0, nltk==3.9.4) can create dependency conflicts for users of the library.

nemo_retriever/src/nemo_retriever/graph_ingestor.py (default run_mode flip) and nemo_retriever/pyproject.toml (exact version pins for non-critical deps).

Important Files Changed

Filename	Overview
nemo_retriever/src/nemo_retriever/service_ingestor.py	Refactors sync/async streaming to thread `retain_results` through the entire call chain; introduces `_ingest_stream_with_retain` as the sync implementation helper. Logic is sound and backward-compatible (default `False`).
nemo_retriever/src/nemo_retriever/service/services/job_tracker.py	Adds `retain_results` flag to `JobAggregate` and `register_job`; `mark_completed` now conditionally discards `result_data` based on the flag. Defense-in-depth approach is correct; `should_retain_results` helper is clean.
nemo_retriever/src/nemo_retriever/service/services/pipeline_pool.py	Wires `retain_results` from `WorkItem` to control `store_result_data` and `mark_completed` calls; contains one redundant `get_job_tracker()` call that should reuse the already-held `tracker` variable.
nemo_retriever/src/nemo_retriever/service/routers/ingest.py	Adds `_GATEWAY_RETAIN_RESULTS_HEADER` and helper functions to propagate the per-job retain flag from gateway to workers; three submission endpoints updated consistently.
nemo_retriever/pyproject.toml	Moves nemotron model packages from test-pypi to PyPI, bumps langchain-nvidia-ai-endpoints to >=1.4.0, tightens litellm to <2, and advances langgraph minimum. Also introduces exact pins for urllib3 and nltk that are overly restrictive for non-critical dependencies.
nemo_retriever/src/nemo_retriever/graph_ingestor.py	Flips default `run_mode` from `"batch"` to `"inprocess"` — an intentional GA release change, but a behavioral breaking change for any existing caller that omitted `run_mode`.
nemo_retriever/src/nemo_retriever/service/client.py	Threads `retain_results` from `aingest_documents_stream` through `_create_job`; new parameter is keyword-only with a `False` default — fully backward-compatible.
nemo_retriever/src/nemo_retriever/service/models/requests.py	Adds `retain_results: bool = Field(default=False)` to `JobCreateRequest`; additive change with sensible default and clear description.
nemo_retriever/helm/values.yaml	Updates service image to the GA NGC registry path with tag `26.5.0` and `IfNotPresent` pull policy — correct for a GA release.
nemo_retriever/helm/Chart.yaml	Version bumped from `26.05-RC1` to `26.5.0` (SemVer-compliant), matching the GA release.
.github/workflows/release-helm.yml	Minor: updates workflow input description example from `26.05-RC1` to `26.5.0` to match the new version format.

Sequence Diagram

sequenceDiagram
    participant Client
    participant RetrieverServiceClient
    participant IngestRouter
    participant JobTracker
    participant PipelinePool
    participant WorkerResultStore

    Client->>RetrieverServiceClient: "aingest_documents_stream(files, retain_results=True)"
    RetrieverServiceClient->>IngestRouter: "POST /v1/ingest/job {expected_documents, retain_results: true}"
    IngestRouter->>JobTracker: "register_job(retain_results=True)"
    JobTracker-->>IngestRouter: job_id
    IngestRouter-->>RetrieverServiceClient: "201 {job_id}"

    RetrieverServiceClient->>IngestRouter: "POST /v1/ingest/job/{job_id}/document"
    IngestRouter->>IngestRouter: _gateway_retain_results_headers(job_id)
    IngestRouter->>PipelinePool: "WorkItem(retain_results=True)"
    PipelinePool->>PipelinePool: run pipeline
    alt "retain_results=True"
        PipelinePool->>WorkerResultStore: store_result_data(item.id, result_data)
    end
    PipelinePool->>JobTracker: mark_completed(result_data if retain_results else None)
    JobTracker->>JobTracker: "rec.result_data = result_data"

    Client->>IngestRouter: "GET /v1/ingest/status/{doc_id}"
    IngestRouter->>JobTracker: get_document(doc_id)
    JobTracker-->>IngestRouter: "DocumentRecord(result_data=[...])"
    IngestRouter-->>Client: result_data present

Prompt To Fix All With AI

Fix the following 4 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 4
nemo_retriever/src/nemo_retriever/service/services/pipeline_pool.py:271-275
Redundant `get_job_tracker()` call — the pool worker already retrieved `tracker` at the start of the try block (line ~257). `tracker_lookup` is always the same singleton object, so this second lookup is unnecessary. Using the existing `tracker` variable is cleaner and avoids a second function call inside the hot worker loop.

```suggestion
                retain_results = item.retain_results
                if not retain_results and item.job_id:
                    if tracker is not None:
                        retain_results = tracker.should_retain_results(item.job_id)
```

### Issue 2 of 4
nemo_retriever/pyproject.toml:55-58
Exact version pins for `urllib3` and `nltk` are overly restrictive for non-critical dependencies. The project's `dependency-management-uv` rule specifies that only critical packages (PyTorch, CUDA-related) should be pinned exactly; others should use `>=x.y.z` range specifiers. Exact pins can create unsolvable resolution conflicts when these packages are installed alongside other libraries that also depend on them.

```suggestion
  "urllib3>=2.7.0",
  # Utilities
  "pydantic>=2.8.0",
  "rich>=13.7.0",
```

### Issue 3 of 4
nemo_retriever/pyproject.toml:68
Same concern as `urllib3`: `nltk` is not a critical (CUDA/PyTorch) dependency, so an exact pin may create resolution conflicts in environments where another package requires a different `nltk` patch version.

```suggestion
  "nltk>=3.9.4",
```

### Issue 4 of 4
nemo_retriever/src/nemo_retriever/graph_ingestor.py:415-418
**Default `run_mode` changed from `"batch"` to `"inprocess"` without a deprecation cycle.** Any caller that relied on `GraphIngestor()` (or `ingest_documents()` / `resolve_ingest_plan()`) defaulting to Ray Data scale-out will now silently execute single-process pandas instead. This includes the `retriever pipeline run` CLI, the harness, and every programmatic caller that omitted `run_mode`. The `api-backward-compatibility` rule requires a deprecation warning or migration note for changes that alter observable default behavior. A comment in the docstring noting the version where the default changed would satisfy the intent of that rule.

_{Reviews (1): Last reviewed commit: "Merge branch '26.05' into main" | Re-trigger Greptile}

greptile-apps · 2026-05-29T19:49:01Z

+                retain_results = item.retain_results
+                if not retain_results and item.job_id:
+                    tracker_lookup = get_job_tracker()
+                    if tracker_lookup is not None:
+                        retain_results = tracker_lookup.should_retain_results(item.job_id)


Redundant get_job_tracker() call — the pool worker already retrieved tracker at the start of the try block (line ~257). tracker_lookup is always the same singleton object, so this second lookup is unnecessary. Using the existing tracker variable is cleaner and avoids a second function call inside the hot worker loop.

Suggested change

retain_results = item.retain_results

if not retain_results and item.job_id:

tracker_lookup = get_job_tracker()

if tracker_lookup is not None:

retain_results = tracker_lookup.should_retain_results(item.job_id)

retain_results = item.retain_results

if not retain_results and item.job_id:

if tracker is not None:

retain_results = tracker.should_retain_results(item.job_id)

Prompt To Fix With AI

This is a comment left during a code review. Path: nemo_retriever/src/nemo_retriever/service/services/pipeline_pool.py Line: 271-275 Comment: Redundant `get_job_tracker()` call — the pool worker already retrieved `tracker` at the start of the try block (line ~257). `tracker_lookup` is always the same singleton object, so this second lookup is unnecessary. Using the existing `tracker` variable is cleaner and avoids a second function call inside the hot worker loop. ```suggestion retain_results = item.retain_results if not retain_results and item.job_id: if tracker is not None: retain_results = tracker.should_retain_results(item.job_id) ``` How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

greptile-apps · 2026-05-29T19:49:02Z

+  "urllib3==2.7.0",
  # Utilities
  "pydantic>=2.8.0",
  "rich>=13.7.0",


Exact version pins for urllib3 and nltk are overly restrictive for non-critical dependencies. The project's dependency-management-uv rule specifies that only critical packages (PyTorch, CUDA-related) should be pinned exactly; others should use >=x.y.z range specifiers. Exact pins can create unsolvable resolution conflicts when these packages are installed alongside other libraries that also depend on them.

Suggested change

"urllib3==2.7.0",

# Utilities

"pydantic>=2.8.0",

"rich>=13.7.0",

"urllib3>=2.7.0",

# Utilities

"pydantic>=2.8.0",

"rich>=13.7.0",

Rule Used: Dependencies must be declared in the appropriate p... (source)

Prompt To Fix With AI

This is a comment left during a code review. Path: nemo_retriever/pyproject.toml Line: 55-58 Comment: Exact version pins for `urllib3` and `nltk` are overly restrictive for non-critical dependencies. The project's `dependency-management-uv` rule specifies that only critical packages (PyTorch, CUDA-related) should be pinned exactly; others should use `>=x.y.z` range specifiers. Exact pins can create unsolvable resolution conflicts when these packages are installed alongside other libraries that also depend on them. ```suggestion "urllib3>=2.7.0", # Utilities "pydantic>=2.8.0", "rich>=13.7.0", ``` **Rule Used:** Dependencies must be declared in the appropriate p... ([source](https://app.greptile.com/review/custom-context?memory=dependency-management-uv)) How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-05-29T19:49:03Z

  "pypdfium2==4.30.0",
  "pillow==12.2.0",
-  "nltk>=3.9.4",
+  "nltk==3.9.4",


Same concern as urllib3: nltk is not a critical (CUDA/PyTorch) dependency, so an exact pin may create resolution conflicts in environments where another package requires a different nltk patch version.

Suggested change

"nltk==3.9.4",

"nltk>=3.9.4",

Rule Used: Dependencies must be declared in the appropriate p... (source)

Prompt To Fix With AI

This is a comment left during a code review. Path: nemo_retriever/pyproject.toml Line: 68 Comment: Same concern as `urllib3`: `nltk` is not a critical (CUDA/PyTorch) dependency, so an exact pin may create resolution conflicts in environments where another package requires a different `nltk` patch version. ```suggestion "nltk>=3.9.4", ``` **Rule Used:** Dependencies must be declared in the appropriate p... ([source](https://app.greptile.com/review/custom-context?memory=dependency-management-uv)) How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-05-29T19:49:04Z

    def __init__(
        self,
        *,
-        run_mode: str = "batch",
+        run_mode: str = "inprocess",


Default run_mode changed from "batch" to "inprocess" without a deprecation cycle. Any caller that relied on GraphIngestor() (or ingest_documents() / resolve_ingest_plan()) defaulting to Ray Data scale-out will now silently execute single-process pandas instead. This includes the retriever pipeline run CLI, the harness, and every programmatic caller that omitted run_mode. The api-backward-compatibility rule requires a deprecation warning or migration note for changes that alter observable default behavior. A comment in the docstring noting the version where the default changed would satisfy the intent of that rule.

Rule Used: Changes to public API surfaces (FastAPI endpoints,... (source)

Prompt To Fix With AI

This is a comment left during a code review. Path: nemo_retriever/src/nemo_retriever/graph_ingestor.py Line: 415-418 Comment: **Default `run_mode` changed from `"batch"` to `"inprocess"` without a deprecation cycle.** Any caller that relied on `GraphIngestor()` (or `ingest_documents()` / `resolve_ingest_plan()`) defaulting to Ray Data scale-out will now silently execute single-process pandas instead. This includes the `retriever pipeline run` CLI, the harness, and every programmatic caller that omitted `run_mode`. The `api-backward-compatibility` rule requires a deprecation warning or migration note for changes that alter observable default behavior. A comment in the docstring noting the version where the default changed would satisfy the intent of that rule. **Rule Used:** Changes to public API surfaces (FastAPI endpoints,... ([source](https://app.greptile.com/review/custom-context?memory=api-backward-compatibility)) How can I resolve this? If you propose a fix, please make it concise.

jdye64 and others added 30 commits May 19, 2026 21:16

Prepare 26.05 release line for 26.05-RC1

a452eb6

Bump docs, Helm chart metadata, and install snippets from 26.03/26.3.0 to the 26.05 line and RC1 tag so published artifacts align with user-facing version references.

Pypi and helm publish fixes

6c2fd4e

Fix PyPI publish wheel path and add artifact listing step

542e995

upload-artifact flattens nemo_retriever/dist into ./dist on download; use find instead of ./dist/nemo_retriever/dist/*. Re-run failed jobs does not pick up workflow changes — dispatch a new run after merging.

Add PyPI build/publish debug logging for artifact layout

96c11fc

Log pwd, run metadata, directory trees, and glob probe results so CI failures show whether wheels are missing, mis-staged, or the workflow YAML is frozen from an older re-run.

describe Nemotron Parse as alternate PDF extraction method (NVIDIA#2070)

103c271

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

Merge branch 'main' into 26.05

0dea097

align captioning and chart extraction with Helm NIM topology Fixes: 6…

9190fe5

…195023, 6195296 (NVIDIA#2074) Co-authored-by: Randy Gelhausen <rgelhau@gmail.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

Service mode pdf only fix (NVIDIA#2077)

722d553

Gate service ffmpeg install at runtime (NVIDIA#2052)

4d698ad

(cherry picked from commit 13ee35b)

Add stable HF PyPI release dispatch (NVIDIA#2075)

881af18

(cherry picked from commit 050331f)

Add PR install smoke for Windows and macOS (NVIDIA#2078)

6c5bb78

(cherry picked from commit 73f3d5f)

Bump OCR nightly train and relax retriever pin (NVIDIA#2080)

a14b5b3

(cherry picked from commit 2524cfb)

air-gapped deployment for 26.05 (NVBugs 6195103, PR NVIDIA#2052) (NV…

19af4bc

…IDIA#2082)

Fix video ASR audio demuxing (NVIDIA#2086)

4bfcb40

(cherry picked from commit 9330be8)

mark non-ingest/query/pipeline retriever subcommands as experimental …

dfd6b38

…(NVBugs 6199005, 6198526) (NVIDIA#2088)

Add input-aware retriever ingest routing (NVIDIA#2068)

b3029da

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

silence retriever ingest cli command (NVIDIA#2083)

2b9880d

Helm nemotron parse (NVIDIA#2092)

63dfa90

docs(helm): clarify four core NIMs vs optional Helm NIMs for 26.05 (N…

d25eb24

…VIDIA#2097)

fix asr and ocr on default cpu remote (NVIDIA#2085)

40e9992

Signed-off-by: Julio Perez <jperez@nvidia.com> Co-authored-by: Charles Blackmon-Luca <20627856+charlesbluca@users.noreply.github.com> (cherry picked from commit 7130a72)

Fix ASR and media pipeline parameter handling (NVIDIA#2101)

a99479e

(cherry picked from commit 744f0bc)

docs(extraction): OCR v2 defaults, captioning link, B200 nemotron-par…

794fe97

…se (26.05, NVBug 6204537) (NVIDIA#2103)

Update nemotron parse http interface for nemotron-parse 1.2 (cherry p…

fc53bed

…ick of NVIDIA#2113) (NVIDIA#2117)

Helm fixes latest (NVIDIA#2121)

cafad31

Helm rerank vl version (NVIDIA#2122)

6309bdf

Codex/26.05 runmode typing cleanup (NVIDIA#2124)

ca3d676

Nim operator GPU resources fix (NVIDIA#2123)

2005314

Fix detection mode to ensure HTML and Text are honored (NVIDIA#2128)

e63cf74

Fix .extract() silently dropping unknown kwargs and docs (NVIDIA#2130)

891f1f6

jioffe502 and others added 21 commits May 27, 2026 13:27

docs: update Retriever constructor examples (NVIDIA#2134)

cb75050

Default to service mode returning the results for the ingestion job (N…

ac41c62

…VIDIA#2133)

backport: (main --> 26.05)PDF pre-split docs + service-only pdf_split…

52281d8

…_config note (NVBugs 6218013) (NVIDIA#2126)

docs: fix graph_pipeline LanceDB examples (NVIDIA#2136)

a4ab4e3

Add OTEL basic support and bump to nemotron-ocr-v2 (NVIDIA#2142)

2edab62

Otel introduction (NVIDIA#2145)

951e0b5

fix versions of cve packages (NVIDIA#2129)

f4e50c4

Asr fixes (NVIDIA#2147)

2df85c5

Fix release source ref issue (NVIDIA#2149)

2271755

add checks against service mode params (NVIDIA#2148)

db2c87d

Replace ingest input-type routing with manifest branches (NVIDIA#2095)

984140b

Co-authored-by: Julio Perez <37191411+jperez999@users.noreply.github.com>

Update ASR model to use batch mode and auto-select using batch/stream…

8856a87

…ing (NVIDIA#2153)

quiet mode default (NVIDIA#2154)

e254e08

update the restricted params for service mode (NVIDIA#2157)

08de78e

ingest profiles + captioning (NVIDIA#2158)

9da3fb2

Codex/fail empty root ingest (NVIDIA#2160)

47715ea

fix stale VDB docs (NVBugs 6205401) (NVIDIA#2151)

72a6b61

Adjust service side retention of results based on client desires (NVI…

2853421

…DIA#2165)

Make inprocess the default implementation for the GraphIngestor, batc… (

2be38bd

NVIDIA#2170)

Prepare for 26.5.0 release (NVIDIA#2171)

0a6cb70

Merge branch '26.05' into main

7fe6593

Bring 26.5.0 GA release prep, inprocess default pipeline, service retain_results, and pinned dependencies into main while preserving main's OpenShift docs, GPU CLI tuning, and QA-aligned extraction docs.

jdye64 requested review from a team as code owners May 29, 2026 19:44

jdye64 requested a review from charlesbluca May 29, 2026 19:44

greptile-apps Bot reviewed May 29, 2026

View reviewed changes

jperez999 approved these changes May 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

26.05 into main folding pr#2175

26.05 into main folding pr#2175
jdye64 wants to merge 51 commits into
NVIDIA:mainfrom
jdye64:26.05-into-main-folding-pr

jdye64 commented May 29, 2026

Uh oh!

copy-pr-bot Bot commented May 29, 2026

Uh oh!

greptile-apps Bot commented May 29, 2026

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot May 29, 2026

Uh oh!

greptile-apps Bot May 29, 2026

Uh oh!

greptile-apps Bot May 29, 2026

Uh oh!

greptile-apps Bot May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Conversation

jdye64 commented May 29, 2026

Description

Checklist

Uh oh!

copy-pr-bot Bot commented May 29, 2026

Uh oh!

greptile-apps Bot commented May 29, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants