Skip to content

26.05 into main folding pr#2175

Open
jdye64 wants to merge 51 commits into
NVIDIA:mainfrom
jdye64:26.05-into-main-folding-pr
Open

26.05 into main folding pr#2175
jdye64 wants to merge 51 commits into
NVIDIA:mainfrom
jdye64:26.05-into-main-folding-pr

Conversation

@jdye64
Copy link
Copy Markdown
Collaborator

@jdye64 jdye64 commented May 29, 2026

Description

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

jdye64 and others added 30 commits May 19, 2026 21:16
Bump docs, Helm chart metadata, and install snippets from 26.03/26.3.0 to
the 26.05 line and RC1 tag so published artifacts align with user-facing
version references.
upload-artifact flattens nemo_retriever/dist into ./dist on download;
use find instead of ./dist/nemo_retriever/dist/*. Re-run failed jobs
does not pick up workflow changes — dispatch a new run after merging.
Stage wheels under nemo_retriever/dist in the CI artifact so legacy and new
publish paths both work; resolve wheels from multiple download layouts.
Handle missing NGC chart (create) and duplicate version (skip) in helm script.
Log pwd, run metadata, directory trees, and glob probe results so CI
failures show whether wheels are missing, mis-staged, or the workflow
YAML is frozen from an older re-run.
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
…195023, 6195296 (NVIDIA#2074)

Co-authored-by: Randy Gelhausen <rgelhau@gmail.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Signed-off-by: Julio Perez <jperez@nvidia.com>
Co-authored-by: Charles Blackmon-Luca <20627856+charlesbluca@users.noreply.github.com>
(cherry picked from commit 7130a72)
jioffe502 and others added 21 commits May 27, 2026 13:27
Co-authored-by: Julio Perez <37191411+jperez999@users.noreply.github.com>
Bring 26.5.0 GA release prep, inprocess default pipeline, service
retain_results, and pinned dependencies into main while preserving
main's OpenShift docs, GPU CLI tuning, and QA-aligned extraction docs.
@jdye64 jdye64 requested review from a team as code owners May 29, 2026 19:44
@jdye64 jdye64 requested a review from charlesbluca May 29, 2026 19:44
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 29, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 29, 2026

Greptile Summary

This is the 26.5.0 GA release fold from the 26.05 branch into main. It promotes the Helm chart and service image to the GA NGC registry, moves nemotron model packages from test-PyPI to production PyPI, and introduces a server-side retain_results flag that lets clients opt-in to keeping raw row payloads in the job tracker instead of discarding them after pipeline completion.

  • retain_results feature: JobCreateRequest, JobAggregate, WorkItem, and the full streaming chain (ServiceIngestor → RetrieverServiceClient → pipeline pool → job tracker) are updated to propagate a per-job retention flag, with defense-in-depth checks at both the pool layer and the tracker's mark_completed.
  • Default run_mode flipped to "inprocess": All entry points (GraphIngestor, ingest_documents, resolve_ingest_plan, CLI, harness config) now default to single-process pandas instead of Ray Data; tests are updated accordingly.
  • Dependency cleanup: nemotron packages move to stable PyPI indexes; langchain-nvidia-ai-endpoints minimum jumps to >=1.4.0; litellm is bounded to <2; langgraph minimum advances to >=1.2.0.

Confidence Score: 3/5

Safe to merge if the team accepts the breaking default-mode change; the retain_results feature is well-tested and structurally sound.

The retain_results feature is implemented consistently across all layers (router → pool → tracker → client) with defense-in-depth guards and good test coverage. The main concern is the intentional but undocumented behavioral breaking change: every existing caller that omits run_mode silently switches from Ray Data to single-process pandas. This affects GraphIngestor(), ingest_documents(), the CLI default, and the harness — all without a deprecation note. Additionally, exact-pinning non-critical packages (urllib3==2.7.0, nltk==3.9.4) can create dependency conflicts for users of the library.

nemo_retriever/src/nemo_retriever/graph_ingestor.py (default run_mode flip) and nemo_retriever/pyproject.toml (exact version pins for non-critical deps).

Important Files Changed

Filename Overview
nemo_retriever/src/nemo_retriever/service_ingestor.py Refactors sync/async streaming to thread retain_results through the entire call chain; introduces _ingest_stream_with_retain as the sync implementation helper. Logic is sound and backward-compatible (default False).
nemo_retriever/src/nemo_retriever/service/services/job_tracker.py Adds retain_results flag to JobAggregate and register_job; mark_completed now conditionally discards result_data based on the flag. Defense-in-depth approach is correct; should_retain_results helper is clean.
nemo_retriever/src/nemo_retriever/service/services/pipeline_pool.py Wires retain_results from WorkItem to control store_result_data and mark_completed calls; contains one redundant get_job_tracker() call that should reuse the already-held tracker variable.
nemo_retriever/src/nemo_retriever/service/routers/ingest.py Adds _GATEWAY_RETAIN_RESULTS_HEADER and helper functions to propagate the per-job retain flag from gateway to workers; three submission endpoints updated consistently.
nemo_retriever/pyproject.toml Moves nemotron model packages from test-pypi to PyPI, bumps langchain-nvidia-ai-endpoints to >=1.4.0, tightens litellm to <2, and advances langgraph minimum. Also introduces exact pins for urllib3 and nltk that are overly restrictive for non-critical dependencies.
nemo_retriever/src/nemo_retriever/graph_ingestor.py Flips default run_mode from "batch" to "inprocess" — an intentional GA release change, but a behavioral breaking change for any existing caller that omitted run_mode.
nemo_retriever/src/nemo_retriever/service/client.py Threads retain_results from aingest_documents_stream through _create_job; new parameter is keyword-only with a False default — fully backward-compatible.
nemo_retriever/src/nemo_retriever/service/models/requests.py Adds retain_results: bool = Field(default=False) to JobCreateRequest; additive change with sensible default and clear description.
nemo_retriever/helm/values.yaml Updates service image to the GA NGC registry path with tag 26.5.0 and IfNotPresent pull policy — correct for a GA release.
nemo_retriever/helm/Chart.yaml Version bumped from 26.05-RC1 to 26.5.0 (SemVer-compliant), matching the GA release.
.github/workflows/release-helm.yml Minor: updates workflow input description example from 26.05-RC1 to 26.5.0 to match the new version format.

Sequence Diagram

sequenceDiagram
    participant Client
    participant RetrieverServiceClient
    participant IngestRouter
    participant JobTracker
    participant PipelinePool
    participant WorkerResultStore

    Client->>RetrieverServiceClient: "aingest_documents_stream(files, retain_results=True)"
    RetrieverServiceClient->>IngestRouter: "POST /v1/ingest/job {expected_documents, retain_results: true}"
    IngestRouter->>JobTracker: "register_job(retain_results=True)"
    JobTracker-->>IngestRouter: job_id
    IngestRouter-->>RetrieverServiceClient: "201 {job_id}"

    RetrieverServiceClient->>IngestRouter: "POST /v1/ingest/job/{job_id}/document"
    IngestRouter->>IngestRouter: _gateway_retain_results_headers(job_id)
    IngestRouter->>PipelinePool: "WorkItem(retain_results=True)"
    PipelinePool->>PipelinePool: run pipeline
    alt "retain_results=True"
        PipelinePool->>WorkerResultStore: store_result_data(item.id, result_data)
    end
    PipelinePool->>JobTracker: mark_completed(result_data if retain_results else None)
    JobTracker->>JobTracker: "rec.result_data = result_data"

    Client->>IngestRouter: "GET /v1/ingest/status/{doc_id}"
    IngestRouter->>JobTracker: get_document(doc_id)
    JobTracker-->>IngestRouter: "DocumentRecord(result_data=[...])"
    IngestRouter-->>Client: result_data present
Loading
Prompt To Fix All With AI
Fix the following 4 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 4
nemo_retriever/src/nemo_retriever/service/services/pipeline_pool.py:271-275
Redundant `get_job_tracker()` call — the pool worker already retrieved `tracker` at the start of the try block (line ~257). `tracker_lookup` is always the same singleton object, so this second lookup is unnecessary. Using the existing `tracker` variable is cleaner and avoids a second function call inside the hot worker loop.

```suggestion
                retain_results = item.retain_results
                if not retain_results and item.job_id:
                    if tracker is not None:
                        retain_results = tracker.should_retain_results(item.job_id)
```

### Issue 2 of 4
nemo_retriever/pyproject.toml:55-58
Exact version pins for `urllib3` and `nltk` are overly restrictive for non-critical dependencies. The project's `dependency-management-uv` rule specifies that only critical packages (PyTorch, CUDA-related) should be pinned exactly; others should use `>=x.y.z` range specifiers. Exact pins can create unsolvable resolution conflicts when these packages are installed alongside other libraries that also depend on them.

```suggestion
  "urllib3>=2.7.0",
  # Utilities
  "pydantic>=2.8.0",
  "rich>=13.7.0",
```

### Issue 3 of 4
nemo_retriever/pyproject.toml:68
Same concern as `urllib3`: `nltk` is not a critical (CUDA/PyTorch) dependency, so an exact pin may create resolution conflicts in environments where another package requires a different `nltk` patch version.

```suggestion
  "nltk>=3.9.4",
```

### Issue 4 of 4
nemo_retriever/src/nemo_retriever/graph_ingestor.py:415-418
**Default `run_mode` changed from `"batch"` to `"inprocess"` without a deprecation cycle.** Any caller that relied on `GraphIngestor()` (or `ingest_documents()` / `resolve_ingest_plan()`) defaulting to Ray Data scale-out will now silently execute single-process pandas instead. This includes the `retriever pipeline run` CLI, the harness, and every programmatic caller that omitted `run_mode`. The `api-backward-compatibility` rule requires a deprecation warning or migration note for changes that alter observable default behavior. A comment in the docstring noting the version where the default changed would satisfy the intent of that rule.

Reviews (1): Last reviewed commit: "Merge branch '26.05' into main" | Re-trigger Greptile

Comment on lines +271 to +275
retain_results = item.retain_results
if not retain_results and item.job_id:
tracker_lookup = get_job_tracker()
if tracker_lookup is not None:
retain_results = tracker_lookup.should_retain_results(item.job_id)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Redundant get_job_tracker() call — the pool worker already retrieved tracker at the start of the try block (line ~257). tracker_lookup is always the same singleton object, so this second lookup is unnecessary. Using the existing tracker variable is cleaner and avoids a second function call inside the hot worker loop.

Suggested change
retain_results = item.retain_results
if not retain_results and item.job_id:
tracker_lookup = get_job_tracker()
if tracker_lookup is not None:
retain_results = tracker_lookup.should_retain_results(item.job_id)
retain_results = item.retain_results
if not retain_results and item.job_id:
if tracker is not None:
retain_results = tracker.should_retain_results(item.job_id)
Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/service/services/pipeline_pool.py
Line: 271-275

Comment:
Redundant `get_job_tracker()` call — the pool worker already retrieved `tracker` at the start of the try block (line ~257). `tracker_lookup` is always the same singleton object, so this second lookup is unnecessary. Using the existing `tracker` variable is cleaner and avoids a second function call inside the hot worker loop.

```suggestion
                retain_results = item.retain_results
                if not retain_results and item.job_id:
                    if tracker is not None:
                        retain_results = tracker.should_retain_results(item.job_id)
```

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +55 to 58
"urllib3==2.7.0",
# Utilities
"pydantic>=2.8.0",
"rich>=13.7.0",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Exact version pins for urllib3 and nltk are overly restrictive for non-critical dependencies. The project's dependency-management-uv rule specifies that only critical packages (PyTorch, CUDA-related) should be pinned exactly; others should use >=x.y.z range specifiers. Exact pins can create unsolvable resolution conflicts when these packages are installed alongside other libraries that also depend on them.

Suggested change
"urllib3==2.7.0",
# Utilities
"pydantic>=2.8.0",
"rich>=13.7.0",
"urllib3>=2.7.0",
# Utilities
"pydantic>=2.8.0",
"rich>=13.7.0",

Rule Used: Dependencies must be declared in the appropriate p... (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/pyproject.toml
Line: 55-58

Comment:
Exact version pins for `urllib3` and `nltk` are overly restrictive for non-critical dependencies. The project's `dependency-management-uv` rule specifies that only critical packages (PyTorch, CUDA-related) should be pinned exactly; others should use `>=x.y.z` range specifiers. Exact pins can create unsolvable resolution conflicts when these packages are installed alongside other libraries that also depend on them.

```suggestion
  "urllib3>=2.7.0",
  # Utilities
  "pydantic>=2.8.0",
  "rich>=13.7.0",
```

**Rule Used:** Dependencies must be declared in the appropriate p... ([source](https://app.greptile.com/review/custom-context?memory=dependency-management-uv))

How can I resolve this? If you propose a fix, please make it concise.

"pypdfium2==4.30.0",
"pillow==12.2.0",
"nltk>=3.9.4",
"nltk==3.9.4",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Same concern as urllib3: nltk is not a critical (CUDA/PyTorch) dependency, so an exact pin may create resolution conflicts in environments where another package requires a different nltk patch version.

Suggested change
"nltk==3.9.4",
"nltk>=3.9.4",

Rule Used: Dependencies must be declared in the appropriate p... (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/pyproject.toml
Line: 68

Comment:
Same concern as `urllib3`: `nltk` is not a critical (CUDA/PyTorch) dependency, so an exact pin may create resolution conflicts in environments where another package requires a different `nltk` patch version.

```suggestion
  "nltk>=3.9.4",
```

**Rule Used:** Dependencies must be declared in the appropriate p... ([source](https://app.greptile.com/review/custom-context?memory=dependency-management-uv))

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 415 to +418
def __init__(
self,
*,
run_mode: str = "batch",
run_mode: str = "inprocess",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Default run_mode changed from "batch" to "inprocess" without a deprecation cycle. Any caller that relied on GraphIngestor() (or ingest_documents() / resolve_ingest_plan()) defaulting to Ray Data scale-out will now silently execute single-process pandas instead. This includes the retriever pipeline run CLI, the harness, and every programmatic caller that omitted run_mode. The api-backward-compatibility rule requires a deprecation warning or migration note for changes that alter observable default behavior. A comment in the docstring noting the version where the default changed would satisfy the intent of that rule.

Rule Used: Changes to public API surfaces (FastAPI endpoints,... (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/graph_ingestor.py
Line: 415-418

Comment:
**Default `run_mode` changed from `"batch"` to `"inprocess"` without a deprecation cycle.** Any caller that relied on `GraphIngestor()` (or `ingest_documents()` / `resolve_ingest_plan()`) defaulting to Ray Data scale-out will now silently execute single-process pandas instead. This includes the `retriever pipeline run` CLI, the harness, and every programmatic caller that omitted `run_mode`. The `api-backward-compatibility` rule requires a deprecation warning or migration note for changes that alter observable default behavior. A comment in the docstring noting the version where the default changed would satisfy the intent of that rule.

**Rule Used:** Changes to public API surfaces (FastAPI endpoints,... ([source](https://app.greptile.com/review/custom-context?memory=api-backward-compatibility))

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants