Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
a452eb6
Prepare 26.05 release line for 26.05-RC1
jdye64 May 19, 2026
6c2fd4e
Pypi and helm publish fixes
jdye64 May 19, 2026
542e995
Fix PyPI publish wheel path and add artifact listing step
jdye64 May 19, 2026
6e47222
Fix PyPI artifact layout and Helm publish idempotency on 26.05
jdye64 May 19, 2026
96c11fc
Add PyPI build/publish debug logging for artifact layout
jdye64 May 19, 2026
103c271
describe Nemotron Parse as alternate PDF extraction method (#2070)
kheiss-uwzoo May 20, 2026
0dea097
Merge branch 'main' into 26.05
kheiss-uwzoo May 20, 2026
9190fe5
align captioning and chart extraction with Helm NIM topology Fixes: 6…
kheiss-uwzoo May 20, 2026
722d553
Service mode pdf only fix (#2077)
jdye64 May 20, 2026
4d698ad
Gate service ffmpeg install at runtime (#2052)
charlesbluca May 21, 2026
881af18
Add stable HF PyPI release dispatch (#2075)
charlesbluca May 20, 2026
6c5bb78
Add PR install smoke for Windows and macOS (#2078)
charlesbluca May 20, 2026
a14b5b3
Bump OCR nightly train and relax retriever pin (#2080)
charlesbluca May 21, 2026
19af4bc
air-gapped deployment for 26.05 (NVBugs 6195103, PR #2052) (#2082)
kheiss-uwzoo May 21, 2026
4bfcb40
Fix video ASR audio demuxing (#2086)
ChrisJar May 21, 2026
dfd6b38
mark non-ingest/query/pipeline retriever subcommands as experimental …
kheiss-uwzoo May 21, 2026
b3029da
Add input-aware retriever ingest routing (#2068)
jioffe502 May 20, 2026
2b9880d
silence retriever ingest cli command (#2083)
edknv May 21, 2026
63dfa90
Helm nemotron parse (#2092)
jdye64 May 22, 2026
d25eb24
docs(helm): clarify four core NIMs vs optional Helm NIMs for 26.05 (#…
kheiss-uwzoo May 22, 2026
40e9992
fix asr and ocr on default cpu remote (#2085)
jperez999 May 22, 2026
a99479e
Fix ASR and media pipeline parameter handling (#2101)
charlesbluca May 22, 2026
794fe97
docs(extraction): OCR v2 defaults, captioning link, B200 nemotron-par…
kheiss-uwzoo May 26, 2026
fc53bed
Update nemotron parse http interface for nemotron-parse 1.2 (cherry p…
ChrisJar May 26, 2026
cafad31
Helm fixes latest (#2121)
jdye64 May 27, 2026
6309bdf
Helm rerank vl version (#2122)
jdye64 May 27, 2026
ca3d676
Codex/26.05 runmode typing cleanup (#2124)
jioffe502 May 27, 2026
2005314
Nim operator GPU resources fix (#2123)
jdye64 May 27, 2026
e63cf74
Fix detection mode to ensure HTML and Text are honored (#2128)
jdye64 May 27, 2026
891f1f6
Fix .extract() silently dropping unknown kwargs and docs (#2130)
mahikaw May 27, 2026
cb75050
docs: update Retriever constructor examples (#2134)
jioffe502 May 27, 2026
ac41c62
Default to service mode returning the results for the ingestion job (…
jdye64 May 27, 2026
52281d8
backport: (main --> 26.05)PDF pre-split docs + service-only pdf_split…
kheiss-uwzoo May 27, 2026
a4ab4e3
docs: fix graph_pipeline LanceDB examples (#2136)
jioffe502 May 27, 2026
2edab62
Add OTEL basic support and bump to nemotron-ocr-v2 (#2142)
jdye64 May 27, 2026
951e0b5
Otel introduction (#2145)
jdye64 May 28, 2026
f4e50c4
fix versions of cve packages (#2129)
jperez999 May 28, 2026
2df85c5
Asr fixes (#2147)
jdye64 May 28, 2026
2271755
Fix release source ref issue (#2149)
jdye64 May 28, 2026
db2c87d
add checks against service mode params (#2148)
jperez999 May 28, 2026
984140b
Replace ingest input-type routing with manifest branches (#2095)
jioffe502 May 28, 2026
8856a87
Update ASR model to use batch mode and auto-select using batch/stream…
jdye64 May 28, 2026
e254e08
quiet mode default (#2154)
jperez999 May 28, 2026
08de78e
update the restricted params for service mode (#2157)
jperez999 May 28, 2026
9da3fb2
ingest profiles + captioning (#2158)
jioffe502 May 28, 2026
47715ea
Codex/fail empty root ingest (#2160)
jioffe502 May 28, 2026
72a6b61
fix stale VDB docs (NVBugs 6205401) (#2151)
kheiss-uwzoo May 29, 2026
2853421
Adjust service side retention of results based on client desires (#2165)
jdye64 May 29, 2026
2be38bd
Make inprocess the default implementation for the GraphIngestor, batc…
jdye64 May 29, 2026
0a6cb70
Prepare for 26.5.0 release (#2171)
jdye64 May 29, 2026
7fe6593
Merge branch '26.05' into main
jdye64 May 29, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/release-helm.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ on:
workflow_dispatch:
inputs:
version:
description: 'Chart version (e.g. 26.05-RC1)'
description: 'Chart version (e.g. 26.5.0)'
required: true
type: string
source-ref:
Expand Down
4 changes: 2 additions & 2 deletions nemo_retriever/helm/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@ description: |
shared PostgreSQL backend so the service can scale horizontally.
type: application
version: 26.05-RC1
appVersion: "26.05-RC1"
version: "26.5.0"
appVersion: "26.5.0"
kubeVersion: ">=1.25.0-0"
home: https://github.com/NVIDIA/NeMo-Retriever
sources:
Expand Down
10 changes: 5 additions & 5 deletions nemo_retriever/helm/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -67,13 +67,13 @@ imagePullSecrets: []
# =============================================================================
service:
image:
# Default points at the staging image published to NGC. Override
# Default points at the GA image published to NGC. Override
# `repository` / `tag` to pin a different build, e.g. one produced by:
# docker build -f nemo_retriever/Dockerfile --target service \
# docker build -f Dockerfile --target service \
# -t <your-registry>/nemo-retriever-service:<tag> .
repository: localhost:32000/nemo-retriever-service
tag: "latest"
pullPolicy: Always
repository: nvcr.io/nvidia/nemo-microservices/nrl-service
tag: "26.5.0"
pullPolicy: IfNotPresent

# Number of pod replicas. Must stay at 1 while persistence is SQLite-backed
# (RWO PVC + single writer). Bumping this requires switching to a shared
Expand Down
23 changes: 9 additions & 14 deletions nemo_retriever/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ dependencies = [
# HTTP clients
"httpx>=0.27.0",
"requests>=2.32.5",
"urllib3>=2.7.0",
"urllib3==2.7.0",
# Utilities
"pydantic>=2.8.0",
"rich>=13.7.0",
Comment on lines +55 to 58
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Exact version pins for urllib3 and nltk are overly restrictive for non-critical dependencies. The project's dependency-management-uv rule specifies that only critical packages (PyTorch, CUDA-related) should be pinned exactly; others should use >=x.y.z range specifiers. Exact pins can create unsolvable resolution conflicts when these packages are installed alongside other libraries that also depend on them.

Suggested change
"urllib3==2.7.0",
# Utilities
"pydantic>=2.8.0",
"rich>=13.7.0",
"urllib3>=2.7.0",
# Utilities
"pydantic>=2.8.0",
"rich>=13.7.0",

Rule Used: Dependencies must be declared in the appropriate p... (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/pyproject.toml
Line: 55-58

Comment:
Exact version pins for `urllib3` and `nltk` are overly restrictive for non-critical dependencies. The project's `dependency-management-uv` rule specifies that only critical packages (PyTorch, CUDA-related) should be pinned exactly; others should use `>=x.y.z` range specifiers. Exact pins can create unsolvable resolution conflicts when these packages are installed alongside other libraries that also depend on them.

```suggestion
  "urllib3>=2.7.0",
  # Utilities
  "pydantic>=2.8.0",
  "rich>=13.7.0",
```

**Rule Used:** Dependencies must be declared in the appropriate p... ([source](https://app.greptile.com/review/custom-context?memory=dependency-management-uv))

How can I resolve this? If you propose a fix, please make it concise.

Expand All @@ -65,9 +65,9 @@ dependencies = [
# Document parsing and NIM client libs
"pypdfium2==4.30.0",
"pillow==12.2.0",
"nltk>=3.9.4",
"nltk==3.9.4",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Same concern as urllib3: nltk is not a critical (CUDA/PyTorch) dependency, so an exact pin may create resolution conflicts in environments where another package requires a different nltk patch version.

Suggested change
"nltk==3.9.4",
"nltk>=3.9.4",

Rule Used: Dependencies must be declared in the appropriate p... (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/pyproject.toml
Line: 68

Comment:
Same concern as `urllib3`: `nltk` is not a critical (CUDA/PyTorch) dependency, so an exact pin may create resolution conflicts in environments where another package requires a different `nltk` patch version.

```suggestion
  "nltk>=3.9.4",
```

**Rule Used:** Dependencies must be declared in the appropriate p... ([source](https://app.greptile.com/review/custom-context?memory=dependency-management-uv))

How can I resolve this? If you propose a fix, please make it concise.

"markitdown",
"langchain-nvidia-ai-endpoints>=0.3.0",
"langchain-nvidia-ai-endpoints>=1.4.0",
# Default VDB solution
"lancedb",
# gRPC client for Parakeet/Riva ASR. Required for ASRCPUActor when it
Expand Down Expand Up @@ -123,11 +123,10 @@ local = [
"scikit-learn>=1.6.0",
"timm==1.0.22",
"albumentations==2.0.8",
"nemotron-page-elements-v3>=0.dev0",
"nemotron-graphic-elements-v1>=0.dev0",
"nemotron-table-structure-v1>=0.dev0",
# Accept the 2.0.0 stable release and newer OCR dev/final trains.
"nemotron-ocr>=2.0.0.dev0; sys_platform == 'linux' and (platform_machine == 'x86_64' or platform_machine == 'aarch64')",
"nemotron-page-elements-v3==3.0.1",
"nemotron-graphic-elements-v1==1.0.0",
"nemotron-table-structure-v1==1.0.0",
"nemotron-ocr>=2.0.0,<3; sys_platform == 'linux' and (platform_machine == 'x86_64' or platform_machine == 'aarch64')",
"nvidia-ml-py",
"apscheduler>=3.10",
"psutil>=5.9.0",
Expand Down Expand Up @@ -165,7 +164,7 @@ tabular = [
"duckdb>=1.2.0",
"duckdb-engine>=0.13.0",
"neo4j>=5.0",
"langgraph>=1.1.0a2",
"langgraph>=1.2.0",
]

# BEIR benchmarking and evaluation tools (not needed for production use).
Expand All @@ -181,7 +180,7 @@ benchmarks = [
# or construct an ``LLMJudge`` / ``LiteLLMClient`` directly. Powers both the
# live-RAG SDK and the batch evaluation framework.
llm = [
"litellm>=1.86.0rc1",
"litellm>=1.86.0,<2",
]

dev = [
Expand All @@ -202,10 +201,6 @@ retriever-harness = "nemo_retriever.harness:main"
version = {attr = "nemo_retriever.version.get_build_version"}

[tool.uv.sources]
nemotron-page-elements-v3 = { index = "test-pypi" }
nemotron-graphic-elements-v1 = { index = "test-pypi" }
nemotron-table-structure-v1 = { index = "test-pypi" }
nemotron-ocr = { index = "test-pypi" }
# On Linux, resolve torch/torchvision from the CUDA wheel index.
# On Mac, fall through to PyPI to get CPU wheels.
torch = [
Expand Down
8 changes: 4 additions & 4 deletions nemo_retriever/src/nemo_retriever/adapters/cli/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -167,9 +167,9 @@ def ingest_command(
lancedb_uri: str = typer.Option(DEFAULT_LANCEDB_URI, "--lancedb-uri", help="LanceDB database URI."),
table_name: str = typer.Option(DEFAULT_TABLE_NAME, "--table-name", help="LanceDB table name."),
run_mode: IngestRunModeValue = typer.Option(
"batch",
"inprocess",
"--run-mode",
help="Execution mode for the SDK ingestor. Defaults to batch; use inprocess to skip Ray for local debug/CI.",
help="Execution mode for the SDK ingestor. Defaults to inprocess; use batch for Ray Data scale-out.",
),
dry_run: bool = typer.Option(
False,
Expand Down Expand Up @@ -557,8 +557,8 @@ def ingest_command(
# Report input-file count alongside the actual landed-row count from the
# LanceDB table — they diverge whenever one document explodes into multiple
# chunks (PDFs → page elements, video → audio_visual segments) or
# shrinks to zero rows when every NIM call failed. The previous message
# only reported inputs and hid both cases. ``n_rows`` is None when the
# shrinks to zero rows when every NIM call failed. The SDK rejects empty
# or unverifiable ingests before we get here; ``n_rows`` is None when the
# table read itself failed (caller can still see file count + URI).
n_files = len(summary["documents"])
table_path = f"{summary['lancedb_uri']}/{summary['table_name']}"
Expand Down
14 changes: 6 additions & 8 deletions nemo_retriever/src/nemo_retriever/adapters/cli/sdk_workflow.py
Original file line number Diff line number Diff line change
Expand Up @@ -505,7 +505,7 @@ def resolve_ingest_plan(
*,
profile: IngestProfileValue = "auto",
input_type: IngestInputTypeValue = "auto",
run_mode: IngestRunModeValue = "batch",
run_mode: IngestRunModeValue = "inprocess",
method: str | None = None,
dpi: int | None = None,
extract_text: bool | None = None,
Expand Down Expand Up @@ -567,9 +567,8 @@ def resolve_ingest_plan(
) -> ResolvedIngestPlan:
"""Resolve root ingest options into ordinary params for one extract call.

Root ``retriever ingest`` intentionally defaults to ``run_mode="batch"``.
Programmatic callers that need Ray-free local execution should pass
``run_mode="inprocess"`` explicitly. ``input_type`` remains a private
Root ``retriever ingest`` defaults to ``run_mode="inprocess"`` (no Ray).
Pass ``run_mode="batch"`` for Ray Data scale-out. ``input_type`` remains a private
expansion/validation constraint; extraction still routes from the manifest.
"""

Expand Down Expand Up @@ -706,7 +705,7 @@ def ingest_documents(
*,
profile: IngestProfileValue = "auto",
input_type: IngestInputTypeValue = "auto",
run_mode: IngestRunModeValue = "batch",
run_mode: IngestRunModeValue = "inprocess",
dry_run: bool = False,
method: str | None = None,
dpi: int | None = None,
Expand Down Expand Up @@ -778,9 +777,8 @@ def ingest_documents(
Batch tuning arguments are opt-in and are translated into
``BatchTuningParams`` for extraction or embedding; they are meaningful for
``run_mode="batch"`` and ignored by callers that leave them unset.
Root ``retriever ingest`` intentionally defaults to ``run_mode="batch"``;
pass ``run_mode="inprocess"`` explicitly for local debug or CI callers
that need to skip Ray startup.
Root ``retriever ingest`` defaults to ``run_mode="inprocess"``; pass
``run_mode="batch"`` for Ray Data scale-out.
The legacy ``input_type`` argument constrains directory expansion and file
validation only; extraction routing remains manifest-planned.
"""
Expand Down
5 changes: 2 additions & 3 deletions nemo_retriever/src/nemo_retriever/graph/executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -228,9 +228,8 @@ def build_dataset(self, data: Any, **kwargs: Any) -> Any:

Returns
-------
pandas.DataFrame
The materialized result after executing the Ray Data pipeline
(``ds.to_pandas()``).
ray.data.Dataset
The lazy Ray dataset with all graph stages appended.
"""
import ray
import ray.data as rd
Expand Down
8 changes: 4 additions & 4 deletions nemo_retriever/src/nemo_retriever/graph_ingestor.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
from nemo_retriever.params import ExtractParams, EmbedParams

result_ds = (
GraphIngestor(run_mode="batch")
GraphIngestor(run_mode="inprocess")
.files(["/data/*.pdf"])
.extract(ExtractParams(method="pdfium"))
.embed(EmbedParams(model_name="nvidia/llama-nemotron-embed-1b-v2"))
Expand Down Expand Up @@ -387,8 +387,8 @@ class GraphIngestor(ingestor):
Parameters
----------
run_mode
``"batch"`` (Ray Data, default) or ``"inprocess"`` (single-process
pandas).
``"inprocess"`` (single-process pandas, default) or ``"batch"`` (Ray
Data).
ray_address
Ray cluster address. ``None`` starts a local cluster.
batch_size
Expand All @@ -415,7 +415,7 @@ class GraphIngestor(ingestor):
def __init__(
self,
*,
run_mode: str = "batch",
run_mode: str = "inprocess",
Comment on lines 415 to +418
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Default run_mode changed from "batch" to "inprocess" without a deprecation cycle. Any caller that relied on GraphIngestor() (or ingest_documents() / resolve_ingest_plan()) defaulting to Ray Data scale-out will now silently execute single-process pandas instead. This includes the retriever pipeline run CLI, the harness, and every programmatic caller that omitted run_mode. The api-backward-compatibility rule requires a deprecation warning or migration note for changes that alter observable default behavior. A comment in the docstring noting the version where the default changed would satisfy the intent of that rule.

Rule Used: Changes to public API surfaces (FastAPI endpoints,... (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/graph_ingestor.py
Line: 415-418

Comment:
**Default `run_mode` changed from `"batch"` to `"inprocess"` without a deprecation cycle.** Any caller that relied on `GraphIngestor()` (or `ingest_documents()` / `resolve_ingest_plan()`) defaulting to Ray Data scale-out will now silently execute single-process pandas instead. This includes the `retriever pipeline run` CLI, the harness, and every programmatic caller that omitted `run_mode`. The `api-backward-compatibility` rule requires a deprecation warning or migration note for changes that alter observable default behavior. A comment in the docstring noting the version where the default changed would satisfy the intent of that rule.

**Rule Used:** Changes to public API surfaces (FastAPI endpoints,... ([source](https://app.greptile.com/review/custom-context?memory=api-backward-compatibility))

How can I resolve this? If you propose a fix, please make it concise.

documents: Optional[List[str]] = None,
ray_address: Optional[str] = None,
ray_log_to_driver: bool = True,
Expand Down
2 changes: 1 addition & 1 deletion nemo_retriever/src/nemo_retriever/harness/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ class HarnessConfig:
dataset_dir: str
dataset_label: str
preset: str
run_mode: str = "batch"
run_mode: str = "inprocess"

query_csv: str | None = None
input_type: str = "pdf"
Expand Down
15 changes: 7 additions & 8 deletions nemo_retriever/src/nemo_retriever/pipeline/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,14 @@

Examples::

# Batch mode (Ray) with PDF extraction + embedding
# In-process mode (default; no Ray) for local extraction + embedding
retriever pipeline run /data/pdfs \\
--run-mode batch \\
--embed-invoke-url http://localhost:8000/v1
--ocr-invoke-url http://localhost:9000/v1

# In-process mode (no Ray) for quick local testing
# Batch mode (Ray) for large-scale throughput
retriever pipeline run /data/pdfs \\
--run-mode inprocess \\
--ocr-invoke-url http://localhost:9000/v1
--run-mode batch \\
--embed-invoke-url http://localhost:8000/v1

# Service mode (delegate to a running retriever service)
retriever pipeline run /data/pdfs \\
Expand Down Expand Up @@ -979,10 +978,10 @@ def run(
),
# --- I/O and execution ------------------------------------------------
run_mode: str = typer.Option(
"batch",
"inprocess",
"--run-mode",
help=(
"Execution mode: 'batch' (Ray Data), 'inprocess' (pandas, no Ray), "
"Execution mode: 'inprocess' (pandas, no Ray, default), 'batch' (Ray Data), "
"or 'service' (remote retriever service)."
),
rich_help_panel=_PANEL_IO,
Expand Down
2 changes: 1 addition & 1 deletion nemo_retriever/src/nemo_retriever/service/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -247,7 +247,7 @@ def create_app(config: ServiceConfig) -> FastAPI:
app = FastAPI(
title="Retriever Service",
description="Low-latency document ingestion service powered by nemo-retriever",
version="1.0.0",
version="26.5.0",
docs_url="/docs",
lifespan=_lifespan,
)
Expand Down
13 changes: 11 additions & 2 deletions nemo_retriever/src/nemo_retriever/service/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -216,6 +216,7 @@ async def _create_job(
*,
expected_documents: int,
label: str | None = None,
retain_results: bool = False,
) -> str:
"""Open a server-side job aggregate and return the assigned ``job_id``.

Expand All @@ -224,7 +225,10 @@ async def _create_job(
call sized to the number of files supplied.
"""
url = f"{self._base_url}/v1/ingest/job"
payload: dict[str, Any] = {"expected_documents": expected_documents}
payload: dict[str, Any] = {
"expected_documents": expected_documents,
"retain_results": retain_results,
}
if label is not None:
payload["label"] = label
resp = await client.post(url, json=payload)
Expand Down Expand Up @@ -639,6 +643,7 @@ async def aingest_documents_stream(
files: list[Path],
*,
pipeline_spec: dict[str, Any] | None = None,
retain_results: bool = False,
) -> AsyncIterator[dict[str, Any]]:
"""Async generator: upload files, yield events as documents complete.

Expand All @@ -665,7 +670,11 @@ async def aingest_documents_stream(
limits=pool_limits,
headers=self._auth_headers,
) as client:
job_id = await self._create_job(client, expected_documents=len(files))
job_id = await self._create_job(
client,
expected_documents=len(files),
retain_results=retain_results,
)
yield {
"event": "job_created",
"job_id": job_id,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,3 +46,12 @@ class JobCreateRequest(RichModel):
expected_documents: int = Field(ge=1, description="Number of documents this job will receive")
label: str | None = Field(default=None, description="Optional human-readable tag for the dashboard")
metadata: dict[str, Any] = Field(default_factory=dict)
retain_results: bool = Field(
default=False,
description=(
"When false (default), completed documents keep only ``result_rows`` in the "
"job tracker; row payloads are discarded after the pipeline finishes. Set true "
"when the client will poll ``GET /v1/ingest/status/{id}`` to fetch "
"``result_data``."
),
)
Loading
Loading