From c0cc98c5b780e7a2f4c38f715544a1f57594eb83 Mon Sep 17 00:00:00 2001 From: Kurt Heiss Date: Fri, 22 May 2026 12:59:55 -0700 Subject: [PATCH 1/3] docs(extraction): note OCR v2 multilingual defaults Document local HuggingFace and Helm OCR NIM language defaults for Nemotron OCR v2. --- docs/docs/extraction/multimodal-extraction.md | 2 ++ docs/docs/extraction/prerequisites-support-matrix.md | 8 ++++++++ 2 files changed, 10 insertions(+) diff --git a/docs/docs/extraction/multimodal-extraction.md b/docs/docs/extraction/multimodal-extraction.md index 1b4e984e5..aaf454c9f 100644 --- a/docs/docs/extraction/multimodal-extraction.md +++ b/docs/docs/extraction/multimodal-extraction.md @@ -62,6 +62,8 @@ For natural-language infographic descriptions, optionally enable [image captioni Scanned PDFs and image-only pages rely on OCR and hybrid paths that combine native text extraction with OCR when needed. For extract methods such as `ocr` and `pdfium_hybrid`, refer to the [Python API reference](nemo-retriever-api-reference.md). +The default OCR engine is **Nemotron OCR v2**. When you run extraction **locally with HuggingFace models**, v2 operates in **multilingual** mode by default (`multi`). Pass `--ocr-lang english` on the CLI (or the equivalent API parameter) for English-only v2, or `--ocr-version v1` for the legacy engine. For Kubernetes installs, the chart's OCR NIM defaults and image are documented under [Nemotron OCR v2 — language mode](prerequisites-support-matrix.md#nemotron-ocr-v2-language-mode) in the support matrix. + **Related** - [Text and layout extraction](#text-and-layout-extraction) diff --git a/docs/docs/extraction/prerequisites-support-matrix.md b/docs/docs/extraction/prerequisites-support-matrix.md index 0363b8c85..1cdfbfe8e 100644 --- a/docs/docs/extraction/prerequisites-support-matrix.md +++ b/docs/docs/extraction/prerequisites-support-matrix.md @@ -70,6 +70,14 @@ The production Helm chart enables these NIM microservices **by default** (for ex | `ocr` | [nemotron-ocr-v2](https://huggingface.co/nvidia/nemotron-ocr-v2) | Image OCR | | `vlm_embed` | [llama-nemotron-embed-vl-1b-v2](https://huggingface.co/nvidia/llama-nemotron-embed-vl-1b-v2) | Multimodal (VL) embedding | +### Nemotron OCR v2 language mode { #nemotron-ocr-v2-language-mode } + +!!! note + + **Local Hugging Face inference:** When you deploy locally with HuggingFace model weights (for example `pip install "nemo-retriever[local]"` and GPU inference without remote OCR NIM URLs), the default OCR engine is **Nemotron OCR v2**, which runs in **multilingual** mode by default (`multi`). For English-only v2, pass `--ocr-lang english` on the [CLI](https://github.com/NVIDIA/NeMo-Retriever/tree/main/nemo_retriever/docs/cli) or set the equivalent `ocr_lang` parameter in the Python API. Use `--ocr-version v1` for the legacy English-only engine. Remote OCR NIM endpoints use their own model and language behavior; local OCR language selectors are not sent on remote requests. + + **Helm / NIM (26.05):** The [NeMo Retriever Helm chart](https://github.com/NVIDIA/NeMo-Retriever/blob/26.05/nemo_retriever/helm/README.md) deploys the core OCR NIM under [`nimOperator.ocr`](https://github.com/NVIDIA/NeMo-Retriever/blob/26.05/nemo_retriever/helm/values.yaml#L817-L852). When that block targets **nemotron-ocr-v2** for your release, the deployed NIM also runs in multilingual mode by default. Confirm the `repository` and `tag` in `values.yaml` before you upgrade. + Default VL embedder container and model for release deployments: - **Image:** `nvcr.io/nim/nvidia/llama-nemotron-embed-vl-1b-v2:1.12.0` From cb3a44103ed94e583b891b75daa49383c5669150 Mon Sep 17 00:00:00 2001 From: Kurt Heiss Date: Fri, 22 May 2026 11:16:04 -0700 Subject: [PATCH 2/3] docs(extraction): drop NIM hardware prose from captioning Related link --- docs/docs/extraction/multimodal-extraction.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/docs/extraction/multimodal-extraction.md b/docs/docs/extraction/multimodal-extraction.md index aaf454c9f..30b57ddf6 100644 --- a/docs/docs/extraction/multimodal-extraction.md +++ b/docs/docs/extraction/multimodal-extraction.md @@ -80,7 +80,7 @@ Image captioning generates natural-language descriptions for unstructured image - [Multimodal embeddings (VLM)](embedding.md) - [Metadata reference](content-metadata.md) -- [Image captioning (26.05)](prerequisites-support-matrix.md#image-captioning-2605) — optional NIM and hardware on the support matrix +- [Image captioning (26.05)](prerequisites-support-matrix.md#image-captioning-2605) ## Metadata and content schema { #metadata-and-content-schema } From f357a56534a2ddf7fa47dd2c4ef1e9fc884346ca Mon Sep 17 00:00:00 2001 From: Kurt Heiss Date: Fri, 22 May 2026 14:55:50 -0700 Subject: [PATCH 3/3] docs(extraction): mark B200 supported for nemotron-parse (NVBug 6204537) Correct support matrix GPU and disk columns for B200; Helm can deploy nemotron-parse-v1.2 on B200. Doc-only; separate from SDK workflow tracking in NVBug 6198661. --- docs/docs/extraction/prerequisites-support-matrix.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/docs/extraction/prerequisites-support-matrix.md b/docs/docs/extraction/prerequisites-support-matrix.md index 1cdfbfe8e..e547a097f 100644 --- a/docs/docs/extraction/prerequisites-support-matrix.md +++ b/docs/docs/extraction/prerequisites-support-matrix.md @@ -118,8 +118,8 @@ Model repositories and NIM references are linked in [Core and Advanced Pipeline | Core Features | — | Total Disk Space | ~150GB | ~150GB | ~150GB | ~150GB | ~150GB | ~150GB | ~150GB | ~150GB | ~150GB | | Audio (parakeet-1-1b-ctc-en-us) | ~4.0 GiB (`model.safetensors`; the repo also ships `parakeet-ctc-1.1b.nemo` of similar size—use one format to avoid roughly doubling disk use) | Additional Dedicated GPUs | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1¹ | | Audio (parakeet-1-1b-ctc-en-us) | — | Additional Disk Space | ~37GB | ~37GB | ~37GB | ~37GB | ~37GB | ~37GB | ~37GB | ~37GB | ~37GB¹ | -| nemotron-parse | ~3.5 GiB | Additional Dedicated GPUs | Not supported | Not supported | Not supported | 1 | 1 | 1 | 1 | 1 | Not supported² | -| nemotron-parse | — | Additional Disk Space | Not supported | Not supported | Not supported | ~16GB | ~16GB | ~16GB | ~16GB | ~16GB | Not supported² | +| nemotron-parse | ~3.5 GiB | Additional Dedicated GPUs | Not supported | 1 | Not supported | 1 | 1 | 1 | 1 | 1 | Not supported² | +| nemotron-parse | — | Additional Disk Space | Not supported | ~16GB | Not supported | ~16GB | ~16GB | ~16GB | ~16GB | ~16GB | Not supported² | | Omni caption (nemotron-3-nano-omni-30b-a3b-reasoning) | ~62 GiB (BF16); ~33 GiB (FP8); ~21 GiB (NVFP4) | Additional Dedicated GPUs | 1 | 1 | 1 | 1 | 1 | Not supported | Not supported | 2 | Not supported³ | | Omni caption (nemotron-3-nano-omni-30b-a3b-reasoning) | — | Additional Disk Space (HF) | ~21–62GB | ~21–62GB | ~21–62GB | ~21–62GB | ~21–62GB | Not supported | Not supported | ~21–62GB | Not supported³ | | Omni caption (nemotron-3-nano-omni-30b-a3b-reasoning) | — | Additional Disk Space (NIM) | ~80GB | ~80GB | ~80GB | ~80GB | ~80GB | Not supported | Not supported | ~80GB | Not supported³ |