Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion docs/docs/extraction/multimodal-extraction.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,8 @@ For natural-language infographic descriptions, optionally enable [image captioni

Scanned PDFs and image-only pages rely on OCR and hybrid paths that combine native text extraction with OCR when needed. For extract methods such as `ocr` and `pdfium_hybrid`, refer to the [Python API reference](nemo-retriever-api-reference.md).

The default OCR engine is **Nemotron OCR v2**. When you run extraction **locally with HuggingFace models**, v2 operates in **multilingual** mode by default (`multi`). Pass `--ocr-lang english` on the CLI (or the equivalent API parameter) for English-only v2, or `--ocr-version v1` for the legacy engine. For Kubernetes installs, the chart's OCR NIM defaults and image are documented under [Nemotron OCR v2 — language mode](prerequisites-support-matrix.md#nemotron-ocr-v2-language-mode) in the support matrix.

**Related**

- [Text and layout extraction](#text-and-layout-extraction)
Expand All @@ -78,7 +80,7 @@ Image captioning generates natural-language descriptions for unstructured image

- [Multimodal embeddings (VLM)](embedding.md)
- [Metadata reference](content-metadata.md)
- [Image captioning (26.05)](prerequisites-support-matrix.md#image-captioning-2605) — optional NIM and hardware on the support matrix
- [Image captioning (26.05)](prerequisites-support-matrix.md#image-captioning-2605)

## Metadata and content schema { #metadata-and-content-schema }

Expand Down
12 changes: 10 additions & 2 deletions docs/docs/extraction/prerequisites-support-matrix.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,14 @@ The production Helm chart enables these NIM microservices **by default** (for ex
| `ocr` | [nemotron-ocr-v2](https://huggingface.co/nvidia/nemotron-ocr-v2) | Image OCR |
| `vlm_embed` | [llama-nemotron-embed-vl-1b-v2](https://huggingface.co/nvidia/llama-nemotron-embed-vl-1b-v2) | Multimodal (VL) embedding |

### Nemotron OCR v2 language mode { #nemotron-ocr-v2-language-mode }

!!! note

**Local Hugging Face inference:** When you deploy locally with HuggingFace model weights (for example `pip install "nemo-retriever[local]"` and GPU inference without remote OCR NIM URLs), the default OCR engine is **Nemotron OCR v2**, which runs in **multilingual** mode by default (`multi`). For English-only v2, pass `--ocr-lang english` on the [CLI](https://github.com/NVIDIA/NeMo-Retriever/tree/main/nemo_retriever/docs/cli) or set the equivalent `ocr_lang` parameter in the Python API. Use `--ocr-version v1` for the legacy English-only engine. Remote OCR NIM endpoints use their own model and language behavior; local OCR language selectors are not sent on remote requests.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 CLI link targets main instead of the 26.05 branch

The anchor text says this is 26.05-specific guidance, but the CLI link resolves to https://github.com/NVIDIA/NeMo-Retriever/tree/main/nemo_retriever/docs/cli. If the CLI interface diverges between main and 26.05, readers following this link from the versioned docs will see instructions that may not match their installed release. Consider pinning to 26.05 (or the appropriate release tag) for consistency with the rest of this section.

Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/docs/extraction/prerequisites-support-matrix.md
Line: 77

Comment:
**CLI link targets `main` instead of the `26.05` branch**

The anchor text says this is 26.05-specific guidance, but the CLI link resolves to `https://github.com/NVIDIA/NeMo-Retriever/tree/main/nemo_retriever/docs/cli`. If the CLI interface diverges between `main` and `26.05`, readers following this link from the versioned docs will see instructions that may not match their installed release. Consider pinning to `26.05` (or the appropriate release tag) for consistency with the rest of this section.

How can I resolve this? If you propose a fix, please make it concise.


**Helm / NIM (26.05):** The [NeMo Retriever Helm chart](https://github.com/NVIDIA/NeMo-Retriever/blob/26.05/nemo_retriever/helm/README.md) deploys the core OCR NIM under [`nimOperator.ocr`](https://github.com/NVIDIA/NeMo-Retriever/blob/26.05/nemo_retriever/helm/values.yaml#L817-L852). When that block targets **nemotron-ocr-v2** for your release, the deployed NIM also runs in multilingual mode by default. Confirm the `repository` and `tag` in `values.yaml` before you upgrade.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Hardcoded line-range anchor in values.yaml link will go stale

The URL values.yaml#L817-L852 pins specific line numbers that will drift the moment anyone adds or removes lines above that block in values.yaml. When the anchor breaks, readers land at the wrong section with no error. Consider linking to the file root (values.yaml) without the fragment, or to a named heading/comment in values.yaml that is stable across edits.

Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/docs/extraction/prerequisites-support-matrix.md
Line: 79

Comment:
**Hardcoded line-range anchor in `values.yaml` link will go stale**

The URL `values.yaml#L817-L852` pins specific line numbers that will drift the moment anyone adds or removes lines above that block in `values.yaml`. When the anchor breaks, readers land at the wrong section with no error. Consider linking to the file root (`values.yaml`) without the fragment, or to a named heading/comment in `values.yaml` that is stable across edits.

How can I resolve this? If you propose a fix, please make it concise.


Default VL embedder container and model for release deployments:

- **Image:** `nvcr.io/nim/nvidia/llama-nemotron-embed-vl-1b-v2:1.12.0`
Expand Down Expand Up @@ -110,8 +118,8 @@ Model repositories and NIM references are linked in [Core and Advanced Pipeline
| Core Features | — | Total Disk Space | ~150GB | ~150GB | ~150GB | ~150GB | ~150GB | ~150GB | ~150GB | ~150GB | ~150GB |
| Audio (parakeet-1-1b-ctc-en-us) | ~4.0 GiB (`model.safetensors`; the repo also ships `parakeet-ctc-1.1b.nemo` of similar size—use one format to avoid roughly doubling disk use) | Additional Dedicated GPUs | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1¹ |
| Audio (parakeet-1-1b-ctc-en-us) | — | Additional Disk Space | ~37GB | ~37GB | ~37GB | ~37GB | ~37GB | ~37GB | ~37GB | ~37GB | ~37GB¹ |
| nemotron-parse | ~3.5 GiB | Additional Dedicated GPUs | Not supported | Not supported | Not supported | 1 | 1 | 1 | 1 | 1 | Not supported² |
| nemotron-parse | — | Additional Disk Space | Not supported | Not supported | Not supported | ~16GB | ~16GB | ~16GB | ~16GB | ~16GB | Not supported² |
| nemotron-parse | ~3.5 GiB | Additional Dedicated GPUs | Not supported | 1 | Not supported | 1 | 1 | 1 | 1 | 1 | Not supported² |
| nemotron-parse | — | Additional Disk Space | Not supported | ~16GB | Not supported | ~16GB | ~16GB | ~16GB | ~16GB | ~16GB | Not supported² |
| Omni caption (nemotron-3-nano-omni-30b-a3b-reasoning) | ~62 GiB (BF16); ~33 GiB (FP8); ~21 GiB (NVFP4) | Additional Dedicated GPUs | 1 | 1 | 1 | 1 | 1 | Not supported | Not supported | 2 | Not supported³ |
| Omni caption (nemotron-3-nano-omni-30b-a3b-reasoning) | — | Additional Disk Space (HF) | ~21–62GB | ~21–62GB | ~21–62GB | ~21–62GB | ~21–62GB | Not supported | Not supported | ~21–62GB | Not supported³ |
| Omni caption (nemotron-3-nano-omni-30b-a3b-reasoning) | — | Additional Disk Space (NIM) | ~80GB | ~80GB | ~80GB | ~80GB | ~80GB | Not supported | Not supported | ~80GB | Not supported³ |
Expand Down
Loading