Skip to content

docs(extraction): document nemo-retriever[nemotron-parse] extra (NVBugs 6170950)#2099

Open
kheiss-uwzoo wants to merge 2 commits into
NVIDIA:mainfrom
kheiss-uwzoo:kheiss/nemotron_parse_extraction
Open

docs(extraction): document nemo-retriever[nemotron-parse] extra (NVBugs 6170950)#2099
kheiss-uwzoo wants to merge 2 commits into
NVIDIA:mainfrom
kheiss-uwzoo:kheiss/nemotron_parse_extraction

Conversation

@kheiss-uwzoo
Copy link
Copy Markdown
Collaborator

@kheiss-uwzoo kheiss-uwzoo commented May 22, 2026

Summary

Fixes NVBugs 6170950 documentation gap: customers who install nemo-retriever (or nemo-retriever[local]) and run extract_method="nemotron_parse" need open_clip from open-clip-torch, which is provided only by the [nemotron-parse] PyPI extra—not the default or local install.

This PR is documentation only. The [nemotron-parse] optional dependency is already declared on main in nemo_retriever/pyproject.toml; these edits tell users to install it before using Nemotron Parse PDF extraction.

Changes

  • overview.md — Note the nemotron-parse extra and pdfium vs Nemotron Parse extraction methods.
  • prerequisites-support-matrix.md — Software requirement for pip install "nemo-retriever[nemotron-parse]".
  • faq.md — Install guidance when using advanced visual parsing / nemotron_parse.
  • troubleshoot.md — Section for ModuleNotFoundError: No module named 'open_clip'.

Out of scope

  • Packaging changes (extra already exists on main).
  • Runtime dependency checks or new tests (not included in this docs-only PR).
  • releasenotes.md (deferred).

Test plan

  • Review rendered docs on overview, prerequisites, FAQ, and troubleshoot pages.
  • Confirm install commands match PyPI extra name nemotron-parse (hyphen) in pyproject.toml.
  • Optional: fresh venv pip install "nemo-retriever[nemotron-parse]" then verify python -c "import open_clip" succeeds.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 22, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@kheiss-uwzoo kheiss-uwzoo force-pushed the kheiss/nemotron_parse_extraction branch from d3b8481 to 4c3fcb5 Compare May 22, 2026 16:29
…gs 6170950)

Tell users to install the nemotron-parse PyPI extra before extract_method=nemotron_parse so open_clip is available. Packaging already declares the extra on main; this is documentation only.
@kheiss-uwzoo kheiss-uwzoo force-pushed the kheiss/nemotron_parse_extraction branch from 4c3fcb5 to 6a8fbc6 Compare May 22, 2026 16:31
@kheiss-uwzoo kheiss-uwzoo changed the title fix(nemotron-parse): document and validate open_clip extra (NVBugs 6170950) docs(extraction): document nemo-retriever[nemotron-parse] extra (NVBugs 6170950) May 22, 2026
@kheiss-uwzoo kheiss-uwzoo marked this pull request as ready for review May 22, 2026 16:33
@kheiss-uwzoo kheiss-uwzoo requested review from a team as code owners May 22, 2026 16:34
@kheiss-uwzoo kheiss-uwzoo requested a review from jdye64 May 22, 2026 16:34
@kheiss-uwzoo kheiss-uwzoo added the doc Improvements or additions to documentation label May 22, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 22, 2026

Greptile Summary

This is a documentation-only PR that closes a gap for users who install nemo-retriever or nemo-retriever[local] and then call extract_method="nemotron_parse" without the [nemotron-parse] extra, causing an open_clip import error at runtime. No code, packaging, or runtime behaviour changes are made.

  • faq.md — adds an inline install hint with fully-quoted pip commands inside backticks; consistent with the existing prose style of that page.
  • prerequisites-support-matrix.md — adds an explicit software-requirement bullet for the [nemotron-parse] extra, noting the open-clip-torch dependency and that neither the base nor [local] installs include it.
  • overview.md — adds a capability bullet and a MkDocs !!! note block pointing users to pip install "nemo-retriever[nemotron-parse]" before using Nemotron Parse.
  • troubleshoot.md — adds a complete troubleshooting section with properly fenced bash code blocks and correct cross-reference anchors.

Confidence Score: 5/5

Documentation-only change with no code, packaging, or runtime impact; safe to merge.

All four files make targeted, factually accurate additions: package names, module names, and MkDocs anchor references all check out against the existing codebase. No logic, configuration, or dependency files are touched.

No files require special attention.

Important Files Changed

Filename Overview
docs/docs/extraction/faq.md Adds one inline sentence instructing users to install nemo-retriever[nemotron-parse] before calling extract_method="nemotron_parse"; both pip commands are properly quoted inside backticks.
docs/docs/extraction/overview.md Adds one bullet noting pdfium vs. Nemotron Parse extraction methods and a correctly formatted MkDocs admonition pointing users to the [nemotron-parse] extra.
docs/docs/extraction/prerequisites-support-matrix.md Appends a software-requirements bullet explaining that open-clip-torch is needed for Nemotron Parse extraction and is not included by the default or [local] installs; factually accurate.
docs/docs/extraction/troubleshoot.md Adds a dedicated troubleshooting section for ModuleNotFoundError: No module named 'open_clip' with properly fenced bash code blocks; cross-reference anchor #software-requirements resolves correctly.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[pip install nemo-retriever] --> B{Which extra?}
    B -->|default or local| C[pdfium extraction only\nopen_clip not installed]
    B -->|nemotron-parse extra| E[open-clip-torch included\nopen_clip available]
    B -->|local plus nemotron-parse| F[Local GPU mode\nopen-clip-torch included]
    C --> G{Use nemotron_parse?}
    G -->|Yes| H[ModuleNotFoundError\nNo module named open_clip]
    G -->|No| I[Extraction succeeds]
    H --> K[Fix: reinstall with nemotron-parse extra]
    E --> J[Nemotron Parse extraction succeeds]
    F --> J
    K --> J
Loading

Reviews (2): Last reviewed commit: "docs(extraction): address PR review for ..." | Re-trigger Greptile

Comment thread docs/docs/extraction/faq.md Outdated
Comment thread docs/docs/extraction/troubleshoot.md Outdated
Quote the combined local+nemotron-parse pip command in the FAQ so shell copy-paste works, and drop the hard-pinned open-clip-torch version from troubleshoot prose.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant