docs(extraction): document nemo-retriever[nemotron-parse] extra (NVBugs 6170950)#2099
Open
kheiss-uwzoo wants to merge 2 commits into
Open
docs(extraction): document nemo-retriever[nemotron-parse] extra (NVBugs 6170950)#2099kheiss-uwzoo wants to merge 2 commits into
kheiss-uwzoo wants to merge 2 commits into
Conversation
d3b8481 to
4c3fcb5
Compare
…gs 6170950) Tell users to install the nemotron-parse PyPI extra before extract_method=nemotron_parse so open_clip is available. Packaging already declares the extra on main; this is documentation only.
4c3fcb5 to
6a8fbc6
Compare
Contributor
Greptile SummaryThis is a documentation-only PR that closes a gap for users who install
|
| Filename | Overview |
|---|---|
| docs/docs/extraction/faq.md | Adds one inline sentence instructing users to install nemo-retriever[nemotron-parse] before calling extract_method="nemotron_parse"; both pip commands are properly quoted inside backticks. |
| docs/docs/extraction/overview.md | Adds one bullet noting pdfium vs. Nemotron Parse extraction methods and a correctly formatted MkDocs admonition pointing users to the [nemotron-parse] extra. |
| docs/docs/extraction/prerequisites-support-matrix.md | Appends a software-requirements bullet explaining that open-clip-torch is needed for Nemotron Parse extraction and is not included by the default or [local] installs; factually accurate. |
| docs/docs/extraction/troubleshoot.md | Adds a dedicated troubleshooting section for ModuleNotFoundError: No module named 'open_clip' with properly fenced bash code blocks; cross-reference anchor #software-requirements resolves correctly. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[pip install nemo-retriever] --> B{Which extra?}
B -->|default or local| C[pdfium extraction only\nopen_clip not installed]
B -->|nemotron-parse extra| E[open-clip-torch included\nopen_clip available]
B -->|local plus nemotron-parse| F[Local GPU mode\nopen-clip-torch included]
C --> G{Use nemotron_parse?}
G -->|Yes| H[ModuleNotFoundError\nNo module named open_clip]
G -->|No| I[Extraction succeeds]
H --> K[Fix: reinstall with nemotron-parse extra]
E --> J[Nemotron Parse extraction succeeds]
F --> J
K --> J
Reviews (2): Last reviewed commit: "docs(extraction): address PR review for ..." | Re-trigger Greptile
Quote the combined local+nemotron-parse pip command in the FAQ so shell copy-paste works, and drop the hard-pinned open-clip-torch version from troubleshoot prose.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes NVBugs 6170950 documentation gap: customers who install
nemo-retriever(ornemo-retriever[local]) and runextract_method="nemotron_parse"needopen_clipfromopen-clip-torch, which is provided only by the[nemotron-parse]PyPI extra—not the default or local install.This PR is documentation only. The
[nemotron-parse]optional dependency is already declared onmaininnemo_retriever/pyproject.toml; these edits tell users to install it before using Nemotron Parse PDF extraction.Changes
nemotron-parseextra and pdfium vs Nemotron Parse extraction methods.pip install "nemo-retriever[nemotron-parse]".nemotron_parse.ModuleNotFoundError: No module named 'open_clip'.Out of scope
main).releasenotes.md(deferred).Test plan
nemotron-parse(hyphen) inpyproject.toml.pip install "nemo-retriever[nemotron-parse]"then verifypython -c "import open_clip"succeeds.