NVIDIA · kheiss-uwzoo · May 22, 2026 · May 22, 2026
@@ -29,6 +29,7 @@ For more information, refer to [Extract Captions from Images](nemo-retriever-api
 
 For scanned documents, or documents with complex layouts, 
 you can use [nemotron-parse](https://build.nvidia.com/nvidia/nemotron-parse) as an alternate PDF extraction method by setting `extract_method="nemotron_parse"`. 
+Install the Python client dependencies first with `pip install "nemo-retriever[nemotron-parse]"` (or combine extras as `pip install "nemo-retriever[local,nemotron-parse]"` when you also run models on your GPU).
 For more information, refer to [Nemotron Parse](https://build.nvidia.com/nvidia/nemotron-parse).
 
 ## Why are the environment variables different between library mode and self-hosted mode?

@@ -15,6 +15,7 @@ NeMo Retriever Library does the following:
 
 - Accept directories of input files and a series of configurable ingestion tasks to perform on that input
 - Allow the extracted content be retrieved from a VDB containing discrete metadata element
+- Support multiple extraction methods per document type to balance throughput and accuracy—for example, PDFs can use **pdfium** or [Nemotron Parse](https://build.nvidia.com/nvidia/nemotron-parse) (`extract_method="nemotron_parse"`)
 - Support various types of pre- and post- processing operations, including text splitting and chunking, transform and filtering, embedding generation, and image offloading to storage.
 
 !!! note

@@ -13,6 +13,11 @@ Before you begin using [NeMo Retriever Library](overview.md), confirm your softw
   `ffmpeg-python` and `nemo-retriever[multimedia]` do not install these binaries.
   On Helm with package-repo access, set `service.installFfmpeg=true`. For
   air-gapped clusters, see [Air-gapped and disconnected deployment](deployment-options.md#air-gapped-deployment).
+- For PDF extraction with `extract_method="nemotron_parse"`, install the Nemotron Parse
+  client dependencies with `pip install "nemo-retriever[nemotron-parse]"` (pulls
+  `open-clip-torch`, which provides the `open_clip` module required by the Nemotron Parse
+  NIM client). The base `nemo-retriever` install and `[local]` extra do not include this
+  package.
 
 !!! note
 

@@ -100,6 +100,30 @@ You can set the variable in your .env file or directly in your environment.
 
 
 
+## ModuleNotFoundError: No module named open_clip when using nemotron_parse { #modulenotfounderror-no-module-named-open-clip-when-using-nemotron-parse }
+
+When you run PDF extraction with `extract_method="nemotron_parse"`, you might see an error similar to the following:
+
+```text
+ModuleNotFoundError: No module named 'open_clip'
+```
+
+The Nemotron Parse NIM client requires the `open_clip` Python module, provided by `open-clip-torch`. That package is not part of the default `nemo-retriever` install or the `[local]` extra.
+
+Install the dedicated PyPI extra before running Nemotron Parse extraction:
+
+```bash
+pip install "nemo-retriever[nemotron-parse]"
+```
+
+For local GPU inference with Nemotron Parse, combine extras:
+
+```bash
+pip install "nemo-retriever[local,nemotron-parse]"
+```
+
+See also [What is NeMo Retriever Library?](overview.md) and [Pre-Requisites & Support Matrix](prerequisites-support-matrix.md#software-requirements).
+
 ## Extract method nemotron-parse doesn't support image files
 
 Currently, extraction with Nemotron parse doesn't support image files, only scanned PDFs.