Skip to content

docs: sync main extraction docs with 26.05#2178

Closed
kheiss-uwzoo wants to merge 2 commits into
NVIDIA:mainfrom
kheiss-uwzoo:docs/sync-main-docs-with-26.05
Closed

docs: sync main extraction docs with 26.05#2178
kheiss-uwzoo wants to merge 2 commits into
NVIDIA:mainfrom
kheiss-uwzoo:docs/sync-main-docs-with-26.05

Conversation

@kheiss-uwzoo
Copy link
Copy Markdown
Collaborator

Summary

  • Confirmed 14 documentation files under docs/ differed between main and 26.05.
  • Synced main to match the 26.05 documentation content exactly (verified with git diff upstream/26.05 -- docs/ showing no remaining differences).
  • Changes include release notes updates, support matrix cleanup, custom metadata doc refresh, notebooks page path consolidation (notebooks/index.mdnotebooks.md), and mkdocs.yml nav updates.

Files changed

  • docs/docs/extraction/audio-video.md
  • docs/docs/extraction/concepts.md
  • docs/docs/extraction/custom-metadata.md
  • docs/docs/extraction/deployment-options.md
  • docs/docs/extraction/faq.md
  • docs/docs/extraction/getting-started-about.md
  • docs/docs/extraction/integrations-langchain-llamaindex-haystack.md
  • docs/docs/extraction/multimodal-extraction.md
  • docs/docs/extraction/notebooks.md (renamed from notebooks/index.md)
  • docs/docs/extraction/overview.md
  • docs/docs/extraction/prerequisites-support-matrix.md
  • docs/docs/extraction/releasenotes.md
  • docs/docs/extraction/troubleshoot.md
  • docs/mkdocs.yml

Test plan

  • Verify git diff upstream/26.05 -- docs/ is empty on this branch
  • Confirm MkDocs site builds without nav/link errors
  • Spot-check release notes and support matrix pages render correctly

@kheiss-uwzoo kheiss-uwzoo requested review from a team as code owners May 30, 2026 13:30
@kheiss-uwzoo kheiss-uwzoo requested a review from jioffe502 May 30, 2026 13:30
@kheiss-uwzoo kheiss-uwzoo added the doc Improvements or additions to documentation label May 30, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 30, 2026

Greptile Summary

This PR syncs 14 documentation files under docs/ from the 26.05 release branch to main, covering release notes, the support matrix, the notebooks page rename, and mkdocs.yml nav/redirect updates.

  • Notebooks page rename: notebooks/index.mdnotebooks.md; all internal cross-links and the mkdocs.yml nav entry updated consistently. The stale reverse-redirect is correctly removed.
  • Support matrix & release notes: CUDA/Driver requirements and the default OCR NIM updated to 26.05 values; Nemotron Parse and Omni chart-captioning callouts removed across prerequisites-support-matrix.md, multimodal-extraction.md, faq.md, and troubleshoot.md.
  • custom-metadata.md: Page rewritten with a simpler intro, but the new ingestion code example omits the three metadata sidecar parameters from .vdb_upload(), so users who copy it will ingest without metadata attached.

Confidence Score: 3/5

Safe to merge after addressing the incomplete metadata ingestion example and verifying the 26.03 docs URL.

The custom-metadata.md rewrite creates a code example that appears to demonstrate metadata ingestion but silently omits the required sidecar parameters, leaving users with a working-looking snippet that does nothing. The 26.03 release notes URL was changed to a format that does not match the versioning convention used by every other link on the same page and may resolve to a 404.

docs/docs/extraction/custom-metadata.md (incomplete vdb_upload example) and docs/docs/extraction/releasenotes.md (26.03 URL format mismatch).

Important Files Changed

Filename Overview
docs/docs/extraction/custom-metadata.md Page restructured and significantly shortened; the new vdb_upload code example is missing the three metadata sidecar parameters, so it does not actually attach metadata.
docs/mkdocs.yml Nav updated for notebooks.md rename, stale redirect removed; exclude_docs guard weakened from /index.md to index.md with the explanatory comment deleted.
docs/docs/extraction/releasenotes.md 26.05 section condensed to a release-line summary; 26.03 section added inline; the 26.03 versioned docs URL format changed from 26.3.0 to 26.03, inconsistent with all other version links.
docs/docs/extraction/prerequisites-support-matrix.md CUDA/Driver requirements updated to 12.2/535; OCR NIM updated to nemotron-ocr-v2; B200 limitation note and Nemotron Parse install prerequisites removed in line with 26.05.
docs/docs/extraction/notebooks.md Renamed from notebooks/index.md; relative link to overview.md corrected; Related Topics section added.
docs/docs/extraction/audio-video.md GPU pinning note converted to a proper !!! important admonition block; code example formatting corrected.
docs/docs/extraction/troubleshoot.md Removed the open_clip / nemotron-parse troubleshooting entry, consistent with removing the [nemotron-parse] extra from prerequisites.
docs/docs/extraction/multimodal-extraction.md Removed the Omni caption-scope callout for chart regions and simplified the OCR engine description to default to nemotron-ocr-v2.
docs/docs/extraction/faq.md Removed the FAQ entry about PDF chart captioning with Omni, consistent with the multimodal-extraction.md cleanup.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["notebooks/index.md (old)"] -->|renamed to| B["notebooks.md"]
    B --> C["mkdocs.yml nav updated"]
    A -->|reverse redirect removed| D["redirect deleted"]
    C --> E["exclude_docs: /index.md changed to index.md"]

    F["prerequisites-support-matrix.md"] -->|CUDA 12.2 / Driver 535| G["Updated requirements"]
    F -->|nemotron-ocr-v1 to v2| H["OCR NIM updated"]
    F -->|Nemotron Parse removed| I["Affects troubleshoot.md, faq.md, multimodal-extraction.md"]

    J["releasenotes.md"] -->|26.05 condensed| K["26.03 section added inline"]
    K -->|URL 26.3.0 changed to 26.03| L["Possible broken link"]

    M["custom-metadata.md"] -->|Rewritten| N["vdb_upload example"]
    N -->|Missing meta_dataframe, meta_source_field, meta_fields| O["Metadata not attached"]
Loading

Comments Outside Diff (1)

  1. docs/docs/extraction/custom-metadata.md, line 62-92 (link)

    P1 Metadata parameters missing from the vdb_upload call

    The example creates meta_df and writes it to meta_file.csv, but the .vdb_upload(...) call on line 83–88 does not pass any of the three required metadata parameters (meta_dataframe, meta_source_field, meta_fields). As written, a user who copies this example will ingest documents without any custom metadata attached — the opposite of what the page advertises. The prose on line 92 says "merge values from meta_df (or file_path) into each document's content_metadata before vdb_upload" but provides no code showing how to do so.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: docs/docs/extraction/custom-metadata.md
    Line: 62-92
    
    Comment:
    **Metadata parameters missing from the `vdb_upload` call**
    
    The example creates `meta_df` and writes it to `meta_file.csv`, but the `.vdb_upload(...)` call on line 83–88 does not pass any of the three required metadata parameters (`meta_dataframe`, `meta_source_field`, `meta_fields`). As written, a user who copies this example will ingest documents **without** any custom metadata attached — the opposite of what the page advertises. The prose on line 92 says "merge values from `meta_df` (or `file_path`) into each document's `content_metadata` before `vdb_upload`" but provides no code showing how to do so.
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 3
docs/docs/extraction/custom-metadata.md:62-92
**Metadata parameters missing from the `vdb_upload` call**

The example creates `meta_df` and writes it to `meta_file.csv`, but the `.vdb_upload(...)` call on line 83–88 does not pass any of the three required metadata parameters (`meta_dataframe`, `meta_source_field`, `meta_fields`). As written, a user who copies this example will ingest documents **without** any custom metadata attached — the opposite of what the page advertises. The prose on line 92 says "merge values from `meta_df` (or `file_path`) into each document's `content_metadata` before `vdb_upload`" but provides no code showing how to do so.

### Issue 2 of 3
docs/mkdocs.yml:209-212
The `exclude_docs` pattern was changed from `/index.md` to `index.md`, and the comment explaining the distinction was removed. In gitignore syntax, `index.md` (no leading slash) matches every `index.md` at any directory depth, while `/index.md` restricts the match to the docs root only. With `notebooks/index.md` now renamed there is currently only one `index.md` in the tree so the behavior is equivalent — but silently dropping the guard means any future `index.md` added in a subdirectory will be excluded without warning.

```suggestion
# MkDocs 1.6+: exclude suite landing and legacy duplicate pages (still in repo for parity).
# extraction/chunking.md — removed from nav; content is under concepts.md (redirect_maps keeps old URLs).
# Use /index.md (docs root only); bare index.md would exclude every index.md in any subdirectory.
exclude_docs: |
  /index.md
```

### Issue 3 of 3
docs/docs/extraction/releasenotes.md:57
The 26.03 versioned docs URL was changed to `26.03` format, but every other link in this table uses the `YY.M.P` semver-style path (e.g., `25.9.0`, `26.1.2`, `26.1.1`). The old value `26.3.0` was consistent with that pattern; `26.03` is not and may resolve to a 404 on the NVIDIA docs CDN.

```suggestion
| [26.03](https://docs.nvidia.com/nemo/retriever/26.3.0/extraction/releasenotes/)
```

Reviews (1): Last reviewed commit: "Update faq.md" | Re-trigger Greptile

Comment thread docs/mkdocs.yml
Comment on lines 209 to +212
# MkDocs 1.6+: exclude suite landing and legacy duplicate pages (still in repo for parity).
# extraction/chunking.md — removed from nav; content is under concepts.md (redirect_maps keeps old URLs).
# Use /index.md (docs root only); bare index.md would exclude every index.md (e.g. extraction/notebooks/index.md).
exclude_docs: |
/index.md
index.md
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The exclude_docs pattern was changed from /index.md to index.md, and the comment explaining the distinction was removed. In gitignore syntax, index.md (no leading slash) matches every index.md at any directory depth, while /index.md restricts the match to the docs root only. With notebooks/index.md now renamed there is currently only one index.md in the tree so the behavior is equivalent — but silently dropping the guard means any future index.md added in a subdirectory will be excluded without warning.

Suggested change
# MkDocs 1.6+: exclude suite landing and legacy duplicate pages (still in repo for parity).
# extraction/chunking.md — removed from nav; content is under concepts.md (redirect_maps keeps old URLs).
# Use /index.md (docs root only); bare index.md would exclude every index.md (e.g. extraction/notebooks/index.md).
exclude_docs: |
/index.md
index.md
# MkDocs 1.6+: exclude suite landing and legacy duplicate pages (still in repo for parity).
# extraction/chunking.md — removed from nav; content is under concepts.md (redirect_maps keeps old URLs).
# Use /index.md (docs root only); bare index.md would exclude every index.md in any subdirectory.
exclude_docs: |
/index.md
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/mkdocs.yml
Line: 209-212

Comment:
The `exclude_docs` pattern was changed from `/index.md` to `index.md`, and the comment explaining the distinction was removed. In gitignore syntax, `index.md` (no leading slash) matches every `index.md` at any directory depth, while `/index.md` restricts the match to the docs root only. With `notebooks/index.md` now renamed there is currently only one `index.md` in the tree so the behavior is equivalent — but silently dropping the guard means any future `index.md` added in a subdirectory will be excluded without warning.

```suggestion
# MkDocs 1.6+: exclude suite landing and legacy duplicate pages (still in repo for parity).
# extraction/chunking.md — removed from nav; content is under concepts.md (redirect_maps keeps old URLs).
# Use /index.md (docs root only); bare index.md would exclude every index.md in any subdirectory.
exclude_docs: |
  /index.md
```

How can I resolve this? If you propose a fix, please make it concise.

## Release Notes for Previous Versions

| [26.03](https://docs.nvidia.com/nemo/retriever/26.3.0/extraction/releasenotes/)
| [26.03](https://docs.nvidia.com/nemo/retriever/26.03/extraction/releasenotes/)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 The 26.03 versioned docs URL was changed to 26.03 format, but every other link in this table uses the YY.M.P semver-style path (e.g., 25.9.0, 26.1.2, 26.1.1). The old value 26.3.0 was consistent with that pattern; 26.03 is not and may resolve to a 404 on the NVIDIA docs CDN.

Suggested change
| [26.03](https://docs.nvidia.com/nemo/retriever/26.03/extraction/releasenotes/)
| [26.03](https://docs.nvidia.com/nemo/retriever/26.3.0/extraction/releasenotes/)
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/docs/extraction/releasenotes.md
Line: 57

Comment:
The 26.03 versioned docs URL was changed to `26.03` format, but every other link in this table uses the `YY.M.P` semver-style path (e.g., `25.9.0`, `26.1.2`, `26.1.1`). The old value `26.3.0` was consistent with that pattern; `26.03` is not and may resolve to a 404 on the NVIDIA docs CDN.

```suggestion
| [26.03](https://docs.nvidia.com/nemo/retriever/26.3.0/extraction/releasenotes/)
```

How can I resolve this? If you propose a fix, please make it concise.

@kheiss-uwzoo
Copy link
Copy Markdown
Collaborator Author

Closing: sync direction was incorrect. main/docs/docs already has the authoritative GA content (especially releasenotes.md). Copying from 26.05 regressed main. Follow-up PR will sync 26.05 to match main instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant