LanceDB as Default Vector Database Backend#1454
Open
kheiss-uwzoo wants to merge 80 commits intoNVIDIA:release/26.1.2from
Open
LanceDB as Default Vector Database Backend#1454kheiss-uwzoo wants to merge 80 commits intoNVIDIA:release/26.1.2from
kheiss-uwzoo wants to merge 80 commits intoNVIDIA:release/26.1.2from
Conversation
updateed MIG slice topic per recommended edits from Nicole
Co-authored-by: Julio Perez <37191411+jperez999@users.noreply.github.com> Co-authored-by: Edward Kim <109497216+edknv@users.noreply.github.com> Co-authored-by: Jeremy Dyer <jdye64@gmail.com> Co-authored-by: Kurt Heiss <kheiss@nvidia.com> Co-authored-by: nkmcalli <nmcallister@nvidia.com>
…ss (NVIDIA#1268) Co-authored-by: Jeremy Dyer <jdye64@nvidia.com>
Co-authored-by: nkmcalli <nkmcalli@yahoo.com>
…#1346) Co-authored-by: Jeremy Dyer <jdye64@gmail.com>
Co-authored-by: Jeremy Dyer <jdye64@gmail.com>
Co-authored-by: nkmcalli <nkmcalli@yahoo.com>
Co-authored-by: Edward Kim <109497216+edknv@users.noreply.github.com> Co-authored-by: Charles Blackmon-Luca <20627856+charlesbluca@users.noreply.github.com> Co-authored-by: Julio Perez <jperez@nvidia.com> Co-authored-by: edknv <edwardk@nvidia.com>
…1438) Signed-off-by: Jacob Ioffe <jioffe@nvidia.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Jeremy Dyer <jdye64@gmail.com>
Co-authored-by: Edward Kim <109497216+edknv@users.noreply.github.com>
…ool (NVIDIA#1379) Signed-off-by: Jacob Ioffe <jioffe@nvidia.com>
…`pypdfium2<5.0.0` (NVIDIA#1440)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR Summary: LanceDB as Default Vector Database Backend
Overview
Documentation updates establish LanceDB as the default vector database backend for NeMo Retriever extraction, with Milvus documented as a fully supported alternative. Content from the LanceDB product brief is integrated into the existing extraction docs in logical places.
Summary of Changes
1. Data Store Documentation (
extraction/data-store.md)vdb_uploadtargets the configured backend.LanceDBoperator example (uri,table_name,index_type,hybrid).tools/harness/test_configs.yaml:vdb_backend,hybrid,sparse).VDB_BACKEND,HYBRID.vdb_backend: milvusor using the existing Milvus API.2. Product-Level and Overview Docs
extraction/overview.md— Vector DB sentence updated to “LanceDB by default, or Milvus”.index.md— “Embedding + Indexing” bullet updated to “LanceDB (default) or Milvus”.extraction/support-matrix.md— Retrieval bullet updated to “LanceDB (default) or Milvus”.3. Python API and Environment
extraction/nv-ingest-python-api.md—vdb_uploadtable and example caption updated to mention LanceDB as default and Milvus viamilvus_uri.extraction/environment-config.md— New “Vector Database (Retrieval) Environment Variables” table:VDB_BACKEND,HYBRID.4. Quickstarts
extraction/quickstart-guide.md— Step 1f and profile table updated so default is LanceDB (no--profile retrieval); retrieval profile is for Milvus. Added tip thatdocker compose upwithout--profile retrievaluses LanceDB. Container/URI note updated for when Milvus is used.extraction/quickstart-library-mode.md— Install step: default is LanceDB (no milvus-lite required); optional Milvus path described. Ingestion example uses LanceDB by default (nomilvus_uri). Step 3 retrieval example described as “when using Milvus”; LanceDB retrieval pointed to data-store.5. FAQ and Release Notes
extraction/faq.md— “Where does it ingest to?” updated for LanceDB default and Milvus supported.extraction/releasenotes-nv-ingest.md— LanceDB bullet updated to “LanceDB is now the default vector database backend” and linked to Data Upload.Files Modified
docs/docs/extraction/data-store.mddocs/docs/extraction/overview.mddocs/docs/index.mddocs/docs/extraction/support-matrix.mddocs/docs/extraction/nv-ingest-python-api.mdvdb_uploaddescription and example (LanceDB default, Milvus option)docs/docs/extraction/environment-config.mdVDB_BACKEND,HYBRIDdocs/docs/extraction/quickstart-guide.mddocs/docs/extraction/quickstart-library-mode.mddocs/docs/extraction/faq.mddocs/docs/extraction/releasenotes-nv-ingest.mdSource
Content drawn from the LanceDB product brief (LanceDB Replaces Milvus as the Default Vector Database Backend), including rationale (format, index, embedded runtime), hybrid search (BM25 + vector, RRF) and recall table, test harness and env configuration, programmatic LanceDB usage, and infrastructure comparison (LanceDB vs Milvus).
Branch
LanceDB documentation branch.