Skip to content

LanceDB as Default Vector Database Backend#1454

Open
kheiss-uwzoo wants to merge 80 commits intoNVIDIA:release/26.1.2from
kheiss-uwzoo:kheiss/lancedb
Open

LanceDB as Default Vector Database Backend#1454
kheiss-uwzoo wants to merge 80 commits intoNVIDIA:release/26.1.2from
kheiss-uwzoo:kheiss/lancedb

Conversation

@kheiss-uwzoo
Copy link
Collaborator

PR Summary: LanceDB as Default Vector Database Backend

Overview

Documentation updates establish LanceDB as the default vector database backend for NeMo Retriever extraction, with Milvus documented as a fully supported alternative. Content from the LanceDB product brief is integrated into the existing extraction docs in logical places.

Summary of Changes

1. Data Store Documentation (extraction/data-store.md)

  • Overview — States LanceDB as default vector DB and Milvus as alternative; clarifies that vdb_upload targets the configured backend.
  • Why LanceDB? — New section: Lance columnar format, IVF_HNSW_SQ index, embedded runtime, and why they improve latency.
  • Upload to LanceDB (default) — New section with:
    • Python LanceDB operator example (uri, table_name, index_type, hybrid).
    • Test harness configuration (tools/harness/test_configs.yaml: vdb_backend, hybrid, sparse).
    • Environment variables: VDB_BACKEND, HYBRID.
  • Hybrid search (LanceDB) — New section: BM25 FTS + vector, RRF, recall table (+0.5%–3.5%), latency note, one-time FTS index build.
  • Infrastructure: LanceDB vs Milvus — New comparison table (runtime, external services, Docker profile, index type, hybrid approach, persistence).
  • Upload to Milvus — Kept as alternative; clarified that you can continue using Milvus by setting vdb_backend: milvus or using the existing Milvus API.

2. Product-Level and Overview Docs

  • extraction/overview.md — Vector DB sentence updated to “LanceDB by default, or Milvus”.
  • index.md — “Embedding + Indexing” bullet updated to “LanceDB (default) or Milvus”.
  • extraction/support-matrix.md — Retrieval bullet updated to “LanceDB (default) or Milvus”.

3. Python API and Environment

  • extraction/nv-ingest-python-api.mdvdb_upload table and example caption updated to mention LanceDB as default and Milvus via milvus_uri.
  • extraction/environment-config.md — New “Vector Database (Retrieval) Environment Variables” table: VDB_BACKEND, HYBRID.

4. Quickstarts

  • extraction/quickstart-guide.md — Step 1f and profile table updated so default is LanceDB (no --profile retrieval); retrieval profile is for Milvus. Added tip that docker compose up without --profile retrieval uses LanceDB. Container/URI note updated for when Milvus is used.
  • extraction/quickstart-library-mode.md — Install step: default is LanceDB (no milvus-lite required); optional Milvus path described. Ingestion example uses LanceDB by default (no milvus_uri). Step 3 retrieval example described as “when using Milvus”; LanceDB retrieval pointed to data-store.

5. FAQ and Release Notes

  • extraction/faq.md — “Where does it ingest to?” updated for LanceDB default and Milvus supported.
  • extraction/releasenotes-nv-ingest.md — LanceDB bullet updated to “LanceDB is now the default vector database backend” and linked to Data Upload.

Files Modified

File Changes
docs/docs/extraction/data-store.md LanceDB as default, “Why LanceDB?”, LanceDB upload + API + harness, hybrid search, infrastructure table, Milvus as alternative
docs/docs/extraction/overview.md Vector DB sentence (LanceDB default, Milvus)
docs/docs/index.md Embedding + indexing bullet (LanceDB default or Milvus)
docs/docs/extraction/support-matrix.md Retrieval bullet (LanceDB default or Milvus)
docs/docs/extraction/nv-ingest-python-api.md vdb_upload description and example (LanceDB default, Milvus option)
docs/docs/extraction/environment-config.md VDB env vars: VDB_BACKEND, HYBRID
docs/docs/extraction/quickstart-guide.md Default = LanceDB; retrieval profile for Milvus; container/URI note
docs/docs/extraction/quickstart-library-mode.md Default = LanceDB; optional Milvus; ingestion/retrieval examples
docs/docs/extraction/faq.md Ingest destination (LanceDB default, Milvus)
docs/docs/extraction/releasenotes-nv-ingest.md LanceDB as default backend, link to Data Upload

Source

Content drawn from the LanceDB product brief (LanceDB Replaces Milvus as the Default Vector Database Backend), including rationale (format, index, embedded runtime), hybrid search (BM25 + vector, RRF) and recall table, test harness and env configuration, programmatic LanceDB usage, and infrastructure comparison (LanceDB vs Milvus).

Branch

LanceDB documentation branch.

jdye64 and others added 30 commits January 20, 2026 15:07
updateed MIG slice topic per recommended edits from Nicole
Co-authored-by: Julio Perez <37191411+jperez999@users.noreply.github.com>
Co-authored-by: Edward Kim <109497216+edknv@users.noreply.github.com>
Co-authored-by: Jeremy Dyer <jdye64@gmail.com>
Co-authored-by: Kurt Heiss <kheiss@nvidia.com>
Co-authored-by: nkmcalli <nmcallister@nvidia.com>
…ss (NVIDIA#1268)

Co-authored-by: Jeremy Dyer <jdye64@nvidia.com>
Co-authored-by: nkmcalli <nkmcalli@yahoo.com>
Co-authored-by: Jeremy Dyer <jdye64@gmail.com>
Co-authored-by: nkmcalli <nkmcalli@yahoo.com>
jdye64 and others added 26 commits February 25, 2026 13:57
Co-authored-by: Edward Kim <109497216+edknv@users.noreply.github.com>
Co-authored-by: Charles Blackmon-Luca <20627856+charlesbluca@users.noreply.github.com>
Co-authored-by: Julio Perez <jperez@nvidia.com>
Co-authored-by: edknv <edwardk@nvidia.com>
…1438)

Signed-off-by: Jacob Ioffe <jioffe@nvidia.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Edward Kim <109497216+edknv@users.noreply.github.com>
…ool (NVIDIA#1379)

Signed-off-by: Jacob Ioffe <jioffe@nvidia.com>
@kheiss-uwzoo kheiss-uwzoo requested a review from a team as a code owner February 27, 2026 23:20
@jperez999 jperez999 changed the base branch from release/26.1.2 to main March 2, 2026 17:46
@jperez999 jperez999 changed the base branch from main to release/26.1.2 March 2, 2026 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants