Skip to content

feat: implement distributed VECTOR_SEARCH with parallel fragment/index scanning#608

Open
summaryzb wants to merge 1 commit into
lance-format:mainfrom
summaryzb:dis_vec_search
Open

feat: implement distributed VECTOR_SEARCH with parallel fragment/index scanning#608
summaryzb wants to merge 1 commit into
lance-format:mainfrom
summaryzb:dis_vec_search

Conversation

@summaryzb

@summaryzb summaryzb commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Summary

Implements distributed execution for the VECTOR_SEARCH table function, enabling Spark-parallel vector similarity search across Lance datasets. When enabled via spark.sql.lance.search.distributed.enabled=true, the driver plans one Spark task per execution unit
(indexed segment or fallback fragment), and each worker runs a local ANN scan or fallback to KNN scan without indexed segment. Results are merged with a global sort on _distance.

This provides horizontal scalability for vector search workloads on large datasets without requiring a centralized vector index server.

Behavior

Condition Execution Path
distributed.enabled=false Single-partition namespace.queryTable()
distributed.enabled=true, has vector index One task per index segment, plus fallback tasks for unindexed fragments (unless fastSearch=true)
distributed.enabled=true, no index One task per fragment (flat KNN)

Notice

Testing

  • BaseSparkDistributedVectorSearchTest exercises fallback-only scenarios
  • Spark 3.4 and 3.5 modules have thin test subclasses

Change-Id: I94c3cd431bcf5ba4bee7906838fa2d7cd4f6769e
@github-actions github-actions Bot added the enhancement New feature or request label Jun 10, 2026
@summaryzb

Copy link
Copy Markdown
Contributor Author

CI RED is expected since it rely on lance-format/lance#7169

@summaryzb

Copy link
Copy Markdown
Contributor Author

@jackye1995 @Xuanwo @LuciferYang PTAL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant