feat: add fts support by egolearner · Pull Request #408 · alibaba/zvec

egolearner · 2026-05-15T09:17:45Z

address #397

… subdirs

…s on same block

…ring hot path

…PLT indirect calls

Move the per-query filter check from the column-reader loop into the Disjunction/Conjunction/Phrase iterators so filtered docs no longer pay for block-max binary search, do_next alignment, or phase-2 position verification ($POS CF reads). TermDocIterator inherits the base-class default and stays unchanged.

block_max_info_for() now returns {score, last_doc} in one binary search (with a small cache), so the standalone current_block_max_score(), skip_to_next_block(), block_max_score_for() and block_max_last_doc_for() methods have no live callers. Remove them from BitPackedPostingIterator, the DocIterator base, and TermDocIterator, along with the now-dead current_block_max_score_ member and its decode_block assignment. Tests adjusted to query via block_max_info_for().

When an invert filter is highly selective compared to the FTS posting size, posting-driven evaluation walks far more docs than necessary. Mirror the existing vector_recall pattern: when invert match_count is below fts_brute_force_by_keys_ratio * doc_count, extract the small id set and AND it into the FTS root via a new CandidateDocIterator. The candidate iterator becomes the lead by cost, turning the posting walk into per-candidate advance() + matches() + score() and fully reusing the existing AND / filter-pushdown / BM25 machinery. - new CandidateDocIterator: ascending segment-local ids, lower_bound advance, zero score contribution - FtsColumnIndexer::search wraps root_iter in Conjunction when FtsQueryParams.candidate_ids is non-empty - new GlobalConfig::fts_brute_force_by_keys_ratio (default 0.05, independent from the vector knob because per-candidate FTS cost is higher due to phrase phase-2 IO), wired through C API + Python binding - DocFilter::get_bf_by_keys_and_update now takes an explicit ratio so the two callers (vector vs FTS) pick the right knob; on the brute- force branch invert_filter_ is cleared so DocFilter never re-checks the same ids - 9 iterator unit tests + 7 reader equivalence tests (Term / OR / AND / Phrase / Nested, coexistence with IndexFilter, empty-candidate fallback) + config default / validation asserts

JalinWang · 2026-05-22T10:29:26Z

+
+import pytest
+
+from zvec.model.param.query import Fts, Query


This naming "Fts" is a little bit too generic. Would it be more precise to name it after its underlying dependency, like _FtsQuery (binding) or FtsQueryParam (C++)?

egolearner requested review from JalinWang, chinaux, feihongxu0824 and zhourrr as code owners May 15, 2026 09:17

github-actions Bot assigned egolearner May 15, 2026

egolearner force-pushed the feat/fts branch 5 times, most recently from 74cfce2 to a2882f1 Compare May 21, 2026 09:02

egolearner added 20 commits May 22, 2026 15:43

feat: add fts support

764532e

fix mac compile & ci

288ea80

refactor parse fts & add fts debug text

0704718

fix some problems

784e5fd

refactor(fts_column): reorganize into tokenizer/, posting/, iterator/…

7ae49af

… subdirs

perf: or use multi_get

84dd52a

perf: optimize disjunction iterator

22a612f

perf: fts use hashskiplist

3a05a15

refactor batch_get_postings

a7da75e

perf: optimize iterator virtual function

12a8d56

bench limit max_queries

34740d1

perf: use PinnableSlice

ab52311

perf: bitpacked avx2

14882eb

chore: rm unnecessary checkpoint

3badf49

perf: cache block_max_info_for result to skip repeated binary searche…

ca5808e

…s on same block

perf: precompute BM25 IDF weight per term to eliminate log() from sco…

bbb74ae

…ring hot path

perf: cache SIMD dispatch function pointers in iterator to eliminate …

5ea99ff

…PLT indirect calls

rename

d001175

egolearner added 4 commits May 22, 2026 15:43

PartialMerge no optimize

04cb8f6

fix fts score

2739620

python binding support fts

122fb51

egolearner force-pushed the feat/fts branch from a2882f1 to 122fb51 Compare May 22, 2026 07:43

egolearner requested a review from Cuiyus as a code owner May 22, 2026 07:43

egolearner changed the title ~~feat: add fts support in db layer~~ feat: add fts support May 22, 2026

JalinWang reviewed May 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add fts support#408

feat: add fts support#408
egolearner wants to merge 24 commits into
alibaba:mainfrom
egolearner:feat/fts

egolearner commented May 15, 2026

Uh oh!

JalinWang May 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

egolearner commented May 15, 2026

Uh oh!

JalinWang May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JalinWang May 22, 2026 •

edited

Loading