feat: add fts support#408
Open
egolearner wants to merge 24 commits into
Open
Conversation
74cfce2 to
a2882f1
Compare
…PLT indirect calls
Move the per-query filter check from the column-reader loop into the Disjunction/Conjunction/Phrase iterators so filtered docs no longer pay for block-max binary search, do_next alignment, or phase-2 position verification ($POS CF reads). TermDocIterator inherits the base-class default and stays unchanged.
block_max_info_for() now returns {score, last_doc} in one binary search
(with a small cache), so the standalone current_block_max_score(),
skip_to_next_block(), block_max_score_for() and block_max_last_doc_for()
methods have no live callers. Remove them from BitPackedPostingIterator,
the DocIterator base, and TermDocIterator, along with the now-dead
current_block_max_score_ member and its decode_block assignment. Tests
adjusted to query via block_max_info_for().
When an invert filter is highly selective compared to the FTS posting size, posting-driven evaluation walks far more docs than necessary. Mirror the existing vector_recall pattern: when invert match_count is below fts_brute_force_by_keys_ratio * doc_count, extract the small id set and AND it into the FTS root via a new CandidateDocIterator. The candidate iterator becomes the lead by cost, turning the posting walk into per-candidate advance() + matches() + score() and fully reusing the existing AND / filter-pushdown / BM25 machinery. - new CandidateDocIterator: ascending segment-local ids, lower_bound advance, zero score contribution - FtsColumnIndexer::search wraps root_iter in Conjunction when FtsQueryParams.candidate_ids is non-empty - new GlobalConfig::fts_brute_force_by_keys_ratio (default 0.05, independent from the vector knob because per-candidate FTS cost is higher due to phrase phase-2 IO), wired through C API + Python binding - DocFilter::get_bf_by_keys_and_update now takes an explicit ratio so the two callers (vector vs FTS) pick the right knob; on the brute- force branch invert_filter_ is cleared so DocFilter never re-checks the same ids - 9 iterator unit tests + 7 reader equivalence tests (Term / OR / AND / Phrase / Nested, coexistence with IndexFilter, empty-candidate fallback) + config default / validation asserts
JalinWang
reviewed
May 22, 2026
|
|
||
| import pytest | ||
|
|
||
| from zvec.model.param.query import Fts, Query |
Collaborator
There was a problem hiding this comment.
This naming "Fts" is a little bit too generic. Would it be more precise to name it after its underlying dependency, like _FtsQuery (binding) or FtsQueryParam (C++)?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
address #397