Surfaced by the khive-audit play (UNIFIED_AUDIT_REPORT) and confirmed in code.
Problem
SparseStore (ADR-031) ships the correct contract — SparseVector{indices,values} (CSR), insert_sparse/search_sparse, RRF-fused with dense + BM25 in khive-score. But the SQLite impl search_sparse_vectors (khive-db/src/stores/sparse.rs:~328) is a brute-force scan: it loads EVERY candidate row for the namespace (+ optional kind), deserializes indices_json + values_blob, scores all of them, then truncates to top_k. That's O(corpus) per query — the opposite of what sparse retrieval is for.
Fix
Replace the row scan with an inverted postings table: term(dimension index) → posting list of (subject_id, weight). At query time, touch only the postings for the query vector's non-zero indices, accumulate scores via a merge/accumulator, top-k. This is the standard SPLADE/sparse-retrieval structure and turns query cost from O(corpus) into O(sum of posting-list lengths for query terms).
Alternative (if learned-sparse isn't a near-term priority): retire this path and rely on BM25 + dense ANN fusion, and mark ADR-031 deferred. Decide explicitly rather than leave an O(N) scan wired into hybrid search.
Acceptance
Surfaced by the khive-audit play (UNIFIED_AUDIT_REPORT) and confirmed in code.
Problem
SparseStore(ADR-031) ships the correct contract —SparseVector{indices,values}(CSR),insert_sparse/search_sparse, RRF-fused with dense + BM25 inkhive-score. But the SQLite implsearch_sparse_vectors(khive-db/src/stores/sparse.rs:~328) is a brute-force scan: it loads EVERY candidate row for the namespace (+ optional kind), deserializesindices_json+values_blob, scores all of them, then truncates to top_k. That's O(corpus) per query — the opposite of what sparse retrieval is for.Fix
Replace the row scan with an inverted postings table: term(dimension index) → posting list of (subject_id, weight). At query time, touch only the postings for the query vector's non-zero indices, accumulate scores via a merge/accumulator, top-k. This is the standard SPLADE/sparse-retrieval structure and turns query cost from O(corpus) into O(sum of posting-list lengths for query terms).
Alternative (if learned-sparse isn't a near-term priority): retire this path and rely on BM25 + dense ANN fusion, and mark ADR-031 deferred. Decide explicitly rather than leave an O(N) scan wired into hybrid search.
Acceptance