Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 12 additions & 10 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -357,10 +357,11 @@ jobs:
#
# Pattern adapted from microsoft/DiskANN's CI (also a vector-search crate).
# The local setup-intel-sde action owns the fixed Intel downloadmirror build,
# SHA256 verification, and x86_64 runner guard. The SHA gate still fails
# closed for any archive we extract. While Intel's CloudFront/WAF challenge
# blocks GitHub-hosted runners, this PR temporarily lets the job report an
# explicit unavailable state and skip SDE-dependent steps.
# SHA256 verification, and x86_64 runner guard. The SHA gate fails closed for
# any archive we extract. Pull requests may soft-skip during Intel mirror
# outages, but push/workflow_dispatch runs fail closed; the release gate only
# accepts the post-merge push workflow result, so a release cannot proceed
# without the SDE probe and AVX-512 tests actually executing on main.
avx512:
name: avx512 (Intel SDE / Sapphire Rapids)
runs-on: ubuntu-24.04
Expand Down Expand Up @@ -393,12 +394,13 @@ jobs:
with:
version: ${{ env.SDE_VERSION }}
sha256: ${{ env.SDE_SHA256 }}
allow-unavailable: true
- name: note Intel SDE unavailable
if: steps.sde.outputs.sde-available != 'true'
run: echo "::notice::Intel SDE archive unavailable; temporarily skipping AVX-512 SDE coverage."
allow-unavailable: ${{ github.event_name == 'pull_request' }}
- name: note Intel SDE unavailable on PR
if: ${{ github.event_name == 'pull_request' && steps.sde.outputs.sde-available != 'true' }}
run: |
echo "::warning::Intel SDE archive unavailable on this pull request; push and release-gated runs fail closed."
- name: sanity-check AVX-512 detection under SDE
if: steps.sde.outputs.sde-available == 'true'
if: ${{ steps.sde.outputs.sde-available == 'true' }}
env:
SDE_PATH: ${{ steps.sde.outputs.sde-path }}
run: |
Expand Down Expand Up @@ -432,7 +434,7 @@ jobs:
"${SDE_PATH}" -spr -- \
"${RUNNER_TEMP}/sde-probe/target/release/sde-probe"
- name: cargo test under SDE (AVX-512 kernels)
if: steps.sde.outputs.sde-available == 'true'
if: ${{ steps.sde.outputs.sde-available == 'true' }}
env:
CARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_RUNNER: ${{ steps.sde.outputs.sde-path }} -spr --
# Cause any AVX-512 test that would silently skip on a non-AVX-512 host
Expand Down
17 changes: 9 additions & 8 deletions .github/workflows/coverage.yml
Original file line number Diff line number Diff line change
Expand Up @@ -54,15 +54,16 @@ jobs:
with:
version: ${{ env.SDE_VERSION }}
sha256: ${{ env.SDE_SHA256 }}
allow-unavailable: true
- name: Note Intel SDE unavailable
if: steps.sde.outputs.sde-available != 'true'
run: echo "::notice::Intel SDE archive unavailable; temporarily skipping SDE-backed coverage."
allow-unavailable: ${{ github.event_name == 'pull_request' }}
- name: note Intel SDE unavailable on PR
if: ${{ github.event_name == 'pull_request' && steps.sde.outputs.sde-available != 'true' }}
run: |
echo "::warning::Intel SDE archive unavailable on this pull request; push and release-gated runs fail closed."
- name: Install cargo-llvm-cov (pinned)
if: steps.sde.outputs.sde-available == 'true'
if: ${{ steps.sde.outputs.sde-available == 'true' }}
run: cargo install cargo-llvm-cov --version 0.8.7 --locked
- name: Sanity-check AVX-512 detection under SDE
if: steps.sde.outputs.sde-available == 'true'
if: ${{ steps.sde.outputs.sde-available == 'true' }}
env:
SDE_PATH: ${{ steps.sde.outputs.sde-path }}
run: |
Expand Down Expand Up @@ -99,12 +100,12 @@ jobs:
# feature detection reaches the AVX-512 kernels. That makes the coverage
# floor reflect the same exercised code as the dedicated ci.yml avx512 job.
- name: Generate coverage (lcov) + enforce floor
if: steps.sde.outputs.sde-available == 'true'
if: ${{ steps.sde.outputs.sde-available == 'true' }}
env:
CARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_RUNNER: ${{ steps.sde.outputs.sde-path }} -spr --
run: cargo llvm-cov --all-features --target x86_64-unknown-linux-gnu --fail-under-lines 85 --lcov --output-path lcov.info
- name: Upload coverage to Codecov
if: steps.sde.outputs.sde-available == 'true'
if: ${{ steps.sde.outputs.sde-available == 'true' }}
uses: codecov/codecov-action@fb8b3582c8e4def4969c97caa2f19720cb33a72f # v7.0.0
with:
files: lcov.info
Expand Down
83 changes: 47 additions & 36 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,21 @@ All notable changes to this project are documented here.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## Unreleased
## 0.5.0 - 2026-06-19

### Security

- **Hardened `.ovfs` FastScan loading before the format's first stable
release.** `RankQuantFastscan` now rejects invalid FastScan payload bytes
(`byte & 0xf0 != 0`), rows that violate b=2 constant composition, and
nonzero block-tail padding across the path, reader, and byte-slice load APIs.
Loader fuzzing now runs a safe `search()` after every successful `.ovfs` load,
and persisted-input tests compare the dispatch path against the scalar
FastScan reference (AVX-512 under SDE, scalar otherwise).
- **Bounded calibration-profile hashing in `ordvec-manifest`.** Verification now
applies `max_calibration_profile_bytes` (64 MiB by default, CLI-overridable)
before hashing calibration profile artifacts, matching the existing bounded
resource model for encoder-distortion profiles and auxiliary artifacts.
- **Cleared OSV / OpenSSF-Scorecard advisories on the dev-only BEIR benchmark
tooling** (introduced with the benchmark harness; none reach the published
`ordvec` crate or the `ordvec` PyPI wheel). The `benchmarks/beir/requirements.txt`
Expand Down Expand Up @@ -41,6 +52,17 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
`ordvec-manifest` v1 support for `.ovfs` are deferred to 0.8.0 (#233, #232);
bind `.ovfs` artifacts with caller-owned checksums or attestations when they
cross a trust boundary.
- **Caller-owned serial batched/buffered two-stage primitives** (additive):
`SignBitmap::top_m_candidates_batched_serial_csr`, `CandidateBatch`,
`SubsetScratch`, `RankQuant::search_asymmetric_subset_batched_serial`, and
`RankQuant::search_asymmetric_subset_batched_serial_into`. These primitives
never enter rayon; callers partition query batches and drive the serial
`_into` primitive from their own scheduler. The serial CSR candidate generator
is correctness-first in this release; future releases can optimize internals
behind the same signature.
- `avx512vpop_supported()` (`#[doc(hidden)]`) — reports whether the AVX-512
VPOPCNTDQ scan kernels are active on the current CPU. The scan dispatch reads
only this predicate (no per-dimension gate).

### Performance

Expand All @@ -58,12 +80,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
residues 0..7 plus 384/512/768/1024/1536 for all six SignBitmap/Bitmap scan
kernels. This is stage-1 scan-kernel throughput, not a whole-pipeline figure.

### Added

- `avx512vpop_supported()` (`#[doc(hidden)]`) — reports whether the AVX-512
VPOPCNTDQ scan kernels are active on the current CPU. The scan dispatch reads
only this predicate (no per-dimension gate).

### Changed

- **Clarified BEIR benchmark release claims.** The committed README figures use
Expand All @@ -76,6 +92,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
previously-written `.tvr` / `.tvrq` / `.tvbm` / `.tvsb` file continues to load
unchanged; only the file extensions and magic bytes written by `write()`
change (#230).
- **Documented the v0.5 `b=8` support boundary.** `b=8` is a stable Rust
in-memory evidence/refinement width: asymmetric scoring and code/projection
generation work at any valid dimension, while symmetric `RankQuant::search`
requires `dim % 256 == 0`. It is not exposed through the Python `RankQuant`
constructor in v0.5.0, cannot be persisted to `.ovrq`, and each prepared
asymmetric query/worker owns a `dim * 256` `f32` LUT (about 64 MiB at the
maximum dimension).
- **Release-hardened the caller-owned serial two-stage primitives** (no API
change; added in 0.5.0). The trust model is now explicit and tested:
- Rejection-path regression tests for the full CSR/query/buffer validation set
Expand All @@ -85,9 +108,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
input can never reach the SIMD scan.
- A counting-allocator test proving `search_asymmetric_subset_batched_serial_into`
performs **zero heap allocations** in steady state (warmed `SubsetScratch`,
reused caller buffers) **on the AVX-512/AVX2 rerank path** — the strong form of
the prior capacity-stability proxy. (The scalar fallback, e.g. aarch64,
allocates a per-query scoring LUT; the test skips the strict check there.)
reused caller buffers, including the scalar LUT scratch) across the rerank
paths — the strong form of the prior capacity-stability proxy.
- A focused `two_stage_bench` example decomposing stage-1 candidate-gen /
single-query rerank loop / batched `_into` / full two-stage at the
Harrier-1024 shape, with a committed reference capture
Expand All @@ -97,6 +119,21 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Fixed

- **Made Intel SDE AVX-512 coverage fail closed for release gating.** Pull
requests may emit a visible warning and skip SDE-dependent steps during an
Intel mirror outage, but the push/workflow-dispatch runs used by the release
gate still fail closed; setup must succeed, the AVX-512 CPUID probe must run,
and the SDE-backed test/coverage commands must execute before release.
- **Closed manifest verifier path-reopen drift.** Verification and SQLite
cache-key construction now hash, probe, and validate the canonical path that
was checked and recorded, rather than reopening the pre-canonical joined path.
- **Marked persisted-format metadata enums non-exhaustive before v0.5 ships.**
`IndexKind`, `IndexParams`, `ManifestIndexKind`, and `ManifestIndexParams`
are now future-extensible for later stable formats such as `.ovfs` manifest
support without forcing downstream exhaustive matches.
- **Corrected FastScan dispatch documentation.** `RankQuantFastscan` dispatches
AVX-512 when available and otherwise uses its scalar kernel; the AVX2 path is
part of the exact `RankQuant` asymmetric scorer, not FastScan.
- **`ordvec-manifest` crate and wheel now ship license text.** Both declared
`MIT OR Apache-2.0` but packaged no `LICENSE-*` files (a pre-0.5.0 defect);
added `LICENSE-MIT` + `LICENSE-APACHE-2.0` (copied from the workspace root) to
Expand All @@ -108,32 +145,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
the sdist) — closing the regression class at the published-bytes layer, not
only at `cargo package`.

## 0.5.0 - 2026-06-13

### Added

- **Caller-owned serial batched/buffered two-stage primitives** (additive):
- `SignBitmap::top_m_candidates_batched_serial_csr(&self, queries, m) -> CandidateBatch`
— serial (no rayon) CSR candidate generation; pair with the rerank below to
run a fully caller-scheduled two-stage search.
- `RankQuant::search_asymmetric_subset_batched_serial(..) -> SearchResults` and
`..._serial_into(.., &mut SubsetScratch, &mut out_scores, &mut out_indices)`
— serial batched subset rerank; the `_into` form is allocation-free after
scratch warmup on the AVX-512/AVX2 rerank path (the integration contract for
runtimes that own their own thread pool / GIL release).
- New public types `CandidateBatch` (CSR candidate carrier) and `SubsetScratch`
(reusable rerank scratch).
- These primitives never enter rayon; the caller owns parallelism. No bundled
rayon convenience wrapper ships in this release — partition the query batch and
drive the serial `_into` primitive from your own pool. The existing
internally-parallel `top_m_candidates_batched` and `search_asymmetric*` are
unchanged.

### Notes

- The serial CSR candidate-gen is a correctness-first implementation; a future
release optimizes its internals behind the same signature.

## 0.4.0 - 2026-06-04

### Added
Expand Down
13 changes: 9 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,10 @@ structure of each vector on its own:
known before you see any data (256 B at dim = 1024, 2-bit), with
`bits ∈ {1, 2, 4}` the size/recall knob. (`b = 8` is an opt-in
evidence/refinement width — asymmetric scoring at any dim, symmetric only
when `dim % 256 == 0` — not a broad retrieval mode.)
when `dim % 256 == 0` — not a broad retrieval mode. In v0.5.0 it is
Rust-only, in-memory, not accepted by the Python `RankQuant` constructor, and
not persistable to `.ovrq`; each prepared asymmetric query owns a
`dim * 256` `f32` LUT, about 64 MiB at the maximum dimension.)
- **Two-stage retrieval, built in.** A cheap bitmap / sign-popcount
prefilter feeds an exact rerank — the coarse→fine pipeline ships as
library primitives. The coarse-scan→exact-rerank pattern, and the
Expand All @@ -118,7 +121,9 @@ large-scale serving rather than competing with one.
- **`Rank`** — full-precision rank vectors (`u16` per coordinate).
- **`RankQuant`** — ranks bucketed into `1 << bits` equal-width
bins, `bits` bits per coordinate (`dim * bits / 8` bytes/doc). Both a
symmetric (Spearman) and asymmetric (float-query LUT) scorer.
symmetric (Spearman) and asymmetric (float-query LUT) scorer. `bits ∈
{1, 2, 4}` are the cross-language persisted retrieval widths in v0.5.0;
`b = 8` is Rust-only and in-memory for evidence/refinement.
- **`Bitmap`** — a top-bucket bitmap per document (one bit per
coordinate); scoring is `popcount(Q AND D)`, a coarsened rank overlap.
- **`SignBitmap`** — a sign bitmap per document for sign-cosine
Expand All @@ -127,8 +132,8 @@ large-scale serving rather than competing with one.
Two further paths, for callers who need them:

- **`RankQuantFastscan`** — a stable, documented *but specialized* public
type: an optional b=2 FastScan kernel (block-32 nibble/PQ-LUT, AVX-512 → AVX2
scalar dispatch) for absolute-minimum stage-1 scan latency, at 2× the
type: an optional b=2 FastScan kernel (block-32 nibble/PQ-LUT, AVX-512 →
scalar dispatch) for absolute-minimum stage-1 scan latency, at 2× the
RankQuant b=2 footprint (`dim/2` bytes/doc) and 8-bit LUT scoring noise. It
persists to `.ovfs` (magic `OVFS`) through direct
`RankQuantFastscan::{write,load}` calls. In v0.5.0, `.ovfs` is not yet part
Expand Down
13 changes: 9 additions & 4 deletions THREAT_MODEL.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,9 @@ candidate generation followed by RankQuant subset reranking).
- Per-row **structural** invariants: `Rank` rows must be a true permutation of
`[0, dim)` (verified by bound + duplicate checks ⇒ pigeonhole);
`RankQuant` rows must satisfy constant composition (uniform per-bucket
histogram); `Bitmap` rows must have exactly `n_top` bits set.
histogram); `Bitmap` rows must have exactly `n_top` bits set;
`RankQuantFastscan` `.ovfs` rows must use valid FastScan nibbles, satisfy
b=2 constant composition, and zero block-tail padding.
- No `panic!` on malformed data — all validation returns
`io::Error(InvalidData)`.
- The raw `rank_io` read/write functions are `pub(crate)`; the only public
Expand Down Expand Up @@ -205,8 +207,9 @@ introduces `O(span/255)` per-pair approximation error — an intentional
trade-off matching FAISS FastScan semantics, documented in the code. The
scalar and AVX-512 paths agree on the same quantized inputs (equivalence test),
and `TopK` uses `total_cmp` for deterministic tie-breaking across all paths.
This is approximate *scoring*, not a CPU oracle. FastScan is a `#[doc(hidden)]`
pre-ranker; callers needing exact scores use `RankQuant::search_asymmetric`.
This is approximate *scoring*, not a CPU oracle. FastScan is a stable
specialized pre-ranker; callers needing exact scores use
`RankQuant::search_asymmetric`.

**THREAT-SIMD-004 (mitigated this cycle): Native sanitizer coverage for
unsafe kernels.** `.github/workflows/sanitizers.yml` runs nightly
Expand Down Expand Up @@ -444,7 +447,9 @@ SignBitmap→RankQuant retrieval path.
`search_asymmetric_fastscan_b2` + the scalar/AVX-512 kernel), crossing the
32-doc block boundary so tail-padding blocks are exercised. On
non-AVX-512 CI runners it exercises the scalar reference kernel; under Intel SDE
it exercises the AVX-512 kernel.
it exercises the AVX-512 kernel. The `load_fastscan` target also follows every
successful `.ovfs` load with a safe `search()` call so loader-accepted bytes
must survive the public scan path.

**THREAT-FUZZ-002 (mitigated this cycle): CI-bound fuzzing for continuous
regression.** A `fuzz.yml` workflow now runs a bounded smoke on every pull
Expand Down
13 changes: 9 additions & 4 deletions docs/INDEX_PROVENANCE.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,15 +96,19 @@ The manifest verifier checks:
path/hash integrity for side artifacts, and optional calibration-profile
linkage;
- optional `calibration` profile references, checking profile identity,
path/hash integrity, encoder identity, and ordinalization compatibility;
path/hash integrity, configured byte ceiling, encoder identity, and
ordinalization compatibility;
- attestation **shape** only: predicate type, builder id when present, and at
least one subject SHA-256 matching the artifact when attestations are
supplied.

The v1 verifier intentionally does not create or verify `.ovfs` FastScan
artifacts yet. If a `RankQuantFastscan` artifact crosses a trust boundary in
v0.5.0, bind the bytes with a caller-owned checksum, artifact-store control, or
attestation and load it directly only after that policy check succeeds.
attestation and load it directly only after that policy check succeeds. The
direct `.ovfs` loader still rejects invalid nibbles, non-canonical block-tail
padding, and rows that violate b=2 constant composition; manifest v1 simply
does not bind or probe those bytes yet.

Auxiliary artifacts are for application-owned sidecars such as metadata,
secondary indexes, or stores that a caller intends to load together with the
Expand Down Expand Up @@ -135,8 +139,9 @@ manifest's `calibration.profile_id`.

When present, `calibration` binds an index artifact to a hashed ordinal profile
used to interpret overlap, bucket, sign, or rank evidence under a calibrated
null. The verifier checks profile identity, path/hash integrity, encoder
identity, and ordinalization compatibility; it does not judge whether the null
null. The verifier checks profile identity, path/hash integrity, configured
byte ceiling, encoder identity, and ordinalization compatibility; it does not
judge whether the null
model is scientifically adequate and does not compute likelihood ratios or tail
probabilities. Calibration profiles must match the encoder identity declared by
`embedding`; cross-encoder calibration is rejected by default. The
Expand Down
8 changes: 6 additions & 2 deletions docs/PERSISTED_FORMAT.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,10 @@ API, but `.ovfs` is intentionally outside this v1 primitive-format,
`probe_index_metadata()`, and `ordvec-manifest` contract. Until metadata-probe
and manifest support are promoted, callers should treat `.ovfs` as a
specialized direct-load artifact and bind it with application-owned checksums or
attestations when it crosses a trust boundary.
attestations when it crosses a trust boundary. The direct `.ovfs` loader still
validates the payload before search: real document bytes must be 4-bit FastScan
codes, every row must satisfy b=2 constant composition, and block-tail padding
must be zero.

All integer fields are little-endian. Each format has one fixed header followed
by one contiguous payload. The payload must consume the rest of the file
Expand Down Expand Up @@ -65,7 +68,8 @@ cache in their own manifests:
In v0.5.0, `probe_index_metadata(path)` rejects `OVFS` with an unsupported
metadata-probe error rather than returning a partial descriptor. Load `.ovfs`
only through `RankQuantFastscan::load` unless and until the FastScan metadata
contract is promoted in a later minor release.
contract is promoted in a later minor release; the direct loader rejects
invalid nibbles, non-canonical tail padding, and b=2 composition violations.

Example external segment entry:

Expand Down
Loading
Loading