Skip to content

CNDB-15527: Upgrade jvector to 4.0.0-rc.8; make FusedPQ configurable#2042

Merged
michaeljmarshall merged 66 commits intomainfrom
revive-fused-adc
Feb 20, 2026
Merged

CNDB-15527: Upgrade jvector to 4.0.0-rc.8; make FusedPQ configurable#2042
michaeljmarshall merged 66 commits intomainfrom
revive-fused-adc

Conversation

@michaeljmarshall
Copy link
Copy Markdown

@michaeljmarshall michaeljmarshall commented Oct 8, 2025

What is the issue

Fixes: https://github.com/riptano/cndb/issues/15527
CNDB test PR: https://github.com/riptano/cndb/pull/16797

What does this PR fix and why was it fixed

This PR upgrades jvector, which brings several improvements. Here are the git commits brought in:

8b3e93cf (tag: 4.0.0-rc.8) chore: update changelog for 4.0.0-rc.8 (#627)
9d0488e5 release 4.0.0-rc.8 (#626)
570bd118 Refactor parallel writer (#608)
20c348ec Move buffer position in ByteBufferIndexWriter#writeFloats (#607)
d9ddce51 Ensure extractTrainingVectors return a list of at most MAX_PQ_TRAINING_SET_SIZE (#610)
d663b4f7 add config options for regression testing (#609)
7e493eee On-disk index cache for the Grid benchmark harness (#612)
e263cc80 Improved dataset loading; fixes, safeties, diagnostics, and better feedback (#613)
6b235ce7 bump to next SNAPSHOT (#605)
84bf5708 (tag: 4.0.0-rc.7) chore: update changelog for 4.0.0-rc.7 (#604)
fceeb885 release 4.0.0-rc.7 (#603)
51807cba add protection against bad ordinal mappings (#602)
6ca3b5e2 adding memory and disk usage stats to bench tests (#591)
a66fd914 Fix OnDiskGraphIndex#ramBytesUsed NPE (#588)
0ca5a392 Move float bulk-write into IndexWriter to enforce endianness (#577)
a6c6c09b Add diversityScoreFunctionFor to avoid creation of wrapper object (#592)
977c21d4 Relax the threshold of a flaky test related to an experimental feature (#598)
fa808d69 adding average nodes visited to benchmark tests (#552)
3bd15e70 Virtualize and Modularize DataSetLoader logic (#593)
42259e9f Speed up ivec reads by buffering (#584)
f967f1c9 virtualize DataSet (#589)
55f902f4 turn off parallel writes in grid (#582)
019a241d Parallelize graph writes (#542)
02fea879 Save allocation of a large array in PQVectors.encodeAndBuild (#574)
32a51821 javadoc for base [graph] (#548)
4eb607f8 javadoc for base [disk,exceptions] (#547)
30e8932c Enable the fused graph index  (#561)
d8848fc6 Start development on 4.0.0-rc.7-SNAPSHOT (#573)
c57f3a62 (tag: 4.0.0-rc.6) chore: update changelog for 4.0.0-rc.6 (#572)
214b7c20 release 4.0.0-rc.6 (#571)
e3686999 fix javadoc error (#570)
88669887 Ignoring testIncrementalInsertionFromOnDiskIndex_withNonIdentityOrdinalMapping and adding a TODO in buildAndMergeNewNodes (#569)
29a943e1 Computation of reconstruction errors for vector compressors (#567)
d8e9cb16 Add NVQ paper in README (#560)
d5cbe658 Add ImmutableGraphIndex.isHierarchical (#563)
b484dae2 Harden tests for heap graph reconstruction (#543)
9471c57d Make the thresholds in TestLowCardinalityFiltering tighter (#559)
21e4a226 Begin development on 4.0.0-rc.6 (#558)
4f661d99 Revert "Start development on 4.0.0-rc.6-SNAPSHOT"
fdee5779 Start development on 4.0.0-rc.6-SNAPSHOT

SAI Version Bump

Adds a new sai on disk version: fa

Fused PQ

With this version, we are adding a new, experimental feature to write PQ vectors fused into the graph. In doing so, we are able to skip writing the PQ vectors to the PQ file, which results in significant memory savings since the PQ vectors in the CassandraDiskAnn graph searcher consumers O(n) memory based on the number of vectors and their quantized size. The fused pq vectors mostly fit within the page cache as we read the node and its neighbors from disk, so we see minimal latency reduction due to this change, though further testing is required to see the real impact.

In order to enable fused pq, the runtime needs cassandra.sai.latest.version=fa or greater and cassandra.sai.vector.enable_fused=true. Note that because this feature is still experimental, cassandra.sai.vector.enable_fused defaults to false.

Another experimental feature introduced in this commit via the jvector upgrade is parallel graph encoding and writing to disk. Writing the fused graph requires increased CPU time to encode the graph node and we write more bytes to disk, so this parallelism is likely necessary to keep vector index creation/compaction times down. The key configurations available with their associated defaults:

    // When building a compaction graph, encode layer 0 nodes in parallel and subsequently use async io for writes.
    // This feature is experimental, so defaults to false.
    SAI_ENCODE_AND_WRITE_VECTOR_GRAPH_IN_PARALLEL_ENABLED("cassandra.sai.vector.encode_and_write_graph_in_parallel.enabled", "false"),
    // When parallel graph encoding is enabled, the number of threads to use for encoding. Defaults to 0, meaning
    // use all available processors as reported by the JVM.
    SAI_ENCODE_AND_WRITE_VECTOR_GRAPH_IN_PARALLEL_NUM_THREADS("cassandra.sai.vector.encode_and_write_graph_in_parallel.num_threads", "0"),
    // When parallel graph encoding is enabled, whether to use director buffers. Defaults to false, meaning heap
    // buffers are used. A buffer will be allocated per encoding thread. The size of each buffer is the size
    // of the encoded graph node at layer 0, which varies based on graph feature settings.
    SAI_ENCODE_AND_WRITE_VECTOR_GRAPH_IN_PARALLEL_USE_DIRECT_BUFFERS("cassandra.sai.vector.encode_and_write_graph_in_parallel.use_direct_buffers", "false"),

OnDiskVectorValues and OnDiskVectorValuesWriter

OnDiskVectorValues is now in its own file and is now thread safe in order to account for some necessary implementation details within jvector. Added OnDiskVectorValuesWriter to improve test coverage and to abstract away the flush issues associated with BufferedRandomAccessWriter as described in datastax/jvector#562.

Verification

This PR also introduces new benchmarks as well as improved unit testing. The new benchmarks verify the performance of the OnDiskVectorValues and OnDiskVectorValuesWriter to confirm (at least directionally) the time associated with read and write operations.

New tests have been added to verify that when we iterate over an sstable's rows, we are able to assert that the sstable's vector value's similarity to the one stored in the vector graph is ~1. This testing is valuable in that it confirms the row id to ordinal mapping is correct at every node. Previously, we relied on recall results to verify this for us. This new pattern allows us to confirm every node, which is more thorough and removes most edge cases that might have led to partially correct graphs that may have achieved acceptable recall.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Oct 8, 2025

Checklist before you submit for review

  • This PR adheres to the Definition of Done
  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits
  • All new files should contain the DataStax copyright header instead of the Apache License one


VectorCompression.CompressionType compressionType = VectorCompression.CompressionType.values()[reader.readByte()];
if (features.contains(FeatureId.FUSED_ADC))
if (features.contains(FeatureId.FUSED_PQ))
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marianotepper - I just noticed that the features map already has logic that loads the ProductQuantization, meaning this branch currently keeps two identical maps in memory. I think it'd make sense to possibly expose the features map in the OnDiskGraphIndex so we can remove the duplicate cost. Any reason we can't do that?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we actually already use the one from the header, I just didn't catch it. I think we'll be able to get rid of it with a little extra work in CC.

@michaeljmarshall michaeljmarshall marked this pull request as ready for review November 25, 2025 22:23
michaeljmarshall and others added 4 commits December 1, 2025 14:01
built from the branch in this
PR: datastax/jvector#588
This is a hack because it increases memory
utilization and may impact other threads
in the CC host. However, this will help
unblock certain test scenarios.
Improve hashCode of the Key to prevent collisions
Implement Comparable<Key> in case the ConcurrentHashMap falls back in to Tree mode

(cherry picked from commit 091b636)
@eolivelli
Copy link
Copy Markdown

I have cherry-pick this into this feature branch: #2156

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Dec 7, 2025

     [java] Benchmark                                         (dimension)  Mode  Cnt    Score    Error  Units
     [java] OnDiskVectorValuesBench.readAllVectorsSequential          384  avgt    5   55.764 ±  1.522  ms/op
     [java] OnDiskVectorValuesBench.readAllVectorsSequential         1536  avgt    5  198.640 ±  3.520  ms/op
     [java] OnDiskVectorValuesBench.writeVectorsToDisk                384  avgt    5  113.735 ±  2.768  ms/op
     [java] OnDiskVectorValuesBench.writeVectorsToDisk               1536  avgt    5  500.743 ± 18.353  ms/op
…allow it

Note that the main motivation for updating this
logic comes from the implementation for the
RandomAccessVectorValues method named
threadLocalSupplier, which makes an inference
based on the result of isValueShared about
whether the class instance can be shared across
threads. Since this one cannot, it is safest
to just make the value shared. Given
how we use the resulting vector at the moment,
it is not a problem to make it shared.

     [java] Benchmark                                         (dimension)  Mode  Cnt    Score    Error  Units
     [java] OnDiskVectorValuesBench.readAllVectorsSequential          384  avgt    5   52.610 ±  2.211  ms/op
     [java] OnDiskVectorValuesBench.readAllVectorsSequential         1536  avgt    5  184.552 ±  4.006  ms/op
     [java] OnDiskVectorValuesBench.writeVectorsToDisk                384  avgt    5  113.262 ±  3.906  ms/op
     [java] OnDiskVectorValuesBench.writeVectorsToDisk               1536  avgt    5  498.166 ± 14.529  ms/op
Note that the score is slightly worse, but the
error ratio is high. There is certainly cost
to fetching the thread from the hash map.

The main motivation here is to prevent leaked
resources. See datastax/jvector#635

     [java] Benchmark                                         (dimension)  Mode  Cnt    Score    Error  Units
     [java] OnDiskVectorValuesBench.readAllVectorsSequential          384  avgt    5   59.925 ±  4.414  ms/op
     [java] OnDiskVectorValuesBench.readAllVectorsSequential         1536  avgt    5  212.512 ± 28.455  ms/op
     [java] OnDiskVectorValuesBench.writeVectorsToDisk                384  avgt    5  117.685 ±  4.208  ms/op
     [java] OnDiskVectorValuesBench.writeVectorsToDisk               1536  avgt    5  526.014 ± 14.489  ms/op
@michaeljmarshall
Copy link
Copy Markdown
Author

@pkolaczk - I needed to make a few tweaks to the implementation to make the RandomAccessVectorVariables implementation correct for a PR that I am planning on adding later. Since I wrote this class now, I wanted to make it correct and add a bench mark to demonstrate its performance. Technically, there is a penalty for concurrent access in this implementation, but it seems minimal and worth it for the sake of correctness.

I ran it again, and the results seem
in the same ballpark, but with higher
error bounds.

     [java] Benchmark                                         (dimension)  Mode  Cnt    Score     Error  Units
     [java] OnDiskVectorValuesBench.readAllVectorsSequential          384  avgt    5   61.719 ±  17.843  ms/op
     [java] OnDiskVectorValuesBench.readAllVectorsSequential         1536  avgt    5  217.288 ±  37.908  ms/op
     [java] OnDiskVectorValuesBench.writeVectorsToDisk                384  avgt    5  117.555 ±   5.090  ms/op
     [java] OnDiskVectorValuesBench.writeVectorsToDisk               1536  avgt    5  560.162 ± 251.997  ms/op
@sonarqubecloud
Copy link
Copy Markdown

@cassci-bot
Copy link
Copy Markdown

@michaeljmarshall michaeljmarshall changed the title CNDB-15527: Add config to use FusedPQ CNDB-15527: Upgrade jvector to 4.0.0-rc.8; make FusedPQ configurable Feb 20, 2026
@michaeljmarshall
Copy link
Copy Markdown
Author

Test failures are all timeouts. Merging.

@michaeljmarshall michaeljmarshall merged commit 522e8b6 into main Feb 20, 2026
2 of 4 checks passed
@michaeljmarshall michaeljmarshall deleted the revive-fused-adc branch February 20, 2026 15:24
driftx pushed a commit that referenced this pull request Apr 24, 2026
### What is the issue
Fixes: riptano/cndb#16693
Test: riptano/cndb#16694

### What does this PR fix and why was it fixed
Binary quantization was never used in production. We have a different
quantization coming, but it is not ready yet. It simplifies some coming
PRs to remove this logic eagerly, namely #2042.
driftx pushed a commit that referenced this pull request Apr 24, 2026
…skAnn (#2222)

Fixes: riptano/cndb#15926
CNDB Test PR: riptano/cndb#16685

We do not count the compressed vectors bytes in the
`CassandraDiskAnn#ramBytesUsed()` method, which leads to significantly
under reported utilization on the metrics. This change fixes the logic
and adds a test that fails without the new logic. Notably, the test
fails with `Expected at least 196608 bytes but got 76` without the
`compressedVectorBytes()` method in place, which just goes to show how
significant of a miss this is.

When we merge #2042, the added test will need to be modified to account
for the reduced memory consumption, but the test is valid for now.
driftx pushed a commit that referenced this pull request Apr 24, 2026
…skAnn (#2222)

Fixes: riptano/cndb#15926
CNDB Test PR: riptano/cndb#16685

We do not count the compressed vectors bytes in the
`CassandraDiskAnn#ramBytesUsed()` method, which leads to significantly
under reported utilization on the metrics. This change fixes the logic
and adds a test that fails without the new logic. Notably, the test
fails with `Expected at least 196608 bytes but got 76` without the
`compressedVectorBytes()` method in place, which just goes to show how
significant of a miss this is.

When we merge #2042, the added test will need to be modified to account
for the reduced memory consumption, but the test is valid for now.
driftx pushed a commit that referenced this pull request Apr 27, 2026
…2042)

Fixes: riptano/cndb#15527
CNDB test PR: riptano/cndb#16797

This PR upgrades jvector, which brings several improvements. Here are
the git commits brought in:

```
8b3e93cf (tag: 4.0.0-rc.8) chore: update changelog for 4.0.0-rc.8 (#627)
9d0488e5 release 4.0.0-rc.8 (#626)
570bd118 Refactor parallel writer (#608)
20c348ec Move buffer position in ByteBufferIndexWriter#writeFloats (#607)
d9ddce51 Ensure extractTrainingVectors return a list of at most MAX_PQ_TRAINING_SET_SIZE (#610)
d663b4f7 add config options for regression testing (#609)
7e493eee On-disk index cache for the Grid benchmark harness (#612)
e263cc80 Improved dataset loading; fixes, safeties, diagnostics, and better feedback (#613)
6b235ce7 bump to next SNAPSHOT (#605)
84bf5708 (tag: 4.0.0-rc.7) chore: update changelog for 4.0.0-rc.7 (#604)
fceeb885 release 4.0.0-rc.7 (#603)
51807cba add protection against bad ordinal mappings (#602)
6ca3b5e2 adding memory and disk usage stats to bench tests (#591)
a66fd914 Fix OnDiskGraphIndex#ramBytesUsed NPE (#588)
0ca5a392 Move float bulk-write into IndexWriter to enforce endianness (#577)
a6c6c09b Add diversityScoreFunctionFor to avoid creation of wrapper object (#592)
977c21d4 Relax the threshold of a flaky test related to an experimental feature (#598)
fa808d69 adding average nodes visited to benchmark tests (#552)
3bd15e70 Virtualize and Modularize DataSetLoader logic (#593)
42259e9f Speed up ivec reads by buffering (#584)
f967f1c9 virtualize DataSet (#589)
55f902f4 turn off parallel writes in grid (#582)
019a241d Parallelize graph writes (#542)
02fea879 Save allocation of a large array in PQVectors.encodeAndBuild (#574)
32a51821 javadoc for base [graph] (#548)
4eb607f8 javadoc for base [disk,exceptions] (#547)
30e8932c Enable the fused graph index  (#561)
d8848fc6 Start development on 4.0.0-rc.7-SNAPSHOT (#573)
c57f3a62 (tag: 4.0.0-rc.6) chore: update changelog for 4.0.0-rc.6 (#572)
214b7c20 release 4.0.0-rc.6 (#571)
e3686999 fix javadoc error (#570)
88669887 Ignoring testIncrementalInsertionFromOnDiskIndex_withNonIdentityOrdinalMapping and adding a TODO in buildAndMergeNewNodes (#569)
29a943e1 Computation of reconstruction errors for vector compressors (#567)
d8e9cb16 Add NVQ paper in README (#560)
d5cbe658 Add ImmutableGraphIndex.isHierarchical (#563)
b484dae2 Harden tests for heap graph reconstruction (#543)
9471c57d Make the thresholds in TestLowCardinalityFiltering tighter (#559)
21e4a226 Begin development on 4.0.0-rc.6 (#558)
4f661d99 Revert "Start development on 4.0.0-rc.6-SNAPSHOT"
fdee5779 Start development on 4.0.0-rc.6-SNAPSHOT
```

Adds a new sai on disk version: `fa`

With this version, we are adding a new, experimental feature to write PQ
vectors fused into the graph. In doing so, we are able to skip writing
the PQ vectors to the PQ file, which results in significant memory
savings since the PQ vectors in the `CassandraDiskAnn` graph searcher
consumers `O(n)` memory based on the number of vectors and their
quantized size. The fused pq vectors mostly fit within the page cache as
we read the node and its neighbors from disk, so we see minimal latency
reduction due to this change, though further testing is required to see
the real impact.

In order to enable fused pq, the runtime needs
`cassandra.sai.latest.version=fa` or greater and
`cassandra.sai.vector.enable_fused=true`. Note that because this feature
is still experimental, `cassandra.sai.vector.enable_fused` defaults to
`false`.

Another experimental feature introduced in this commit via the jvector
upgrade is parallel graph encoding and writing to disk. Writing the
fused graph requires increased CPU time to encode the graph node and we
write more bytes to disk, so this parallelism is likely necessary to
keep vector index creation/compaction times down. The key configurations
available with their associated defaults:

```java
    // When building a compaction graph, encode layer 0 nodes in parallel and subsequently use async io for writes.
    // This feature is experimental, so defaults to false.
    SAI_ENCODE_AND_WRITE_VECTOR_GRAPH_IN_PARALLEL_ENABLED("cassandra.sai.vector.encode_and_write_graph_in_parallel.enabled", "false"),
    // When parallel graph encoding is enabled, the number of threads to use for encoding. Defaults to 0, meaning
    // use all available processors as reported by the JVM.
    SAI_ENCODE_AND_WRITE_VECTOR_GRAPH_IN_PARALLEL_NUM_THREADS("cassandra.sai.vector.encode_and_write_graph_in_parallel.num_threads", "0"),
    // When parallel graph encoding is enabled, whether to use director buffers. Defaults to false, meaning heap
    // buffers are used. A buffer will be allocated per encoding thread. The size of each buffer is the size
    // of the encoded graph node at layer 0, which varies based on graph feature settings.
    SAI_ENCODE_AND_WRITE_VECTOR_GRAPH_IN_PARALLEL_USE_DIRECT_BUFFERS("cassandra.sai.vector.encode_and_write_graph_in_parallel.use_direct_buffers", "false"),
```

`OnDiskVectorValues` is now in its own file and is now thread safe in
order to account for some necessary implementation details within
jvector. Added `OnDiskVectorValuesWriter` to improve test coverage and
to abstract away the flush issues associated with
`BufferedRandomAccessWriter` as described in
datastax/jvector#562.

This PR also introduces new benchmarks as well as improved unit testing.
The new benchmarks verify the performance of the `OnDiskVectorValues`
and `OnDiskVectorValuesWriter` to confirm (at least directionally) the
time associated with read and write operations.

New tests have been added to verify that when we iterate over an
sstable's rows, we are able to assert that the sstable's vector value's
similarity to the one stored in the vector graph is ~1. This testing is
valuable in that it confirms the row id to ordinal mapping is correct at
every node. Previously, we relied on recall results to verify this for
us. This new pattern allows us to confirm _every_ node, which is more
thorough and removes most edge cases that might have led to partially
correct graphs that may have achieved acceptable recall.
driftx pushed a commit that referenced this pull request Apr 28, 2026
…2042)

Fixes: riptano/cndb#15527
CNDB test PR: riptano/cndb#16797

This PR upgrades jvector, which brings several improvements. Here are
the git commits brought in:

```
8b3e93cf (tag: 4.0.0-rc.8) chore: update changelog for 4.0.0-rc.8 (#627)
9d0488e5 release 4.0.0-rc.8 (#626)
570bd118 Refactor parallel writer (#608)
20c348ec Move buffer position in ByteBufferIndexWriter#writeFloats (#607)
d9ddce51 Ensure extractTrainingVectors return a list of at most MAX_PQ_TRAINING_SET_SIZE (#610)
d663b4f7 add config options for regression testing (#609)
7e493eee On-disk index cache for the Grid benchmark harness (#612)
e263cc80 Improved dataset loading; fixes, safeties, diagnostics, and better feedback (#613)
6b235ce7 bump to next SNAPSHOT (#605)
84bf5708 (tag: 4.0.0-rc.7) chore: update changelog for 4.0.0-rc.7 (#604)
fceeb885 release 4.0.0-rc.7 (#603)
51807cba add protection against bad ordinal mappings (#602)
6ca3b5e2 adding memory and disk usage stats to bench tests (#591)
a66fd914 Fix OnDiskGraphIndex#ramBytesUsed NPE (#588)
0ca5a392 Move float bulk-write into IndexWriter to enforce endianness (#577)
a6c6c09b Add diversityScoreFunctionFor to avoid creation of wrapper object (#592)
977c21d4 Relax the threshold of a flaky test related to an experimental feature (#598)
fa808d69 adding average nodes visited to benchmark tests (#552)
3bd15e70 Virtualize and Modularize DataSetLoader logic (#593)
42259e9f Speed up ivec reads by buffering (#584)
f967f1c9 virtualize DataSet (#589)
55f902f4 turn off parallel writes in grid (#582)
019a241d Parallelize graph writes (#542)
02fea879 Save allocation of a large array in PQVectors.encodeAndBuild (#574)
32a51821 javadoc for base [graph] (#548)
4eb607f8 javadoc for base [disk,exceptions] (#547)
30e8932c Enable the fused graph index  (#561)
d8848fc6 Start development on 4.0.0-rc.7-SNAPSHOT (#573)
c57f3a62 (tag: 4.0.0-rc.6) chore: update changelog for 4.0.0-rc.6 (#572)
214b7c20 release 4.0.0-rc.6 (#571)
e3686999 fix javadoc error (#570)
88669887 Ignoring testIncrementalInsertionFromOnDiskIndex_withNonIdentityOrdinalMapping and adding a TODO in buildAndMergeNewNodes (#569)
29a943e1 Computation of reconstruction errors for vector compressors (#567)
d8e9cb16 Add NVQ paper in README (#560)
d5cbe658 Add ImmutableGraphIndex.isHierarchical (#563)
b484dae2 Harden tests for heap graph reconstruction (#543)
9471c57d Make the thresholds in TestLowCardinalityFiltering tighter (#559)
21e4a226 Begin development on 4.0.0-rc.6 (#558)
4f661d99 Revert "Start development on 4.0.0-rc.6-SNAPSHOT"
fdee5779 Start development on 4.0.0-rc.6-SNAPSHOT
```

Adds a new sai on disk version: `fa`

With this version, we are adding a new, experimental feature to write PQ
vectors fused into the graph. In doing so, we are able to skip writing
the PQ vectors to the PQ file, which results in significant memory
savings since the PQ vectors in the `CassandraDiskAnn` graph searcher
consumers `O(n)` memory based on the number of vectors and their
quantized size. The fused pq vectors mostly fit within the page cache as
we read the node and its neighbors from disk, so we see minimal latency
reduction due to this change, though further testing is required to see
the real impact.

In order to enable fused pq, the runtime needs
`cassandra.sai.latest.version=fa` or greater and
`cassandra.sai.vector.enable_fused=true`. Note that because this feature
is still experimental, `cassandra.sai.vector.enable_fused` defaults to
`false`.

Another experimental feature introduced in this commit via the jvector
upgrade is parallel graph encoding and writing to disk. Writing the
fused graph requires increased CPU time to encode the graph node and we
write more bytes to disk, so this parallelism is likely necessary to
keep vector index creation/compaction times down. The key configurations
available with their associated defaults:

```java
    // When building a compaction graph, encode layer 0 nodes in parallel and subsequently use async io for writes.
    // This feature is experimental, so defaults to false.
    SAI_ENCODE_AND_WRITE_VECTOR_GRAPH_IN_PARALLEL_ENABLED("cassandra.sai.vector.encode_and_write_graph_in_parallel.enabled", "false"),
    // When parallel graph encoding is enabled, the number of threads to use for encoding. Defaults to 0, meaning
    // use all available processors as reported by the JVM.
    SAI_ENCODE_AND_WRITE_VECTOR_GRAPH_IN_PARALLEL_NUM_THREADS("cassandra.sai.vector.encode_and_write_graph_in_parallel.num_threads", "0"),
    // When parallel graph encoding is enabled, whether to use director buffers. Defaults to false, meaning heap
    // buffers are used. A buffer will be allocated per encoding thread. The size of each buffer is the size
    // of the encoded graph node at layer 0, which varies based on graph feature settings.
    SAI_ENCODE_AND_WRITE_VECTOR_GRAPH_IN_PARALLEL_USE_DIRECT_BUFFERS("cassandra.sai.vector.encode_and_write_graph_in_parallel.use_direct_buffers", "false"),
```

`OnDiskVectorValues` is now in its own file and is now thread safe in
order to account for some necessary implementation details within
jvector. Added `OnDiskVectorValuesWriter` to improve test coverage and
to abstract away the flush issues associated with
`BufferedRandomAccessWriter` as described in
datastax/jvector#562.

This PR also introduces new benchmarks as well as improved unit testing.
The new benchmarks verify the performance of the `OnDiskVectorValues`
and `OnDiskVectorValuesWriter` to confirm (at least directionally) the
time associated with read and write operations.

New tests have been added to verify that when we iterate over an
sstable's rows, we are able to assert that the sstable's vector value's
similarity to the one stored in the vector graph is ~1. This testing is
valuable in that it confirms the row id to ordinal mapping is correct at
every node. Previously, we relied on recall results to verify this for
us. This new pattern allows us to confirm _every_ node, which is more
thorough and removes most edge cases that might have led to partially
correct graphs that may have achieved acceptable recall.
Copy link
Copy Markdown

@MarkWolters MarkWolters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, excited to see these features being put into use.

.with(nvq != null ? new NVQ(nvq) : new InlineVectors(vectorValues.dimension()));

if (compressor instanceof ProductQuantization && JVectorVersionUtil.shouldWriteFused(context.version()))
indexWriterBuilder.with(new FusedPQ(context.getIndexWriterConfig().getAnnMaxDegree(), (ProductQuantization) compressor));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell compressor is always going to be either null or an instance of ProductQuantization here. It is instantiated on line 473 by calling writePQ, which either returns a ProductQuantization or null. Here there is no null check and it's also being checked and cast to ProductQuantization which seems dangerous.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will take a look, thanks!

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 531:
null instanceof ProductQuantization -> returns false, so I think we are good. To make it more clear, we can add some assertions maybe?

}
}

private OnDiskGraphIndexWriter createIndexWriter(File indexFile, long termsOffset, IndexContext context, OrdinalMapper ordinalMapper, VectorCompressor<?> compressor, NVQuantization nvq) throws IOException
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CassandraRelevantProperties implements configuration to allow for using parallel graph writes but here they are not referenced and the serial writer is instantiated.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I was reading in the PR:

Another experimental feature introduced in this commit via the jvector upgrade is parallel graph encoding and writing to disk. Writing the fused graph requires increased CPU time to encode the graph node and we write more bytes to disk, so this parallelism is likely necessary to keep vector index creation/compaction times down. The key configurations available with their associated defaults:

From Writing the fused graph requires increased CPU time I understand we would want to enable this config. I have a note while looking into the testing also to check where it is tested. Thanks for flagging!

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarkWolters , are the parallel graph writes still experimental as the comments say, or they are in the same boat as FusedPQ - your test framework was fixed and you got good results for both and it is ok to potentially enable them by default after we finalize testing on your end too?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ekaterinadimitrova2 I would categorize them as still experimental. The feature has been tested internally but not to the extent of fused PQ. I believe it is ready to use, I'd just feel more comfortable once it goes through some testing once integrated to validate the correctness of the written graphs.

Copy link
Copy Markdown

@ekaterinadimitrova2 ekaterinadimitrova2 May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see,

I'd just feel more comfortable once it goes through some testing once integrated

You mean in Cassandra/Astra I guess and not JVector additional testing you plan? Did I get it right?

}

@TearDown(Level.Trial)
public void tearDown()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 2 teardowns with different capitalization?

{
// we need a long timeout because we are adding many vectors
String index = createIndexAsync("CREATE CUSTOM INDEX ON %s(val) USING 'StorageAttachedIndex' WITH OPTIONS = {'similarity_function' : 'euclidean'}");
String index = createIndexAsync("CREATE CUSTOM INDEX ON %s(val) USING 'StorageAttachedIndex' WITH OPTIONS = {'similarity_function' : 'euclidean', 'enable_hierarchy': 'true'}");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the future I think it would be more thorough coverage to test both with hierarchy and without


@Override
public void processNeighbors(int i, int i1, ScoreFunction scoreFunction, ImmutableGraphIndex.IntMarker intMarker, ImmutableGraphIndex.NeighborProcessor neighborProcessor)
{
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intentionally empty? If so should there be a comment to that effect?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants