Skip to content

feat: [NCS] Integrate NCS support into DiskANN#1418

Open
ronmarcus wants to merge 5 commits intozilliztech:mainfrom
ronmarcus:ncs
Open

feat: [NCS] Integrate NCS support into DiskANN#1418
ronmarcus wants to merge 5 commits intozilliztech:mainfrom
ronmarcus:ncs

Conversation

@ronmarcus
Copy link

This commit updates DiskANN to support Near-Compute Storage (NCS), enabling index data to be served from a shared KV storage tier.

Key changes:

  • Refactored PQFlashIndex to use an abstract IndexReader interface, replacing direct file I/O.
  • Implemented NCSReader to fetch index nodes via the milvus-common NCS connector.
  • Implemented FileIndexReader to maintain backward compatibility with local file storage.
  • Added NcsUpload to the Index class, enabling the upload of local index files to the NCS tier.
  • Updated DiskANN loading and searching logic to support key-based data retrieval (ReadReq).
  • Added Python bindings and unit tests for NCS-enabled DiskANN.

This allows DiskANN to scale independently of local storage constraints by leveraging the NCS tier.

issue: milvus-io/milvus#45178

This commit updates DiskANN to support Near-Compute Storage (NCS), enabling
index data to be served from a shared KV storage tier.

Key changes:
- Refactored `PQFlashIndex` to use an abstract `IndexReader` interface, replacing direct file I/O.
- Implemented `NCSReader` to fetch index nodes via the `milvus-common` NCS connector.
- Implemented `FileIndexReader` to maintain backward compatibility with local file storage.
- Added `NcsUpload` to the Index class, enabling the upload of local index files to the NCS tier.
- Updated `DiskANN` loading and searching logic to support key-based data retrieval (`ReadReq`).
- Added Python bindings and unit tests for NCS-enabled DiskANN.

This allows DiskANN to scale independently of local storage constraints by
leveraging the NCS tier.

issue: milvus-io/milvus#45178
@sre-ci-robot
Copy link
Collaborator

Welcome @ronmarcus! It looks like this is your first PR to zilliztech/knowhere 🎉

@mergify
Copy link

mergify bot commented Jan 1, 2026

@ronmarcus 🔍 Important: PR Classification Needed!

For efficient project management and a seamless review process, it's essential to classify your PR correctly. Here's how:

  1. If you're fixing a bug, label it as kind/bug.
  2. For small tweaks (less than 20 lines without altering any functionality), please use kind/improvement.
  3. Significant changes that don't modify existing functionalities should be tagged as kind/enhancement.
  4. Adjusting APIs or changing functionality? Go with kind/feature.

For any PR outside the kind/improvement category, ensure you link to the associated issue using the format: “issue: #”.

Thanks for your efforts and contribution to the community!.

@alexanderguzhva
Copy link
Collaborator

@ronmarcus it seems that unit tests are failing. Could you please take a look whenever you have time? :)

- Fix buffers allocation in DiskANNIndexNode::NcsUpload()
@sre-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ronmarcus
To complete the pull request process, please assign alexanderguzhva after the PR has been reviewed.
You can assign the PR to them by writing /assign @alexanderguzhva in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

  - Fix NCSReader thread-safty by using thread_local NcsConnector
  - Add concurency unit test for NCSReader
  - Update milvus-common hash
@ronmarcus
Copy link
Author

@ronmarcus it seems that unit tests are failing. Could you please take a look whenever you have time? :)

@alexanderguzhva thanks for pointing this out.
Fixed in commit 8347129.

conanfile.py Outdated
self.requires("glog/0.6.0")
self.requires("nlohmann_json/3.11.2")
self.requires("openssl/1.1.1t")
self.requires("hiredis/1.2.0")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this used exactly?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a duplicated requirment, I removed it.
I answered below where Hiredis is used.

set( MILVUS-COMMON-VERSION b6629f7 )
set( GIT_REPOSITORY "https://github.com/zilliztech/milvus-common.git" )
set( MILVUS-COMMON-VERSION b0a5e9b )
set( GIT_REPOSITORY "https://github.com/ronmarcus/milvus-common.git" )
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please fix this one appropriately to https://github.com/zilliztech/milvus-common.git or let us know about the proposed change there

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have three related PRs across three repositories: milvus-common, knowhere, and milvus. The high-level idea behind them is described in the linked GitHub issue.

From the point of view of DiskANN, NCS (Near Compute Storage) is an alternative to a locally attached SSD for storing index data. Specifically, NCS can replace local storage when querying the index; the build process remains unchanged.

The proposed change in milvus-common provides the NCS infrastructure, including:

  • Ncs class for NCS bucket management
  • NcsConnector class for writing and reading to/from the NCS backend

NCS is used by both knowhere (to access the data) and milvus (the coordinator manages NCS). Thus, we added the NCS infrastructure to milvus-common.

self.requires("libcurl/8.2.1")
self.requires("simde/0.8.2")
self.requires("xxhash/0.8.3")
self.requires("hiredis/1.2.0")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this used exactly?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hiredis is a client library for the Redis database. It is used by diskann when the Redis implementation of NCS is enabled. Specifically, Hiredis is used in the RedisNcsConnector and RedisNcs classes in milvus-common, which are used by diskann:

  • diskann uses the IndexReader abstract class, which has two implementations, one of which is NCSReader. This uses the NcsConnector abstract class, with RedisNcsConnector as one implementation.
  • In the unit tests under test_diskann.cc, the Ncs abstract class is used for NCS bucket management, with RedisNcs as one of its implementations.

  - Remove duplicate requirment in conanfile.py
  - Add a redis service to the 'ut' github worklow, to support redis NCS unit test
  - Fix memory leaks in unit tests
build_config["search_cache_budget_gb"] = 6.000000212225132e-06;
build_config["search_cache_budget_gb_ratio"] = 0.10000000149011612;
build_config["build_dram_budget_gb"] = 503.04913330078125;
build_config["num_build_thread"] = 80;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this triggers problems on our CI, bcz we use a github machine with 4 CPUs only and a proportional amount of RAM (16 GB, I think) :(

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the test to use a typical config

  - Refactor NCS unit tests to use typical config
  - Refactor diskann to use the updated NcsConnector API using boost:span
std::vector<std::string> filenames;
auto pq_pivots_filename = diskann::get_pq_pivots_filename(prefix);
auto disk_index_filename = diskann::get_disk_index_filename(prefix);
auto disk_index_metadata_filename = diskann::get_disk_index_metadata_filename(prefix);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this changes the file-naming scheme, which could cause serious compatibility issues. Please consider how the new code will load existing indexes, and how compatibility during upgrades will be ensured. Knowhere provides a version mechanism to guarantee compatibility—please make sure the PR has taken this into account.

auto disk_index_filename = diskann::get_disk_index_filename(prefix);
filenames.push_back(diskann::get_disk_index_centroids_filename(disk_index_filename));
filenames.push_back(diskann::get_disk_index_medoids_filename(disk_index_filename));
filenames.push_back(diskann::get_disk_index_centroids_filename(prefix));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same compatibility issue


// load diskann pq code and meta info
std::shared_ptr<AlignedFileReader> reader = nullptr;
const milvus::NcsDescriptor* descriptor = nullptr;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we guarantee consistency between upload and load? It looks like this relies on configuration, but if the configuration is wrong or missing, it could cause a serious load failure. We need to validate this consistency before loading.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants