feat: [NCS] Integrate NCS support into DiskANN#1418
feat: [NCS] Integrate NCS support into DiskANN#1418ronmarcus wants to merge 5 commits intozilliztech:mainfrom
Conversation
This commit updates DiskANN to support Near-Compute Storage (NCS), enabling index data to be served from a shared KV storage tier. Key changes: - Refactored `PQFlashIndex` to use an abstract `IndexReader` interface, replacing direct file I/O. - Implemented `NCSReader` to fetch index nodes via the `milvus-common` NCS connector. - Implemented `FileIndexReader` to maintain backward compatibility with local file storage. - Added `NcsUpload` to the Index class, enabling the upload of local index files to the NCS tier. - Updated `DiskANN` loading and searching logic to support key-based data retrieval (`ReadReq`). - Added Python bindings and unit tests for NCS-enabled DiskANN. This allows DiskANN to scale independently of local storage constraints by leveraging the NCS tier. issue: milvus-io/milvus#45178
|
Welcome @ronmarcus! It looks like this is your first PR to zilliztech/knowhere 🎉 |
|
@ronmarcus 🔍 Important: PR Classification Needed! For efficient project management and a seamless review process, it's essential to classify your PR correctly. Here's how:
For any PR outside the kind/improvement category, ensure you link to the associated issue using the format: “issue: #”. Thanks for your efforts and contribution to the community!. |
|
@ronmarcus it seems that unit tests are failing. Could you please take a look whenever you have time? :) |
- Fix buffers allocation in DiskANNIndexNode::NcsUpload()
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: ronmarcus The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
- Fix NCSReader thread-safty by using thread_local NcsConnector - Add concurency unit test for NCSReader - Update milvus-common hash
@alexanderguzhva thanks for pointing this out. |
conanfile.py
Outdated
| self.requires("glog/0.6.0") | ||
| self.requires("nlohmann_json/3.11.2") | ||
| self.requires("openssl/1.1.1t") | ||
| self.requires("hiredis/1.2.0") |
There was a problem hiding this comment.
where is this used exactly?
There was a problem hiding this comment.
This was a duplicated requirment, I removed it.
I answered below where Hiredis is used.
| set( MILVUS-COMMON-VERSION b6629f7 ) | ||
| set( GIT_REPOSITORY "https://github.com/zilliztech/milvus-common.git" ) | ||
| set( MILVUS-COMMON-VERSION b0a5e9b ) | ||
| set( GIT_REPOSITORY "https://github.com/ronmarcus/milvus-common.git" ) |
There was a problem hiding this comment.
please fix this one appropriately to https://github.com/zilliztech/milvus-common.git or let us know about the proposed change there
There was a problem hiding this comment.
I have three related PRs across three repositories: milvus-common, knowhere, and milvus. The high-level idea behind them is described in the linked GitHub issue.
From the point of view of DiskANN, NCS (Near Compute Storage) is an alternative to a locally attached SSD for storing index data. Specifically, NCS can replace local storage when querying the index; the build process remains unchanged.
The proposed change in milvus-common provides the NCS infrastructure, including:
Ncsclass for NCS bucket managementNcsConnectorclass for writing and reading to/from the NCS backend
NCS is used by both knowhere (to access the data) and milvus (the coordinator manages NCS). Thus, we added the NCS infrastructure to milvus-common.
| self.requires("libcurl/8.2.1") | ||
| self.requires("simde/0.8.2") | ||
| self.requires("xxhash/0.8.3") | ||
| self.requires("hiredis/1.2.0") |
There was a problem hiding this comment.
where is this used exactly?
There was a problem hiding this comment.
Hiredis is a client library for the Redis database. It is used by diskann when the Redis implementation of NCS is enabled. Specifically, Hiredis is used in the RedisNcsConnector and RedisNcs classes in milvus-common, which are used by diskann:
- diskann uses the
IndexReaderabstract class, which has two implementations, one of which isNCSReader. This uses theNcsConnectorabstract class, withRedisNcsConnectoras one implementation. - In the unit tests under
test_diskann.cc, theNcsabstract class is used for NCS bucket management, withRedisNcsas one of its implementations.
- Remove duplicate requirment in conanfile.py - Add a redis service to the 'ut' github worklow, to support redis NCS unit test - Fix memory leaks in unit tests
tests/ut/test_diskann.cc
Outdated
| build_config["search_cache_budget_gb"] = 6.000000212225132e-06; | ||
| build_config["search_cache_budget_gb_ratio"] = 0.10000000149011612; | ||
| build_config["build_dram_budget_gb"] = 503.04913330078125; | ||
| build_config["num_build_thread"] = 80; |
There was a problem hiding this comment.
this triggers problems on our CI, bcz we use a github machine with 4 CPUs only and a proportional amount of RAM (16 GB, I think) :(
There was a problem hiding this comment.
I changed the test to use a typical config
- Refactor NCS unit tests to use typical config - Refactor diskann to use the updated NcsConnector API using boost:span
| std::vector<std::string> filenames; | ||
| auto pq_pivots_filename = diskann::get_pq_pivots_filename(prefix); | ||
| auto disk_index_filename = diskann::get_disk_index_filename(prefix); | ||
| auto disk_index_metadata_filename = diskann::get_disk_index_metadata_filename(prefix); |
There was a problem hiding this comment.
It looks like this changes the file-naming scheme, which could cause serious compatibility issues. Please consider how the new code will load existing indexes, and how compatibility during upgrades will be ensured. Knowhere provides a version mechanism to guarantee compatibility—please make sure the PR has taken this into account.
| auto disk_index_filename = diskann::get_disk_index_filename(prefix); | ||
| filenames.push_back(diskann::get_disk_index_centroids_filename(disk_index_filename)); | ||
| filenames.push_back(diskann::get_disk_index_medoids_filename(disk_index_filename)); | ||
| filenames.push_back(diskann::get_disk_index_centroids_filename(prefix)); |
|
|
||
| // load diskann pq code and meta info | ||
| std::shared_ptr<AlignedFileReader> reader = nullptr; | ||
| const milvus::NcsDescriptor* descriptor = nullptr; |
There was a problem hiding this comment.
How do we guarantee consistency between upload and load? It looks like this relies on configuration, but if the configuration is wrong or missing, it could cause a serious load failure. We need to validate this consistency before loading.
This commit updates DiskANN to support Near-Compute Storage (NCS), enabling index data to be served from a shared KV storage tier.
Key changes:
PQFlashIndexto use an abstractIndexReaderinterface, replacing direct file I/O.NCSReaderto fetch index nodes via themilvus-commonNCS connector.FileIndexReaderto maintain backward compatibility with local file storage.NcsUploadto the Index class, enabling the upload of local index files to the NCS tier.DiskANNloading and searching logic to support key-based data retrieval (ReadReq).This allows DiskANN to scale independently of local storage constraints by leveraging the NCS tier.
issue: milvus-io/milvus#45178