Add HNSW Layered Index Support by julianmi · Pull Request #2148 · rapidsai/cuvs

julianmi · 2026-06-01T08:36:25Z

The CAGRA graph built by the disk-backed ACE algorithm partitions the dataset. Thus, the CAGRA graph uses the reordered index space. Building a HNSW index using hnsw::from_cagra uses the reordered dataset and CAGRA graph. Downstream consumers building an HNSW index would therefore require the reordered dataset, which is typically large when requiring the disk-backed ACE algorithm. Thus, building only the layers of the HNSW index without the dataset and moving this to the search node can minimize the network transfers for downstream consumers if they have the original dataset locally available. The hnsw::deserialize step then takes the layered index and combines it with the local dataset to form a hnswlib compatible search index.

Artifact Layout

hnsw_index.cuvs
  fixed_header
    magic = CUVS_HNSW_LAYERED
    version
    metadata_offset
    metadata_size

  metadata_json
    dataset shape, dtype, metric
    hnsw parameters
    section sizes
    upper-layer descriptors

  levels
    uint8[n_rows]
    indexed by original dataset row ID

  base_nodes
    uint32[n_rows]
    maps each base topology row to original row ID

  base_links
    n_rows fixed-size hnswlib-ready rows
    [count:uint32][neighbors:uint32[maxM0]]
    neighbors are original row IDs

  upper_nodes
    concatenated uint32 original row IDs for layers 1..maxlevel

  upper_links
    fixed-size hnswlib-ready rows
    [count:uint32][neighbors:uint32[maxM]]
    neighbors are original row IDs

Layered HNSW Serialization

The layered serializer creates hnsw_index.cuvs from the disk-backed ACE graph.

Create the .cuvs file and write the fixed header and metadata JSON.
Generate HNSW levels in original ID space.
Write levels sequentially.
Read dataset_mapping.npy sequentially into reordered_to_original.
Read cagra_graph.npy source-sequentially in ACE reordered row order.
For each ACE graph row:
- write base_nodes[row] = reordered_to_original[ace_reordered_row]
- convert each neighbor from ACE reordered ID to original ID
- write a padded hnswlib-ready row to base_links[row]
Gather promoted vectors from the original dataset.
Build upper-layer graphs using temporary HNSW promoted order.
Write upper_nodes and upper_links with node IDs and neighbor IDs converted back to original IDs.

This keeps remapping, link padding, and upper-layer KNN work on the build node.

Deserialization

The search node reads:

hnsw_index.cuvs
the external original-order dataset from index_params.dataset_path

The loader:

Reads the fixed header and metadata.
Validates artifact shape, section sizes, and dataset shape.
Reads levels sequentially.
Allocates hnswlib storage.
Reads the external dataset sequentially in original row order.
Initializes hnswlib with:
- internal ID = original ID
- label = original ID
- level = levels[original_id]
Reads base_nodes and base_links sequentially.
Copies each base link row into get_linklist0(base_node_id).
Reads upper_nodes and upper_links sequentially by layer.
Copies each upper link row into get_linklist(node_id, level).

The search node does no graph remapping, no level generation, no link padding, and no KNN work.

Disk Access Patterns

Build node:

Sequential scan of the original dataset for ACE partitioning.
Buffered partition writes for reordered and augmented datasets.
Contiguous per-partition reads from reordered_dataset.npy and augmented_dataset.npy.
Source-sequential reads from cagra_graph.npy when creating the final layered artifact.
Sequential writes to hnsw_index.cuvs.

Search node:

Sequential reads from hnsw_index.cuvs.
Sequential reads from the external original-order dataset.
Scatter writes only into in-memory hnswlib link storage by original ID.

Runtime Requirements

Only hnsw_index.cuvs is copied to the search node. ACE temporary files remain build-node-only.

The search node must have the original dataset in original row order and must provide that path through index_params.dataset_path.

Misc

Unifies the logging format of the ACE algorithm.

- Add base node IDs for sequential access. - Scattered writes happen only in deserialization step using host memory.

copy-pr-bot · 2026-06-01T08:36:28Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

julianmi added 6 commits June 1, 2026 10:15

Add HNSW layered hierarchy

52997da

Improve deserialization logging

a11d9ee

Use ace prefix in benchmarking consistently

a95b0e0

Validate metadata before allocating

47e2d85

Store layered base topology by original node ID

21ce339

- Add base node IDs for sequential access. - Scattered writes happen only in deserialization step using host memory.

Unify the ACE logging format

4514bc8

github-project-automation Bot added this to Unstructured Data Processing Jun 1, 2026

tfeher requested a review from mfoerste4 June 1, 2026 09:49

Merge branch 'main' into hnsw-layered-index

0e9accd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add HNSW Layered Index Support#2148

Add HNSW Layered Index Support#2148
julianmi wants to merge 7 commits into
rapidsai:mainfrom
julianmi:hnsw-layered-index

julianmi commented Jun 1, 2026

Uh oh!

copy-pr-bot Bot commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

julianmi commented Jun 1, 2026

Artifact Layout

Layered HNSW Serialization

Deserialization

Disk Access Patterns

Runtime Requirements

Misc

Uh oh!

copy-pr-bot Bot commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant