Skip to content

Unified index format V1#1221

Open
Sanhaoji2 wants to merge 1 commit into
jegao/LabelHotFixfrom
jegao/UnifyIndexFormat
Open

Unified index format V1#1221
Sanhaoji2 wants to merge 1 commit into
jegao/LabelHotFixfrom
jegao/UnifyIndexFormat

Conversation

@Sanhaoji2

Copy link
Copy Markdown
Contributor

Add a single-file unified index container (docs/unified_index_format.md) that can be built once and loaded as either an in-memory or SSD-served index:

  • New class hierarchy: unified_index (interface) + unified_index_base with unified_index_memory / unified_index_ssd, backed by unified_node_store and unified_label_data (bitmask + integer encodings).
  • UnifiedIndexWriter/Reader (unified_index_io) and unified_index_builder for end-to-end build with optional PQ.
  • Index::save_unified emits the container; get_table_stats() exposed on the unified index (mirrors Index/PQFlashIndex).
  • Build the unit tests against a new static diskann_s lib (DISKANN_STATIC_LIB) so internal symbols need not be exported from the DLL.
  • Tests: node-store/label/factory/memory/ssd/builder suites plus legacy<->unified parity (memory, SSD, filtered) and get_table_stats coverage.
  • Fix: populate _label_map during filtered build so save_unified after build emits a valid label dictionary.
  • Does this PR have a descriptive title that could go in our release notes?
  • Does this PR add any new dependencies?
  • Does this PR modify any existing APIs?
  • Is the change to the API backwards compatible?
  • Should this result in any changes to our documentation, either updating existing docs or adding new ones?

Reference Issues/PRs

What does this implement/fix? Briefly explain your changes.

Any other comments?

Add a single-file unified index container (docs/unified_index_format.md) that
can be built once and loaded as either an in-memory or SSD-served index:

- New class hierarchy: unified_index (interface) + unified_index_base<T> with
  unified_index_memory<T> / unified_index_ssd<T>, backed by unified_node_store
  and unified_label_data (bitmask + integer encodings).
- UnifiedIndexWriter/Reader (unified_index_io) and unified_index_builder for
  end-to-end build with optional PQ.
- Index::save_unified emits the container; get_table_stats() exposed on the
  unified index (mirrors Index/PQFlashIndex).
- Build the unit tests against a new static diskann_s lib (DISKANN_STATIC_LIB)
  so internal symbols need not be exported from the DLL.
- Tests: node-store/label/factory/memory/ssd/builder suites plus legacy<->unified
  parity (memory, SSD, filtered) and get_table_stats coverage.
- Fix: populate _label_map during filtered build so save_unified after build
  emits a valid label dictionary.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@hildebrandmw hildebrandmw added the C++ Pull Request targeting C++ label Jul 2, 2026
throw ANNException("unified_node_store_base: offset table size mismatch", -1, __FUNCSIG__, __FILE__,
__LINE__);
}
_coord_bytes = _header.aligned_dim * sizeof(T);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep consistency with write corrd_bytes

Comment thread src/index.cpp
static_cast<uint64_t>(_start));

writer.begin_graph_region();
std::vector<T> vec(dim);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dim ? aligned_dim? keep consistency write/read

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C++ Pull Request targeting C++

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants