Fix: hash set iterator returns deleted tombstones in equal_range#727
Merged
ashvardanian merged 2 commits intounum-cloud:main-devfrom Apr 5, 2026
Merged
Conversation
The equal_range iterator in flat_hash_multi_set_gt did not check the deleted flag when advancing. After add/remove cycles that create tombstones with the same key, the iterator would yield deleted entries as live matches. In index_dense_gt::remove() this caused already-freed slots to be pushed to free_keys_ again, inflating free_keys_.size() and producing unsigned underflow in size(). Reproduces in both multi and non-multi mode (single-threaded) whenever two keys hash-collide and one tombstone is reused by a different key. Closes unum-cloud#697
Contributor
Author
|
I was able to reproduce corruption with next test: |
Contributor
Author
|
I believe that failing checks aren't related with proposed changes. |
DavIvek
added a commit
to memgraph/memgraph
that referenced
this pull request
Mar 12, 2026
The equal_range iterator in flat_hash_multi_set_gt did not check the
deleted flag when advancing, causing it to yield tombstone entries as
live matches. In index_dense_gt::remove() this caused already-freed
slots to be pushed to free_keys_ again, corrupting size() accounting
and leading to crashes ("cannot create std::vector larger than max_size").
Upstream PR: unum-cloud/USearch#727
github-merge-queue bot
pushed a commit
to memgraph/memgraph
that referenced
this pull request
Mar 12, 2026
The `equal_range` iterator in `flat_hash_multi_set_gt` did not check the
deleted flag when advancing, causing it to yield tombstone entries as
live matches. In `index_dense_gt::remove()` this caused already-freed
slots to be pushed to `free_keys_` again, corrupting `size()` accounting
and leading to crashes ("cannot create `std::vector` larger than
`max_size`").
Usearch related PR: unum-cloud/USearch#727
Contributor
Author
|
Hi @ashvardanian. I know things can get busy, so no rush on the review/merge. In the meantime, I’m preparing a patch on our side (https://github.com/memgraph/memgraph) since we have a release coming up in a couple of days. I wanted to check if the proposed fix in the PR generally makes sense from your perspective before we proceed with that approach. Thanks! |
github-merge-queue bot
pushed a commit
to memgraph/memgraph
that referenced
this pull request
Mar 12, 2026
The `equal_range` iterator in `flat_hash_multi_set_gt` did not check the
deleted flag when advancing, causing it to yield tombstone entries as
live matches. In `index_dense_gt::remove()` this caused already-freed
slots to be pushed to `free_keys_` again, corrupting `size()` accounting
and leading to crashes ("cannot create `std::vector` larger than
`max_size`").
Usearch related PR: unum-cloud/USearch#727
Contributor
|
Hi @DavIvek! Sorry it takes some time to review, I'm fully consumed by ashvardanian/NumKong#220. I need to push it through the line before validating the PRs here 🤗 |
Replace lambda with inline loop body — reads slot once, no redundant parameter, same tombstone-skipping logic.
ashvardanian
pushed a commit
that referenced
this pull request
Apr 15, 2026
### Minor - Add: Native UInt8 `u8` API and Float6 `e2m3` & `e3m2` quantization (e4db172) - Add: Float8 support with E5M2 & E4M3 (bf80af8) - Add: Uniform hardware caps queries across SDKs (11f4d5c) - Add: expose `memory_stats` through Rust FFI bindings (d5a04f8) - Add: `memory_stats_t` struct with per-tape breakdowns (8f6f8f6) ### Patch - Fix: Harden `min` & `max` symbols for MSVC (a270651) - Fix: Over-allocate & in-align striped locks (7aee20f) - Improve: Print stack-traces in test singal-handlers (94c7ec9) - Make: Bump NumKong to v7.5 (c666f16) - Docs: Benchmarks table formatting (95963ff) - Improve: Extended `uint8` interfaces for Java, Swift, ObjC, JS (82e1196) - Improve: Smaller & faster `striped_locks_gt` (4a47d18) - Docs: SIFT & BigANN subsets (1c60c6c) - Fix: ObjC compilation issue for `b1x8` (d54383d) - Improve: Reuse `eval.random_vectors` in scripts (2b7f469) - Improve: New benchmarking suite (4f6d2d7) - Improve: Type annotations for Python SDK & scripts (c39b15b) - Improve: Extended `bench.cpp` & datasets for 100M scale (dac569c) - Make: Split flaky CIBW jobs & add retries (5e0139f) - Make: Drop deprecated SPDX license expression (#744) (145272f) - Make: Drop EOL language versions and standardize CI runners (0a803a9) - Fix: Don't compile NumKong C sources into macOS wheel (2c44fb0) - Make: Bump deps for MSVC compatibility (36981b0) - Fix: Re-generate JNI header for `hardware_acceleration` checks (1a3a86e) - Make: Newer `/Zc:preprocessor` for MSVC in `binding.gyp` (79cd39f) - Fix: NumKong dispatch for Sorensen and binary vector dimensions (5546dd6) - Fix: Use `RAND_MAX` instead of `INT_MAX` for random test vectors (d45a616) - Make: Use "Trusted Publishing" for NPM (8a02676) - Fix: PyTest module collision & PeachPy on Python 3.11+ (8ed2548) - Fix: MSVC type-conversion warnings (#709) (98486cd) - Improve: Test remove/rename/multi in Rust (17dfac3) - Make: Arm32-friendly NumKong version (6f65872) - Fix: Move-safe init for Rust SDK (#704) (05b5eb0) - Fix: Check for `config.expansion` early in `add` (fb2e2f7) - Fix: Stale vector references in `refine_` within `index_get::update` (#731) (451e2d5) - Fix: Deduplication logic in `form_reverse_links_` (#729) (9ea8233) - Make: Switch to Ruff linter (#737) (bff98ed) - Fix: OpenMP flags forwarding in `build.rs` (#724) (46256a2) - Fix: Heap-buffer-overflow in HNSW search via sorted_buffer_gt (5697c60) - Fix: Serial Float16 conversion fallbacks (49b8b2c) - Make: Avoid `npm ci` in non-dependency-locked state (42e9203) - Make: Bump Clang due to LLVM optimizer bug (5bd15bb) - Make: Don't version removed dependency lock files (62724d4) - Fix: Hash set iterator returns deleted tombstones in `equal_range` (#727) (7d95f6f) - Fix: `MAP_FAILED` on Windows (#742) (786a7d9) - Fix: Escalate `MAP_FAILED` on POSIX (#722) (ba47347) - Fix: Strip nested sub-directories from package distributions (#733) (3832c02) - Fix: Guard Float16 APIs with `#if arch(arm64)` in Swift SDK (#739) (5bcf787) - Docs: Rust `Cos` metric docstring (#734) (c9ff5b8) - Fix: Avoid `IntPtr` global out-of-scope alias in C# (5413341) - Improve: Expose & test Float8 in Java, ObjC, Go, JS, Rust (bebb539) - Fix: Stale symbol name in `join` demo (ef0dc4f) - Improve: Allow non-sequential vector IDs in `bench` (11bcde1) - Make: Pull NumKong from PyPi (1b46d1b) - Make: Pull NumKong via CMake (b0f8708) - Make: Pull SwiftPM NumKong dependency (eede286) - Make: Pull NumKong crate (46347ec) - Make: Switch from SimSIMD v6 to NumKong v7 submodule (b2361c3) - Docs: Link capitalization (98eeb9e) - Fix: Set equality comparison in Numba JIT PyTest (30381b7) - Fix: Avoid duplicate neighbor slots in HNSW reverse links (#699) (6995246) - Fix: Postfix operators for member/candidate iterators in `index_gt` (#718) (7fbf086) - Fix: Potential UB in `index_dense::add` (#721) (6b69e95) - Fix: Accurately track total allocated memory in `memory_mapping_allocator_gt` (#608) (3c25f20) - Fix: Python `serialized_length` missing argument (#714) (5cde3fc)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The equal_range iterator in
flat_hash_multi_set_gtdid not check the deleted flag when advancing. After add/remove cycles that create tombstones with the same key, the iterator would yield deleted entries as live matches. Inindex_dense_gt::remove()this caused already-freed slots to be pushed tofree_keys_again, inflatingfree_keys_.size()and producing unsigned underflow insize().Reproduces in both multi and non-multi mode (single-threaded) whenever two keys hash-collide and one tombstone is reused by a different key.
Closes #697