Fix: hash set iterator returns deleted tombstones in equal_range by DavIvek · Pull Request #727 · unum-cloud/USearch

DavIvek · 2026-03-10T15:29:49Z

The equal_range iterator in flat_hash_multi_set_gt did not check the deleted flag when advancing. After add/remove cycles that create tombstones with the same key, the iterator would yield deleted entries as live matches. In index_dense_gt::remove() this caused already-freed slots to be pushed to free_keys_ again, inflating free_keys_.size() and producing unsigned underflow in size().

Reproduces in both multi and non-multi mode (single-threaded) whenever two keys hash-collide and one tombstone is reused by a different key.

Closes #697

The equal_range iterator in flat_hash_multi_set_gt did not check the deleted flag when advancing. After add/remove cycles that create tombstones with the same key, the iterator would yield deleted entries as live matches. In index_dense_gt::remove() this caused already-freed slots to be pushed to free_keys_ again, inflating free_keys_.size() and producing unsigned underflow in size(). Reproduces in both multi and non-multi mode (single-threaded) whenever two keys hash-collide and one tombstone is reused by a different key. Closes unum-cloud#697

DavIvek · 2026-03-10T15:32:13Z

I was able to reproduce corruption with next test:

#include <usearch/index_plugins.hpp>

namespace {

struct KeySlot {
    std::uint64_t key;
    std::uint64_t slot;
};

// Guaranteed collisions.
struct CollidingHash {
    using is_transparent = void;
    std::size_t operator()(KeySlot const &) const noexcept { return 0; }
    std::size_t operator()(std::uint64_t) const noexcept { return 0; }
};

struct KeySlotEqual {
    using is_transparent = void;
    bool operator()(KeySlot const &a, std::uint64_t b) const noexcept { return a.key == b; }
    bool operator()(std::uint64_t a, KeySlot const &b) const noexcept { return a == b.key; }
    bool operator()(KeySlot const &a, KeySlot const &b) const noexcept { return a.key == b.key; }
};

using HashSet = unum::usearch::flat_hash_multi_set_gt<KeySlot, CollidingHash, KeySlotEqual>;

}  // namespace

// Simulates index_dense_gt's add/remove cycle and shows size() corruption.
TEST(UsearchTombstoneBug, SizeCorruption) {
    HashSet slot_lookup;

    // Simulated counters (mirrors index_dense_gt internals)
    std::size_t typed_size = 0;
    std::vector<std::uint64_t> free_keys;  // simulated ring buffer

    auto simulated_size = [&]() -> std::int64_t {
        return static_cast<std::int64_t>(typed_size) - static_cast<std::int64_t>(free_keys.size());
    };

    auto simulate_add = [&](std::uint64_t key) {
        std::uint64_t slot;
        if (!free_keys.empty()) {
            slot = free_keys.back();
            free_keys.pop_back();
        } else {
            slot = typed_size++;
        }
        slot_lookup.try_emplace({key, slot});
    };

    auto simulate_remove = [&](std::uint64_t key) {
        auto [begin, end] = slot_lookup.equal_range(key);
        for (auto it = begin; it != end; ++it) {
            free_keys.push_back((*it).slot);
        }
        slot_lookup.erase(key);
    };

    // Build the toxic pattern: [live-K, tombstone-K] in the same cluster.
    //
    // 1. add(J)    → [pos 0: {J, slot0}]
    // 2. add(K)    → [pos 0: {J, slot0}, pos 1: {K, slot1}]  (collision)
    // 3. remove(J) → [pos 0: {J, slot0}☠, pos 1: {K, slot1}]
    // 4. remove(K) → [pos 0: {J, slot0}☠, pos 1: {K, slot1}☠]
    // 5. add(K)    → reuses tombstone at pos 0: [pos 0: {K, slot0}, pos 1: {K, slot1}☠]
    //                                            ^^^^^^^^ live       ^^^^^^^^ tombstone
    //
    // Now remove(K) calls equal_range(K). The buggy iterator sees BOTH the
    // live entry at pos 0 AND the tombstone at pos 1 as matches.
    // It pushes slot0 AND slot1 to free_keys_, but only slot0 was actually live.
    // slot1 was already in free_keys_ from step 4. → phantom duplicate → size() drift.

    constexpr std::uint64_t K = 1;
    constexpr std::uint64_t J = 2;

    simulate_add(J);                            // step 1
    simulate_add(K);                            // step 2
    simulate_remove(J);                         // step 3
    simulate_remove(K);                         // step 4
    simulate_add(K);                            // step 5

    ASSERT_EQ(simulated_size(), 1) << "One live entry before the critical remove";

    simulate_remove(K);                         // step 6: THE BUGGY REMOVE

    // Bug: free_keys has a phantom duplicate
    EXPECT_EQ(simulated_size(), 0)
        << "size() should be 0 after removing the only entry, but free_keys_ has "
        << free_keys.size() << " entries vs typed_size " << typed_size
        << " → size() = " << simulated_size() << " (phantom duplicates in free_keys)";
}

DavIvek · 2026-03-10T15:39:47Z

I believe that failing checks aren't related with proposed changes.

The equal_range iterator in flat_hash_multi_set_gt did not check the deleted flag when advancing, causing it to yield tombstone entries as live matches. In index_dense_gt::remove() this caused already-freed slots to be pushed to free_keys_ again, corrupting size() accounting and leading to crashes ("cannot create std::vector larger than max_size"). Upstream PR: unum-cloud/USearch#727

The `equal_range` iterator in `flat_hash_multi_set_gt` did not check the deleted flag when advancing, causing it to yield tombstone entries as live matches. In `index_dense_gt::remove()` this caused already-freed slots to be pushed to `free_keys_` again, corrupting `size()` accounting and leading to crashes ("cannot create `std::vector` larger than `max_size`"). Usearch related PR: unum-cloud/USearch#727

DavIvek · 2026-03-12T13:41:28Z

Hi @ashvardanian. I know things can get busy, so no rush on the review/merge.

In the meantime, I’m preparing a patch on our side (https://github.com/memgraph/memgraph) since we have a release coming up in a couple of days. I wanted to check if the proposed fix in the PR generally makes sense from your perspective before we proceed with that approach.

Thanks!

The `equal_range` iterator in `flat_hash_multi_set_gt` did not check the deleted flag when advancing, causing it to yield tombstone entries as live matches. In `index_dense_gt::remove()` this caused already-freed slots to be pushed to `free_keys_` again, corrupting `size()` accounting and leading to crashes ("cannot create `std::vector` larger than `max_size`"). Usearch related PR: unum-cloud/USearch#727

ashvardanian · 2026-03-12T19:13:12Z

Hi @DavIvek! Sorry it takes some time to review, I'm fully consumed by ashvardanian/NumKong#220. I need to push it through the line before validating the PRs here 🤗

Replace lambda with inline loop body — reads slot once, no redundant parameter, same tombstone-skipping logic.

### Minor - Add: Native UInt8 `u8` API and Float6 `e2m3` & `e3m2` quantization (e4db172) - Add: Float8 support with E5M2 & E4M3 (bf80af8) - Add: Uniform hardware caps queries across SDKs (11f4d5c) - Add: expose `memory_stats` through Rust FFI bindings (d5a04f8) - Add: `memory_stats_t` struct with per-tape breakdowns (8f6f8f6) ### Patch - Fix: Harden `min` & `max` symbols for MSVC (a270651) - Fix: Over-allocate & in-align striped locks (7aee20f) - Improve: Print stack-traces in test singal-handlers (94c7ec9) - Make: Bump NumKong to v7.5 (c666f16) - Docs: Benchmarks table formatting (95963ff) - Improve: Extended `uint8` interfaces for Java, Swift, ObjC, JS (82e1196) - Improve: Smaller & faster `striped_locks_gt` (4a47d18) - Docs: SIFT & BigANN subsets (1c60c6c) - Fix: ObjC compilation issue for `b1x8` (d54383d) - Improve: Reuse `eval.random_vectors` in scripts (2b7f469) - Improve: New benchmarking suite (4f6d2d7) - Improve: Type annotations for Python SDK & scripts (c39b15b) - Improve: Extended `bench.cpp` & datasets for 100M scale (dac569c) - Make: Split flaky CIBW jobs & add retries (5e0139f) - Make: Drop deprecated SPDX license expression (#744) (145272f) - Make: Drop EOL language versions and standardize CI runners (0a803a9) - Fix: Don't compile NumKong C sources into macOS wheel (2c44fb0) - Make: Bump deps for MSVC compatibility (36981b0) - Fix: Re-generate JNI header for `hardware_acceleration` checks (1a3a86e) - Make: Newer `/Zc:preprocessor` for MSVC in `binding.gyp` (79cd39f) - Fix: NumKong dispatch for Sorensen and binary vector dimensions (5546dd6) - Fix: Use `RAND_MAX` instead of `INT_MAX` for random test vectors (d45a616) - Make: Use "Trusted Publishing" for NPM (8a02676) - Fix: PyTest module collision & PeachPy on Python 3.11+ (8ed2548) - Fix: MSVC type-conversion warnings (#709) (98486cd) - Improve: Test remove/rename/multi in Rust (17dfac3) - Make: Arm32-friendly NumKong version (6f65872) - Fix: Move-safe init for Rust SDK (#704) (05b5eb0) - Fix: Check for `config.expansion` early in `add` (fb2e2f7) - Fix: Stale vector references in `refine_` within `index_get::update` (#731) (451e2d5) - Fix: Deduplication logic in `form_reverse_links_` (#729) (9ea8233) - Make: Switch to Ruff linter (#737) (bff98ed) - Fix: OpenMP flags forwarding in `build.rs` (#724) (46256a2) - Fix: Heap-buffer-overflow in HNSW search via sorted_buffer_gt (5697c60) - Fix: Serial Float16 conversion fallbacks (49b8b2c) - Make: Avoid `npm ci` in non-dependency-locked state (42e9203) - Make: Bump Clang due to LLVM optimizer bug (5bd15bb) - Make: Don't version removed dependency lock files (62724d4) - Fix: Hash set iterator returns deleted tombstones in `equal_range` (#727) (7d95f6f) - Fix: `MAP_FAILED` on Windows (#742) (786a7d9) - Fix: Escalate `MAP_FAILED` on POSIX (#722) (ba47347) - Fix: Strip nested sub-directories from package distributions (#733) (3832c02) - Fix: Guard Float16 APIs with `#if arch(arm64)` in Swift SDK (#739) (5bcf787) - Docs: Rust `Cos` metric docstring (#734) (c9ff5b8) - Fix: Avoid `IntPtr` global out-of-scope alias in C# (5413341) - Improve: Expose & test Float8 in Java, ObjC, Go, JS, Rust (bebb539) - Fix: Stale symbol name in `join` demo (ef0dc4f) - Improve: Allow non-sequential vector IDs in `bench` (11bcde1) - Make: Pull NumKong from PyPi (1b46d1b) - Make: Pull NumKong via CMake (b0f8708) - Make: Pull SwiftPM NumKong dependency (eede286) - Make: Pull NumKong crate (46347ec) - Make: Switch from SimSIMD v6 to NumKong v7 submodule (b2361c3) - Docs: Link capitalization (98eeb9e) - Fix: Set equality comparison in Numba JIT PyTest (30381b7) - Fix: Avoid duplicate neighbor slots in HNSW reverse links (#699) (6995246) - Fix: Postfix operators for member/candidate iterators in `index_gt` (#718) (7fbf086) - Fix: Potential UB in `index_dense::add` (#721) (6b69e95) - Fix: Accurately track total allocated memory in `memory_mapping_allocator_gt` (#608) (3c25f20) - Fix: Python `serialized_length` missing argument (#714) (5cde3fc)

DavIvek marked this pull request as draft March 10, 2026 15:32

DavIvek marked this pull request as ready for review March 10, 2026 15:38

DavIvek mentioned this pull request Mar 12, 2026

Fix: Patch usearch tombstone bug in equal_range iterator memgraph/memgraph#3899

Merged

Simplify equal_range iterator increment

9d88bfa

Replace lambda with inline loop body — reads slot once, no redundant parameter, same tombstone-skipping logic.

ashvardanian merged commit 7d95f6f into unum-cloud:main-dev Apr 5, 2026
2 of 11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: hash set iterator returns deleted tombstones in equal_range#727

Fix: hash set iterator returns deleted tombstones in equal_range#727
ashvardanian merged 2 commits intounum-cloud:main-devfrom
DavIvek:fix/hash-set-tombstone-iterator

DavIvek commented Mar 10, 2026

Uh oh!

DavIvek commented Mar 10, 2026

Uh oh!

DavIvek commented Mar 10, 2026 •

edited

Loading

Uh oh!

DavIvek commented Mar 12, 2026 •

edited

Loading

Uh oh!

ashvardanian commented Mar 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

DavIvek commented Mar 10, 2026

Uh oh!

DavIvek commented Mar 10, 2026

Uh oh!

DavIvek commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DavIvek commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ashvardanian commented Mar 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

DavIvek commented Mar 10, 2026 •

edited

Loading

DavIvek commented Mar 12, 2026 •

edited

Loading