Skip to content

deadlock: VectorDb::delete double-acquires stats RwLock #437

@proffesor-for-testing

Description

@proffesor-for-testing

Summary

VectorDb::delete in crates/ruvector-router-core/src/vector_db.rs:153 calls self.stats.write() twice in the same expression. The temporary write guard from the LHS lives until the end of the statement, so the RHS .write() deadlocks waiting for itself. Any caller of delete() hangs forever the moment the storage delete succeeds.

Reproducer

use ruvector_router_core::{VectorDb, VectorDbConfig};

let db = VectorDb::new(VectorDbConfig { dimensions: 3, ..Default::default() }).unwrap();
db.insert(VectorEntry { id: "1".into(), vector: vec![1.0, 2.0, 3.0], ..Default::default() }).unwrap();
db.delete("1").unwrap();   // never returns

This was observed downstream in cognitum-one/seedcargo test --features foundation hangs on router::tests::test_delete (a 6-line wrapper test). 50-minute soak with no rustc child active, just the test binary spinning. Tracked at cognitum-one/seed#142.

Root cause

// crates/ruvector-router-core/src/vector_db.rs:147-157
pub fn delete(&self, id: &str) -> Result<bool> {
    let deleted = self.storage.delete(id)?;

    if deleted {
        self.index.remove(id)?;
        self.stats.write().total_vectors = self.stats.write().total_vectors.saturating_sub(1);
        // ^^^^^^^^^^^^^^^^^^                ^^^^^^^^^^^^^^^^^^
        // LHS guard's lifetime extends      RHS tries to acquire
        // until end of statement (`;`)      → deadlock under
        //                                    parking_lot::RwLock
    }
    Ok(deleted)
}

In Rust, the place expression on the LHS of an assignment is evaluated last, but the temporary it returns (the RwLockWriteGuard) lives until the end of the enclosing statement. The RHS evaluates first and its own .write() guard can't drop until the statement ends either. Both guards are alive at the same moment when the RHS write call attempts to acquire — it deadlocks.

Proposed fix

 pub fn delete(&self, id: &str) -> Result<bool> {
     let deleted = self.storage.delete(id)?;

     if deleted {
         self.index.remove(id)?;
-        self.stats.write().total_vectors = self.stats.write().total_vectors.saturating_sub(1);
+        let mut stats = self.stats.write();
+        stats.total_vectors = stats.total_vectors.saturating_sub(1);
     }

     Ok(deleted)
 }

One write lock acquired, mutated in place, released at end of if block. Same semantics, no deadlock.

Suggested regression test

#[test]
fn delete_does_not_deadlock() {
    let db = VectorDb::new(VectorDbConfig { dimensions: 3, ..Default::default() }).unwrap();
    db.insert(VectorEntry { id: "x".into(), vector: vec![1.0, 2.0, 3.0], metadata: Default::default(), timestamp: 0 }).unwrap();
    assert!(db.delete("x").unwrap());
    assert_eq!(db.count().unwrap(), 0);
}

Without delete_does_not_deadlock the cargo runner hangs forever; with the fix it completes in milliseconds.

Scope

I grepped the rest of crates/ruvector-router-core/ for the same .write() ... = self...write() shape — this is the only occurrence in that crate. Worth a quick sweep across the other ruvector crates for the same pattern when you're at it.

Why filing instead of PRing

I have read-only access on ruvnet/ruvector. Happy to open the PR off my fork (proffesor-for-testing/ruvector) if useful — let me know. In the meantime cognitum-one/seed#143 lands the test infrastructure with this specific test commented out and a TODO referencing this issue.

cc @ruvnet

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions