Skip to content

feat: implement core/aggsigdb#430

Draft
emlautarom1 wants to merge 11 commits into
mainfrom
emlautarom1/core-aggsigdb
Draft

feat: implement core/aggsigdb#430
emlautarom1 wants to merge 11 commits into
mainfrom
emlautarom1/core-aggsigdb

Conversation

@emlautarom1
Copy link
Copy Markdown
Collaborator

Closes #169


There are two implementations in Charon: V1 uses an Actor model with commands and queries, and V2 uses a Mutex alongside Go runtime trickery. The implementation in use in Charon is V1 due to concurrency issues in V2.

This implementation is closer to V2, and we use tokio's tools (ex. Notify) to replicate the Go runtime features.

Tests have been ported when appropriate, and some new ones were added to increase coverage.

- Use Actor model
- Implement `store`
- Test `store`
- Expose top-level `new`
- Remove impossible errors
- Add inline docs
- Use Mutex/Notify pattern
- Remove need for handles
- Adjust tests and docs
- Use existing Deadliner API (Arc)
- Add tests
- Rust specific
- Use `try_for_each`
- Proper naming convention
@emlautarom1
Copy link
Copy Markdown
Collaborator Author

@claude

@claude
Copy link
Copy Markdown

claude Bot commented May 21, 2026

Claude finished @emlautarom1's task in 3m 32s —— View job


Claude PR Review — core/aggsigdb

  • Read PR diff and changed files
  • Compare with Charon Go implementation (memory.go, memory_v2.go)
  • Review correctness, concurrency, and Rust style
  • Post review findings

Overall: the implementation is clean, well-tested, and faithfully follows Charon's V2 mutex+notify design with idiomatic Tokio primitives. The Notify::notified() → enable() → check map → await pattern is correctly applied to avoid lost wakeups, and the eviction task lifetime via Weak<MemDBInner> is elegant. A few items below are worth addressing before merge — most are design questions rather than bugs.


🟠 Cancellation: wait_for has no way to be aborted

crates/core/src/aggsigdb/memory.rs:95-118MemDB::wait_for returns Box<dyn SignedData> and has no timeout/cancellation parameter. Charon's Await(ctx, ...) takes a context.Context and returns an error on cancel or DB stop. In Rust this can be papered over by callers with tokio::select!/tokio::time::timeout, but the lost flexibility is worth a conscious choice: e.g. accepting a CancellationToken or returning a Result so the caller can distinguish "I gave up" from "value not yet stored." Worth confirming this is the intended Pluto API surface and not just a temporary simplification, since the consumers of AggSigDB (sigagg / fetcher‐glue) need a cancel path.

🟠 Deadliner::c() returning None is silently a no-op

crates/core/src/aggsigdb/memory.rs:39-46:

match deadliner.c() {
    Some(evictions) => {
        tokio::spawn(Self::evict(Arc::downgrade(&this.0), evictions));
    }
    None => {
        // TODO: In Charon, `deadliner.c()` always returns `Some`
    }
}

A None here disables eviction entirely with no log/warn, which makes misconfiguration (e.g. somebody else already took the receiver via c()) extremely hard to diagnose — a long‑running node will quietly leak memory. Three options worth considering:

  1. expect("aggsigdb requires exclusive access to the deadliner") — matches Charon's invariant.
  2. tracing::warn! at minimum so it's visible.
  3. Reshape the Deadliner API so the receiver is passed in directly (then misuse is a type error).

🟡 tokio::sync::Mutex for purely-synchronous critical sections

crates/core/src/aggsigdb/memory.rs:25: data: Mutex<HashMap<...>> uses tokio::sync::Mutex, but the lock is never held across .await (the only post‑lock work is notify.notify_waiters() which is sync). std::sync::Mutex (or parking_lot::Mutex) would be cheaper and more idiomatic per Tokio's own guidance. The notify_waiters call comes after the lock drops, so swapping is safe.

🟡 Eviction is O(n) over the full map instead of O(k) per duty

crates/core/src/aggsigdb/memory.rs:51-61:

inner.data.lock().await.retain(|(d, _), _| d != &duty);

Charon V2 maintains a keysByDuty map[Duty][]memDBKey precisely to avoid scanning the entire data map on every eviction (memory_v2.go:18, 54, 128-138). At steady state with many active duties per slot, this matters. Not a correctness issue, but it drops a deliberate optimization from the original. Consider adding the secondary index — it's a few lines and removes a perf regression vs. Charon. Fix this →

🟡 notify_waiters is fired even on no-op stores

crates/core/src/aggsigdb/memory.rs:86-89: the TODO already calls this out. Two waste modes:

  1. An idempotent re-store (existing == new) still wakes every waiter.
  2. An error path (MismatchingData) also wakes them.

In a steady-state DV node with many parallel waiters this is fine but not free. The fix is a bool inserted_any flag built in the try_for_each, and only calling notify_waiters when true. Cheap to do now and the TODO is right there. The fancier per‑key wakeup mentioned in the second half of the TODO is overkill — the boolean is enough.

🟡 store calls deadliner.add(duty) before taking the data lock

crates/core/src/aggsigdb/memory.rs:64-68 — Charon V2 calls m.deadliner.Add(duty) inside the write lock (memory_v2.go:41, called from store while Store holds m.Lock()). The Rust port calls it before locking. The window matters only if add could schedule an immediate eviction for an already‑expired duty before we even take the lock: in that case evict removes nothing (key not yet inserted), then we insert, and the entry survives past its deadline. Charon has the same race in V2 actually (since Add posts to an mpsc, the eviction is scheduled async), so this isn't a regression — but the TODO(charon) at line 65 about "no deadline supported vs already expired" is exactly the same hole. Worth a comment acknowledging this, or filing a follow-up.

🟢 SignedDataSet type-erased rewrite (crates/core/src/types.rs)

The change from SignedDataSet<T: SignedData> to SignedDataSet(HashMap<PubKey, Box<dyn SignedData>>) makes sense for aggsigdb's heterogeneous storage. Verified there are no current callers of the old generic API outside this PR, so no migration risk. The new IntoIterator for both owned and & flavors is a nice touch. Minor: the &'a SignedDataSet IntoIterator yields &'a Box<dyn SignedData> — Clippy may eventually nudge you to &'a dyn SignedData via .map(|(k, v)| (k, v.as_ref())), but that's an ergonomic call.

🟢 Tests are good — one fragility note

The tokio::task::yield_now().await; assert!(!reader.is_finished()) pattern (memory.rs:225-226, 320-321, 350-356, 388-389) is racy on loaded CI: a single yield doesn't guarantee the spawned reader has reached notified.await. In practice this rarely flakes because the executor schedules eagerly, but on contended runners it can. Consider replacing with a small tokio::time::sleep(Duration::from_millis(50)) or — better — a Notify-based "reader is parked" signal in the test. Same critique applies to Charon's runtime.Gosched(), so this isn't unique to Pluto.

The new test coverage (write/read, write_unblocks, cannot_overwrite, write_idempotent, eviction, write_unblocks_many, unrelated_write_does_not_unblock) is more thorough than Charon's V2 internal tests — nice work.

🔵 Nits

  • crates/core/src/aggsigdb/mod.rs has only pub mod memory; — consider a one‑line module doc describing the AggSigDB role (aggregated signed duty data store; key invariants).
  • MemDB::new calls tokio::spawn, so it requires a Tokio runtime to be active at construction. Worth a # Panics note on the rustdoc, since a non-test caller could trip this from a builder run outside an async context.
  • crates/core/src/aggsigdb/memory.rs:99wait_for could take &Duty instead of Duty by value; you only need an owned Duty to build the lookup key, which Clone covers. Minor ergonomics.
  • The 692/-222 diff in Cargo.lock looks like an unrelated cargo update (alloy 1.5→1.6, winnow 0.7→1.0, secp256k1 versioning, etc.). No new direct deps were introduced by the aggsigdb crate itself, so the churn isn't load-bearing, but you may want to land it as a separate commit/PR for cleaner bisection.

Summary

Severity Item
🟠 wait_for has no cancellation/timeout primitive — confirm intent
🟠 Deadliner::c() == None silently disables eviction — log or expect
🟡 tokio::sync::Mutex unnecessary; std::sync::Mutex is sufficient
🟡 Drop keysByDuty index makes eviction O(n) vs Charon's O(k)
🟡 notify_waiters fires even on no-op stores
🟡 deadliner.add before lock — re-confirm the eviction race comment
🔵 Doc/ergonomic nits, Cargo.lock churn separability

Nothing here is a blocker; the implementation is correct and matches Charon V2 semantics. Most actionable is the keysByDuty index (Charon left it in for a reason) and the Deadliner::c() == None warn/expect. The wait_for cancellation question is worth nailing down before downstream consumers pin against this API.
· Branch: emlautarom1/core-aggsigdb

@emlautarom1 emlautarom1 marked this pull request as draft May 21, 2026 00:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement core/aggsigdb

1 participant