feat: dutydb by mskrzypkows · Pull Request #415 · NethermindEth/pluto

mskrzypkows · 2026-05-14T13:02:19Z

No description provided.

mskrzypkows · 2026-05-14T13:02:33Z

claude · 2026-05-14T13:03:00Z

Claude finished @mskrzypkows's task in 4m 38s —— View job

Review — Pluto DutyDB (PR #415)

Overall: a solid, readable port of Charon's core/dutydb/memory.go. The high‑level structure, error variants and field layout match the Go source well, and the Notify+RwLock swap for Go's Mutex+query‑queue model is a sensible Rust idiom that I'd lean into. A few correctness / parity points are worth addressing before merging, and test coverage has some real gaps versus the Go suite. Comments grouped by severity.

1. Correctness / parity with Charon

(a) deadliner.add is awaited while holding the state write lock
memory.rs:219-224

pub async fn store(&self, duty: Duty, unsigned_set: UnsignedDataSet) -> Result<()> {
    let mut state = self.state.write().await;
    if !self.deadliner.add(duty.clone()).await {
        return Err(Error::ExpiredDuty);
    }

Go's MemDB.Store calls db.deadliner.Add(duty) synchronously under a sync.Mutex, so the lock is held for a non‑blocking instant. In Rust, Deadliner::add is async fn and the concrete Deadliner (crates/core/src/deadline.rs:258) does its own locking and channel send (tx.send(...).await). Holding the dutydb write lock across that .await blocks all readers/writers of the DutyDB on the deadliner's progress, which is a deadlock/contention foot‑gun (and worse than Go).

Recommend either:

call deadliner.add(duty.clone()).await before acquiring the write lock, then acquire and proceed; or
make Deadliner::add non‑async (Go's signature is sync).

(b) Dead clash check in store_agg_attestation
memory.rs:510-527

The key includes the tree‑hash root of the att data, and the "clash" check compares existing_data.tree_hash_root().0 != root. Because we just looked the entry up by that root, the two sides are always equal, so ClashingDataRoot is unreachable. This faithfully mirrors a bug already in memory.go:458-460, but worth flagging: this branch never fires and should either (i) be removed, or (ii) the key changed (e.g. slot only) so a real clash is detectable. Either is a behavior change vs. Go and warrants a separate decision.

(c) Aggregated attestation duty is always re‑inserted even when already present
memory.rs:521-524

} else {
    self.agg_keys_by_slot.entry(slot).or_default().push(key);
}
self.agg_duties.insert(key, agg.clone());

Matches Go (memory.go:462), but the .clone() on every duplicate store is wasted work — consider moving the insert into the else branch if Go's overwrite is not actually load‑bearing.

(d) await_* have no per‑call cancellation
memory.rs:289-307

Go's AwaitProposal(ctx, slot) returns ctx.Err() if the caller cancels. The Rust versions only honor the DB‑global CancellationToken. Callers can tokio::time::timeout(...) or abort the task to drop the future, but the API surface is narrower than Go's. I'd add a CancellationToken (or a tokio::time::Duration timeout) parameter to bring parity — otherwise it's easy for higher layers to leak tasks waiting forever.

(e) Expired duties are not drained when store returns early with an error
memory.rs:226-283

Same as Go (early return err in the switch skips the drain loop), so this is parity, but worth noting because a transient clash error will defer expiry processing until the next successful store. Not blocking.

(f) BuilderProposer and other unsupported duty types are still registered with the deadliner

deadliner.add(duty) runs before the switch, so a duty whose type later returns DeprecatedDutyBuilderProposer / UnsupportedDutyType ends up tracked by the deadliner and silently expires (delete_duty rejects unknown types again). Again, this is parity with Go, but it's a minor wart that has been carried over.

2. Concurrency model

(a) await_data thundering herd. memory.rs:341-367

notify_waiters() wakes every waiter on the channel, regardless of whether their specific key was the one stored. Each woken task then takes the read lock and does a hashmap lookup. For a cluster with many concurrent await_attestation calls in flight (one per committee), every Store will wake them all. Go's *Queries slice resolves only the entry whose key matches, with no spurious wakeups. Probably fine in practice given the scale, but worth being aware of — if it shows up as latency, a per‑key channel or a tokio::sync::watch keyed map is a clean fix.

(b) Notify::notified() registration pattern is correct. The pin! + enable() + lookup + select! ordering avoids the classic missed‑wakeup race. Nice.

(c) State.deadliner_rx: Option<...> is held inside State so that try_recv happens under the write lock. The Option/take() workaround stems from Deadliner::c() returning the receiver only once. It would read more naturally as a dedicated Mutex<Option<Receiver<Duty>>> field on MemDB (or simply on State) — keeping it in State works but the optional is awkward.

(d) MemDB::shutdown(&self) is idempotent via CancellationToken, which is an improvement over Go's "may only be called once" requirement. Worth mentioning in the doc comment.

3. API / style

Naming collision. pluto_core::dutydb::MemDB and pluto_core::parsigdb::memory::MemDB are both reachable. Not a compile error, but consider DutyMemDB / ParSigMemDB to make imports unambiguous.
State methods are sync but mutate &mut self — they implicitly assume the caller holds the write lock. Go names them *Unsafe for exactly this reason; a one‑line // caller holds the write lock doc on each method (or splitting State into its own private module) would help future readers.
Doc comment on the commIdx=0 compatibility block memory.rs:466-468 is shorter than Go's. The Go version explains why both index variants are stored (post‑Electra default + buggy VCs); the Rust comment is a one‑liner. Worth porting the full rationale — this code looks pointless without the context.
UnsignedDutyData::Proposal(Box<VersionedProposal>) — good call to box the variant; the other variants are much smaller.
store_agg_attestation / store_sync_contribution differ in shape: agg writes to agg_keys_by_slot inside the else and to agg_duties outside; contrib writes both inside the else. Pick one style — the contrib version is the safer one (no overwrite when content matches by key).

4. Tests

The new suite covers shutdown, attester happy path, unsupported duty types, proposer, sync contribution, clashes for sync/blocks/proposer, and expiry. Compared to charon/core/dutydb/memory_test.go, the following are missing — I'd add at least the first three before merge:

Aggregator end‑to‑end test (Go's TestMemDBAggregator) — await_agg_attestation and store_agg_attestation are completely uncovered.
ClashingPublicKey — two different pubkeys for the same (slot, commIdx, valIdx).
ClashingAttestationData (and the commIdx=0 source / target variants) — different phase0::AttestationData for the same (slot, commIdx).
Await blocks then resolves: the mem_db_proposer test does this for proposals; consider replicating for attester / aggregator / sync.
Shutdown wakes an already‑waiting await_* — current tests call shutdown before the await; the more interesting case is a waiter parked in select!.
The clashing_sync_contributions test mutates signature to force a different tree‑hash root — it would be more explicit to construct two semantically distinct contributions, since "signature affects root" is only true today and depends on the SSZ schema.

The TestDeadliner helper uses std::sync::Mutex inside async fn add (memory.rs:651-655). The critical section is trivially short so this is fine, but parking_lot::Mutex or restructuring would make it impossible to misuse later.

5. Nits

memory.rs:155-170: missing doc comment on State.
memory.rs:172-184: the MemDB doc could explicitly say "functionally equivalent to charon/core/dutydb.MemDB" — there's a comment in the file header but not on the type.
memory.rs:104-105: consider re‑exporting Result privately; many downstream callers will conflict with std::result::Result if they use pluto_core::dutydb::*.
memory.rs:530-552: the temporary binding let inner = &contrib.0; could go away if SyncContribution had a Deref or a data() accessor — minor.

Summary

One real concern: deadliner.add is awaited under the dutydb write lock (1a) — please move it outside or make it sync.
Behavior matches Go closely; the small parity surprises (unreachable agg clash check, missing per‑call cancellation, deadliner add on rejected duties) are inherited from Charon — flag them in the PR description so we don't pretend they were intentional Rust choices.
Add aggregator + clash tests to bring the suite in line with Go.

Nothing here blocks merge other than (1a) and the missing tests.

Read PR diff and changed files
Cross-reference Charon Go source (dutydb)
Functional correctness review (parity with Go)
Concurrency/locking review
Rust style / API surface review
Test quality review
Post final review
• Branch: dutydb

varex83agent

test

varex83agent

t

varex83agent

t

varex83agent

t

varex83agent

t

varex83agent · 2026-05-15T10:44:03Z

+impl MemDB {
+    /// Creates a new in-memory DutyDB.
+    pub fn new(deadliner: Arc<dyn Deadliner>, cancel: CancellationToken) -> Self {
+        let deadliner_rx = deadliner.c();


Major — silent memory leak if the deadliner channel has already been consumed.

let deadliner_rx = deadliner.c();

Deadliner::c() documents itself as "may only be called once and returns None on subsequent calls" (deadline.rs:114-118). If a caller wires the deadliner into two components (or some intermediate already took the receiver), deadliner_rx is silently None and the drain loop at memory.rs:276-285 short-circuits via None => break on every store(). Expired duties never get cleaned up; the in-memory maps grow without bound for the lifetime of the process — no error, no log.

Recommend one of:

Make MemDB::new fallible: pub fn new(...) -> Result<Self> and return Err when c() returns None.

Take the receiver directly: pub fn new(deadliner, deadliner_rx, cancel) so the wiring is statically obvious.

At minimum, panic with a descriptive message.

Note: the test fixture NoopDeadliner (line 614-621) intentionally returns None, but it also never Adds anything to a queue, so the no-cleanup is safe for tests. Real wiring must yield Some(rx) exactly once.

varex83agent

t

varex83agent · 2026-05-15T10:44:04Z

+    root: phase0::Root,
+}
+
+struct State {


Major — no upper bound on stored entries; resource-exhaustion vector.

State has eight maps (att_duties, att_pub_keys, att_keys_by_slot, pro_duties, agg_duties, agg_keys_by_slot, contrib_duties, contrib_keys_by_slot) whose only cleanup path is the deadliner drain at memory.rs:276-285. There is no cap on (a) how many slots can be tracked simultaneously, (b) how many committees/validators per slot, or (c) how far in the future a slot can be.

With u64 slot space, a peer who can inject duties into the consensus pipeline that feeds store() can submit duties for far-future slots whose deadlines (genesis_time + slot*slot_duration + ...) are far in the future, pinning entries for arbitrary durations. Combined with the deadliner mpsc full-channel-drops behaviour (see other finding), even near-term entries can leak.

Recommend:

A configurable cap on entries per map, with Err at store() time when exceeded.

A maximum permitted (slot - current_head_slot) delta enforced at store().

Sanity bounds on AttesterDuty.committee_length and committees_at_slot (signeddata.rs:1124-1133) which are currently stored verbatim with no validation.

varex83agent

t

varex83agent · 2026-05-15T10:44:05Z

+        }
+
+        // Drain all expired duties that the deadliner has sent.
+        loop {


Major — deadliner full-channel drops cause permanent map leaks.

Cleanup of *_duties / *_keys_by_slot only runs inside this drain loop, consuming from deadliner_rx. The deadliner's output channel is a bounded mpsc of capacity 256 (deadline.rs:421-426). When full, deadline.rs:368-380 does a try_send whose Full arm logs a tracing::warn! and drops the expiry event. The deadliner also removes the duty from its internal HashSet (deadline.rs:383), so the expiry is permanently lost — never retried, never delivered.

Result: the matching entries in att_duties, att_pub_keys, att_keys_by_slot, pro_duties, agg_duties, agg_keys_by_slot, contrib_duties, contrib_keys_by_slot are never cleaned up. They persist until process restart.

Triggers: many duties expiring near-simultaneously across types (a peer who submits duties for hundreds of distinct slots × 4 duty types can produce > 256 expiries in a burst), or any store() call blocking long enough on the write lock for the channel to back up.

Fixes to consider in this PR:

Have MemDB run a dedicated background task that consumes from deadliner_rx independently of store(), so drain rate doesn't depend on call rate.

Replace try_send in the deadliner with a bounded blocking send (with a watchdog for slow consumers).

This affects the upstream deadliner too, so it may warrant a separate ticket; flag it here because the consequence shows up in DutyDB.

varex83agent

t

varex83agent

t

varex83agent · 2026-05-15T10:44:07Z

+                },
+                None => break,
+            };
+            state.delete_duty(expired)?;


store() propagates delete_duty errors that are unrelated to the current store.

loop { let expired = match state.deadliner_rx { Some(ref mut rx) => match rx.try_recv() { Ok(d) => d, Err(_) => break, }, None => break, }; state.delete_duty(expired)?; }

try_recv() removes the duty from the channel before delete_duty is called. If delete_duty returns Err(Error::DeprecatedDutyBuilderProposer) (line 563) or Err(Error::UnknownDutyType) (line 589), the loop terminates with that error propagated to the caller of store(). From the caller's perspective the current store failed — but storage already succeeded — and the expired duty has been silently dropped from the channel without its maps being cleaned up.

Go has the same shape (memory.go:130-147), so this is inherited; but the Rust port can harden it cheaply:

After the successful storage above, do not bubble drain-loop errors to the store() caller — tracing::error! them instead, since the current store has nothing to do with the failure.

The two error variants reachable here are both bugs (BuilderProposer is rejected at line 220 before any deadliner add; UnknownDutyType is unconstructable from Duty::new callers) — error!-and-continue is the right move.

Also minor: Err(_) collapses TryRecvError::Empty (normal break) and TryRecvError::Disconnected (deadliner task died, abnormal). Consider distinguishing them and logging Disconnected at error!.

varex83agent

t

varex83agent · 2026-05-15T10:44:08Z

+            return Err(Error::DeprecatedDutyBuilderProposer);
+        }
+
+        if !self.deadliner.add(duty.clone()).await {


deadliner.add() runs outside the state lock — race past deadline before insert.

if !self.deadliner.add(duty.clone()).await { return Err(Error::ExpiredDuty); } let mut state = self.state.write().await;

Go (memory.go:73-77) holds the mutex across deadliner.Add(duty) and the subsequent storage. Rust calls deadliner.add().await before acquiring the state write lock.

For the in-tree DeadlinerImpl and the test fakes this is benign (add is idempotent and concurrency-safe). The real risk: if add() succeeds and the .await between these two lines exceeds the duty's remaining lifetime (mainnet margin is slot_duration/12 ≈ 1s), the deadliner can schedule and fire an expiry before the data has been inserted. The drain loop will then no-op (HashMap::remove on absent keys is fine), then the subsequent insert under the write lock will store data that is already past its deadline, with no further cleanup until another store() covers that slot.

Not directly exploitable to cause memory growth (one duty leaked until the next call covering that slot), but it does widen the leak window from the deadliner-full finding. Either acquire the write lock first and call add() inside the critical section (matches Go), or document the intentional deviation in a source comment.

varex83agent

t

varex83agent · 2026-05-15T10:44:10Z

+                    Some(UnsignedDutyData::Proposal(p)) => state.store_proposal(p)?,
+                    Some(_) => return Err(Error::InvalidVersionedProposal),
+                }
+                self.pro_notify.notify_waiters();


notify_waiters thundering herd — O(N) wakeups per store vs Go's O(M) per matching query.

store() calls self.X_notify.notify_waiters() on each successful arm (lines 240, 250, 260, 270). All readers blocked on the same notify type wake up, race for the read lock, re-run their lookup closure, and only the matching one returns; the rest re-register on a fresh notified() and re-sleep.

Go (memory.go:551-635 resolve*QueriesUnsafe) keeps one query record per pending await and only resolves queries whose key actually matches the newly stored item — O(M) per store where M is the number of pending queries.

For a validator running many concurrent attesters/aggregators this is a real amplification on every store(). Not a correctness bug, but a measurable efficiency regression vs the Go design. Long-term fix: keyed notify (HashMap<Key, Vec<oneshot::Sender<V>>>) or match Go's per-query record + resolver pattern. At minimum, note the design choice in a source comment so a future optimisation pass knows what to revisit.

mskrzypkows · 2026-05-15T15:55:51Z

@varex83agent review again

iamquang95 · 2026-05-18T10:01:05Z

+        if let Some(existing) = self.att_duties.get(&att_key) {
+            if existing.source != data.source
+                || existing.target != data.target
+                || existing.beacon_block_root != data.beacon_block_root


missing data.index and data.slot

It's missing because we create att_key from slot and index, so there is no reason to compare it.

But the key is from committee_index, that's no guarantee on att.data.index == att.duty.committee_index right?

varex83agent

Solid first pass at porting core/dutydb/memory.go. The functional behaviour matches Charon for the happy paths and the test coverage hits most of the Go scenarios. Three findings push this to request-changes:

store() holds the write lock across deadliner.add().await — a multi-await section under tokio RwLock that blocks all readers/writers behind a channel round-trip to the deadliner background task.
Drop for MemDB cancels a caller-supplied CancellationToken — diverges from Go's Shutdown semantics and will propagate cancellation to every sibling subsystem that shares the token.
store_agg_attestation unconditionally overwrites the stored VersionedAggregatedAttestation even though only the AttestationData root is in the key — fields like validator_index and the signed payload variant can differ between callers with the same root, and last-write-wins. Combined with the dead Error::ClashingDataRoot variant this is a behaviour change relative to Charon.

Minor parity gaps: await_agg_attestation drops the slot parameter from the Go API, await_* methods have no per-call cancellation (Go takes ctx.Context), AggKey drops the slot field. Several test assertions use fragile err.to_string().contains(...) instead of matches!.

Nothing here is a slashing-risk regression (the commIdx=0 compat path's source/target-only check is the slashing-safe choice). Verdict: REQUEST_CHANGES on the three majors; the rest are minor/nit comments left for author discretion.

varex83agent · 2026-05-18T12:49:16Z

+
+        let mut state = self.state.write().await;
+
+        if !self.deadliner.add(duty.clone()).await {


Write lock held across deadliner.add().await — readers and writers are starved on every store.

Line 258 takes self.state.write().await. Line 260 then awaits self.deadliner.add(duty.clone()).await. DeadlinerImpl::add (crates/core/src/deadline.rs:258-274) does an mpsc::Sender::send().await plus a oneshot::Receiver::await against a single background task — that's two awaits while the write lock is held. Under tokio's writer-preferring RwLock this blocks ALL concurrent await_proposal / await_attestation / pub_key_by_attestation / other store calls until the deadliner task drains its mpsc input and replies.

Charon's deadliner.Add (charon/core/dutydb/memory.go:76) is synchronous and non-blocking, so holding the equivalent db.mu.Lock() across it is cheap. The Rust port turns a fast critical section into a multi-hop async operation under the lock.

Fix: call the deadliner before acquiring the write lock —

if !self.deadliner.add(duty.clone()).await { return Err(Error::ExpiredDuty); } let mut state = self.state.write().await;

This preserves parity with Go (Add then mutate) and keeps the in-memory mutation in a fast non-awaiting critical section.

varex83agent · 2026-05-18T12:49:16Z

+
+impl Drop for MemDB {
+    fn drop(&mut self) {
+        self.cancel.cancel();


Drop cancels a caller-supplied CancellationToken — propagates beyond the dutydb.

MemDB::new accepts cancel: CancellationToken from the caller (line 219). The whole point of CancellationToken is shareability — that token is almost always a clone of an app-wide shutdown token wired to the deadliner background task and potentially other subsystems (p2p, relay, …). Drop unconditionally calls self.cancel.cancel() on line 424.

If a MemDB is dropped before the explicit shutdown — construction error in a sibling, panic during init, replaced in a test fixture, error path that unwinds the stack — this Drop kills every component that shares this token, including the deadliner the MemDB references.

Charon's Shutdown only closes an internal chan struct{} (charon/core/dutydb/memory.go:67-69); dropping the DB cannot affect external components.

Pick one:

Take the parent CancellationToken and store cancel.child_token() internally — Drop cancels only the child.

Remove Drop and rely on the explicit shutdown() (whose doc already implies it is the lifecycle hook).

Document the contract on new if option 2 is undesirable.

Option 1 is the most defensive and matches the pattern used elsewhere in the codebase.

varex83agent · 2026-05-18T12:49:16Z

+        }
+        // we don't check existingDataRoot != providedDataRoot because these values
+        // comes from the same source and the error was unreachable
+        self.agg_duties.insert(key, agg.clone()); // unconditional overwrite


Unconditional overwrite + dead Error::ClashingDataRoot — Go-divergent for fields that aren't in the tree-hash key.

store_agg_attestation keys by att_data.tree_hash_root() (line 580/584) but the stored value is the full VersionedAggregatedAttestation, which carries validator_index and an AttestationPayload variant (Phase0/Altair/.../Fulu). Neither field is part of the AttestationData tree-hash root, so two callers with the same root but a different validator_index (or a different payload variant — e.g. one producer emits an Electra payload, another a Fulu) will silently overwrite each other; the last store_agg_attestation wins for every awaiter on that root.

The comment at lines 588-589 ("these values come from the same source and the error was unreachable") justifies removing the clash check, but the justification only holds for the root-only equality — not for the payload fields stored alongside.

Charon (charon/core/dutydb/memory.go:435-466) computes providedData.HashTreeRoot() and existingData.HashTreeRoot() and returns errors.New("clashing data root") if they differ. The Rust port has the analogous variant Error::ClashingDataRoot (memory.rs:91) but it is never constructed anywhere in the workspace — dead code.

Please either (a) restore an equality check against the full stored payload (signatures included) so two callers that disagree see an error, or (b) document why payload differences are guaranteed never to occur in Pluto's producer wiring AND remove Error::ClashingDataRoot to avoid a dead public variant.

varex83agent · 2026-05-18T12:49:17Z

+    };
+
+    /// Deadliner that always accepts duties and never expires them.
+    pub(crate) struct NoopDeadliner;


nit: pub(crate) on NoopDeadliner (line 681), TestDeadliner (line 697), and its methods (lines 704, 714) is redundant — they live inside a private mod tests, so nothing outside this file's test module can reach them anyway. Drop the modifiers.

varex83agent · 2026-05-18T12:49:17Z

+    /// Stores unsigned duty data for the given duty, waking any pending
+    /// waiters.
+    pub async fn store(&self, duty: Duty, unsigned_set: UnsignedDataSet) -> Result<()> {
+        if duty.duty_type == DutyType::BuilderProposer {


nit: BuilderProposer short-circuits BEFORE deadliner.add (line 255-256), but Charon's Store hits case core.DutyBuilderProposer: return errDeprecated (charon/core/dutydb/memory.go:95-96) AFTER deadliner.Add(duty) on line 76. The duty type is deprecated so this rarely matters, but it is a documented order-of-operations divergence. Either move the check inside the match arm after the deadliner call (to match Go) or leave a comment that the pre-empt is intentional.

varex83agent · 2026-05-18T12:49:17Z

+
+/// Versioned aggregated attestation (unsigned).
+#[derive(Debug, Clone, PartialEq, Eq)]
+pub struct VersionedAggregatedAttestation(pub versioned::VersionedAttestation);


nit: VersionedAggregatedAttestation(pub versioned::VersionedAttestation) and SyncContribution(pub altair::SyncCommitteeContribution) (line 1160) both expose their inner field as pub, so callers reach in via .0 (e.g. contrib2.0.aggregation_bits = ... in the dutydb tests). Public fields on newtype wrappers defeat the wrapper's purpose. Either drop the wrappers and use the inner type with a pub type alias, or keep them and expose pub fn inner(&self) -> &T / into_inner(self) -> T with the field private.

iamquang95

other part lgtm

iamquang95 · 2026-05-20T11:30:46Z

+
+impl MemDB {
+    /// Creates a new in-memory DutyDB.
+    /// cancel: cancellation token that shuts down the DB, it should be a child


Should we store cancel.child_token instead?

Hmm, yes seems good idea, it will guarantee that we won't cancel global one.

iamquang95 · 2026-05-20T11:37:11Z

+        if let Some(existing) = self.att_duties.get(&att_key) {
+            if existing.source != data.source
+                || existing.target != data.target
+                || existing.beacon_block_root != data.beacon_block_root


But the key is from committee_index, that's no guarantee on att.data.index == att.duty.committee_index right?

varex83agent

Third-pass multi-agent review of #415. The PR already has ~60 comments from two prior /review-pr runs and three human reviewers, so this pass is filtered to novel findings only — no duplicates of issues already raised. All findings here are minor- or nit-level; the prior CHANGES_REQUESTED verdicts already cover the bug-level items.

Themes in this pass:

Cancellation gap on the reader side — state.read().await parks readers with no shutdown escape if a writer hangs (compounds with the already-flagged writer-side lock-across-await issue).
Spurious notify_waiters cascades — empty unsigned_sets still fire wakeups and consume deadliner slots; tighten with an early-return.
Error variant overloading — Error::InvalidAggregatedAttestation and SignedDataError::Custom each conflate distinct failure modes, hurting operator triage and source-chain visibility.
Silent overwrite of wrapper fields in store_agg_attestation (aggregation_bits, signature, validator_index, payload variant) — only the AttestationData tree-hash root is checked.
Compat namespace asymmetry — the relaxed clash check in store_att_compat_commidx0 lets a compat-mirror entry poison the real committee_index=0 slot.
Test fixture monoculture — every aggregator/proposer test uses Phase0 payloads, leaving post-merge match arms (Bellatrix/Deneb/Electra/Fulu) dead in CI.

Verdict: COMMENT (no new bug/major items above what's already been raised — leaving the prior CHANGES_REQUESTED standing).

varex83agent · 2026-05-20T13:40:21Z

+            tokio::pin!(notified);
+
+            {
+                let state = self.state.read().await;


Reader-side shutdown DoS: state.read().await ignores self.cancel.

Both await_data (here) and pub_key_by_attestation (line 426) park unconditionally on self.state.read().await before entering any select! on self.cancel. If a writer holds the write lock for an extended period — most plausibly via the already-flagged deadliner.add().await pattern at lines 273-275 — every reader is stuck on the lock with no shutdown escape.

When shutdown() is called, cancel.cancel() wakes futures inside the tokio::select! on line 410, but not futures still parked on RwLock::read().await. tokio's RwLock is not cancellation-driven; it only wakes on lock release.

The shutdown contract documented at lines 259-260 ("signalling all current and future await_* calls to return Error::Shutdown") is violated whenever a writer hangs. The underlying writer-side fix (release the lock before any .await) is the right long-term solution, but defending the reader independently is cheap:

let state = tokio::select! { biased; _ = self.cancel.cancelled() => return Err(Error::Shutdown), s = self.state.read() => s, };

Apply uniformly to await_data and pub_key_by_attestation. Defense-in-depth that survives any future writer-side regression.

varex83agent · 2026-05-20T13:40:21Z

+                    Some(UnsignedDutyData::Proposal(p)) => state.store_proposal(p)?,
+                    Some(_) => return Err(Error::InvalidVersionedProposal),
+                }
+                self.proposer_notify.notify_waiters();


notify_waiters fires even when nothing was inserted — wakes everyone for a no-op store.

When unsigned_set is empty for DutyType::Proposer, the match at line 284 takes the None arm and skips storage, but self.proposer_notify.notify_waiters() still runs here. All parked await_proposal futures wake, re-acquire the read lock, find nothing, re-park. The same shape applies to Attester/Aggregator/SyncContribution arms (lines 299, 309, 319) — empty unsigned_set means the for loop body never runs but the notify still fires.

Also: the empty-set branch silently consumes a deadliner slot at line 275 for a duty that contributes no data, which compounds the already-flagged deadliner-full leak.

Cheapest fix: early-return at the top of store() with if unsigned_set.is_empty() { return Ok(()); } before deadliner.add(). Or, less invasive, gate each notify on a bool changed tracked inside the arm. Either avoids cascading wake/re-park cycles on no-op stores (which happen in retry paths).

varex83agent · 2026-05-20T13:40:21Z

+    }
+
+    fn store_agg_attestation(&mut self, agg: &VersionedAggregatedAttestation) -> Result<()> {
+        let att_data = agg.data().ok_or(Error::InvalidAggregatedAttestation)?;


Error::InvalidAggregatedAttestation overloads two unrelated failure modes.

Line 615 returns Error::InvalidAggregatedAttestation when agg.data() returns None — the inner versioned::VersionedAttestation.attestation field is absent (per signeddata.rs:174-176). But the same variant is returned at line 305 in store() for a different reason: the outer UnsignedDutyData variant was wrong.

Go's analogous failure mode (memory.go:413-415) covers only the wrapper-type mismatch — the eth2 client lib's VersionedAttestation has a recognised Version always, so missing-payload isn't an issue there. The Pluto wrapper allows attestation: None, which is a new failure mode without a Go analogue.

Operators reading logs or pattern-matching on Error can't tell the two cases apart. Split: Error::AggregatedAttestationMissingPayload for the line-615 case, or — if missing-payload is a valid "nothing to store" sentinel — early-return Ok(()) instead of erroring.

varex83agent · 2026-05-20T13:40:21Z

+        }
+        // we don't check existingDataRoot != providedDataRoot because these values
+        // come from the same source and the error was unreachable
+        self.aggregation_duties.insert(key, agg.clone()); // unconditional overwrite


store_agg_attestation silently overwrites wrapper fields that aren't in the tree-hash root.

The comment at lines 627-628 ("values come from the same source") is correct only for AttestationData. The unconditional insert here replaces fields not covered by AttestationData::tree_hash_root():

aggregation_bits (different aggregators may include different bits)

signature (each aggregator produces its own)

validator_index in the outer versioned::VersionedAttestation

The AttestationPayload enum variant (Phase0 vs Altair vs Deneb vs Electra vs Fulu — same AttestationData root could be wrapped in different variants at a fork boundary)

Go (memory.go:462) also overwrites unconditionally, but Go's stored type is the eth2 client lib's wrapper which is largely stateless. A waiter blocked on a root could be resolved by an aggregator whose attestation has different aggregation_bits than what the caller expected.

Low-effort fix: add a doc comment on store_agg_attestation documenting that ONLY the AttestationData tree-hash root is checked. Better: tracing::warn! when overwrite changes wrapper fields (especially signature or aggregation_bits).

varex83agent · 2026-05-20T13:40:21Z

+
+        match duty.duty_type {
+            DutyType::Proposer => {
+                if unsigned_set.len() > 1 {


Pre-flight validation runs after acquiring the write lock and after deadliner.add() — cheap latency win to hoist it.

Line 273 grabs self.state.write().await, line 275 awaits self.deadliner.add(...) (multi-hop async against a background task), and only then line 281 validates unsigned_set.len() > 1. The length check is a pure caller-side invariant — no state required. Same applies to:

the wildcard _ => return Err(Error::UnsupportedDutyType) at line 321 (rejects 9 reachable variants per the mem_db_store_unsupported test)

the BuilderProposer arm already pre-empted at lines 269-271

Go (memory.go:82-84) also performs the check inside the switch, but Go's deadliner.Add is synchronous; the cost is trivial there. In Rust the deadliner round-trip can block the write-lock holder for tokio-scheduler-noticeable time, and unsupported duty types pollute the deadliner queue (their later expiry hits delete_duty's _ => return Err(Error::UnknownDutyType) arm and contaminates the next successful store via the drain loop on line 326 — distinct from but compounded by the already-flagged delete_duty propagation finding).

Hoist all pre-flight validations (unsupported types, len > 1, BuilderProposer) above line 273 so invalid requests never grab the write lock or the deadliner.

Test gap: mem_db_store_unsupported (line 911-942) uses NoopDeadliner, which discards everything. A TestDeadliner + expire() test would reproduce the cross-store contamination.

varex83agent · 2026-05-20T13:40:21Z

+            slot,
+            committee_index: 0,
+        };
+        if let Some(existing) = self.attestation_duties.get(&att_key0) {


store_att_compat_commidx0 shares attestation_duties's namespace with the relaxed clash check, letting a compat-mirror entry poison the real committee_index=0 slot.

store_att_data (line 521-535) rejects source/target/beacon_block_root mismatches. store_att_compat_commidx0 only compares source and target (by design, per the comment at lines 544-547). The asymmetry changes the meaning of clashes on the (slot, comm_idx=0) namespace because both paths write to the same map.

Concrete failure mode:

Validator V at slot S attests with duty.committee_index=5, beacon_block_root=A. The compat path inserts AttKey{S, 0} → data_A with no clash.

A second VC (or the same VC retrying with an updated head) then stores genuinely at committee_index=0 with beacon_block_root=B (same source/target — head shifted by one block during a late-slot store, common in practice).

store_att_data(S, 0, data_B) now fails with Error::ClashingAttestationData because existing.beacon_block_root != data.beacon_block_root.

The compat entry has poisoned the legitimate comm_idx=0 slot. Go has the same shape (memory.go:347-403), but the Rust port can harden it cheaply: store compat entries in a separate map (e.g. att_compat_commidx0: HashMap<(slot, val_idx), AttestationData>) instead of sharing attestation_duties. await_attestation(slot, 0) then falls back to the compat map only on a miss in the real map. The shared-namespace design conflates 'real comm=0 attestation' with 'compat mirror' — they should be logically distinct.

varex83agent · 2026-05-20T13:40:22Z

+
+        // Unlike Go implementation, we key by root only, slot field is redundant.
+        let key = AggKey { root };
+        if !self.aggregation_duties.contains_key(&key) {


nit: contains_key + unconditional insert triggers clippy::map_entry and double-hashes key.

The intent — gate only the secondary aggregation_keys_by_slot.push on absence while overwriting the primary map unconditionally — doesn't fit the standard Entry API. Rewrite to hash once and document the intent locally:

let first_seen = !self.aggregation_duties.contains_key(&key); self.aggregation_duties.insert(key, agg.clone()); if first_seen { self.aggregation_keys_by_slot.entry(slot).or_default().push(key); }

Same two map ops, but the variable name (first_seen) documents why the secondary push is conditional.

varex83agent · 2026-05-20T13:40:22Z

+            true
+        }
+
+        fn c(&self) -> Option<Receiver<Duty>> {


nit: NoopDeadliner::c() violates the documented Deadliner::c() contract ("may only be called once and returns None on subsequent calls", deadline.rs:114-118).

fn c(&self) -> Option<Receiver<Duty>> { let (_, rx) = channel(1); Some(rx) }

This returns a fresh Some(rx) on every call (with the sender dropped immediately, so the receiver yields Disconnected straight away). A fake that lies about the contract masks real misuse — if MemDB::new is ever rewritten to assume c() returns None on the second call, the tests will keep passing while production breaks.

Fix: track the taken-state like TestDeadliner does (rx: Mutex<Option<Receiver<Duty>>> + .lock().unwrap().take()), or have a single shared (tx, rx) pair owned by the fake.

varex83agent · 2026-05-20T13:40:22Z

+        slot: u64,
+        committee_index: u64,
+        validator_index: u64,
+    ) -> Result<PubKey> {


nit: PkKey field list appears twice in pub_key_by_attestation.

state .attestation_pub_keys .get(&PkKey { slot, committee_index, validator_index }) .copied() .ok_or(Error::PubKeyNotFound { slot, committee_index, validator_index })

Bind once at the top — matches the let key = AttKey { ... } / let key = ContribKey { ... } shape already used in the other await_* methods (lines 346, 360, 377):

let key = PkKey { slot, committee_index, validator_index }; let state = self.state.read().await; state .attestation_pub_keys .get(&key) .copied() .ok_or(Error::PubKeyNotFound { slot, committee_index, validator_index })

varex83agent · 2026-05-20T13:40:22Z

+        const COMM_IDX: u64 = 3;
+        const V_IDX: u64 = 7;
+
+        let agg = agg_attestation_fixture(SLOT, COMM_IDX, V_IDX);


nit: Test fixtures are Phase0-only — Altair/Bellatrix/Capella/Deneb/Electra/Fulu payload arms are dead in CI.

agg_attestation_fixture (lines 1020-1042) hardcodes version: DataVersion::Phase0 and AttestationPayload::Phase0. The match arms in AttestationPayload::data() for the other six fork variants are never exercised, despite being on the hot path through store_agg_attestation (line 615) and ProposalBlock::root()/slot() (signeddata.rs:260-293).

Go's TestMemDBAggregator (memory_test.go:227-255) explicitly uses RandomDenebCoreVersionedAggregateAttestation, and TestMemDBProposer uses RandomBellatrixBeaconBlock. Add at least one agg_attestation_fixture_electra(...) and one bellatrix_proposal(...) fixture, and re-run mem_db_aggregator/mem_db_proposer/mem_db_clash_proposer against them. Particularly relevant because Electra/Fulu wrap a different concrete electra::Attestation type than the Phase0 family.

ai

mskrzypkows added 7 commits May 13, 2026 15:42

Dutydb first attempt

9e54547

style improvements

d484a3b

await_data method

33c0a21

rwlock instead of mutex

cfc77a6

renamed variables

d4e9e4a

State struct methods

5303946

improved test

0dce2ad

mskrzypkows changed the title ~~Dutydb~~ feat: dutydb May 14, 2026

mskrzypkows added 7 commits May 14, 2026 15:44

Locking state after adding duty to the deadliner in store function

fd6b97d

fixed Dead clash check in store_agg_attestation

d62c106

store_agg_attestation inserting only when not slot present

84e8fe3

fmt

90bcb60

return early on DeprecatedDutyBuilderProposer

cb8a5a6

redundant enable

5719029

removed Option from the State, deadliner field.

923d9cb

varex83 linked an issue May 15, 2026 that may be closed by this pull request

Implement core/dutydb #171

Open

comment ported from charon

aab5845

varex83agent reviewed May 15, 2026

View reviewed changes

Comment thread crates/core/src/dutydb/memory.rs Outdated

varex83agent reviewed May 15, 2026

View reviewed changes

Comment thread crates/core/src/dutydb/memory.rs Outdated

varex83agent reviewed May 15, 2026

View reviewed changes

better implementation of the #1

a7d8a16

mskrzypkows marked this pull request as ready for review May 15, 2026 15:13

mskrzypkows requested review from iamquang95, varex83 and varex83agent May 15, 2026 15:13

mskrzypkows added 7 commits May 15, 2026 17:16

Impl Drop for MemDB

6836a4c

fixed comment 2

9933d7e

More detailed errors

fbd66ec

fix comment 11

63dc968

fix unneeded pub(crate)

9f5284f

comment

5f62b23

additional tests

b7946d5

iamquang95 reviewed May 18, 2026

View reviewed changes

varex83agent previously requested changes May 18, 2026

View reviewed changes

therustmonk reviewed May 18, 2026

View reviewed changes

Comment thread crates/core/src/dutydb/memory.rs Outdated

therustmonk reviewed May 18, 2026

View reviewed changes

Comment thread crates/core/src/dutydb/memory.rs Outdated

mskrzypkows added 8 commits May 19, 2026 08:58

Merge branch 'main' of github.com:NethermindEth/pluto into dutydb

249bd82

removed unused, renamed variables

a17b5e9

comment fixed

0628163

more comments fixed

2db5faf

comment updated

eefe1c9

more uses, removed export o Result

dd3a513

Checkpoint::default()

f113de9

renamed variables

2b2fe1f

mskrzypkows requested review from iamquang95 and therustmonk May 20, 2026 10:33

iamquang95 approved these changes May 20, 2026

View reviewed changes

varex83agent reviewed May 20, 2026

View reviewed changes


		let mut state = self.state.write().await;

		if !self.deadliner.add(duty.clone()).await {

Conversation

mskrzypkows commented May 14, 2026

Uh oh!

mskrzypkows commented May 14, 2026

Uh oh!

claude Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review — Pluto DutyDB (PR #415)

1. Correctness / parity with Charon

2. Concurrency model

3. API / style

4. Tests

5. Nits

Summary

Uh oh!

varex83agent left a comment

Choose a reason for hiding this comment

Uh oh!

varex83agent left a comment

Choose a reason for hiding this comment

Uh oh!

varex83agent left a comment

Choose a reason for hiding this comment

Uh oh!

varex83agent left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

varex83agent left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

varex83agent left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

varex83agent left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

varex83agent left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

varex83agent left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

varex83agent left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

varex83agent left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mskrzypkows commented May 15, 2026

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

varex83agent left a comment

Choose a reason for hiding this comment

Uh oh!

claude Bot commented May 14, 2026 •

edited

Loading