fix(consensus): centralize reward bookkeeping in apply path (fork-gated)#782
Conversation
Root cause of the 2026-06-03 testnet 4-way state drift (only PROTOCOL_TREASURY diverged, every other account identical): reward + liveness + epoch-record bookkeeping was done in 5 separate network/ finalize receive paths (libp2p gossip-apply / peer-apply / catch-up-sync + main.rs validator-finalize ×2), not in the deterministic block-apply path. Path coverage is uneven, so a block's reward got applied a per-node variable number of times → pending_rewards / delegator_rewards (and thus PROTOCOL_TREASURY = their sum) drifted → state_root divergence. Block-hash agreement masked it (state_root is excluded from the block hash). Fix: run the bundle (record_block_signatures + distribute_reward + epoch_manager.record_block) exactly once inside apply_block_pass2, keyed off the committed block's justification, before update_trie_for_block so it lands in this block's state_root. Every node applies it identically regardless of how the block arrived. Fork-gated by NATIVE-style gate REWARD_APPLY_PATH_HEIGHT (default u64::MAX on both nets): - pre-fork: the 5 external sites run it (bit-identical to today) - post-fork: apply_block_pass2 runs it once; the 5 external sites skip it The boundary is clean (block H-1 via external, H via apply-path — each once). run_epoch_bookkeeping (epoch-boundary rotation) is intentionally left on the receive paths — out of scope for this fork. Consensus-changing → stays off until an activation height is pinned (halt-all + state-root-aligned + simul-start). Does NOT itself re-converge a drifted chain; that's a separate recovery (cp canonical chain.db). Tests: fork gate default-disabled both nets + activates at pinned height; apply-path helper credits pending_rewards (and each call = one distribution, so caller must run once); no-op without justification.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
…gate Extends the apply-path bookkeeping fix to the other half of the bundle: run_epoch_bookkeeping (epoch-boundary active-set rotation, unbonding release, liveness slashing) was also called from all 5 network/finalize receive paths. It is NOT idempotent — advancing epoch_number, pushing history, and slashing — so a per-node-variable application count corrupts epoch_state (trie-committed) and double-slashes. Same multipath drift class as the reward bundle. Post REWARD_APPLY_PATH_HEIGHT it runs once in apply_block_pass2, right after the reward bundle (so it sees fresh liveness counts + pre-rotation active_set, matching the external ordering) and before update_trie_for_block. The 5 external run_epoch_bookkeeping calls are gated to pre-fork only. One gate covers the full per-block bookkeeping (reward + liveness + epoch + slashing) so it all activates atomically and coherently.
|
Extended (commit 61d4a2e): the same gate now also centralizes |
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Plus Run ID: ⛔ Files ignored due to path filters (2)
📒 Files selected for processing (6)
📝 WalkthroughWalkthroughThis PR introduces a fork-height gate Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Draft — do not merge/deploy/activate. Consensus-changing; ships off-by-default.
The real fix for the reward-escrow determinism bug that caused the 2026-06-03 testnet 4-way state drift (root cause in
audits/reward-distribution-flow-audit-2026-04-27.mdlineage).Root cause
Reward + liveness + epoch-record bookkeeping ran in 5 separate network/finalize receive paths (
libp2p_node.rsgossip-apply / peer-apply / catch-up-sync +main.rsvalidator-finalize ×2), not the deterministic block-apply path. Uneven path coverage → a block's reward applied a per-node-variable number of times →pending_rewards/delegator_rewards(andPROTOCOL_TREASURY= their sum) drift →state_rootdivergence. Confirmed on the stalled testnet: onlyPROTOCOL_TREASURYdiverged across all 4 validators; every other account was byte-identical.Fix
Run the bundle (
record_block_signatures+distribute_reward+epoch_manager.record_block) exactly once insideapply_block_pass2, keyed off the committed block's justification, beforeupdate_trie_for_block(so it lands in this block'sstate_root). Every node applies it identically regardless of receive path.New
apply_reward_bookkeeping_for_latest_block()mirrors the external sites exactly (proposer, justification precommitstake_weight,get_block_reward(), fee_share=0). Bundle order is benign (3 independent structures).Fork gate —
REWARD_APPLY_PATH_HEIGHT(defaultu64::MAX, both nets)apply_block_pass2runs it once; the 5 external sites skip itrun_epoch_bookkeeping(epoch-boundary rotation) intentionally stays on the receive paths — out of scope for this fork.Scope / safety
chain.db). It prevents future drift.Files
fork_heights.rs— gate + tests ·blockchain.rs— wrapperblock_executor.rs— apply-path hook +apply_reward_bookkeeping_for_latest_block+ testslibp2p_node.rs×3,main.rs×2 — gate external sites to pre-fork onlyTests
Gate default-disabled both nets + activates at pinned height; apply-path helper credits
pending_rewards(each call = one distribution → caller must run once-per-block); no-op without justification. Existing SRC-20/NFT/staking/fingerprint suites unchanged.Commands
cargo test --workspace✅ (67 suites, 0 fail) ·cargo test -p sentrix-nft✅ ·cargo fmt --check✅ ·cargo clippy --workspace --all-targets -D warnings✅No deploy, no activation, no fork enabled.
Summary by CodeRabbit
Bug Fixes
Chores