Skip to content

fix: stabilize BFT block application and sync#769

Merged
github-actions[bot] merged 1 commit into
mainfrom
fix/bft-block-application-sync
Jun 3, 2026
Merged

fix: stabilize BFT block application and sync#769
github-actions[bot] merged 1 commit into
mainfrom
fix/bft-block-application-sync

Conversation

@satyakwok
Copy link
Copy Markdown
Collaborator

@satyakwok satyakwok commented Jun 3, 2026

Summary

Risk tier

Check ONE:

  • 🟢 Low — docs, tools/, tests/, CI configs, dependency patch bumps in dev-only crates, comments
  • 🟡 Medium — non-consensus production code (RPC handlers, network plumbing, observability, ops scripts)
  • 🟠 High — consensus-critical crates (sentrix-core, sentrix-trie, sentrix-staking, sentrix-bft), block_executor, apply_block_*, state_root path
  • 🔴 Critical — Voyager activation, fork-height changes, hard-fork rollouts, anything that flips env vars on mainnet

Required by tier

🟢 Low — minimum bar

  • CI green (tests + clippy + audit + gitleaks)

🟡 Medium — adds

  • New public function or behavioural change has at least one corresponding #[test] in same PR
  • Brief description of how this was tested (manual run, integration test, etc.)

🟠 High — adds

  • Regression test that fails on main and passes with this change — paste test name in PR body
  • Designed against documented invariant (link the audit/runbook/design doc)
  • Fresh-brain review by someone other than the author (per the consensus-change review checklist)
  • Single conceptual unit per PR (no bundling — bundling consensus changes burned us on v2.1.12 → 2026-04-25 livelock)

🔴 Critical — adds

  • Testnet rehearsal completed with success criteria + log evidence linked here
  • Bake window observed: minimum 2h on testnet at the same configuration before mainnet
  • Coordinated rollback plan documented in PR body — exact commands operator runs if it fails
  • Operator sign-off at activation moment (not just PR approval — separate moment for the actual flip)

Test plan

Rollback plan

Related

Summary by CodeRabbit

  • Bug Fixes
    • Improved block validation in the consensus loop to prevent applying stale or redundant blocks.
    • Fixed missing epoch bookkeeping in certain block processing paths to ensure consistency across the network.
    • Enhanced synchronization trigger behavior for better peer communication and block recovery.

@satyakwok satyakwok self-assigned this Jun 3, 2026
@github-actions github-actions Bot enabled auto-merge (squash) June 3, 2026 05:33
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 3, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

The validator consensus loop now pre-applies blocks on a scratch blockchain before embedding them into signed proposals. This pre-apply pattern applies across normal proposal construction, self-propose speculative N+1 paths, and peer-propose speculative N+1 paths; pre-apply failures abort proposal building. Peer-propose additionally validates that speculative blocks have matching validator addresses. SyncNeeded handlers are clarified with explicit "triggering block sync" logging and async trigger calls. In the block receive layer, gossipsub and direct NewBlock requests add stale-block guards to skip already-applied heights. NewBlock apply gains epoch boundary bookkeeping to match gossipsub apply behavior.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • sentrix-labs/sentrix#764: Introduces Blockchain::run_epoch_bookkeeping method that this PR's NewBlock apply path now calls in crates/sentrix-network/src/libp2p_node.rs.
  • sentrix-labs/sentrix#763: Refactors epoch-boundary handling across sync and gossip block-apply paths; this PR extends NewBlock direct apply with matching epoch bookkeeping behavior.
🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The pull request description is incomplete; the Summary section is empty, no risk tier is selected despite changes being consensus-critical (High tier), and required sections like Test plan, Rollback plan, and Related remain unfilled. Complete the Summary section with 1-3 sentences explaining the changes. Select and check the appropriate risk tier (High for consensus changes). Fill in the Test plan and, if High-tier, include test names and design doc references.
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: stabilizing BFT block application and sync behavior, which aligns with the core modifications across both main.rs and libp2p_node.rs.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/bft-block-application-sync

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@bin/sentrix/src/main.rs`:
- Around line 2888-2902: The peer-finalization branch is incorrectly gating
processing on blk.validator != wallet.address, causing peer proposals to be
skipped and forcing followers to wait for NewBlock/sync; remove that
local-validator gate so peer proposals proceed to the normal finalization path
(allow bc.add_block to run for proposed_block), and instead move any local-only
speculative validation into the speculative build/stash code (e.g., where
create_block_voyager stamps proposer and where speculative builds are handled).
Ensure you also remove or avoid calling lp2p_clone.trigger_sync() and break out
of the finalization flow for legitimate peer proposals so the block is applied
immediately.
- Around line 2415-2418: The speculative pre-build miss handler currently sets
speculative_proposal = None and then break; which aborts the enclosing BFT
action loop and prevents the already-committed block (added via
bc.add_block(blk)) from reaching the persistence/broadcast path; change the
behavior at both occurrences so that after detecting let Some(block) = block
else { ... } you clear speculative_proposal but do not break the outer loop —
use continue (or otherwise return to the top of the enclosing loop) so
processing proceeds to the finalized-block persistence/broadcast logic; update
both places referenced (the block unwrap paths around speculative_proposal) to
only clear speculative_proposal and continue rather than break.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 462ad508-56c1-4c6a-984b-3c8061485c16

📥 Commits

Reviewing files that changed from the base of the PR and between d8ac0b1 and 08f9b06.

📒 Files selected for processing (2)
  • bin/sentrix/src/main.rs
  • crates/sentrix-network/src/libp2p_node.rs

Comment thread bin/sentrix/src/main.rs
Comment on lines +2415 to +2418
let Some(block) = block else {
speculative_proposal = None;
break;
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't let speculative N+1 failure abort finalized-block dissemination.

At both sites, height N has already been committed via bc.add_block(blk), but this break exits the enclosing BFT action loop before the finalized block reaches the persistence/broadcast path starting at Line 2471 and Line 3048. A speculative pre-build miss should only clear speculative_proposal; otherwise an optional optimization failure turns into a local-only commit of the finalized block.

As per coding guidelines, "bin/sentrix/src/main.rs: CONSENSUS-CRITICAL — suggestions only, no destructive rewrites."

Also applies to: 3001-3004

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bin/sentrix/src/main.rs` around lines 2415 - 2418, The speculative pre-build
miss handler currently sets speculative_proposal = None and then break; which
aborts the enclosing BFT action loop and prevents the already-committed block
(added via bc.add_block(blk)) from reaching the persistence/broadcast path;
change the behavior at both occurrences so that after detecting let Some(block)
= block else { ... } you clear speculative_proposal but do not break the outer
loop — use continue (or otherwise return to the top of the enclosing loop) so
processing proceeds to the finalized-block persistence/broadcast logic; update
both places referenced (the block unwrap paths around speculative_proposal) to
only clear speculative_proposal and continue rather than break.

Comment thread bin/sentrix/src/main.rs
Comment on lines +2888 to +2902
if blk.validator != wallet.address {
tracing::info!(
target: "finalize_trace",
"BFT finalize peer-propose: h={} round={} block={:.16}… \
proposer={} is not local validator {}; waiting for \
libp2p NewBlock/sync instead of executing peer block \
in the BFT loop",
height,
round,
block_hash,
blk.validator,
wallet.address,
);
lp2p_clone.trigger_sync().await;
break;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Remove the local-validator gate from peer finalization.

proposed_block here is the block carried by a peer's proposal, so on every non-proposer validator blk.validator is expected to be the peer proposer's address, not wallet.address. Cross-file, create_block_voyager stamps the passed proposer address into Block.validator, so this branch will fire on the normal peer-finalize path, skip bc.add_block, and make followers depend on a later NewBlock/sync to advance. If the intent was to validate speculative locally-built N+1 blocks, that check needs to live on the speculative build/stash path instead of the current finalized block.

As per coding guidelines, "bin/sentrix/src/main.rs: CONSENSUS-CRITICAL — suggestions only, no destructive rewrites."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bin/sentrix/src/main.rs` around lines 2888 - 2902, The peer-finalization
branch is incorrectly gating processing on blk.validator != wallet.address,
causing peer proposals to be skipped and forcing followers to wait for
NewBlock/sync; remove that local-validator gate so peer proposals proceed to the
normal finalization path (allow bc.add_block to run for proposed_block), and
instead move any local-only speculative validation into the speculative
build/stash code (e.g., where create_block_voyager stamps proposer and where
speculative builds are handled). Ensure you also remove or avoid calling
lp2p_clone.trigger_sync() and break out of the finalization flow for legitimate
peer proposals so the block is applied immediately.

@github-actions github-actions Bot merged commit 6c9fdce into main Jun 3, 2026
17 checks passed
satyakwok added a commit that referenced this pull request Jun 3, 2026
v2.2.26 covers BFT block-hash consistency (Bug B fix via scratch-clone
pre-apply, #769), fork detection + /health 503 (#768), SyncNeeded
actually triggers sync, stale-block guards in gossip/direct-apply,
and missing run_epoch_bookkeeping in direct-apply path.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant