Skip to content

feat: surge actual real time proving#891

Draft
AnshuJalan wants to merge 14 commits intosurge-real-time-pocfrom
surge-real-time-proving
Draft

feat: surge actual real time proving#891
AnshuJalan wants to merge 14 commits intosurge-real-time-pocfrom
surge-real-time-proving

Conversation

@AnshuJalan
Copy link
Copy Markdown
Collaborator

No description provided.

@smartprogrammer93
Copy link
Copy Markdown
Contributor

@claude please review

1 similar comment
@smartprogrammer93
Copy link
Copy Markdown
Contributor

@claude please review

@claude
Copy link
Copy Markdown

claude bot commented Mar 10, 2026

Claude finished @smartprogrammer93's task in 2m 31s —— View job


PR Review: feat: surge actual real time proving

  • Gather context and read changed files
  • Review core architecture and design
  • Review L1 execution layer
  • Review L2 execution layer and Taiko integration
  • Review node and proposal manager
  • Review Raiko integration
  • Review documentation
  • Compile final review

Overview

This PR introduces a new realtime crate implementing a "Real Time Proving" mode for the Surge fork of Taiko. The architecture is sound overall — it reuses much of the existing Catalyst infrastructure (common L1, slot clock, operator, taiko driver) and adds a RealTimeInbox contract integration with synchronous ZK proof fetching from Raiko before on-chain proposal submission.


Critical Issues

🔴 Hardcoded Private Key — Must Fix Before Merge

realtime/src/node/proposal_manager/bridge_handler.rs:183-188

// Surge: Hard coding the private key for the POC
l1_call_proof_signer: alloy::signers::local::PrivateKeySigner::from_bytes(
    &"0xac0974bec39a17e36ba4a6b4d238ff944bacb478cbed5efcae784d7bf4f2ff80"
        .parse::<alloy::primitives::FixedBytes<32>>()?,
)?,

This is the well-known Hardhat/Anvil account #0 private key (0xf39Fd6e51...). The l1_call_proof_signer is used to sign signal slot hashes as "proofs" for L1 bridge calls. Anyone who knows this key (and effectively everyone does) can forge these proofs. Even marked as a POC, this must be replaced with a configurable key loaded from an environment variable before merging onto any shared or potentially public branch.

Fix this →


High Priority Issues

🟠 Bridging Guard vs. Bridge Handler Contradiction

realtime/src/lib.rs:35-39

if !config.disable_bridging {
    return Err(anyhow::anyhow!(
        "Bridging is not implemented. Exiting RealTime node creation."
    ));
}

lib.rs hard-exits if bridging is enabled, but then the node unconditionally initializes a BridgeHandler (including a jsonrpsee RPC server on 127.0.0.1:4545) and exposes surge_sendUserOp and surge_userOpStatus endpoints. If disable_bridging is always required to be true, the bridge handler and its RPC server should be conditionally created, or the guard should be removed if bridging is in fact supported. As written, bridging-related infrastructure is always initialized but never reachable.

🟠 Duplicate RaikoClient Construction

realtime/src/l1/execution_layer.rs:99-102 and realtime/src/lib.rs:136

RaikoClient is constructed twice — once in ELTrait::new() within ExecutionLayer and once in create_realtime_node(). The ExecutionLayer exposes a get_raiko_client() method but the RaikoClient in the actual submission pipeline (AsyncSubmitter) comes from the separate instance created in lib.rs. The one inside ExecutionLayer appears unused in the submission path. This creates two independent HTTP clients with potentially different configs, and is a maintenance burden. RealtimeConfig is also read from env vars twice as a result.

Fix this →

🟠 Only First User Op and L1 Call Processed

realtime/src/l1/proposal_tx_builder.rs:97-122

if !batch.user_ops.is_empty()
    && let Some(user_op) = batch.user_ops.first()
{
    // ...only this one is added to multicall
}

if !batch.l1_calls.is_empty()
    && let Some(l1_call) = batch.l1_calls.first()
{
    // ...only this one is added to multicall
}

If a batch somehow accumulates more than one user op or L1 call (the rest of the code currently seems to add at most one per batch, but nothing enforces this structurally), only the first would be included in the multicall. The remaining would be silently ignored. Either explicitly enforce a max-of-one invariant with an assertion, or loop over all entries.

🟠 assert! Panic in AsyncSubmitter::submit()

realtime/src/node/proposal_manager/async_submitter.rs:68

assert!(
    !self.is_busy(),
    "Cannot submit while another submission is in flight"
);

Using assert! (which panics and can crash the whole node process) is inappropriate here. The callsite in try_start_submission checks is_busy() before calling submit(), but relying on this coordination is fragile. Return a Result<(), Error> instead to make the error recoverable.


Medium Priority Issues

🟡 Hardcoded Bridge Handler Addresses / Paths

  • realtime/src/node/proposal_manager/mod.rs:80: Bridge handler RPC address is hardcoded to "127.0.0.1:4545". Should be configurable via RealtimeConfig.
  • realtime/src/node/proposal_manager/bridge_handler.rs:106: sled DB path is hardcoded to "data/user_op_status". Should be configurable.

🟡 Unbounded Warmup Loop

realtime/src/node/mod.rs:447-459

The warmup() loop polls get_last_proposal_hash() indefinitely until the RealTimeInbox is activated (non-zero hash), sleeping 12 seconds between retries. There is no maximum timeout or retry count. A stuck or misconfigured node would spin here forever without surfacing a clear error after a reasonable period.

🟡 Raiko Polling: No Cancellation Token Respected

realtime/src/raiko/mod.rs:81-128

The get_proof() polling loop runs up to 60 × 10s = 600 seconds in the worst case, with no way to cancel it early via the CancellationToken. If the node is shutting down while a proof is being awaited, the background submission task (AsyncSubmitter::submit) will block for up to 10 minutes before exiting. The abort() method on AsyncSubmitter does call handle.abort(), which would cancel the task at any await point, so this is partially mitigated, but explicit cancellation token checking in the polling loop would be cleaner.

🟡 anchor_state_root: B256::ZERO

realtime/src/node/proposal_manager/proposal.rs:98

anchor_state_root: B256::ZERO, // Not used in RealTime anchor

This zero value is passed through to advance_head_to_new_l2_block and then into the Checkpoint struct used in the anchor transaction. If the RealTimeInbox or Anchor contract ever validates stateRoot, this would silently produce invalid proofs. Deserves a more explicit explanation or a real value if it is actually consumed by any contract.

🟡 Missing bridge Address in RealtimeConfig::Display

realtime/src/utils/config.rs:56-63

The bridge address is stored in RealtimeConfig but not printed in the Display implementation, making startup logging incomplete for debugging.


Low Priority / Minor Issues

🔵 #[allow(dead_code)] Module-Level Suppression

realtime/src/l2/taiko.rs:1

A module-level #[allow(dead_code)] suppresses warnings for potentially unused methods. The dead code should either be removed or the #[allow] should be scoped to specific items.

🔵 batches_ready_to_send Hardcoded to 0

realtime/src/node/mod.rs:333

let batches_ready_to_send = 0;

This is always 0 without explanation. If this parameter affects mempool filtering or tx selection in get_pending_l2_tx_list, the intent should be documented or derived from the actual state.

🔵 transfer_eth_from_l2_to_l1 is a Stub

realtime/src/l2/execution_layer.rs:141-149

pub async fn transfer_eth_from_l2_to_l1(...) -> Result<(), Error> {
    warn!("Implement bridge transfer logic here");
    Ok(())
}

Returns Ok(()) silently. Since bridging is gated behind disable_bridging, this may never be called, but it should at minimum return an error rather than silently succeeding with a warning.

🔵 get_l2_height_from_taiko_inbox Always Returns 0

realtime/src/l1/execution_layer.rs:164-166

async fn get_l2_height_from_taiko_inbox(&self) -> Result<u64, Error> {
    Ok(0)
}

Returns a hardcoded 0. Callers should be verified to handle this sentinel value correctly.


Summary

The overall structure of the RealTime proving module is clean and the integration with the existing Catalyst node architecture (slot clock, operator, taiko driver, common EL) is thoughtfully done. The async submission model decoupling preconfirmation from proof fetching is a solid design. The main concerns are:

  1. The hardcoded Hardhat private key must be removed before this can be deployed anywhere — even for a fork/testnet.
  2. The bridging guard in lib.rs conflicts with always-initialized bridging infrastructure.
  3. Duplicate RaikoClient construction should be cleaned up.
  4. A few hardcoded values (RPC address, DB path) should be made configurable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants