-
Notifications
You must be signed in to change notification settings - Fork 47
feat: F3 e2e lifecycle #1469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: F3 e2e lifecycle #1469
Conversation
fbaa095 to
b34142d
Compare
b34142d to
31feb85
Compare
91db005 to
cbce51c
Compare
39e59d6 to
bfdc6f7
Compare
93a1066 to
1747f78
Compare
…t and execute logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| /// | ||
| /// This method should only be called from consensus code path which | ||
| /// contains the lightclient verifier. No additional validation is | ||
| /// performed here as it's expected to be done by the verifier. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing validation in F3 Light Client update_state
High Severity
The update_state function in state.rs has no validation logic—it unconditionally replaces light_client_state and returns Ok(()). However, multiple tests expect it to reject invalid updates with USR_ILLEGAL_ARGUMENT: test_update_state_non_advancing_height expects rejection when height doesn't advance, test_instance_id_skip_rejected expects rejection when instance_id skips values, and test_empty_epochs_rejected also expects an error. These tests will fail because the validation they expect does not exist in the implementation.
Additional Locations (2)
| assert!(result.is_err()); | ||
| let err = result.unwrap_err(); | ||
| assert_eq!(err.exit_code(), ExitCode::USR_ILLEGAL_ARGUMENT); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test name doesn't match test behavior
Low Severity
The test test_empty_epochs_rejected claims to test rejection of "empty finalized_epochs" per its comment on line 399, but it creates a state with Some(10) for latest_finalized_height rather than None. This makes it identical to test_update_state_non_advancing_height instead of testing the distinct case of a missing/empty finalized height. If the intent was to test rejection of None, the test should use create_test_state(1, None, ...).
| if !proof_config.enabled { | ||
| tracing::info!("F3 proof service disabled in configuration"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand correctly that this case is for full subnet nodes that want to follow the subnet with F3-based top-down, but don't follow the parent chain because they are not active validators?
| let cached = self | ||
| .proof_cache | ||
| .get_epoch_proof_with_certificate(msg.height) | ||
| .ok_or_else(|| { | ||
| anyhow::anyhow!( | ||
| "proof bundle not found in local cache for height {}", | ||
| msg.height | ||
| ) | ||
| })?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd better comply with CometBFT requirements and ensure that ProcessProposal is fully deterministic. In order to do that, we have to verify the certificate deterministically against the F3 power table stored in the F3 light client actor's current state. If the certificate is valid then we should not reject the proposal.
In order to avoid voting for the proposal for which we locally don't yet have the corresponding data, I think we could be waiting (e.g. by polling) for the corresponding entry to appear in the proof cache before accepting the proposal. This should be pretty safe, but we absolutely have to deterministically verify the validity of the proposed certificate first and immediately reject the proposal if the certificate happens to be invalid, otherwise we might end up waiting for a non-existent certificate made up by a Byzantine proposer.
| // Epoch must advance by exactly 1 relative to the latest finalized epoch in state. | ||
| // | ||
| // At genesis this is `None` (no finality yet). In that case we skip the check here; the | ||
| // cache lookup (and later execution) will still enforce that we only process epochs we | ||
| // have proofs for. | ||
| if let Some(prev_finalized) = f3_state.latest_finalized_height { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid this would allow the proposer of the first parent chain update since the genesis skip arbitrarily many epochs at the beginning of the certified chain extension. I think we have to set latest_finalized_height in genesis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this what genesis_epoch in the legacy top-down for?
| tracing::debug!(instance = instance_id, "updated F3LightClientActor state"); | ||
|
|
||
| // Mark epoch as committed in cache. | ||
| if let Err(e) = self.mark_committed(epoch, instance_id) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If epoch is not the last one finalized by this certificate then we probably should not mark instance_id yet.
| // Convert BigInt -> u64 (saturating if too large). | ||
| // Power should be non-negative; we ignore the sign here and keep the magnitude. | ||
| let (_sign, digits) = pe.power.to_u64_digits(); | ||
| let power = if digits.is_empty() { | ||
| 0 | ||
| } else if digits.len() == 1 { | ||
| digits[0] | ||
| } else { | ||
| u64::MAX | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
64 bits may not be enough. I haven't check this thoroughly myself, but here's what Claude thinks:
Storage power values on Filecoin can be extremely large (representing Quality-Adjusted Power across the entire network), so they need arbitrary precision integers
Evidence Supporting the Statement:
- Historical Network Size Exceeded uint64:
Filecoin network peaked at 17 EiB in Q3 2022
17 EiB = 19,599,665,578,516,398,592 bytes
uint64 max = 18,446,744,073,709,551,615 bytes
The peak exceeded uint64 by ~1 EiB!
- Quality-Adjusted Power Multipliers:
Verified deals in Filecoin receive a 10x power multiplier
Even current capacity (3 EiB) → 30 EiB with 10x QAP
30 EiB vastly exceeds uint64 range
- Current Capacity Still Large:
Current network: ~3.0 EiB (Q3 2025)
While this fits in uint64, it's close enough that:Calculations involving sums and products could overflow
Future growth requires headroom
Safety margins are essential
So, at very least, we should raise an error if there's an overflow rather than silently saturate to u64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle, we could store just the power table's CID in the actor, though we'd need to fetch the whole table from the endpoint at start and keep it updated somewhere off-chain. (Assuming state sync may only happen on bootstrap, CometBFT wouldn't need to know about this.)
| // Execute F3-specific logic (certificate validation, proof extraction, state updates) | ||
| let (msgs, validator_changes, instance_id) = | ||
| f3.extract_messages_and_validator_changes(state, &msg)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here or inside extract_messages_and_validator_changes, we have to keep trying until the corresponding entry appears in the proof cache. We cannot return error if it's not yet there because this creates a non-deterministic behavior in the consensus execution path.
| f3.extract_messages_and_validator_changes(state, &msg)?; | ||
|
|
||
| // Commit parent finality to gateway | ||
| let finality = IPCParentFinality::new(msg.height as i64, vec![]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably need to get the tipset key (the 2nd parameter) from the F3 cert.
| /// Generalized top-down finality structure | ||
| #[derive(Clone, Debug, Serialize, Deserialize, PartialEq, Eq)] | ||
| pub struct GeneralisedTopDown { | ||
| /// The chain epoch this finality is for (height) | ||
| pub height: ChainEpoch, | ||
| /// The certificate that certifies finality (type-specific, proof is fetched from local cache) | ||
| pub certificate: Certificate, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could omit the height field and commit all the epochs finalized by the cert at once. Though, this won't be so easy when we include the data with integrity proofs, due to potential issues with the block size limits.
| // The last tipset in the certificate has no child tipset inside this certificate, so it | ||
| // cannot be proven yet. We only treat the epochs we generated proofs for as "finalized | ||
| // tipsets" for verification purposes. | ||
| let finalized_tipsets = { | ||
| let parents: Vec<FinalizedTipset> = | ||
| tipset_pairs.iter().map(|(p, _)| p.clone()).collect(); | ||
| FinalizedTipsets::from(parents.as_slice()) | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, verify_proof_bundle_with_tipsets wants to verify the child tipsets as well, so we should use the whole cert.ec_chain as finalized_tipsets.
| self.verifier | ||
| .verify_proof_bundle_with_tipsets(&proof_bundle, &finalized_tipsets) | ||
| .with_context(|| format!("Failed to verify proof for epoch {}", parent_epoch))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently, there's no verification of continuity of top-down event nonces, yet.
Closes #1441 and #1442
Note
Cursor Bugbot is generating a summary for commit 8857277. Configure here.