Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
108 changes: 95 additions & 13 deletions packages/rs-sdk/src/sdk.rs
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,12 @@ pub struct Sdk {
/// Set to `false` when the user explicitly calls [`SdkBuilder::with_version()`].
auto_detect_protocol_version: bool,

/// One-shot latch used by [`Self::ensure_protocol_version_bootstrapped`]
/// to make sure the auto-detect bootstrap RPC runs at most once even
/// under concurrent first calls. Shared between clones so siblings all
/// observe the same bootstrap state.
protocol_version_bootstrapped: Arc<tokio::sync::OnceCell<()>>,

/// Last seen height; used to determine if the remote node is stale.
///
/// This is clone-able and can be shared between threads.
Expand Down Expand Up @@ -149,6 +155,7 @@ impl Clone for Sdk {
cancel_token: self.cancel_token.clone(),
protocol_version: Arc::clone(&self.protocol_version),
auto_detect_protocol_version: self.auto_detect_protocol_version,
protocol_version_bootstrapped: Arc::clone(&self.protocol_version_bootstrapped),
metadata_last_seen_height: Arc::clone(&self.metadata_last_seen_height),
metadata_height_tolerance: self.metadata_height_tolerance,
metadata_time_tolerance_ms: self.metadata_time_tolerance_ms,
Expand Down Expand Up @@ -301,6 +308,77 @@ impl Sdk {
}
}

/// Make sure the SDK has learned the network's protocol version before
/// doing any proof-backed work.
///
/// On a fresh auto-detect SDK the protocol version starts at 0 and
/// [`Self::version`] falls back to [`PlatformVersion::latest()`]. That
/// used to mean the very first proof parse happened at `latest()`, and
/// on an older network whose proof interpretation differs from
/// `latest()` the first request would fail before the SDK could learn
/// the correct version from response metadata.
///
/// This helper closes that hole by eagerly running a single unproved
/// request (the cheap [`CurrentQuorumsInfo`] endpoint) on first use,
/// reading `metadata.protocol_version` off the response, and updating
/// the SDK's cached version *before* the first proof parse runs.
///
/// A [`tokio::sync::OnceCell`] guarantees the bootstrap RPC runs at
/// most once per SDK (and its clones) even under concurrent first
/// calls — subsequent callers simply wait for the in-flight bootstrap
/// to finish. If the bootstrap RPC itself fails we log a warning and
/// fall back to the old `latest()` behaviour; this preserves
/// best-effort semantics for partially-reachable networks.
///
/// Skipped entirely for SDKs built with an explicit version
/// ([`SdkBuilder::with_version()`]), for mock SDKs, and any time this
/// helper is entered from within the unproved request path itself
/// (to avoid re-entry).
Comment on lines +333 to +336
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💬 Nitpick: Doc comment claims a re-entry guard that doesn't exist

The doc says the helper is Skipped entirely ... any time this helper is entered from within the unproved request path itself (to avoid re-entry). There is no such guard in the implementation — re-entry is prevented only by the structural fact that FetchUnproved::fetch_unproved_with_settings does not call parse_proof_with_metadata_and_proof. Either add an explicit guard or fix the doc to describe the actual invariant (so a future refactor doesn't silently introduce a deadlock on the OnceCell).

source: ['claude-general', 'claude-security-auditor', 'claude-rust-quality']

async fn ensure_protocol_version_bootstrapped(&self) {
if !self.auto_detect_protocol_version {
return;
}
// If we've already seen a response (protocol_version != 0), the
// version is already cached — skip the bootstrap entirely.
if self.protocol_version.load(Ordering::Relaxed) != 0 {
return;
}
// Mock SDKs have no real network to bootstrap against.
if !matches!(self.inner, SdkInstance::Dapi { .. }) {
return;
}

let bootstrapped = Arc::clone(&self.protocol_version_bootstrapped);
bootstrapped
Comment on lines +351 to +352
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💬 Nitpick: Redundant Arc::clone before get_or_init

let bootstrapped = Arc::clone(&self.protocol_version_bootstrapped); bootstrapped.get_or_init(...) bumps the Arc count only to call a method that takes &self. self.protocol_version_bootstrapped.get_or_init(...).await works just as well.

source: ['claude-rust-quality']

.get_or_init(|| async {
use crate::platform::FetchUnproved;
use drive_proof_verifier::types::{CurrentQuorumsInfo, NoParamQuery};

match CurrentQuorumsInfo::fetch_unproved_with_settings(
self,
NoParamQuery {},
RequestSettings::default(),
Comment on lines +357 to +360
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use the SDK’s configured request settings for the bootstrap RPC.

Line 360 hardcodes RequestSettings::default(), so this new pre-parse RPC ignores both the SDK default retry policy and any caller overrides from SdkBuilder::with_settings(). That makes the first proof-backed request run under a different timeout/retry policy than every other SDK call.

Suggested fix
                 match CurrentQuorumsInfo::fetch_unproved_with_settings(
                     self,
                     NoParamQuery {},
-                    RequestSettings::default(),
+                    self.dapi_client_settings,
                 )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
match CurrentQuorumsInfo::fetch_unproved_with_settings(
self,
NoParamQuery {},
RequestSettings::default(),
match CurrentQuorumsInfo::fetch_unproved_with_settings(
self,
NoParamQuery {},
self.dapi_client_settings,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/rs-sdk/src/sdk.rs` around lines 357 - 360, The bootstrap RPC call
uses RequestSettings::default() which ignores the SDK-wide settings and caller
overrides; replace that hardcoded default with the SDK instance's configured
RequestSettings (the field on the client, e.g., self.request_settings or
self.settings) and pass that into
CurrentQuorumsInfo::fetch_unproved_with_settings so the call honors
SdkBuilder::with_settings() and the SDK's retry/timeout policy.

)
.await
Comment on lines +357 to +362
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: Bootstrap RPC ignores the user-configured dapi_client_settings

CurrentQuorumsInfo::fetch_unproved_with_settings(self, NoParamQuery {}, RequestSettings::default()) discards self.dapi_client_settings, so timeouts, retries, and ban policy set on the SdkBuilder are silently ignored for the bootstrap call. This runs on the critical path of the very first proof-backed request, so a user who configured longer timeouts for a slow network (or stricter retry/ban policy) will instead get library defaults here. Pass self.dapi_client_settings (or an explicit, documented bootstrap-specific RequestSettings) instead.

source: ['claude-rust-quality']

🤖 Fix this with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `packages/rs-sdk/src/sdk.rs`:
- [SUGGESTION] lines 357-362: Bootstrap RPC ignores the user-configured dapi_client_settings
  `CurrentQuorumsInfo::fetch_unproved_with_settings(self, NoParamQuery {}, RequestSettings::default())` discards `self.dapi_client_settings`, so timeouts, retries, and ban policy set on the `SdkBuilder` are silently ignored for the bootstrap call. This runs on the critical path of the very first proof-backed request, so a user who configured longer timeouts for a slow network (or stricter retry/ban policy) will instead get library defaults here. Pass `self.dapi_client_settings` (or an explicit, documented bootstrap-specific RequestSettings) instead.

{
Ok((_, metadata)) => {
self.maybe_update_protocol_version(metadata.protocol_version);
tracing::debug!(
version = metadata.protocol_version,
"SDK auto-detect bootstrap succeeded"
);
}
Err(err) => {
tracing::warn!(
%err,
"SDK auto-detect bootstrap RPC failed; falling back to PlatformVersion::latest() for the first request"
);
}
}
})
.await;
Comment on lines +351 to +379
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Blocking: Failed bootstrap (and zero-version responses) permanently disable future protocol-version detection

ensure_protocol_version_bootstrapped() uses OnceCell::get_or_init with a closure that always returns (), even in the Err arm and even when maybe_update_protocol_version short-circuits on received_version == 0. After one transient failure (or a response whose metadata protocol_version is 0 — possible for older nodes or protobuf defaults), protocol_version_bootstrapped is permanently initialized while self.protocol_version stays at 0. Every subsequent call to parse_proof_with_metadata_and_proof then skips bootstrap and falls back to PlatformVersion::latest() for the rest of the SDK's lifetime — the exact broken state this PR is meant to fix. Store Result<(), Error> in the OnceCell (or use a more granular latch) so that only a successful version discovery marks bootstrap as complete, and transient failures get retried.

source: ['claude-general', 'codex-general', 'claude-security-auditor', 'codex-security-auditor', 'claude-rust-quality', 'codex-rust-quality']

🤖 Fix this with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `packages/rs-sdk/src/sdk.rs`:
- [BLOCKING] lines 351-379: Failed bootstrap (and zero-version responses) permanently disable future protocol-version detection
  `ensure_protocol_version_bootstrapped()` uses `OnceCell::get_or_init` with a closure that always returns `()`, even in the `Err` arm and even when `maybe_update_protocol_version` short-circuits on `received_version == 0`. After one transient failure (or a response whose metadata protocol_version is 0 — possible for older nodes or protobuf defaults), `protocol_version_bootstrapped` is permanently initialized while `self.protocol_version` stays at 0. Every subsequent call to `parse_proof_with_metadata_and_proof` then skips bootstrap and falls back to `PlatformVersion::latest()` for the rest of the SDK's lifetime — the exact broken state this PR is meant to fix. Store `Result<(), Error>` in the OnceCell (or use a more granular latch) so that only a *successful* version discovery marks bootstrap as complete, and transient failures get retried.

}
Comment on lines +337 to +380
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: Unproved bootstrap metadata can pin the SDK's protocol_version

The bootstrap value comes from ResponseMetadata.protocol_version of an unproved CurrentQuorumsInfo response. Because maybe_update_protocol_version uses fetch_max and only accepts monotonically increasing known versions, a single DAPI node that wins the bootstrap RPC can return the highest-known version and pin the cached version there; any later honest proof-backed metadata carrying the real lower network version is silently discarded. This widens the SDK's attack surface compared to the previous latest() fallback — an attacker can cause proof-parse failures (DoS) against a client whose network is on a lower version. Consider cross-checking the bootstrap metadata against multiple peers, or only treating the bootstrap version as a hint that is confirmed by the first proof-backed response.

source: ['claude-general', 'claude-security-auditor', 'codex-security-auditor', 'codex-rust-quality']

🤖 Fix this with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `packages/rs-sdk/src/sdk.rs`:
- [SUGGESTION] lines 337-380: Unproved bootstrap metadata can pin the SDK's protocol_version
  The bootstrap value comes from `ResponseMetadata.protocol_version` of an *unproved* `CurrentQuorumsInfo` response. Because `maybe_update_protocol_version` uses `fetch_max` and only accepts monotonically increasing known versions, a single DAPI node that wins the bootstrap RPC can return the highest-known version and pin the cached version there; any later honest proof-backed metadata carrying the real lower network version is silently discarded. This widens the SDK's attack surface compared to the previous `latest()` fallback — an attacker can cause proof-parse failures (DoS) against a client whose network is on a lower version. Consider cross-checking the bootstrap metadata against multiple peers, or only treating the bootstrap version as a hint that is confirmed by the first proof-backed response.

Comment on lines +337 to +380
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: No tests exercise the bootstrap RPC path itself

The diff adds no new tests. The existing tests target verify_response_metadata / maybe_update_protocol_version via mock SDKs, but ensure_protocol_version_bootstrapped early-returns for mocks, so the bootstrap logic — which is the entire point of this PR — has no automated coverage. Important untested behaviours include: idempotence of the OnceCell under concurrent first parses, correct skip for pinned and mock SDKs, correct behaviour when the bootstrap RPC fails, and that the cached version is actually updated before the first proof parse.

source: ['claude-general', 'claude-rust-quality']

🤖 Fix this with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `packages/rs-sdk/src/sdk.rs`:
- [SUGGESTION] lines 337-380: No tests exercise the bootstrap RPC path itself
  The diff adds no new tests. The existing tests target `verify_response_metadata` / `maybe_update_protocol_version` via mock SDKs, but `ensure_protocol_version_bootstrapped` early-returns for mocks, so the bootstrap logic — which is the entire point of this PR — has no automated coverage. Important untested behaviours include: idempotence of the OnceCell under concurrent first parses, correct skip for pinned and mock SDKs, correct behaviour when the bootstrap RPC fails, and that the cached version is actually updated before the first proof parse.


// TODO: Changed to public for tests
/// Retrieve object `O` from proof contained in `request` (of type `R`) and `response`.
///
Expand All @@ -313,19 +391,17 @@ impl Sdk {
///
/// ## Protocol version bootstrapping
///
/// On a fresh auto-detect SDK (i.e. one built without [`SdkBuilder::with_version()`]), the
/// first call to this method uses [`PlatformVersion::latest()`] as a fallback because no
/// network response has been received yet to teach the SDK the real network version.
///
/// The actual network version is learned only *after* proof parsing succeeds, when
/// [`Self::verify_response_metadata()`] processes `metadata.protocol_version`. If the
/// connected network runs an older protocol version **and** proof interpretation differs
/// between that version and `latest()`, the very first request may fail before the SDK can
/// correct itself. Subsequent requests will use the correct version.
///
/// This is a known bootstrap limitation. Callers that must guarantee correct version
/// behaviour on the first request should pin the version explicitly via
/// [`SdkBuilder::with_version()`].
/// On a fresh auto-detect SDK (i.e. one built without
/// [`SdkBuilder::with_version()`]), this method calls
/// [`Self::ensure_protocol_version_bootstrapped`] before parsing the
/// proof, which runs a one-shot unproved RPC to learn the network's
/// protocol version. That guarantees the first proof parse happens
/// at the correct version even on older networks.
///
/// If the bootstrap RPC itself fails (unreachable network, etc.) the
/// SDK falls back to [`PlatformVersion::latest()`]. Callers that must
/// absolutely guarantee a specific version without any network round
/// trip should still pin via [`SdkBuilder::with_version()`].
pub(crate) async fn parse_proof_with_metadata_and_proof<R, O: FromProof<R> + MockResponse>(
&self,
request: O::Request,
Expand All @@ -334,6 +410,10 @@ impl Sdk {
where
O::Request: Mockable + TransportRequest,
{
// Learn the network protocol version before the first proof parse.
// No-op after the first successful call (and for pinned / mock SDKs).
self.ensure_protocol_version_bootstrapped().await;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think calling this function in SdkBuilder::build() will be simpler and (marginally) cheaper.


let provider = self
.context_provider()
.ok_or(drive_proof_verifier::Error::ContextProviderNotSet)?;
Expand Down Expand Up @@ -971,6 +1051,7 @@ impl SdkBuilder {
if self.version_explicit { self.version.protocol_version } else { 0 },
)),
auto_detect_protocol_version: !self.version_explicit,
protocol_version_bootstrapped: Arc::new(tokio::sync::OnceCell::new()),
// Note: in the future, we need to securely initialize initial height during Sdk bootstrap or first request.
metadata_last_seen_height: Arc::new(atomic::AtomicU64::new(0)),
metadata_height_tolerance: self.metadata_height_tolerance,
Expand Down Expand Up @@ -1041,6 +1122,7 @@ impl SdkBuilder {
if self.version_explicit { self.version.protocol_version } else { 0 },
)),
auto_detect_protocol_version: !self.version_explicit,
protocol_version_bootstrapped: Arc::new(tokio::sync::OnceCell::new()),
context_provider: ArcSwapOption::new(Some(Arc::new(context_provider))),
cancel_token: self.cancel_token,
metadata_last_seen_height: Arc::new(atomic::AtomicU64::new(0)),
Expand Down
Loading