Skip to content

fix(rs-dapi,sdk): decode base64 CBOR error messages from Tenderdash#3350

Open
lklimek wants to merge 25 commits intov3.1-devfrom
fix/sdk-cbor-security-hardening
Open

fix(rs-dapi,sdk): decode base64 CBOR error messages from Tenderdash#3350
lklimek wants to merge 25 commits intov3.1-devfrom
fix/sdk-cbor-security-hardening

Conversation

@lklimek
Copy link
Copy Markdown
Contributor

@lklimek lklimek commented Mar 16, 2026

Issue being fixed or feature implemented

When Tenderdash returns errors with message: "Internal error", rs-dapi falls through to the data field. Tenderdash wraps Drive's CBOR error payload in the JSON-RPC data field as a base64 string, but rs-dapi stored it raw — producing unintelligible gRPC status messages like oWdtZXNzYWdleMVzdG9yYWdl... on the wire.

Related: dashpay/dash-evo-tool#766 (DET workaround using message-text matching)

Evidence from DET production logs

144 gRPC errors across all DET logs. Four patterns:

Pattern Count gRPC Code drive-error-data-bin dash-serialized-consensus-error-bin consensus_error in CBOR
Duplicate key (non-unique set) 22 Internal (13) null
Duplicate key (unique set) 77 Internal (13) null
Consensus violation (DPP 10530) 2 InvalidArgument present
TCP timeout 43 Unavailable N/A

The majority of errors (99/144) are GroveDB duplicate-key errors during identity create/update where the actual human-readable message was buried in double-encoded CBOR. The 2 consensus errors (DPP 10530) already work correctly via the existing dash-serialized-consensus-error-bin path.

What was done?

rs-dapi (error_mapping.rs)

Added decode_data_message() helper that tries to decode the data field as base64 → CBOR → extract human-readable message text. If decoding fails (plain text data), the raw string is preserved — no regression for existing behavior.

rs-sdk (error.rs)

Added DriveInternalError(String) variant to the SDK Error enum. When gRPC status is Code::Internal and drive-error-data-bin metadata is present, the CBOR payload is decoded and the human-readable message is surfaced via DriveInternalError instead of being lost in a generic DapiClientError.

rs-sdk-ffi (error.rs)

Added DashSDKErrorCode::DriveInternalError = 10 and string match branch so FFI consumers (C/Swift) get a dedicated error code instead of falling through to NetworkError.

wasm-sdk (error.rs)

Added WasmSdkErrorKind::DriveInternalError mapping for the new variant.

How Has This Been Tested?

Automated

  • cargo test -p rs-dapi --lib error_mapping — all 36 tests pass (29 existing + 7 new)
  • cargo test -p dash-sdk --lib — 111 tests pass (4 new for drive-error-data-bin path)
  • cargo clippy -p rs-dapi -p dash-sdk -p dash-sdk-ffi — clean
  • cargo fmt --all — clean
  • Real-world test fixtures extracted from DET production logs

SDK test coverage added

Test What it verifies
test_drive_error_data_bin_maps_to_drive_internal_error Valid CBOR + Code::InternalDriveInternalError
test_internal_error_without_drive_metadata_falls_through Code::Internal without metadata → DapiClientError
test_non_internal_code_with_drive_metadata_not_intercepted Code::Unavailable + metadata → NOT intercepted
test_malformed_cbor_in_drive_error_data_bin_falls_through Garbage bytes → graceful fallback

Manual integration test

To reproduce with a real network:

  1. Start a local devnet or connect to testnet via DET
  2. Create an identity with IdentityCreate state transition
  3. Attempt to create another identity reusing the same public key hash
  4. Before fix: SDK receives gRPC Internal error with message oWdtZXNzYWdleMVzdG9yYWdl... (base64 CBOR gibberish)
  5. After fix: SDK receives gRPC Internal error with message storage: identity: a unique key with that hash already exists: the key already exists in the unique set [...]

Breaking Changes

None. Plain text data fields are preserved as-is. Only base64-encoded CBOR with a message key is decoded. The new DriveInternalError variant extends the Error enum (pre-1.0 SDK, follows platform release cycle).

Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added or updated relevant unit/integration/functional/e2e tests
  • I have added "!" to the title and described breaking changes in the corresponding section if my code contains any
  • I have made corresponding changes to the documentation if needed (N/A)

For repository code-owners and collaborators only

  • I have assigned this pull request to a milestone

🤖 Co-authored by Claudius the Magnificent AI Agent

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 16, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds permissive base64+CBOR decoding to extract embedded "message" strings from Tenderdash JSON data and gRPC drive-error-data-bin metadata, introduces private helpers for these decodings, updates error mapping to prefer decoded messages, and adds unit/integration tests.

Changes

Cohort / File(s) Summary
Platform service error mapping
packages/rs-dapi/src/services/platform_service/error_mapping.rs
Added private decode_data_message(data: &str) -> Option<String> to permissively base64-decode and CBOR-decode data, extracting a top-level "message". Updated impl From<serde_json::Value> for TenderdashStatus to consult decoded "data" when top-level "message" is missing/empty or equals "Internal error" (case-insensitive), falling back to raw data if decoding fails. Added unit and integration tests (plain text, base64→CBOR with/without "message", and real DET fixtures).
Dapi client gRPC error handling
packages/rs-sdk/src/error.rs
Added enum variant Error::DriveInternalError(String). Introduced private helper extract_drive_error_message(bytes: &[u8]) -> Option<String> to CBOR-decode drive-error-data-bin metadata and extract a non-empty "message". Extended From<DapiClientError> for Error to inspect Code::Internal and return Error::DriveInternalError(message) when present; preserved existing consensus-error handling and fallbacks when interception fails.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant TenderdashJSON as Tenderdash JSON handler
    participant DapiGRPC as DapiClient error converter
    participant Metadata as JSON "data" / gRPC metadata
    participant Decoder as Base64+CBOR Decoder
    participant ErrorMapper as Error Mapper

    Client->>TenderdashJSON: deliver JSON error (contains "message" and/or "data")
    TenderdashJSON->>Metadata: inspect top-level "message" and "data"
    alt top-level "message" missing/empty or "Internal error"
        TenderdashJSON->>Decoder: attempt permissive base64 -> CBOR decode of "data"
        Decoder-->>TenderdashJSON: optional "message" string
        TenderdashJSON->>ErrorMapper: map using decoded message or fallback raw "data"
    else top-level "message" present
        TenderdashJSON->>ErrorMapper: map using top-level "message"
    end

    Client->>DapiGRPC: deliver gRPC Status + metadata
    DapiGRPC->>Metadata: inspect `drive-error-data-bin` when Code::Internal
    alt `drive-error-data-bin` present
        DapiGRPC->>Decoder: CBOR-decode metadata bytes
        Decoder-->>DapiGRPC: optional "message" string
        DapiGRPC->>ErrorMapper: return Error::DriveInternalError(message) if found
    else
        DapiGRPC->>ErrorMapper: continue existing mapping / fallback
    end

    ErrorMapper-->>Client: mapped Error
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I nibbled at Base64 bits late at night,

coaxed CBOR whispers into simple light.
Tiny messages hopped out to say "hello",
now errors wear signs so I always know. 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: decoding base64 CBOR error messages from Tenderdash in both rs-dapi and rs-sdk packages.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/sdk-cbor-security-hardening

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added this to the v3.1.0 milestone Mar 16, 2026
@lklimek lklimek requested a review from Copilot March 16, 2026 15:33
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves dash-sdk’s error conversion so callers can recover structured ConsensusError information even when DAPI does not provide the primary dash-serialized-consensus-error-bin metadata, by falling back to decoding drive-error-data-bin.

Changes:

  • Add a fallback path in From<DapiClientError> for Error to parse CBOR drive-error-data-bin and extract embedded consensus_error bytes.
  • Add explicit defensive limits (8 KiB max metadata size, CBOR recursion limit 16) before deserializing untrusted metadata.
  • Add unit tests covering the fallback behavior and precedence.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/rs-sdk/src/error.rs Outdated
Comment thread packages/rs-sdk/src/error.rs Outdated
Comment thread packages/rs-sdk/src/error.rs Outdated
@lklimek lklimek force-pushed the fix/sdk-cbor-security-hardening branch from b173ca5 to 4aa4910 Compare March 16, 2026 18:09
When Tenderdash returns an error with `message: "Internal error"`,
rs-dapi falls through to the `data` field. Previously, the raw base64
string was stored as the message, producing unintelligible gRPC status
messages like "oWdtZXNzYWdleMVzdG9yYWdl...".

Now `decode_data_message()` tries to decode the `data` field as
base64 → CBOR and extract the human-readable `message` text. If
decoding fails (plain text data), the raw string is preserved as before.

This fixes the 99% case from production DET logs: GroveDB duplicate-key
errors (code 13) where the actual message "storage: identity: a unique
key with that hash already exists..." was buried in double-encoded CBOR.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@lklimek lklimek changed the title fix(rs-sdk): decode consensus errors from drive-error-data-bin metadata fallback fix(rs-dapi): decode base64 CBOR message in Tenderdash error data field Mar 16, 2026
@lklimek lklimek force-pushed the fix/sdk-cbor-security-hardening branch from 0fc9aee to bfba2ef Compare March 16, 2026 19:56
@lklimek lklimek requested a review from Copilot March 18, 2026 13:32
@lklimek lklimek marked this pull request as ready for review March 18, 2026 13:32
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves rs-dapi’s Tenderdash JSON-RPC error mapping by decoding base64-encoded CBOR stored in the Tenderdash data field (when the top-level message is unhelpful like "Internal error"), so gRPC clients receive human-readable error messages instead of base64 gibberish.

Changes:

  • Added decode_data_message() helper to decode data as base64 → CBOR and extract the "message" field.
  • Updated TenderdashStatus::from(serde_json::Value) to prefer decoded data.message when appropriate, while preserving prior fallback behavior.
  • Added unit tests for decoding behavior plus real-world DET log fixtures.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/rs-dapi/src/services/platform_service/error_mapping.rs Outdated
Comment thread packages/rs-dapi/src/services/platform_service/error_mapping.rs Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 18, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.83%. Comparing base (85a7117) to head (86fbeea).

Additional details and impacted files
@@            Coverage Diff            @@
##           v3.1-dev    #3350   +/-   ##
=========================================
  Coverage     84.83%   84.83%           
=========================================
  Files          2476     2476           
  Lines        267733   267733           
=========================================
  Hits         227123   227123           
  Misses        40610    40610           
Components Coverage Δ
dpp 81.99% <ø> (ø)
drive 84.21% <ø> (ø)
drive-abci 87.46% <ø> (ø)
sdk ∅ <ø> (∅)
dapi-client ∅ <ø> (∅)
platform-version ∅ <ø> (∅)
platform-value 92.10% <ø> (ø)
platform-wallet ∅ <ø> (∅)
drive-proof-verifier 55.66% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Collaborator

@thepastaclaw thepastaclaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Clean, well-scoped PR that adds base64→CBOR decoding for the Tenderdash error.data field, with good test coverage including real-world DET log fixtures. The new decode_data_message() function follows existing patterns in the module, reuses base64_decode() and walk_cbor_for_key(), and gracefully falls back on any decode failure. No blocking issues. Two suggestions worth considering: the pre-existing is_ascii() filter silently drops non-ASCII UTF-8 data strings (contradicting the 'raw string preserved' claim), and the other Tenderdash error ingestion paths (wait_for_state_transition_result) still pass data through without attempting CBOR decoding.

Reviewed commit: 7ae8846

🟡 3 suggestion(s)

1 additional finding

🟡 suggestion: Other Tenderdash error paths still pass `data` through without CBOR decoding

packages/rs-dapi/src/services/platform_service/wait_for_state_transition_result.rs (lines 151-161)

The new decode_data_message() is only invoked inside impl From<serde_json::Value>, which handles JSON-RPC error responses. The wait_for_state_transition_result path constructs TenderdashStatus::new() directly from ABCI tx_result.data (lines 151–161, 243) without attempting CBOR decoding.

This may be intentional — the ABCI responses could use a different encoding format than JSON-RPC errors. But if they can also carry the same base64-CBOR payloads, clients would still receive unreadable strings on those paths. Worth investigating whether TenderdashStatus::new() should optionally run the same decoding on its message parameter.

🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `packages/rs-dapi/src/services/platform_service/error_mapping.rs`:
- [SUGGESTION] lines 313-314: Pre-existing `is_ascii()` filter silently discards non-ASCII UTF-8 data
  The `.filter(|s| s.is_ascii())` on line 313 (pre-existing, not introduced by this PR) means that if Tenderdash ever returns a valid UTF-8 `data` string containing non-ASCII characters (e.g. localized or user-provided error text), it gets silently dropped — `message` becomes `None` and later code falls back to a generic message instead of preserving the server detail.

`serde_json::Value::as_str()` already guarantees well-formed UTF-8, and `base64_decode()` will simply return `None` for non-base64 input, so the `is_ascii()` guard provides no safety benefit here. Removing it would make the "raw string preserved on decode failure" claim fully accurate.

Since this PR is already touching this exact code path, it would be a good opportunity to fix this.
- [SUGGESTION] lines 830-841: No integration test for valid-base64-but-not-CBOR data field
  The existing `from_json_value_preserves_plain_text_data` test uses text that fails at the base64 step. Adding a test for a `data` field that base64-decodes successfully but is NOT valid CBOR would confirm the end-to-end fallback through the full `From<serde_json::Value>` conversion, not just the unit function.

In `packages/rs-dapi/src/services/platform_service/wait_for_state_transition_result.rs`:
- [SUGGESTION] lines 151-161: Other Tenderdash error paths still pass `data` through without CBOR decoding
  The new `decode_data_message()` is only invoked inside `impl From<serde_json::Value>`, which handles JSON-RPC error responses. The `wait_for_state_transition_result` path constructs `TenderdashStatus::new()` directly from ABCI `tx_result.data` (lines 151–161, 243) without attempting CBOR decoding.

This may be intentional — the ABCI responses could use a different encoding format than JSON-RPC errors. But if they can also carry the same base64-CBOR payloads, clients would still receive unreadable strings on those paths. Worth investigating whether `TenderdashStatus::new()` should optionally run the same decoding on its `message` parameter.

Comment thread packages/rs-dapi/src/services/platform_service/error_mapping.rs Outdated
Comment thread packages/rs-dapi/src/services/platform_service/error_mapping.rs
- T4: Use non-logging base64 decode in decode_data_message() to avoid
  noisy debug logs when data field is plain text (expected failure)
- T5: Fix misleading test comment — decode_data_message returns None,
  caller falls back to raw string
- T6: Remove pre-existing .filter(|s| s.is_ascii()) that silently
  discarded valid non-ASCII UTF-8 data strings
- T7: Add test for valid-base64-but-not-CBOR data field fallback

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@lklimek lklimek added the ready for final review Ready for the final review. If AI was involved in producing this PR, it has already had a reviewer. label Mar 18, 2026
Copy link
Copy Markdown
Collaborator

@thepastaclaw thepastaclaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incremental Review

Single-commit push (25de8d2) addressing 2 of 3 prior suggestions. Changes are minimal, correct, and well-tested:

  1. is_ascii() filter removed ✅ — resolves the pre-existing filter that silently dropped non-ASCII UTF-8 data strings. The decode_data_message() fallback now correctly handles all valid UTF-8.

  2. base64-non-CBOR integration test added ✅ — new from_json_value_preserves_base64_non_cbor_data confirms end-to-end fallback for data that is valid base64 but not CBOR.

  3. Inline base64 engine — good call avoiding base64_decode() debug logging for expected failures on plain-text data fields. The inline GeneralPurpose engine keeps the same decode config.

All 36 tests pass (cargo test -p rs-dapi --lib error_mapping). No new findings.

Reviewed commit: 25de8d2

lklimek and others added 3 commits March 20, 2026 07:31
…-bin metadata

When Drive returns a raw GroveDB storage error for a duplicate identity
key (without a serialized ConsensusError), the SDK now checks the
drive-error-data-bin gRPC metadata for CBOR-encoded error details.
If the decoded message matches the identity key uniqueness pattern,
the error is surfaced as Error::AlreadyExists instead of falling
through to the opaque Error::DapiClientError.

Also adds test fixtures for both the DAPI-level CBOR data field
decoding and the SDK-level drive-error-data-bin handling.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@lklimek lklimek requested a review from shumkov as a code owner March 20, 2026 09:29
@lklimek lklimek changed the title fix(rs-dapi): decode base64 CBOR message in Tenderdash error data field fix(rs-dapi,sdk): decode base64 CBOR error messages and detect duplicate identity keys Mar 20, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/rs-sdk/src/error.rs (1)

193-194: Make duplicate-key matching resilient to casing changes.

Line 193 and Line 194 use case-sensitive contains(...). If upstream wording changes capitalization, this mapping silently falls back to Error::DapiClientError(_).

♻️ Suggested patch
-                        if message.contains("unique key") && message.contains("already exists") {
+                        let normalized = message.to_ascii_lowercase();
+                        if normalized.contains("unique key")
+                            && normalized.contains("already exists")
+                        {
                             return Self::AlreadyExists(message);
                         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/rs-sdk/src/error.rs` around lines 193 - 194, The duplicate-key
detection is using case-sensitive contains on the error message, so
capitalization changes cause falls back to DapiClientError; update the matching
in the function that returns Self::AlreadyExists (the branch that currently does
if message.contains("unique key") && message.contains("already exists")) to
perform case-insensitive matching — e.g. normalize the message with
message.to_lowercase() (or use a case-insensitive regex) and then check for
"unique key" and "already exists", ensuring the same Self::AlreadyExists variant
is returned when those substrings appear in any casing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/rs-sdk/src/error.rs`:
- Around line 193-194: The duplicate-key detection is using case-sensitive
contains on the error message, so capitalization changes cause falls back to
DapiClientError; update the matching in the function that returns
Self::AlreadyExists (the branch that currently does if message.contains("unique
key") && message.contains("already exists")) to perform case-insensitive
matching — e.g. normalize the message with message.to_lowercase() (or use a
case-insensitive regex) and then check for "unique key" and "already exists",
ensuring the same Self::AlreadyExists variant is returned when those substrings
appear in any casing.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 557e7cca-577e-488e-8ca0-2ba378c1e050

📥 Commits

Reviewing files that changed from the base of the PR and between 25de8d2 and b18c173.

📒 Files selected for processing (2)
  • packages/rs-dapi/src/services/platform_service/error_mapping.rs
  • packages/rs-sdk/src/error.rs

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 20, 2026

✅ DashSDKFFI.xcframework built for this PR.

SwiftPM (host the zip at a stable URL, then use):

.binaryTarget(
  name: "DashSDKFFI",
  url: "https://your.cdn.example/DashSDKFFI.xcframework.zip",
  checksum: "8096689fc50b9520370c382dfee3d22a4da8933da0eaf2e8311808e6f0b247f4"
)

Xcode manual integration:

  • Download 'DashSDKFFI.xcframework' artifact from the run link above.
  • Drag it into your app target (Frameworks, Libraries & Embedded Content) and set Embed & Sign.
  • If using the Swift wrapper package, point its binaryTarget to the xcframework location or add the package and place the xcframework at the expected path.

Copy link
Copy Markdown
Collaborator

@thepastaclaw thepastaclaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The new drive-error-data-bin detection code has a semantic conflict with existing recovery logic. When a duplicate identity key error (key belongs to a different identity) is mapped to Error::AlreadyExists, the Identity::wait_for_response() handler misinterprets it as 'this identity was already created', tries to fetch an identity that doesn't exist, and produces a misleading 'identity was proved to not exist but was said to exist' error. The helper function and tests are well-structured but this classification issue needs to be resolved before merging.

Reviewed commit: b18c173

🔴 1 blocking | 🟡 3 suggestion(s) | 💬 1 nitpick(s)

🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `packages/rs-sdk/src/error.rs`:
- [BLOCKING] lines 190-198: Mapping duplicate-key errors to AlreadyExists triggers wrong recovery path in Identity::wait_for_response
  This new branch converts any `drive-error-data-bin` message containing 'unique key' and 'already exists' into `Error::AlreadyExists`. However, `Identity::wait_for_response()` in `waitable.rs:98-115` catches *every* `Error::AlreadyExists` and assumes the identity was previously created — it extracts the identity ID from the state transition and calls `Identity::fetch(sdk, identity_id)`. For a duplicate-key conflict the *key* belongs to a different identity, but the identity being created does not yet exist. The fetch returns `None` and the user receives the misleading error 'identity was proved to not exist but was said to exist', losing the actual duplicate-key cause entirely.

Either introduce a dedicated variant like `Error::DuplicateIdentityKey(String)` that is not caught by the existing `AlreadyExists` recovery path, or keep the error as `Error::DapiClientError` and let the caller inspect it. If `AlreadyExists` must be reused, update `waitable.rs` to distinguish between 'identity already exists' and 'key already exists' before attempting the fetch.
- [SUGGESTION] lines 193-195: Brittle string matching creates invisible coupling to Drive error format
  The pattern `message.contains("unique key") && message.contains("already exists")` matches against Drive/GroveDB's human-readable error text (`UniqueKeyAlreadyExists` display string). If GroveDB or Drive ever changes the wording — even a minor phrasing tweak — this detection silently breaks with no compile-time or test-time signal. A more durable approach would be a machine-readable error code in the CBOR payload (the `code` field is already present but not used for this classification).
- [SUGGESTION] lines 190-192: Silent error swallowing when to_bytes() fails — inconsistent with adjacent code
  When `drive_error_value.to_bytes()` returns `Err`, the failure is silently discarded via `if let Ok(bytes)`. The adjacent `consensus_error_value.to_bytes()` path at lines 181-185 logs at debug level and returns a `Generic` error. Adding a `tracing::debug!` log here would maintain consistency and aid debugging.
- [SUGGESTION] lines 190-198: Detection only covers UniqueKeyAlreadyExists, not IdentityKeyAlreadyExists
  Drive defines two duplicate-key error variants: `UniqueKeyAlreadyExists` (display: 'a unique key with that hash already exists: ...') and `IdentityKeyAlreadyExists` (display: 'identity key already exists for user error: ...'). The string heuristic only matches the first variant. The second variant has different semantics (same-identity duplicate vs. cross-identity conflict) so this may be intentional, but it's worth documenting the distinction.

Comment thread packages/rs-sdk/src/error.rs Outdated
Comment thread packages/rs-sdk/src/error.rs Outdated
Comment thread packages/rs-sdk/src/error.rs Outdated
Comment thread packages/rs-sdk/src/error.rs Outdated
Comment thread packages/rs-sdk/src/error.rs
lklimek and others added 2 commits March 23, 2026 10:41
…ently discarding

Addresses PR review feedback: the adjacent consensus error path logs
at debug level on to_bytes() failure, but the drive-error-data-bin
path silently swallowed the error. Now consistent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves error propagation across rs-dapi → gRPC metadata → rs-sdk by decoding Tenderdash data payloads that are base64-wrapped CBOR and surfacing Drive internal error messages as a dedicated SDK error variant.

Changes:

  • Decode Tenderdash JSON-RPC data field as base64 → CBOR and extract a human-readable message when the top-level message is empty/“Internal error”.
  • Add DriveInternalError to the SDK error enum and parse drive-error-data-bin metadata on Internal gRPC responses to populate it.
  • Add unit tests in rs-dapi covering plain-text, base64-non-CBOR, and real-world fixtures.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
packages/rs-sdk/src/error.rs Adds DriveInternalError and extracts message from CBOR drive-error-data-bin metadata for Internal gRPC statuses.
packages/rs-dapi/src/services/platform_service/error_mapping.rs Decodes Tenderdash data base64 CBOR to recover readable messages and adds tests for this behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/rs-sdk/src/error.rs
Comment thread packages/rs-sdk/src/error.rs
Comment thread packages/rs-dapi/src/services/platform_service/error_mapping.rs
@thepastaclaw
Copy link
Copy Markdown
Collaborator

thepastaclaw commented Apr 7, 2026

Review Gate

Commit: 3a4fc873

  • Debounce: 964m ago (need 30m)

  • CI checks: build failure: Rust workspace tests / Tests (macOS)

  • CodeRabbit review: comment found

  • Off-peak hours: off-peak (10:42 PM PT Tuesday)

  • Run review now (check to override)

lklimek and others added 2 commits April 13, 2026 09:23
@lklimek lklimek changed the title fix(rs-dapi,sdk): decode base64 CBOR error messages and detect duplicate identity keys fix(rs-dapi,sdk): decode base64 CBOR error messages from Tenderdash Apr 13, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/rs-sdk/src/error.rs`:
- Around line 94-103: The new public variant DriveInternalError added to the
Error enum is a semver-breaking change for downstream crates that match
exhaustively; to fix, mark the public enum Error with the #[non_exhaustive]
attribute (above the enum declaration) so future variants (like
DriveInternalError) won’t break consumers, update any internal exhaustive
matches in this crate if necessary, and run cargo check to ensure compilation;
reference the enum named Error and the variant DriveInternalError when making
this change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 94ce607d-3d24-43e9-b268-e4719b011abe

📥 Commits

Reviewing files that changed from the base of the PR and between e32635f and 5581447.

📒 Files selected for processing (1)
  • packages/rs-sdk/src/error.rs

Comment thread packages/rs-sdk/src/error.rs
Add DriveInternalError to WasmSdkErrorKind enum and match arms to
fix compilation after the new variant was added to dash_sdk::Error.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@thepastaclaw thepastaclaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The core rs-dapi/rs-sdk fix looks right: the new decoding path matches the existing drive-error-data-bin metadata flow, and the targeted rs-dapi + dash-sdk tests pass locally. One SDK surface still misses the new error kind, though: DriveInternalError is not wired through rs-sdk-ffi, so C/Swift consumers still fall through to the generic NetworkError path instead of seeing the clearer internal/storage failure this PR is trying to surface.

Reviewed commit: f7833a2

🟡 1 suggestion(s)

🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `packages/rs-sdk/src/error.rs`:
- [SUGGESTION] lines 94-102: Adding `DriveInternalError` here leaves the FFI bridge returning `NetworkError`
  `rs-sdk-ffi/src/error.rs` still classifies `FFIError::SDKError` by substring matching on `sdk_err.to_string()`. The new `DriveInternalError` variant formats as `"Drive internal error: ..."`, which matches none of the existing branches and therefore falls through to the default `DashSDKErrorCode::NetworkError` / `"Failed to fetch balances: ..."` path. That means the exact duplicate-key / storage errors this PR is trying to decode cleanly are still mislabeled once they cross the C/Swift boundary. Please update the FFI mapper for the new variant (or match the enum directly instead of stringifying it) so all SDK surfaces report the same error class.

Comment thread packages/rs-sdk/src/error.rs
lklimek and others added 4 commits April 13, 2026 10:50
Store protocol version as Arc<AtomicU32> on Sdk (shared between clones).
On each gRPC response, compare metadata.protocol_version against stored
value and atomically update if newer and known to this binary. Unknown
versions are logged at WARN and ignored. Downgrades and zero values are
silently skipped.

Closes #3410

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nc tests

Cover the two remaining test cases for protocol version auto-detection:
concurrent thread races converge to highest version, and
PlatformVersionCurrentVersion::get_current() is synced after update.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Relax test_global_dpp_version_synced_after_update assertion to account
for concurrent test execution — global PlatformVersion is process-wide,
so parallel tests may push it higher. Assert >= instead of ==.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@lklimek lklimek changed the title fix(rs-dapi,sdk): decode base64 CBOR error messages from Tenderdash fix(rs-dapi,sdk): decode CBOR error messages and auto-detect protocol version Apr 13, 2026
@lklimek lklimek changed the title fix(rs-dapi,sdk): decode CBOR error messages and auto-detect protocol version fix(rs-dapi,sdk): decode base64 CBOR error messages from Tenderdash Apr 13, 2026
Copy link
Copy Markdown
Collaborator

@thepastaclaw thepastaclaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

I validated the completed reviewer outputs directly against the checked-out PR diff because the Claude verifier lane stalled during ACP startup. The previous cross-language gap from the earlier #3350 review is fixed on this SHA: DriveInternalError now reaches both the FFI and WASM surfaces. One edge case remains in the changed FFI classifier, where some DriveInternalError values are still downgraded to NotFound based on substring order.

Reviewed commit: 60c4d32

🟡 1 suggestion(s)

🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `packages/rs-sdk-ffi/src/error.rs`:
- [SUGGESTION] lines 133-138: DriveInternalError can still be reclassified as NotFound in the new FFI mapping
  The new `DriveInternalError` case is checked after the generic `"not found"` branch. Because `FFIError::SDKError` classifies errors from the rendered string, a real `dash_sdk::Error::DriveInternalError("...not found...")` will hit `DashSDKErrorCode::NotFound` first and never reach the new `DriveInternalError` code path. That means C/Swift consumers still lose the stronger error type for some internal storage failures, which undermines the purpose of adding code 10 in this PR. The fix is to detect the enum variant before any substring-based fallbacks, or at minimum move the DriveInternalError check ahead of the generic `not found` branch.

Comment thread packages/rs-sdk-ffi/src/error.rs Outdated
Copy link
Copy Markdown
Collaborator

@thepastaclaw thepastaclaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The base64/CBOR decoding path in rs-dapi and the new rs-sdk DriveInternalError variant look directionally correct, but one of the new public error surfaces still fails to preserve the intended classification. In particular, the FFI layer can still mislabel real Drive internal failures, and the Swift wrapper never maps the new error code at all, so downstream consumers will not reliably see the new error type this PR is meant to expose.

Reviewed commit: 60c4d32

🔴 1 blocking | 🟡 1 suggestion(s)

2 additional findings

🔴 blocking: `DriveInternalError` is still misclassified by the FFI error mapper

packages/rs-sdk-ffi/src/error.rs (lines 106-148)

From<FFIError> for DashSDKError decides the exported DashSDKErrorCode by substring-matching sdk_err.to_string(), and the new error_str.contains("Drive internal error") branch sits after the protocol and not found branches. Because Error::DriveInternalError(msg) renders as "Drive internal error: {msg}" and msg comes from arbitrary Drive/GroveDB text, a message like ... not found ... or protocol ... will hit one of those earlier arms first and still be exported as NotFound/ProtocolError instead of DriveInternalError = 10. That defeats the main purpose of adding the dedicated FFI code. Match on the typed dash_sdk::Error variant directly instead of re-parsing its formatted string.

🟡 suggestion: Swift still drops `DriveInternalError = 10` into `.unknown`

packages/swift-sdk/Sources/SwiftDashSDK/SDK.swift (lines 340-410)

rs-sdk-ffi now exports DashSDKErrorCode::DriveInternalError = 10, but SDKError.fromDashSDKError only recognizes raw values 1–9 and 99. Any real Drive internal failure therefore falls through the default arm and becomes .unknown(message) in Swift, so Swift callers still cannot distinguish the new error class from an unrelated unknown FFI failure. Add a dedicated Swift enum case plus a case DashSDKErrorCode(rawValue: 10) mapping and matching LocalizedError text so the new FFI contract is actually consumable from Swift.

🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `packages/rs-sdk-ffi/src/error.rs`:
- [BLOCKING] lines 106-148: `DriveInternalError` is still misclassified by the FFI error mapper
  `From<FFIError> for DashSDKError` decides the exported `DashSDKErrorCode` by substring-matching `sdk_err.to_string()`, and the new `error_str.contains("Drive internal error")` branch sits after the `protocol` and `not found` branches. Because `Error::DriveInternalError(msg)` renders as `"Drive internal error: {msg}"` and `msg` comes from arbitrary Drive/GroveDB text, a message like `... not found ...` or `protocol ...` will hit one of those earlier arms first and still be exported as `NotFound`/`ProtocolError` instead of `DriveInternalError = 10`. That defeats the main purpose of adding the dedicated FFI code. Match on the typed `dash_sdk::Error` variant directly instead of re-parsing its formatted string.

In `packages/swift-sdk/Sources/SwiftDashSDK/SDK.swift`:
- [SUGGESTION] lines 340-410: Swift still drops `DriveInternalError = 10` into `.unknown`
  rs-sdk-ffi now exports `DashSDKErrorCode::DriveInternalError = 10`, but `SDKError.fromDashSDKError` only recognizes raw values 1–9 and 99. Any real Drive internal failure therefore falls through the `default` arm and becomes `.unknown(message)` in Swift, so Swift callers still cannot distinguish the new error class from an unrelated unknown FFI failure. Add a dedicated Swift enum case plus a `case DashSDKErrorCode(rawValue: 10)` mapping and matching `LocalizedError` text so the new FFI contract is actually consumable from Swift.

Copy link
Copy Markdown
Collaborator

@thepastaclaw thepastaclaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Two blocking issues survive verification: the new DashSDKErrorCode::DriveInternalError = 10 is not propagated through the Swift wrapper (collapses to .unknown), and the FFI classifier's ordering causes Drive internal errors whose message contains 'not found' (e.g. data contract not found, element not found) to be misclassified as NotFound, defeating the purpose of this PR for a realistic class of failures. Several lower-severity suggestions around substring matching, CBOR decode limits, base64 engine reuse, and test tightness are also valid. Overall the Rust decoding logic is sound; the client-surface plumbing is the weak spot.

Reviewed commit: 86fbeea

🔴 2 blocking | 🟡 2 suggestion(s) | 💬 2 nitpick(s)

1 additional finding

🔴 blocking: Swift `SDKError` drops the new `DriveInternalError` code on the floor

packages/swift-sdk/Sources/SwiftDashSDK/SDK.swift (lines 340-380)

This PR adds DashSDKErrorCode::DriveInternalError = 10 (see packages/rs-sdk-ffi/src/error.rs:32) specifically so FFI/Swift consumers can distinguish Drive storage-level internal errors. However, SDKError in SDK.swift has no corresponding case, and fromDashSDKError() at lines 353–380 switches only on raw values 1–9 and 99, falling through to .unknown(message) for raw value 10. Every Swift caller that throws via SDKError.fromDashSDKError will therefore receive .unknown(...) rather than a dedicated .driveInternalError(...) case, so the user-facing Swift behavior advertised by the PR does not actually work. Add a case driveInternalError(String) to SDKError, handle DashSDKErrorCode(rawValue: 10) in fromDashSDKError, and extend the LocalizedError switch accordingly.

🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `packages/swift-sdk/Sources/SwiftDashSDK/SDK.swift`:
- [BLOCKING] lines 340-380: Swift `SDKError` drops the new `DriveInternalError` code on the floor
  This PR adds `DashSDKErrorCode::DriveInternalError = 10` (see `packages/rs-sdk-ffi/src/error.rs:32`) specifically so FFI/Swift consumers can distinguish Drive storage-level internal errors. However, `SDKError` in `SDK.swift` has no corresponding case, and `fromDashSDKError()` at lines 353–380 switches only on raw values 1–9 and 99, falling through to `.unknown(message)` for raw value 10. Every Swift caller that throws via `SDKError.fromDashSDKError` will therefore receive `.unknown(...)` rather than a dedicated `.driveInternalError(...)` case, so the user-facing Swift behavior advertised by the PR does not actually work. Add a `case driveInternalError(String)` to `SDKError`, handle `DashSDKErrorCode(rawValue: 10)` in `fromDashSDKError`, and extend the `LocalizedError` switch accordingly.

In `packages/rs-sdk-ffi/src/error.rs`:
- [BLOCKING] lines 133-138: `DriveInternalError` is misclassified as `NotFound` when the message contains 'not found'; classification should match the typed variant
  The branch for `DashSDKErrorCode::DriveInternalError` is ordered after the generic `error_str.contains("not found") || error_str.contains("Not found")` check. `dash_sdk::Error::DriveInternalError` displays as `"Drive internal error: {message}"` (see `packages/rs-sdk/src/error.rs:101-102`), and many Drive internal messages already contain the phrase 'not found' — e.g. `data contract not found` (`packages/rs-drive/src/error/drive.rs:190`) and `element not found` (`packages/rs-drive/src/error/drive.rs:202`). Those errors will continue to be reported to FFI/Swift consumers as `NotFound = 7` rather than the new `DriveInternalError = 10`, defeating the purpose of this PR for the most common class of Drive internal failures. The underlying root cause is that the classifier matches on the Display string of `dash_sdk::Error`; since `FFIError::SDKError(sdk_err)` holds the typed error, prefer structural matching (`matches!(sdk_err, dash_sdk::Error::DriveInternalError(_))`) before any string-based heuristics. This also removes the risk that a remote-supplied message could spoof any other error category via the substring match.

In `packages/rs-sdk/src/error.rs`:
- [SUGGESTION] lines 231-244: CBOR decoded into `ciborium::Value` without explicit size/depth limits
  `extract_drive_error_message` (and the parallel `decode_data_message`/`decode_consensus_error` helpers in rs-dapi) pass the full input slice to `ciborium::from_reader` and materialize it into a `ciborium::Value` tree before walking it. The input is attacker-influenceable — in the SDK path, a malicious or MITM'd DAPI peer can place arbitrary bytes into `drive-error-data-bin` gRPC metadata; in the rs-dapi path, a compromised or buggy local Tenderdash can place arbitrary base64 in the `data` field. Although gRPC metadata is conventionally capped (~8 KB) and Tenderdash responses are bounded by the HTTP client, there is no explicit size or nesting cap before decode, so a pathological CBOR payload (e.g. thousands of 1-byte nested map openings `a1 a1 a1 …`) can force deep recursion and amplify memory/CPU during parsing. Impact is limited to client-side availability (not consensus), but defense-in-depth — a max input length and/or max nesting depth — would be cheap. `walk_cbor_for_key` itself is safe because its recursion depth is bounded by `keys.len()` (≤ 2); the risk is entirely in the `ciborium::Value` construction step.

In `packages/rs-dapi/src/services/platform_service/error_mapping.rs`:
- [SUGGESTION] lines 280-299: `decode_data_message` rebuilds a base64 engine on every call
  `decode_data_message` constructs a new `engine::GeneralPurpose` per invocation even though the sibling `base64_decode` already keeps a `static BASE64` engine with identical config. The only justification in the comment is to suppress the debug log `base64_decode` emits on failure. For errors on a hot path this is a minor per-call cost and, more importantly, it duplicates engine configuration that must be kept in sync across two call sites. Consider either (a) adding a `log_on_failure: bool` parameter to `base64_decode` and reusing it, or (b) hoisting a second `static` engine with identical configuration so the config lives in one place.

Comment thread packages/rs-sdk-ffi/src/error.rs Outdated
Comment thread packages/rs-sdk/src/error.rs
Comment thread packages/rs-dapi/src/services/platform_service/error_mapping.rs
Comment thread packages/rs-dapi/src/services/platform_service/error_mapping.rs Outdated
Comment thread packages/rs-sdk/src/error.rs
Copy link
Copy Markdown
Collaborator

@thepastaclaw thepastaclaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Two blocking FFI boundary regressions persist at HEAD 86fbeea: the Rust→C error classifier in rs-sdk-ffi orders the generic not found substring check above the new Drive internal error arm (so common Drive messages like data contract not found still cross the C ABI as NotFound = 7), and the Swift wrapper has no case for raw value 10 so DriveInternalError still falls through to .unknown("Unknown Error: …"). The PR's stated end-to-end goal for Swift consumers is therefore not delivered. Lower-severity items: unbounded ciborium::Value materialization on attacker-influenced bytes in both rs-sdk and rs-dapi, a duplicated base64 engine config in decode_data_message, and a non-discriminating test assertion. (One reviewer, codex-security-auditor, failed to initialize.)

Reviewed commit: 86fbeea

Fresh rerun of the review pipeline for this dispatcher item. An older same-SHA review already exists, so I am posting this current run as a top-level review to avoid duplicating inline threads while still recording the fresh verification.

🔴 2 blocking | 🟡 3 suggestion(s) | 💬 2 nitpick(s)

🔴 blocking: DriveInternalError misclassified as NotFound/ProtocolError due to substring branch ordering

packages/rs-sdk-ffi/src/error.rs:133-138

From<FFIError> for DashSDKError decides the exported DashSDKErrorCode by substring-matching sdk_err.to_string(). dash_sdk::Error::DriveInternalError(msg) renders via thiserror as "Drive internal error: {msg}" (packages/rs-sdk/src/error.rs:101), where msg is the human-readable text decoded from the remote-supplied drive-error-data-bin gRPC trailer. The "protocol" branch (line 133) and "not found" branch (line 135) are both checked before the new "Drive internal error" branch (line 137). Real Drive internal messages frequently contain those tokens (e.g. data contract not found, element not found, protocol error: … — see packages/rs-drive/src/error/drive.rs), so they still surface across the C ABI as NotFound = 7 or ProtocolError = 5 rather than the new DriveInternalError = 10, defeating the PR's purpose for the most common GroveDB internal failures. Additionally, classifying by Display string is attacker-steerable: FFIError::SDKError(sdk_err) already holds the typed dash_sdk::Error, and a malicious or MITM'd DAPI peer can choose message content to redirect the FFI code (e.g. forcing Timeout, NetworkError, NotFound) regardless of the typed Rust variant. Match structurally on the typed variant (match sdk_err { Error::DriveInternalError(_) => …, Error::TimeoutReached(..) => …, … }) before any string heuristics; at minimum, hoist the Drive internal error branch above the not found and protocol branches.

🔴 blocking: Swift SDKError drops DriveInternalError = 10 across the C→Swift boundary

packages/swift-sdk/Sources/SwiftDashSDK/SDK.swift:340-410

The C ABI now exports DashSDKErrorCode::DriveInternalError = 10 (packages/rs-sdk-ffi/src/error.rs:32) precisely so Swift consumers can distinguish Drive storage-level internal errors. However, the Swift wrapper does not translate this code: SDKError (lines 340–351) has no driveInternalError(String) case, fromDashSDKError(_:) (lines 353–380) only switches over raw values 1–9 and 99, and the LocalizedError extension (lines 383–409) has no matching arm. Raw value 10 therefore falls through default: at line 377 to .unknown(message) and renders as "Unknown Error: …". The dedicated FFI code exists in Rust and at the C ABI but is silently re-erased one layer up on the Swift surface, so the user-visible behavior the PR description advertises is not delivered to Swift callers. Add case driveInternalError(String) to SDKError, a case DashSDKErrorCode(rawValue: 10) arm in fromDashSDKError, and a corresponding LocalizedError description.

🟡 suggestion: extract_drive_error_message materializes attacker-influenced CBOR into a full ciborium::Value tree without size or depth caps

packages/rs-sdk/src/error.rs:231-244

extract_drive_error_message is invoked on the raw bytes of the drive-error-data-bin gRPC trailer (status.metadata().get_bin("drive-error-data-bin")), supplied by the remote DAPI peer. It calls ciborium::from_reader::<ciborium::Value, _>(bytes) and walks the resulting Value tree just to read one top-level string field. There is no explicit byte-length cap (relies on tonic's default ~8 KB metadata cap) and no nesting-depth limit. A pathological CBOR payload such as thousands of nested 1-byte map openings (a1 a1 a1 …) can drive deep recursion and amplify CPU/stack work before reaching the lookup. Impact is bounded to client-side availability — no consensus or proof-verification path is touched — but a hard input-length cap and/or decoding into a small typed struct (#[derive(Deserialize)] struct DriveErrorDataBin { message: Option<String>, … }) instead of a generic Value tree would be cheap defense-in-depth on a hot error path that processes adversary-influenced bytes for the first time.

🟡 suggestion: decode_data_message performs unbounded base64+CBOR decode on Tenderdash JSON-RPC data field

packages/rs-dapi/src/services/platform_service/error_mapping.rs:280-299

decode_data_message is reached from TenderdashStatus::from(serde_json::Value) for every malformed/unknown error response. The input string from the JSON-RPC data field is base64-decoded then handed to ciborium::de::from_reader::<ciborium::Value, _>, materializing a full Value tree with no explicit byte cap and no nesting-depth bound. Tenderdash is the trusted local node in nominal operation, but DAPI is a network-facing process and this path runs on every malformed response, including from a buggy or compromised Tenderdash. A pathological payload (e.g. tens of KB of nested 1-byte map openings) can drive disproportionate CPU/recursion. Apply an explicit length cap before decode and/or use a depth-limited or typed CBOR deserialize, matching the defensive style elsewhere in this file (base64_decode log gating, ConsensusError::deserialize_from_bytes sanity check).

🟡 suggestion: decode_data_message rebuilds a base64 engine per call and duplicates engine config

packages/rs-dapi/src/services/platform_service/error_mapping.rs:280-299

decode_data_message constructs a fresh engine::GeneralPurpose per invocation (lines 283–289) even though the sibling base64_decode (lines 179–194) already keeps a static BASE64 engine with effectively the same configuration. The only justification given in the comment is suppressing the debug log emitted by base64_decode on failure. Beyond the per-call allocation, this duplicates engine configuration across two call sites that must be kept in sync — drift would produce divergent decoding behaviour at the wire boundary (the new instance also drops with_encode_padding(false), harmless here but still configuration drift). Either add a log_on_failure: bool parameter to base64_decode and reuse it, or hoist a second static engine with identical config so the configuration lives in one place.

💬 nitpick: from_json_value_decodes_cbor_data_field_unique_key assertion also matches the non-unique fixture

packages/rs-dapi/src/services/platform_service/error_mapping.rs:836

The unique-key fixture asserts status.message.as_deref().unwrap().contains("unique set"), which is true for both the unique-set fixture ("… in the unique set …") and the non-unique-set fixture ("… in the non unique set …"). The paired non-unique test at line 801 uses the more specific contains("non unique set"), and both tests share the same starts_with("storage: identity: a unique key with that hash already exists") prefix, so swapping the two CBOR base64 fixtures would still leave both tests green. Tighten to contains("in the unique set") (with the leading in ) so the assertion can actually distinguish the two payloads.

💬 nitpick: extract_drive_error_message duplicates rs-dapi's walk_cbor_for_key over the same wire format

packages/rs-sdk/src/error.rs:231-244

extract_drive_error_message reimplements the top-level message lookup that rs-dapi's walk_cbor_for_key (packages/rs-dapi/src/services/platform_service/error_mapping.rs:197) already provides over the same wire format produced by rs-dapi's TenderdashStatus/drive-error-data-bin serializer. The implementations diverge slightly (rs-sdk rejects empty strings, rs-dapi accepts them), which is benign today but means any future schema change (e.g. nesting message under a sub-map, or surfacing additional fields like consensus_error on this path) requires two synchronized edits across producer and consumer crates. Extract a shared #[derive(Deserialize)] struct DriveErrorDataBin { message: Option<String>, … } or a tiny shared CBOR-walk helper so the wire format is represented in one place.

🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `packages/rs-sdk-ffi/src/error.rs`:
- [BLOCKING] lines 133-138: DriveInternalError misclassified as NotFound/ProtocolError due to substring branch ordering
  `From<FFIError> for DashSDKError` decides the exported `DashSDKErrorCode` by substring-matching `sdk_err.to_string()`. `dash_sdk::Error::DriveInternalError(msg)` renders via thiserror as `"Drive internal error: {msg}"` (packages/rs-sdk/src/error.rs:101), where `msg` is the human-readable text decoded from the remote-supplied `drive-error-data-bin` gRPC trailer. The `"protocol"` branch (line 133) and `"not found"` branch (line 135) are both checked before the new `"Drive internal error"` branch (line 137). Real Drive internal messages frequently contain those tokens (e.g. `data contract not found`, `element not found`, `protocol error: …` — see packages/rs-drive/src/error/drive.rs), so they still surface across the C ABI as `NotFound = 7` or `ProtocolError = 5` rather than the new `DriveInternalError = 10`, defeating the PR's purpose for the most common GroveDB internal failures. Additionally, classifying by Display string is attacker-steerable: `FFIError::SDKError(sdk_err)` already holds the typed `dash_sdk::Error`, and a malicious or MITM'd DAPI peer can choose `message` content to redirect the FFI code (e.g. forcing Timeout, NetworkError, NotFound) regardless of the typed Rust variant. Match structurally on the typed variant (`match sdk_err { Error::DriveInternalError(_) => …, Error::TimeoutReached(..) => …, … }`) before any string heuristics; at minimum, hoist the `Drive internal error` branch above the `not found` and `protocol` branches.

In `packages/swift-sdk/Sources/SwiftDashSDK/SDK.swift`:
- [BLOCKING] lines 340-410: Swift SDKError drops DriveInternalError = 10 across the C→Swift boundary
  The C ABI now exports `DashSDKErrorCode::DriveInternalError = 10` (packages/rs-sdk-ffi/src/error.rs:32) precisely so Swift consumers can distinguish Drive storage-level internal errors. However, the Swift wrapper does not translate this code: `SDKError` (lines 340–351) has no `driveInternalError(String)` case, `fromDashSDKError(_:)` (lines 353–380) only switches over raw values 1–9 and 99, and the `LocalizedError` extension (lines 383–409) has no matching arm. Raw value 10 therefore falls through `default:` at line 377 to `.unknown(message)` and renders as `"Unknown Error: …"`. The dedicated FFI code exists in Rust and at the C ABI but is silently re-erased one layer up on the Swift surface, so the user-visible behavior the PR description advertises is not delivered to Swift callers. Add `case driveInternalError(String)` to `SDKError`, a `case DashSDKErrorCode(rawValue: 10)` arm in `fromDashSDKError`, and a corresponding `LocalizedError` description.

In `packages/rs-sdk/src/error.rs`:
- [SUGGESTION] lines 231-244: extract_drive_error_message materializes attacker-influenced CBOR into a full ciborium::Value tree without size or depth caps
  `extract_drive_error_message` is invoked on the raw bytes of the `drive-error-data-bin` gRPC trailer (status.metadata().get_bin("drive-error-data-bin")), supplied by the remote DAPI peer. It calls `ciborium::from_reader::<ciborium::Value, _>(bytes)` and walks the resulting Value tree just to read one top-level string field. There is no explicit byte-length cap (relies on tonic's default ~8 KB metadata cap) and no nesting-depth limit. A pathological CBOR payload such as thousands of nested 1-byte map openings (`a1 a1 a1 …`) can drive deep recursion and amplify CPU/stack work before reaching the lookup. Impact is bounded to client-side availability — no consensus or proof-verification path is touched — but a hard input-length cap and/or decoding into a small typed struct (`#[derive(Deserialize)] struct DriveErrorDataBin { message: Option<String>, … }`) instead of a generic Value tree would be cheap defense-in-depth on a hot error path that processes adversary-influenced bytes for the first time.

In `packages/rs-dapi/src/services/platform_service/error_mapping.rs`:
- [SUGGESTION] lines 280-299: decode_data_message performs unbounded base64+CBOR decode on Tenderdash JSON-RPC data field
  `decode_data_message` is reached from `TenderdashStatus::from(serde_json::Value)` for every malformed/unknown error response. The input string from the JSON-RPC `data` field is base64-decoded then handed to `ciborium::de::from_reader::<ciborium::Value, _>`, materializing a full Value tree with no explicit byte cap and no nesting-depth bound. Tenderdash is the trusted local node in nominal operation, but DAPI is a network-facing process and this path runs on every malformed response, including from a buggy or compromised Tenderdash. A pathological payload (e.g. tens of KB of nested 1-byte map openings) can drive disproportionate CPU/recursion. Apply an explicit length cap before decode and/or use a depth-limited or typed CBOR deserialize, matching the defensive style elsewhere in this file (`base64_decode` log gating, `ConsensusError::deserialize_from_bytes` sanity check).
- [SUGGESTION] lines 280-299: decode_data_message rebuilds a base64 engine per call and duplicates engine config
  `decode_data_message` constructs a fresh `engine::GeneralPurpose` per invocation (lines 283–289) even though the sibling `base64_decode` (lines 179–194) already keeps a `static BASE64` engine with effectively the same configuration. The only justification given in the comment is suppressing the debug log emitted by `base64_decode` on failure. Beyond the per-call allocation, this duplicates engine configuration across two call sites that must be kept in sync — drift would produce divergent decoding behaviour at the wire boundary (the new instance also drops `with_encode_padding(false)`, harmless here but still configuration drift). Either add a `log_on_failure: bool` parameter to `base64_decode` and reuse it, or hoist a second `static` engine with identical config so the configuration lives in one place.

lklimek and others added 4 commits May 5, 2026 14:38
…uristics

The FFI mapper classified `dash_sdk::Error` by its `Display` string with the
generic `not found`/`Not found` substring branch ahead of the dedicated
`Drive internal error` branch.  Drive emits messages like
`data contract not found` and `element not found`, so any
`DriveInternalError("data contract not found …")` was mis-mapped to
`DashSDKErrorCode::NotFound` (7) instead of the new
`DashSDKErrorCode::DriveInternalError` (10) — defeating the purpose of the
typed variant for the most common Drive failure shape.

Match the typed enum variant first; substring heuristics now run only as a
fallback when the variant doesn't tell us anything.  Added unit tests that
lock in the ordering: a `DriveInternalError` whose message contains
`not found` maps to `DriveInternalError`, while a `Generic("identity not
found")` still maps to `NotFound`.

Addresses PR #3350 review threads CMT-001, CMT-002, CMT-003.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`extract_drive_error_message` and the rs-dapi `decode_consensus_error` /
`decode_data_message` helpers all fed attacker-influenceable bytes (gRPC
metadata, Tenderdash `data` field) directly to `ciborium::from_reader` with
no size or depth bound.  Pathological CBOR (e.g., long sequences of `a1 a1
a1 …`) could force deep recursion on the decoder side, amplifying memory
and CPU on the client and on rs-dapi.

Reject inputs larger than 64 KiB before handing them to ciborium.  gRPC
metadata is conventionally bounded around 8 KiB, so 64 KiB is comfortably
above any legitimate payload.  Documented as a defence-in-depth measure
with a TODO pointing at a future depth-limited deserializer.

Also takes the opportunity to mark the duplicated CBOR-walk on the rs-sdk
side with the INTENTIONAL(CMT-007) note that mirrors the rs-dapi side.

Addresses PR #3350 review thread CMT-004 (and partially CMT-007 on the
rs-sdk side; the rs-dapi mirror lands in a separate commit).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ue-key test

Bundle of small fixes against `services::platform_service::error_mapping`:

1. CMT-004 — defence in depth.  Add `MAX_CBOR_INPUT_SIZE = 65_536` and
   reject Tenderdash `data` / `info` payloads larger than 64 KiB before
   handing them to `ciborium::from_reader`, so a malicious peer can't force
   unbounded recursion via `a1 a1 a1 …`-style nested map openers.  Mirrors
   the rs-sdk-side cap landed earlier.

2. CMT-005 — engine config dedup.  `decode_data_message` previously rebuilt
   a `base64::engine::GeneralPurpose` per call.  Move it to use the shared
   `base64_decode` helper, with `base64_decode` now a pure decoder; the
   single previous caller (`decode_consensus_error`) keeps the decode-
   failure log at the call site (`decode_data_message` deliberately stays
   silent because base64 failure there is the expected fall-through).

3. CMT-006 — discriminating assertion.  The unique-key fixture test asserted
   `contains("unique set")`, which also matches the non-unique fixture's
   `non unique set`.  Tighten to `contains("in the unique set")` (with
   leading space + `in`) — the production message includes that phrase
   verbatim.

4. CMT-007 — accepted duplication marker.  Add an INTENTIONAL note on
   `walk_cbor_for_key` pointing at the matching extract on the rs-sdk side,
   so anyone editing one knows to mirror the other.

Addresses PR #3350 review threads CMT-004, CMT-005, CMT-006, CMT-007.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready for final review Ready for the final review. If AI was involved in producing this PR, it has already had a reviewer.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants