Skip to content

Conversation

@AndrewWestberg
Copy link

@AndrewWestberg AndrewWestberg commented Dec 2, 2025

For dApps that use witness datum values, you have to keep the datum values off-chain so that when you build a transaction that spends a utxo containing a datum hash, you can provide the actual datum value to witness it.

Since dolos is indexing the blockchain anyway, have it index these actual datum values that are represented in the ledger as only a hash.

This makes it easier when building new transactions to just use dolos as the off-chain store for the datum values.

Summary by CodeRabbit

  • New Features
    • Capture and track Plutus datum bytes added/removed during UTXO updates and recovery.
    • Persist datums with reference counting to manage lifecycle and avoid dangling data.
    • Expose datum lookup in the state API so services can retrieve datum bytes.
    • UTXO query responses now enrich outputs with associated datum payloads and emit clearer, operation-specific warnings.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 2, 2025

Warning

Rate limit exceeded

@AndrewWestberg has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 8 minutes and 12 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 8334b9b and a9290fb.

📒 Files selected for processing (6)
  • crates/cardano/src/utxoset.rs (4 hunks)
  • crates/core/src/lib.rs (1 hunks)
  • crates/core/src/state.rs (2 hunks)
  • crates/redb3/src/state/mod.rs (3 hunks)
  • crates/redb3/src/state/utxoset.rs (5 hunks)
  • src/serve/grpc/query.rs (11 hunks)

Walkthrough

Adds datum-witness tracking end-to-end: UtxoSetDelta now records datum additions/removals; cardano delta computation records per-tx datums; redb3 stores datums with refcounts and exposes lookup; StateStore gains get_datum; gRPC UTXO responses are enriched with decoded Plutus datums from storage.

Changes

Cohort / File(s) Summary
Core data structures
crates/core/src/lib.rs
Adds witness_datums_add: HashMap<Hash<32>, Vec<u8>> and witness_datums_remove: HashSet<Hash<32>> to pub struct UtxoSetDelta.
StateStore trait
crates/core/src/state.rs
Adds fn get_datum(&self, datum_hash: &Hash<32>) -> Result<Option<Vec<u8>>, StateError> and imports Hash.
Delta computation (Cardano)
crates/cardano/src/utxoset.rs
Builds per-tx witness_datums_map from tx.plutus_data(); populates delta.witness_datums_add for produced outputs and delta.witness_datums_remove for consumed outputs; uses Arc-backed stxi bodies for consumed/recovered stxi while preserving invariants.
Storage layer (redb3)
crates/redb3/src/state/mod.rs, crates/redb3/src/state/utxoset.rs
Adds DatumsTable and DatumRefCountTable (initialize/get/insert/remove/stats); wires datum lifecycle into UTXO apply (increment on add, decrement & remove on zero); exposes get_datum; extends utxoset_stats and table iteration to include datums and refcounts.
Query / gRPC enrichment
src/serve/grpc/query.rs
Changes into_u5c_utxo signature to accept state and return Box<dyn std::error::Error>; when an output has a datum hash, fetches datum via StateStore::get_datum, decodes PlutusData, enriches UTXO response, and logs warnings on missing/invalid datum; updates call sites to pass state.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant gRPC as gRPC handler
    participant Mapper as interop::Mapper
    participant State as StateStore
    participant DB as redb3 (DatumsTable)

    Client->>gRPC: request UTXO(s)
    gRPC->>Mapper: map txo + era body -> parsed output (may include datum_hash)
    alt output has datum_hash
        gRPC->>State: get_datum(datum_hash)
        State->>DB: read DatumsTable
        DB-->>State: Option<Vec<u8>>
        alt datum present
            gRPC->>gRPC: decode PlutusData bytes
            gRPC-->>Client: enriched UTXO with decoded datum
        else missing / decode error
            gRPC-->>Client: UTXO without datum (warn logged)
        end
    else no datum_hash
        gRPC-->>Client: UTXO as-is
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Pay attention to:
    • Datum reference-counting and atomicity in crates/redb3/src/state/utxoset.rs.
    • Correct mapping of per-tx plutus_data() entries to produced/consumed outputs in crates/cardano/src/utxoset.rs.
    • Error-type broadening and propagated call-site changes in src/serve/grpc/query.rs.

Possibly related PRs

Suggested reviewers

  • scarmuega

Poem

🐰 I hopped through deltas small and grand,
Collected datums with a careful hand.
Counts rise, then fall—no datum lost, hooray!
Queries fetch payloads to brighten the day.
A rabbit's patchwork ledger—snack time, okay! 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 51.72% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: Track and ref-count witness-set datum values' accurately describes the main change: adding reference-counted tracking of witness datum values across the codebase.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
crates/cardano/src/utxoset.rs (1)

134-141: Duplicate import inside loop and same panic risk.

The OriginalHash import at line 136 is redundant since it's already imported at the function scope in compute_apply_delta (line 57). Additionally, the same unwrap() concern from the apply path applies here.

     for (_, tx) in txs.iter() {
+        use pallas::ledger::traverse::OriginalHash;
         let mut witness_datums_map: HashMap<_, _> = HashMap::new();
         for datum in tx.plutus_data() {
-            use pallas::ledger::traverse::OriginalHash;
             let datum_hash = datum.original_hash();
             let datum_bytes =
                 pallas::codec::minicbor::to_vec(datum.clone().unwrap()).unwrap_or_default();
             witness_datums_map.insert(datum_hash, datum_bytes);
         }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 17795f8 and a616896.

📒 Files selected for processing (6)
  • crates/cardano/src/utxoset.rs (3 hunks)
  • crates/core/src/lib.rs (1 hunks)
  • crates/core/src/state.rs (2 hunks)
  • crates/redb3/src/state/mod.rs (3 hunks)
  • crates/redb3/src/state/utxoset.rs (5 hunks)
  • src/serve/grpc/query.rs (11 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
crates/core/src/state.rs (2)
crates/redb3/src/state/mod.rs (1)
  • get_datum (362-367)
crates/redb3/src/state/utxoset.rs (1)
  • get_datum (545-548)
src/serve/grpc/query.rs (4)
crates/core/src/lib.rs (1)
  • state (607-607)
crates/core/src/state.rs (1)
  • get_datum (412-412)
crates/redb3/src/state/mod.rs (1)
  • get_datum (362-367)
crates/redb3/src/state/utxoset.rs (1)
  • get_datum (545-548)
🪛 GitHub Actions: CI
src/serve/grpc/query.rs

[error] 231-231: cargo clippy failed. Clippy error: the borrowed expression implements the required traits (src/serve/grpc/query.rs:231:54).

🪛 GitHub Check: Check Build
src/serve/grpc/query.rs

[failure] 247-247:
the borrowed expression implements the required traits


[failure] 241-241:
the borrowed expression implements the required traits


[failure] 240-240:
the borrowed expression implements the required traits


[failure] 231-231:
the borrowed expression implements the required traits

🔇 Additional comments (10)
crates/core/src/lib.rs (1)

213-214: LGTM!

The new witness_datums_add and witness_datums_remove fields are appropriately typed for tracking datum witnesses during delta computation. The use of HashMap<Hash<32>, Vec<u8>> for add operations and HashSet<Hash<32>> for removal tracking aligns well with the reference-counting storage pattern implemented in the redb3 layer.

crates/redb3/src/state/mod.rs (2)

134-135: LGTM!

Table initialization for DatumsTable and DatumRefCountTable follows the existing pattern used for UtxosTable and FilterIndexes.


361-367: LGTM!

The get_datum implementation follows the established pattern for read operations in this store, correctly opening a read transaction and delegating to the table-level method.

crates/core/src/state.rs (1)

411-412: LGTM!

The get_datum method signature is consistent with other data retrieval methods in the StateStore trait, returning Result<Option<Vec<u8>>, StateError> to handle both missing data and errors appropriately.

src/serve/grpc/query.rs (1)

204-213: LGTM on the signature and datum handling approach.

The approach of enriching UTXO responses with datum payloads when a datum hash is present is well-designed. The graceful degradation (warning log but continuing) when datum lookup fails is appropriate for non-critical enrichment.

crates/redb3/src/state/utxoset.rs (4)

107-121: LGTM on reference-counted datum storage integration.

The logic correctly implements reference counting: insert datum only on first reference (refcount == 1), remove datum when last reference is gone (refcount == 0). This ensures datums are retained as long as any UTXO references them.


207-224: Defensive handling for underflow is appropriate.

The early return at line 211-213 when current == 0 prevents underflow and handles edge cases gracefully. This could occur if a datum was never indexed (e.g., from pre-upgrade data) and is being consumed.


146-180: LGTM on DatumsTable implementation.

Clean implementation following the existing table patterns. The use of &**datum_hash to dereference Hash<32> to &[u8; 32] is correct for the key type.


544-548: LGTM!

The get_datum method on StateStore provides a convenient public API for datum retrieval, correctly delegating to DatumsTable::get.

crates/cardano/src/utxoset.rs (1)

64-70: The concern about .unwrap() panicking on datum.clone() is based on a misunderstanding of KeepRaw::unwrap(). This method has the signature pub fn unwrap(self) -> T and simply returns the decoded inner value—it does not panic. CBOR decoding errors occur at the time KeepRaw is created (during minicbor::decode), not when calling unwrap(). The existing code is safe; the .unwrap_or_default() on the minicbor::to_vec() already handles encoding failures appropriately.

Likely an incorrect or invalid review comment.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/serve/grpc/query.rs (1)

229-252: Previous clippy issues appear to be resolved.

The hex::encode calls now use the values directly (e.g., hex::encode(datum_hash)) rather than unnecessary borrows, which should satisfy the clippy lint mentioned in past review comments.

🧹 Nitpick comments (2)
crates/cardano/src/utxoset.rs (1)

134-141: Duplicate datum extraction logic - consider refactoring to a helper function.

This block duplicates the datum extraction logic from compute_apply_delta (lines 64-70). Consider extracting to a helper function for maintainability.

fn extract_witness_datums(tx: &MultiEraTx) -> HashMap<Hash<32>, Vec<u8>> {
    use pallas::ledger::traverse::OriginalHash;
    tx.plutus_data()
        .map(|datum| {
            let datum_hash = datum.original_hash();
            let datum_bytes = pallas::codec::minicbor::to_vec(datum.clone().unwrap())
                .unwrap_or_default();
            (datum_hash, datum_bytes)
        })
        .collect()
}
crates/redb3/src/state/utxoset.rs (1)

207-224: Consider logging when decrementing a non-existent datum.

The decrement function silently returns 0 when the datum doesn't exist in the refcount table (line 211-213). While this defensive behavior prevents panics, it could mask bugs where datums are being removed without having been properly added.

Consider adding a debug/trace log when this edge case occurs:

         if current == 0 {
+            // This shouldn't happen in normal operation - could indicate a bug
+            tracing::debug!(datum_hash = %hex::encode(&**datum_hash), "Attempted to decrement non-existent datum refcount");
             return Ok(0);
         }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a616896 and 8334b9b.

📒 Files selected for processing (6)
  • crates/cardano/src/utxoset.rs (3 hunks)
  • crates/core/src/lib.rs (1 hunks)
  • crates/core/src/state.rs (2 hunks)
  • crates/redb3/src/state/mod.rs (3 hunks)
  • crates/redb3/src/state/utxoset.rs (5 hunks)
  • src/serve/grpc/query.rs (11 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/redb3/src/state/mod.rs
🧰 Additional context used
🧬 Code graph analysis (3)
crates/core/src/state.rs (2)
crates/redb3/src/state/mod.rs (1)
  • get_datum (362-367)
crates/redb3/src/state/utxoset.rs (1)
  • get_datum (545-548)
src/serve/grpc/query.rs (2)
crates/core/src/state.rs (1)
  • get_datum (412-412)
crates/redb3/src/state/utxoset.rs (1)
  • get_datum (545-548)
crates/redb3/src/state/utxoset.rs (1)
crates/redb3/src/state/mod.rs (1)
  • get_datum (362-367)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Test (ubuntu-latest)
  • GitHub Check: Test (macos-14)
  • GitHub Check: Test (macos-13)
  • GitHub Check: Test (windows-latest)
🔇 Additional comments (13)
crates/core/src/lib.rs (1)

213-214: LGTM! Well-designed datum tracking fields.

The new witness_datums_add and witness_datums_remove fields are appropriately typed for tracking datum lifecycle. Using HashMap for additions (to store both hash and value) and HashSet for removals (only hash needed) is the right design choice.

crates/core/src/state.rs (1)

411-412: LGTM! Consistent API extension for datum retrieval.

The new get_datum method follows the established pattern of other lookup methods in the StateStore trait, with appropriate return type for optional datum bytes.

crates/cardano/src/utxoset.rs (3)

75-81: Datum is only added if found in witness set - verify this is intentional.

When a produced output has a DatumOption::Hash, the code only adds the datum to witness_datums_add if it exists in the transaction's witness set. This means inline datums or outputs referencing datums from other transactions won't be tracked.

This appears intentional based on the PR description (tracking "witness-set datum values"), but worth confirming this covers all required use cases.


97-104: Consumed outputs mark datums for removal regardless of witness availability.

When consuming a UTXO with a datum hash, the datum is marked for removal in witness_datums_remove. This is correct as the ref-count logic in the storage layer (redb3) will handle the actual removal only when the count reaches zero.


152-162: Undo logic correctly restores datums from witness set when recovering consumed outputs.

The logic properly attempts to restore datum values when recovering previously consumed outputs during a rollback. The asymmetry with the apply path (where removal happens unconditionally) is correct because the ref-counting in storage handles the actual deletion.

src/serve/grpc/query.rs (2)

204-209: Function signature changes look good.

The addition of state: &S::State parameter enables datum lookup from storage. The broadened return type Box<dyn std::error::Error> accommodates multiple error types from the new datum handling logic.


216-253: Datum enrichment logic is well-implemented with appropriate error handling.

The code properly:

  1. Only attempts datum fetch for DatumOption::Hash variants (not inline datums)
  2. Gracefully handles missing datums and decode errors with warnings
  3. Enriches the response with both hash and decoded payload when available

The warning logs provide good observability for debugging datum availability issues without failing the request.

crates/redb3/src/state/utxoset.rs (6)

21-22: LGTM! Clear type aliases for datum storage.

The DatumKey and DatumValue type aliases provide clarity and consistency with the existing UtxosKey/UtxosValue pattern.


107-121: Ref-counting logic is correctly integrated into the apply flow.

The implementation properly:

  1. Increments refcount on datum add, only inserting to DatumsTable on first reference
  2. Decrements refcount on datum remove, only deleting from DatumsTable when count reaches zero

This correctly handles the case where multiple UTXOs may reference the same datum hash.


146-181: DatumsTable implementation is clean and follows existing patterns.

The table provides standard CRUD operations with appropriate use of redb transactions. The implementation mirrors the patterns used by other tables in this file.


183-205: DatumRefCountTable correctly implements reference counting.

The increment function properly:

  1. Retrieves current count (defaulting to 0)
  2. Increments and persists the new value
  3. Returns the new count for caller decision-making

530-540: Stats reporting correctly includes new datum tables.

The utxoset_stats method now includes datums and datum_refcount table statistics, providing visibility into datum storage.


544-548: Public datum retrieval API is correctly implemented.

The get_datum method properly delegates to DatumsTable::get and follows the pattern of other public accessors in StateStore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant