Skip to content

refactor(l1): key TRANSACTION_LOCATIONS by tx hash#6718

Open
ElFantasma wants to merge 3 commits into
mainfrom
refactor/6708-tx-locations-by-hash
Open

refactor(l1): key TRANSACTION_LOCATIONS by tx hash#6718
ElFantasma wants to merge 3 commits into
mainfrom
refactor/6708-tx-locations-by-hash

Conversation

@ElFantasma
Copy link
Copy Markdown
Contributor

@ElFantasma ElFantasma commented May 22, 2026

Motivation

Follow-up to #6688. The minimal break-on-prefix-mismatch fix in #6702 made eth_getTransactionByHash sub-millisecond again, but the underlying schema (composite key tx_hash || block_hash over a table that holds a one-to-many relationship) is the reason a prefix iterator was needed in the first place. The original author left a // FIXME: Use dupsort table on the write helper. This PR follows through on that.

Reads now do a single get_cf on a fixed 32-byte key — O(1), no iteration, no dependency on bounded-iterator hygiene. Future call sites can't accidentally reintroduce the unbounded-scan footgun from #6688 on this table.

Description

Schema change for the TRANSACTION_LOCATIONS column family (v2 → v3, on top of the receipts migration from #6598):

Before (v2) After (v3)
Key tx_hash ‖ block_hash (64 bytes) tx_hash (32 bytes)
Value (BlockNumber, BlockHash, Index) Vec<(BlockNumber, BlockHash, Index)>
Read path prefix_iterator + per-entry filter single get + decode

The value is a Vec because, in the rare reorg case, the same tx hash may appear in multiple blocks. The canonical-chain filter (the second half of get_transaction_location) is unchanged.

Write paths. The four write sites (add_blocks, store_block_updates, add_transaction_location, add_transaction_locations) now go through a small stage_tx_location helper that does read-modify-write into a HashMap<H256, Vec<location>>. Within a batch, re-touching the same tx hash replaces any prior entry with the same block_hash — preserving the dedupe semantics that the old composite key gave us for free. The codebase already implicitly serializes writers to this table (the blockchain pipeline is sequential), and the new RMW preserves that assumption; it doesn't introduce a new concurrency requirement.

Migration. A new migrate_2_to_3 is registered as the second entry in the migration framework's MIGRATIONS array (the receipts migration from #6598 stays as entry 0, v1→v2). It streams the v2 table in lex order (composite keys with the same 32-byte prefix are adjacent), groups them by tx hash, and flushes each group as an atomic put (32-byte key) + delete (64-byte keys) batch. Commits are chunked (50k groups per commit) to bound memory. Per-tx atomicity means crash-resume sees either a fully-migrated tx hash (32-byte key present, 64-byte keys absent) or a still-pending one (32-byte key absent, 64-byte keys present) — never a mix — so resuming after a crash just re-runs cleanly.

TRANSACTION_LOCATIONS is ~150–200 GB on a fully-synced mainnet node; first restart after upgrade will be unavailable for the duration of the migration. This should be flagged in release notes.

Secondary benefits.

  • 32-byte fixed keys work cleanly with RocksDB's standard block-based bloom filter — negative lookups (tx not present) get O(1) bloom-filter rejection. Adding the filter is out of scope for this PR but is now an easy follow-up.
  • ~24% smaller per-entry storage in the common single-block case.

Tests. Four new migration unit tests (empty table, single-entry-per-hash, multi-block-per-hash for reorg, idempotent partial-restart). All existing storage and RPC tests pass.

Closes #6708

Checklist

  • Updated STORE_SCHEMA_VERSION (crates/storage/lib.rs) if the PR includes breaking changes to the Store requiring a re-sync.

@ElFantasma ElFantasma requested a review from a team as a code owner May 22, 2026 13:25
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 22, 2026

Greptile Summary

This PR refactors the TRANSACTION_LOCATIONS column family from a composite 64-byte key schema (tx_hash || block_hash) to a 32-byte tx_hash key with a Vec<(BlockNumber, BlockHash, Index)> value, turning eth_getTransactionByHash lookups from a prefix-iterator scan into a single get call.

  • Read path simplified to O(1) get_cf — no iterator, no bounded-scan footgun.
  • Write paths (add_blocks, store_block_updates, add_transaction_location, add_transaction_locations) now go through a stage_tx_location read-modify-write helper that merges locations per tx_hash within each batch.
  • Migration (migrate_1_to_2) streams the old table in lex order, groups adjacent composite keys by prefix, and rewrites each group atomically with chunked 50k-group commits for memory-bounded, crash-resumable operation; four unit tests cover empty, single-entry, multi-block, and idempotent-partial-state scenarios.

Confidence Score: 4/5

Safe to merge with awareness that all four write paths now depend on callers being sequentialized — the new RMW helper is correct under that assumption but will silently lose location entries if two concurrent writers share a tx_hash.

The schema change, migration, and O(1) read path are well-implemented. The migration's crash-safety and idempotency are carefully designed and tested. The main concern is that the new read-modify-write pattern on stage_tx_location promotes the sequential-pipeline assumption from a performance hint to a correctness invariant with no code-level guard — the old composite-key schema tolerated concurrent writes for different blocks of the same tx naturally.

crates/storage/store.rs — specifically the stage_tx_location helper and its three call sites in add_blocks, store_block_updates, and add_transaction_locations, where the read snapshot and write transaction are separate operations.

Important Files Changed

Filename Overview
crates/storage/store.rs Four write paths updated to use the new RMW helper stage_tx_location; read path simplified to a single get. The RMW pattern now requires sequential callers for correctness — previously the composite-key schema tolerated concurrent writers.
crates/storage/migrations.rs New migrate_1_to_2 streams the v1 table, groups consecutive 64-byte composite keys by tx_hash prefix (lex adjacency), and rewrites each group atomically as a 32-byte key. Chunked 50k-group commits bound memory; idempotency is handled by skipping already-migrated 32-byte keys. Logic and tests are sound.
crates/storage/api/tables.rs Documentation updated to reflect v2 schema: 32-byte key, Vec-valued entry, canonical-chain filter note.
crates/storage/lib.rs STORE_SCHEMA_VERSION bumped from 1 to 2, correctly paired with the new migration entry.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    subgraph WriteV2["Write Path (v2)"]
        W1["begin_read() → snapshot"] --> W2["stage_tx_location per tx\n(lazy read-modify-write\ninto HashMap)"]
        W2 --> W3["begin_write()"]
        W3 --> W4["put(tx_hash 32B → Vec locations)\nfor each unique tx_hash"]
        W4 --> W5["commit()"]
    end

    subgraph ReadV2["Read Path (v2)"]
        R1["get(tx_hash 32B)"] --> R2{"found?"}
        R2 -- No --> R3["return None"]
        R2 -- Yes --> R4["decode Vec&lt;location&gt;"]
        R4 --> R5["filter by canonical chain"]
        R5 --> R6["return match"]
    end

    subgraph MigrateV1toV2["Migration: v1 → v2"]
        M1["prefix_iterator(empty prefix)\nover full CF"] --> M2{"key.len()?"}
        M2 -- 32B --> M3["skip (already migrated)"]
        M2 -- 64B --> M4["group by key[0..32] prefix\n(lex-adjacent)"]
        M4 --> M5["flush group:\nput 32B key + delete 64B keys\n(atomic per group)"]
        M5 --> M6{"groups_in_batch\n≥ 50k?"}
        M6 -- Yes --> M7["commit chunk\nreset batch"]
        M6 -- No --> M4
        M7 --> M4
        M5 --> M8["final commit"]
    end
Loading

Comments Outside Diff (1)

  1. crates/storage/store.rs, line 288-310 (link)

    P2 RMW is now a correctness requirement, not just an optimization

    stage_tx_location reads a snapshot via begin_read(), stages locations in-memory, then commits via a separate begin_write(). If two concurrent callers — say, add_blocks and add_transaction_locations — race on the same tx_hash, both see the same pre-write snapshot, compute independent location lists, and whichever commits last silently overwrites the other's entry. Under the old composite-key schema, each (tx_hash, block_hash) pair was a distinct key, so concurrent writers for different blocks were always safe. The new schema makes concurrent writes for the same tx_hash (even to different blocks) a lost-update hazard. The PR documents the sequential-pipeline assumption, but it is now a correctness prerequisite with no code-level enforcement. A comment on stage_tx_location (and its call sites in add_blocks, store_block_updates, and add_transaction_locations) explaining this invariant and why concurrent calls would produce silent data loss would significantly reduce the risk of a future regression.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: crates/storage/store.rs
    Line: 288-310
    
    Comment:
    **RMW is now a correctness requirement, not just an optimization**
    
    `stage_tx_location` reads a snapshot via `begin_read()`, stages locations in-memory, then commits via a separate `begin_write()`. If two concurrent callers — say, `add_blocks` and `add_transaction_locations` — race on the same `tx_hash`, both see the same pre-write snapshot, compute independent location lists, and whichever commits last silently overwrites the other's entry. Under the old composite-key schema, each `(tx_hash, block_hash)` pair was a distinct key, so concurrent writers for different blocks were always safe. The new schema makes concurrent writes for the same `tx_hash` (even to different blocks) a lost-update hazard. The PR documents the sequential-pipeline assumption, but it is now a correctness prerequisite with no code-level enforcement. A comment on `stage_tx_location` (and its call sites in `add_blocks`, `store_block_updates`, and `add_transaction_locations`) explaining this invariant and why concurrent calls would produce silent data loss would significantly reduce the risk of a future regression.
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
crates/storage/store.rs:288-310
**RMW is now a correctness requirement, not just an optimization**

`stage_tx_location` reads a snapshot via `begin_read()`, stages locations in-memory, then commits via a separate `begin_write()`. If two concurrent callers — say, `add_blocks` and `add_transaction_locations` — race on the same `tx_hash`, both see the same pre-write snapshot, compute independent location lists, and whichever commits last silently overwrites the other's entry. Under the old composite-key schema, each `(tx_hash, block_hash)` pair was a distinct key, so concurrent writers for different blocks were always safe. The new schema makes concurrent writes for the same `tx_hash` (even to different blocks) a lost-update hazard. The PR documents the sequential-pipeline assumption, but it is now a correctness prerequisite with no code-level enforcement. A comment on `stage_tx_location` (and its call sites in `add_blocks`, `store_block_updates`, and `add_transaction_locations`) explaining this invariant and why concurrent calls would produce silent data loss would significantly reduce the risk of a future regression.

### Issue 2 of 2
crates/storage/migrations.rs:107-109
**32-byte skip assumes no pre-existing non-v2 short keys**

Any 32-byte key that ends up in `TRANSACTION_LOCATIONS` via a path other than the v2 writer (e.g. a future bug, a manual repair, or a different future schema version) would be silently skipped, leaving it un-migrated and invisible to the migration's metrics. The current code is correct for its stated invariants (the only 32-byte keys present are valid v2 entries from a prior partial run), but the `continue` branch has no assertion or trace-log to help diagnose if an unexpected short key is encountered. Adding even a `tracing::warn!` for unexpected key lengths would make anomalies visible without changing behavior.

Reviews (1): Last reviewed commit: "refactor(l1,storage): key TRANSACTION_LO..." | Re-trigger Greptile

Comment on lines +107 to +109
if key.len() == 32 {
continue;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 32-byte skip assumes no pre-existing non-v2 short keys

Any 32-byte key that ends up in TRANSACTION_LOCATIONS via a path other than the v2 writer (e.g. a future bug, a manual repair, or a different future schema version) would be silently skipped, leaving it un-migrated and invisible to the migration's metrics. The current code is correct for its stated invariants (the only 32-byte keys present are valid v2 entries from a prior partial run), but the continue branch has no assertion or trace-log to help diagnose if an unexpected short key is encountered. Adding even a tracing::warn! for unexpected key lengths would make anomalies visible without changing behavior.

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/storage/migrations.rs
Line: 107-109

Comment:
**32-byte skip assumes no pre-existing non-v2 short keys**

Any 32-byte key that ends up in `TRANSACTION_LOCATIONS` via a path other than the v2 writer (e.g. a future bug, a manual repair, or a different future schema version) would be silently skipped, leaving it un-migrated and invisible to the migration's metrics. The current code is correct for its stated invariants (the only 32-byte keys present are valid v2 entries from a prior partial run), but the `continue` branch has no assertion or trace-log to help diagnose if an unexpected short key is encountered. Adding even a `tracing::warn!` for unexpected key lengths would make anomalies visible without changing behavior.

How can I resolve this? If you propose a fix, please make it concise.

@ElFantasma ElFantasma force-pushed the refactor/6708-tx-locations-by-hash branch from 7393512 to 688125f Compare May 22, 2026 17:42
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 22, 2026

⚠️ Known Issues — intentionally skipped tests

Source: docs/known_issues.md

Known Issues

Tests intentionally excluded from CI. Source of truth for the Known
Issues
section the L1 workflow appends to each ef-tests job summary
and posts as a sticky PR comment.

EF Tests — Stateless coverage narrowed to EIP-8025 optional-proofs

make -C tooling/ef_tests/blockchain test calls test-stateless-zkevm
instead of test-stateless. The zkevm@v0.3.3 fixtures are filled against
bal@v5.6.1, out of sync with current bal spec; the broad target trips ~549
fixtures. Re-broaden once the zkevm bundle is regenerated.

Why and resolution path

PR #6527 broadened
test-stateless to extract the entire for_amsterdam/ tree from the
zkevm bundle and run all of it under --features stateless; combined with
this branch's bal-devnet-7 semantics that scope produces ~549
GasUsedMismatch / ReceiptsRootMismatch /
BlockAccessListHashMismatch failures.

test-stateless-zkevm filters cargo to the eip8025_optional_proofs
suite, which still validates the stateless harness without the bal-version
mismatch.

Re-broaden by switching test: back to test-stateless in
tooling/ef_tests/blockchain/Makefile once the zkevm bundle is regenerated
against the current bal spec.

@github-actions github-actions Bot added the L1 Ethereum client label May 22, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 22, 2026

Lines of code report

Total lines added: 235
Total lines removed: 0
Total lines changed: 235

Detailed view
+-------------------------------------+-------+------+
| File                                | Lines | Diff |
+-------------------------------------+-------+------+
| ethrex/crates/storage/migrations.rs | 356   | +188 |
+-------------------------------------+-------+------+
| ethrex/crates/storage/store.rs      | 2770  | +47  |
+-------------------------------------+-------+------+

@ElFantasma ElFantasma changed the title refactor(l1,storage): key TRANSACTION_LOCATIONS by tx hash refactor(l1): key TRANSACTION_LOCATIONS by tx hash May 22, 2026
@ElFantasma
Copy link
Copy Markdown
Contributor Author

Re Greptile's "RMW is now a correctness requirement" P2:

Correct call — under the old composite-key schema, two concurrent writers on the same tx_hash (e.g. competing reorg blocks) wrote to different keys (tx_hash||hash_a vs tx_hash||hash_b) so both survived. The new schema can't do that; the read snapshot and write batch are separate operations.

The serialized-writers assumption holds in ethrex today because the blockchain pipeline (fork-choice + block import) runs one block at a time, and the four affected write paths (add_blocks, store_block_updates, add_transaction_location[s]) all live inside that pipeline. I've added an explicit "Concurrency invariant" section to the stage_tx_location doc comment in commit f361dc064 so this isn't a surprise to future contributors and so the right place to add a lock is obvious if the assumption ever breaks.

A defensive lock could be added now, but it'd be a no-op under current call patterns and would cost a cross-thread sync on the hottest tx-import path. Leaving the doc-comment-only guard for now — happy to revisit if you'd prefer the explicit lock.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

L1 Ethereum client

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

refactor(l1,storage): simplify TRANSACTION_LOCATIONS schema to key by tx_hash only (follow-up to #6688)

1 participant