Skip to content

feat(validate): new validate command for direct client binary testing#2622

Draft
spencer-tb wants to merge 1 commit into
ethereum:forks/amsterdamfrom
spencer-tb:feat/consume-direct-engine
Draft

feat(validate): new validate command for direct client binary testing#2622
spencer-tb wants to merge 1 commit into
ethereum:forks/amsterdamfrom
spencer-tb:feat/consume-direct-engine

Conversation

@spencer-tb
Copy link
Copy Markdown
Contributor

@spencer-tb spencer-tb commented Apr 5, 2026

🗒️ Description

New validate (props to @raxhvl for the naming convention) CLI command for running EEST fixtures directly against client EVM binaries. Replaces consume direct with a cleaner UX — type is the subcommand, --client is required, no --bin or --type flags.

validate health                    # health check all clients
validate engine --client geth      # engine tests
validate state --client besu       # state tests
validate block --client nethermind # block tests

Features

  • 7 clients: geth, besu, nethermind, erigon, reth, ethrex, nimbus
  • Per-type Pydantic models: StateTestResult (stateRoot), BlockTestResult (lastBlockHash), EngineTestResult (lastPayloadStatus)
  • Exception matching: maps client error strings to EEST exception types, verifies correct exception for every invalid test (--no-exception-check to disable)
  • Cross-validation: lastBlockHash against fixture, lastPayloadStatus (VALID/INVALID) for engine tests
  • validate.toml config for client binary paths with per-type overrides (state-bin, block-bin, engine-bin)
  • Auto-tuning: bin-workers and xdist settings per client
  • Health checks: version detection + sanity fixture per client per type
  • Aliases: go-ethereum resolves to geth
  • Fully standalone: no dependency on consume plugin
  • Removes consume direct: replaced entirely by validate

Pydantic Result Models

All client adapters return structured results via a shared model hierarchy in cli_types.py:

FixtureTestResult          # base — name, pass, fork, error
├── StateTestResult        # + stateRoot
├── BlockTestResult        # + lastBlockHash
│   └── EngineTestResult   # + lastPayloadStatus
Model Extra Fields Used By
FixtureTestResult name, pass, fork, error base class
StateTestResult stateRoot validate state
BlockTestResult lastBlockHash validate block
EngineTestResult lastBlockHash, lastPayloadStatus validate engine

Each client binary outputs JSON matching these schemas. The shared validate_helpers.py module validates results against fixture expectations:

  • stateRoot compared to fixture's postState hash
  • lastBlockHash compared to fixture's lastblockhash
  • lastPayloadStatus checked as VALID (positive) or INVALID (negative test)
  • error matched through ExceptionMapper against fixture's expectException / validationError

Results (v5.3.0 fixtures, Apple M-series)

Client Engine Time Engine Pass % Block Time Block Pass % State Time State Pass % Exc Check (state/block/engine) Default Flags
geth 58s 99.98% 64s 99.96% pending pending ✓ / ✓ / ✓ --bin-workers 8 (auto)
ethrex 66s 100% 66s 100% pending pending ✗ / ✗ / ✗ --bin-workers 8 (auto)
nimbus 71s 100% 67s 100% pending pending ✗ / ✗ / ✗ --fast mode, sequential
reth 111s 99.10% broken pending pending ✗ / — / ✗ -n 2 (auto)
besu 174s 99.99% 65s 100% pending pending ✓ / ✗ / ✓ --bin-workers 8 (auto)
nethermind 176s 99.98% 107s 99.97% pending pending ✓ / ✓ / ✓ --bin-workers 4 (auto)
erigon 26m 100% pending pending pending pending ✗ / ✓ / ✗ --bin-workers 8 (auto)

Exception check = client reports validation error on pass for negative tests, enabling EELS-side exception verification.

Client PRs (adding statetest/blocktest/enginetest runners)

🔗 Related Issues or PRs

Fixes #2319

✅ Checklist

  • All: Ran fast static checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
    just static
  • All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
  • All: Considered updating the online docs in the ./docs/ directory.
  • All: Set appropriate labels for the changes (only maintainers can apply labels).

@spencer-tb spencer-tb added C-feat Category: an improvement or new feature A-test-consume Area: execution_testing.cli.pytest_commands.plugins.consume A-test-client-clis Area: execution_testing.client_clis labels Apr 5, 2026
@spencer-tb spencer-tb changed the title feat(consume): consume direct with per-type result models and exception matching feat(test-consume): direct with per type result models and exception matching Apr 5, 2026
@spencer-tb spencer-tb force-pushed the feat/consume-direct-engine branch from d67f961 to 8e48ce6 Compare April 8, 2026 17:02
@spencer-tb spencer-tb changed the title feat(test-consume): direct with per type result models and exception matching feat(validate): new validate command for direct client binary testing Apr 8, 2026
@spencer-tb spencer-tb force-pushed the feat/consume-direct-engine branch 7 times, most recently from 2c76c12 to 2658023 Compare April 8, 2026 19:40
New `validate` CLI command for running EEST fixtures directly against
client EVM binaries, replacing Hive for execution correctness testing.

Usage:
  validate health                    # health check all clients
  validate engine --client geth      # engine tests
  validate state --client besu       # state tests
  validate block --client nethermind # block tests

Features:
- 7 clients: geth, besu, nethermind, erigon, reth, ethrex, nimbus
- Per-type Pydantic result models: StateTestResult, BlockTestResult,
  EngineTestResult with type-specific fields
- Exception matching: maps client error strings to EEST exception
  types via ExceptionMapper, verifies correct exception for every
  invalid test (--no-exception-check to disable)
- Cross-validation: lastBlockHash against fixture, lastPayloadStatus
  (VALID/INVALID) for engine tests
- validate.toml config for client binary paths with per-type overrides
  (state-bin, block-bin, engine-bin)
- Auto bin-workers and xdist tuning per client
- Bundled Frontier sanity fixtures for health checks
- Shared validate_helpers.py for validation logic

Client binary PRs:
- geth: ethereum/go-ethereum#34650
- erigon: erigontech/erigon#20315
- besu: besu-eth/besu#10184
- nethermind: NethermindEth/nethermind#11035
- reth: paradigmxyz/reth#23361
- ethrex: lambdaclass/ethrex#6445
- nimbus: status-im/nimbus-eth1#4101
- revm: bluealloy/revm#3544

Tracking issue: ethereum#2319
Copy link
Copy Markdown
Member

@danceratopz danceratopz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very excited about this!!

I don't mind if we make this a mono-pr, but as we discussed, it'll be easier to catch things and perhaps speed-up getting this in, if we split it up into more bite-size PRs.

The rename should also be addressed the docs, which are coming back to life :)

About the rename: I think renaming away from consume direct is fair to get a shorter command (without an extra subcommand redirection) to run each test type (state, block, engine), because consume direct state / consume direct --type=state is clunky.

But, after thinking on it, I'm not that fond of validate. It was suggested by Rahul in a different context (that of validating payloads), but I'm not sure it generalizes to the different test types well. Imo, it feels:

  • Too generic, especially if we use this name in a github action (we should). What does it do?
  • Respectfully, it's too specific to validating payloads, which this command isn't always doing.

I'd propose either:

  1. test-el
  2. el-test
    Wdyt?

Then we'd have, e.g.,

uv run test-el state

Tip: try with Spanish accent.

And the action could be (in ethsteel org) ethsteel/eels-test-el-action?

@shrirajpawar4
Copy link
Copy Markdown

gm @danceratopz @spencer-tb

I opened a small stacked follow-up for #2631 here: #2776

Reason for stacking it on this PR: #2631 becomes most useful with the direct client-testing flow introduced here. Once clients run validate state/block/engine in their own CI, they need to own/
update their exception string mappings without opening EEST PRs for every client error-message change.

The follow-up keeps the existing built-in mappers as defaults and only adds external YAML mappings on top. I also kept Hive consume wired to the same loader so both exception-checking paths stay
consistent while this lands.

Happy to retarget/fold it into this branch if that is preferable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-test-client-clis Area: execution_testing.client_clis A-test-consume Area: execution_testing.cli.pytest_commands.plugins.consume C-feat Category: an improvement or new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants