🚧 wip: System derived negative tests for BAL by raxhvl · Pull Request #2755 · ethereum/execution-specs

raxhvl · 2026-04-24T16:34:27Z

🗒️ Description

Negative tests ensures clients prevents abuse by rejecting corrupted BAL. Existing negative tests are hand written, and provide limited coverage.

This PR helps scale negative test coverage organically by systematically corrupting a valid BAL, thus eliminating the need for hand picking "interesting" corruptions.

How a valid BAL is corrupted

A valid BAL is corrupted to ensure clients verify Correctness and Completeness:

Property	Invariant	Corruption
Correctness	Each entry has the right value.	XOR-flip a value.
Completeness	No expected entry is missing — every access and account is present.	Drop an entry or whole account.

That is, we either tamper a value or omit it from BAL. Additionally we also omit whole account.

The total number of negative test $N$ is:

$$N = 2C + A$$

Where:

Symbol	Meaning
$A$	number of accounts in the BAL
$C$	total changes (accesses) across all accounts

Example: Alice transfers 1 ETH to Bob. The BAL has two accounts and three changes (Alice's nonce and balance; Bob's balance):

$$A = 2, \quad C = 3$$

$$N = 2(3) + 2 = 8$$

 # wrong (3)
<alice>__wrong__nonce__1
<alice>__wrong__balance__1
<bob>__wrong__balance__1
# missing per change (3)
<alice>__missing__nonce__1
<alice>__missing__balance__1
<bob>__missing__balance__1
# missing per account (2)
<alice>__missing
<bob>__missing

🔗 Related Issues or PRs

closes #2705

✅ Checklist

All: Ran fast static checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
```
just static
```
All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
All: Considered updating the online docs in the ./docs/ directory.
All: Set appropriate labels for the changes (only maintainers can apply labels).

codecov · 2026-05-06T16:10:04Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.62%. Comparing base (efb5636) to head (5516209).
⚠️ Report is 19 commits behind head on forks/amsterdam.

Additional details and impacted files

@@                 Coverage Diff                 @@
##           forks/amsterdam    #2755      +/-   ##
===================================================
+ Coverage            88.17%   88.62%   +0.45%     
===================================================
  Files                  577      577              
  Lines                35659    35659              
  Branches              3490     3490              
===================================================
+ Hits                 31442    31604     +162     
+ Misses                3654     3492     -162     
  Partials               563      563

Flag	Coverage Δ
unittests	`88.62% <ø> (+0.45%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

raxhvl · 2026-05-06T16:33:27Z

Hey @fselmo I see two kinds of invalid tests:

(1) dynamic invalid tests that depends on an input BAL,

(2) static invalid tests that check static properties and can be tested using stand alone tests, such as checking for duplicate entries or checking ordering by swapping etc.

The pure function I wrote generates type (1) invalid tests. I now need to wire this in the fill logic.

Im thinking the framework expands a valid test to 1 valid + N invalid fixtures. Perhaps a marker. What do you think?

fselmo · 2026-05-08T22:49:27Z

Im thinking the framework expands a valid test to 1 valid + N invalid fixtures. Perhaps a marker. What do you think?

I love this. We will be brainstorming some ways that we can fuzz test using EELS so I think this might play really nicely with that paradigm as well. One thing I can think of is either a marker or a flag when you fill. Let's say I want to test, locally at least, some tests with these corruptions - I could run with this flag and execute against hive or even the engine test @spencer-tb has been working on which would actually consume these very rapidly.

In this sense I think it'd also be valuable for filling and consuming. I just worry a bit that we have ~50k tests for Amsterdam or so (maybe more I forget but something like this) and if we corrupt all of them by N on releases we'd have some insane amount. This is of course a good problem to have and we can decide when and how to control the amount of fuzzing / corruption we do here... but something to think about with the sheer volume that this might produce.

On the other hand, if we fill these particularly for consumption via engine test and also somehow strap this up to some future fuzzer for execution on a live network, this is a really nice feature to have.

I'd love to hear your thoughts on the above and on any volume / control knob ideas.

raxhvl · 2026-05-11T08:23:33Z

These test cases are intentional, yet generated. So its somewhere between the usual tests and fuzzing.

I worry about the bloat this creates. But we need to make space for these for EIPs that cover large protocol surface area. Maybe an adjacent "guardrail" workflow thats meant for important milestones (pre devnet/mainnet launch etc). Fuzzing can be part of it too.

Let me think through to find a home for this bloat.

raxhvl self-assigned this Apr 24, 2026

raxhvl added C-test Category: test A-test-fill Area: execution_testing.cli.pytest_commands.plugins.filler labels Apr 24, 2026

🚧 wip: BAL corruptions

3a75ec3

raxhvl force-pushed the feat/bal-negative-test-cases branch from cd76fb8 to 3a75ec3 Compare April 24, 2026 16:40

✨ feat: Corruption impl

206dfde

🧪 test: Complex BAL interaction

5516209

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚧 wip: System derived negative tests for BAL#2755

🚧 wip: System derived negative tests for BAL#2755
raxhvl wants to merge 3 commits into
ethereum:forks/amsterdamfrom
raxhvl:feat/bal-negative-test-cases

raxhvl commented Apr 24, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 6, 2026 •

edited

Loading

Uh oh!

raxhvl commented May 6, 2026

Uh oh!

fselmo commented May 8, 2026

Uh oh!

raxhvl commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

raxhvl commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🗒️ Description

How a valid BAL is corrupted

🔗 Related Issues or PRs

✅ Checklist

Uh oh!

codecov Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

raxhvl commented May 6, 2026

Uh oh!

fselmo commented May 8, 2026

Uh oh!

raxhvl commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

raxhvl commented Apr 24, 2026 •

edited

Loading

codecov Bot commented May 6, 2026 •

edited

Loading