Skip to content

🚧 wip: System derived negative tests for BAL#2755

Draft
raxhvl wants to merge 3 commits into
ethereum:forks/amsterdamfrom
raxhvl:feat/bal-negative-test-cases
Draft

🚧 wip: System derived negative tests for BAL#2755
raxhvl wants to merge 3 commits into
ethereum:forks/amsterdamfrom
raxhvl:feat/bal-negative-test-cases

Conversation

@raxhvl
Copy link
Copy Markdown
Member

@raxhvl raxhvl commented Apr 24, 2026

🗒️ Description

Negative tests ensures clients prevents abuse by rejecting corrupted BAL. Existing negative tests are hand written, and provide limited coverage.

This PR helps scale negative test coverage organically by systematically corrupting a valid BAL, thus eliminating the need for hand picking "interesting" corruptions.

How a valid BAL is corrupted

A valid BAL is corrupted to ensure clients verify Correctness and Completeness:

Property Invariant Corruption
Correctness Each entry has the right value. XOR-flip a value.
Completeness No expected entry is missing — every access and account is present. Drop an entry or whole account.

That is, we either tamper a value or omit it from BAL. Additionally we also omit whole account.

The total number of negative test $N$ is:

$$N = 2C + A$$

Where:

Symbol Meaning
$A$ number of accounts in the BAL
$C$ total changes (accesses) across all accounts

Example: Alice transfers 1 ETH to Bob. The BAL has two accounts and three changes (Alice's nonce and balance; Bob's balance):

$$A = 2, \quad C = 3$$

$$N = 2(3) + 2 = 8$$

 # wrong (3)
<alice>__wrong__nonce__1
<alice>__wrong__balance__1
<bob>__wrong__balance__1
# missing per change (3)
<alice>__missing__nonce__1
<alice>__missing__balance__1
<bob>__missing__balance__1
# missing per account (2)
<alice>__missing
<bob>__missing

🔗 Related Issues or PRs

closes #2705

✅ Checklist

  • All: Ran fast static checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
    just static
  • All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
  • All: Considered updating the online docs in the ./docs/ directory.
  • All: Set appropriate labels for the changes (only maintainers can apply labels).

@raxhvl raxhvl self-assigned this Apr 24, 2026
@raxhvl raxhvl added C-test Category: test A-test-fill Area: execution_testing.cli.pytest_commands.plugins.filler labels Apr 24, 2026
@raxhvl raxhvl force-pushed the feat/bal-negative-test-cases branch from cd76fb8 to 3a75ec3 Compare April 24, 2026 16:40
@codecov
Copy link
Copy Markdown

codecov Bot commented May 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.62%. Comparing base (efb5636) to head (5516209).
⚠️ Report is 19 commits behind head on forks/amsterdam.

Additional details and impacted files
@@                 Coverage Diff                 @@
##           forks/amsterdam    #2755      +/-   ##
===================================================
+ Coverage            88.17%   88.62%   +0.45%     
===================================================
  Files                  577      577              
  Lines                35659    35659              
  Branches              3490     3490              
===================================================
+ Hits                 31442    31604     +162     
+ Misses                3654     3492     -162     
  Partials               563      563              
Flag Coverage Δ
unittests 88.62% <ø> (+0.45%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@raxhvl
Copy link
Copy Markdown
Member Author

raxhvl commented May 6, 2026

Hey @fselmo I see two kinds of invalid tests:

(1) dynamic invalid tests that depends on an input BAL,

(2) static invalid tests that check static properties and can be tested using stand alone tests, such as checking for duplicate entries or checking ordering by swapping etc.

The pure function I wrote generates type (1) invalid tests. I now need to wire this in the fill logic.

Im thinking the framework expands a valid test to 1 valid + N invalid fixtures. Perhaps a marker. What do you think?

@fselmo
Copy link
Copy Markdown
Contributor

fselmo commented May 8, 2026

Im thinking the framework expands a valid test to 1 valid + N invalid fixtures. Perhaps a marker. What do you think?

I love this. We will be brainstorming some ways that we can fuzz test using EELS so I think this might play really nicely with that paradigm as well. One thing I can think of is either a marker or a flag when you fill. Let's say I want to test, locally at least, some tests with these corruptions - I could run with this flag and execute against hive or even the engine test @spencer-tb has been working on which would actually consume these very rapidly.

In this sense I think it'd also be valuable for filling and consuming. I just worry a bit that we have ~50k tests for Amsterdam or so (maybe more I forget but something like this) and if we corrupt all of them by N on releases we'd have some insane amount. This is of course a good problem to have and we can decide when and how to control the amount of fuzzing / corruption we do here... but something to think about with the sheer volume that this might produce.

On the other hand, if we fill these particularly for consumption via engine test and also somehow strap this up to some future fuzzer for execution on a live network, this is a really nice feature to have.

I'd love to hear your thoughts on the above and on any volume / control knob ideas.

@raxhvl
Copy link
Copy Markdown
Member Author

raxhvl commented May 11, 2026

These test cases are intentional, yet generated. So its somewhere between the usual tests and fuzzing.

I worry about the bloat this creates. But we need to make space for these for EIPs that cover large protocol surface area. Maybe an adjacent "guardrail" workflow thats meant for important milestones (pre devnet/mainnet launch etc). Fuzzing can be part of it too.

Let me think through to find a home for this bloat.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-test-fill Area: execution_testing.cli.pytest_commands.plugins.filler C-test Category: test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

suggestion: Increase invalid test coverage using the framework

2 participants