Improve FlakyStrategyDefinition error messages with specific details by ianhi · Pull Request #4676 · HypothesisWorks/hypothesis

ianhi · 2026-03-10T23:02:43Z

Moved into a draft. see: #4676 (comment)

🤖 - I used claude extensively for this PR, but have personally reviewed every line to the best of my ability.

Fix for #4673

This PR implements four tightly related changes in the output of hypothesis when there is a flaky failure

Ensure seed is printed
Print the actual different choices in the strategy

the error now says what was different (different constraints, different type, fewer/more draws) instead of just "data generation was inconsistent"

For stateful tests give the replay/give info on how to trigger observability
Fix duplicate FlakyStrategyDefinition errors

when a mismatch was detected during a draw, a second FlakyStrategyDefinition could be raised from conclude_test if the mismatch also resulted in fewer draws. Now the observer has a flaky flag to prevent this redundant second raise.

One of the tricky things is that a FlakyStrategyDefinition error can be thrown with or without a real test failure. In the latter case then you would get a messy output with nested errors (during. the handling another exception...) which made it hard to notice the first error with so much text on the screen. Now the FlakyError is temporarily suppressed and reported in the Hypothesis output, keeping the test failure more visible. (see final example below)

I cooked up this demo script with claude to test out the various combinations of failure modes and otuput (e.g. observability) which I found quite helpful. Here is the script:

demo_flaky.py (click to expand)

import hypothesis.strategies as st
from hypothesis import given, settings
from hypothesis.stateful import Bundle, RuleBasedStateMachine, initialize, rule

_catalog_size = [5]

class FlakyConstraints(RuleBasedStateMachine):
    items = Bundle("items")

    @initialize()
    def create_cart(self):
        self.cart = []

    @rule(target=items, name=st.text(min_size=1, max_size=10))
    def add_item(self, name):
        self.cart.append(name)
        return name

    @rule(
        item=items,
        price=st.integers(1, 100).flatmap(
            lambda p: st.integers(p, p + _catalog_size[0])
        ),
    )
    def set_price(self, item, price):
        _catalog_size[0] += 3

TestConstraint = FlakyConstraints.TestCase
TestConstraint.settings = settings(max_examples=200, database=None, stateful_step_count=10)

_call_count = [0]

@settings(max_examples=200, database=None)
@given(data=st.data())
def test_type_mismatch(data):
    _call_count[0] += 1
    if _call_count[0] % 2 == 0:
        data.draw(st.booleans())
    else:
        data.draw(st.integers(0, 10))

_upper = [10]

@settings(max_examples=200, database=None)
@given(data=st.data())
def test_plain_mismatch(data):
    data.draw(st.integers(0, _upper[0]))
    _upper[0] += 10

_more_count = [0]

@settings(max_examples=200, database=None)
@given(data=st.data())
def test_more_draws(data):
    data.draw(st.integers(0, 10))
    _more_count[0] += 1
    if _more_count[0] > 1:
        data.draw(st.integers(0, 10))
    assert False

With these outputs (formatted by claude to elide portions of the error to focus on whats relevant for this PR)

1. Stateful test — constraint mismatch (observability off)

python -m pytest demo_flaky.py -s -k constraint

Before (v6.151.9) After

FlakyStrategyDefinition: Inconsistent data
generation! Data generation behaved differently
between different runs. Is your data generation
depending on external state?
while generating 'price' from integers(...)
  .flatmap(...) for rule set_price

During handling of the above exception,
another exception occurred:

  ...long chained traceback...

FlakyStrategyDefinition: Inconsistent data
generation! ...
while selecting a rule to run. This is usually
caused by a flaky precondition, or a bundle
that was unexpectedly empty.

FlakyStrategyDefinition: Inconsistent data
generation! Data generation behaved differently
between different runs. Is your data generation
depending on external state?

The second run drew integer with different
constraints than the first run.
  first run:  {'min_value': 99, 'max_value': 278,
               'weights': ...}
  second run: {'min_value': 99, 'max_value': 290,
               'weights': ...}

while generating 'price' from integers(...)
  .flatmap(...) for rule set_price
This error occurred while selecting a rule to
run. This is usually caused by a flaky
precondition, a bundle that was unexpectedly
empty, or a rule that depends on external state
such as time or a global variable.
---------- Hypothesis ----------
Tip: to see which steps led to this error,
  re-run with
  HYPOTHESIS_EXPERIMENTAL_OBSERVABILITY=1
You can add @seed(...) to this test or run
  pytest with --hypothesis-seed=... to reproduce
  this failure.

2. Stateful test — constraint mismatch (observability on)

HYPOTHESIS_EXPERIMENTAL_OBSERVABILITY=1 python -m pytest demo_flaky.py -s -k constraint

Before (v6.151.9) After

  ...same duplicate chained traceback as above...

FlakyStrategyDefinition: Inconsistent data
generation! ...
while selecting a rule to run. This is usually
caused by a flaky precondition, or a bundle
that was unexpectedly empty.

FlakyStrategyDefinition: ...

The second run drew integer with different
constraints than the first run.
  first run:  {'min_value': 29, 'max_value': 220,
               'weights': ...}
  second run: {'min_value': 29, 'max_value': 238,
               'weights': ...}

while generating 'price' from integers(...)
  .flatmap(...) for rule set_price
This error occurred while selecting a rule ...
---------- Hypothesis ----------
Steps leading up to this error:
  state = FlakyConstraints()
  state.create_cart()
  items_0 = state.add_item(name='...')
  state.teardown()
You can add @seed(...) to this test or run
  pytest with --hypothesis-seed=... to reproduce
  this failure.

3. Non-stateful — type mismatch

python -m pytest demo_flaky.py -s -k type_mismatch

Before (v6.151.9) After

FlakyFailure: Inconsistent results from
replaying a test case!
  last: VALID from None
  this: INTERESTING from
    FlakyStrategyDefinition at datatree.py:1053
  (1 sub-exception)
+-+---------------- 1 ----------------
  |   ...long traceback...
  | FlakyStrategyDefinition: Inconsistent data
  |   generation! ...
  | while generating 'Draw 1' from booleans()
  +------------------------------------

FlakyStrategyDefinition: Inconsistent data
generation! Data generation behaved differently
between different runs. Is your data generation
depending on external state?

The second run drew a different type of value
than the first run.
  first run:  integer
  second run: boolean

while generating 'Draw 1' from booleans()
---------- Hypothesis ----------
You can add @seed(...) to this test or run
  pytest with --hypothesis-seed=... to reproduce
  this failure.

4. Non-stateful — constraint mismatch

python -m pytest demo_flaky.py -s -k plain

Before (v6.151.9) After

FlakyFailure: Inconsistent results from
replaying a test case!
  last: VALID from None
  this: INTERESTING from
    FlakyStrategyDefinition at datatree.py:1053
  (1 sub-exception)
+-+---------------- 1 ----------------
  |   ...long traceback...
  | FlakyStrategyDefinition: Inconsistent data
  |   generation! ...
  | while generating 'Draw 1' from
  |   integers(min_value=0, max_value=20)
  +------------------------------------

FlakyStrategyDefinition: Inconsistent data
generation! Data generation behaved differently
between different runs. Is your data generation
depending on external state?

The second run drew integer with different
constraints than the first run.
  first run:  {'min_value': 0, 'max_value': 10,
               'weights': None,
               'shrink_towards': 0}
  second run: {'min_value': 0, 'max_value': 20,
               'weights': None,
               'shrink_towards': 0}

while generating 'Draw 1' from
  integers(min_value=0, max_value=20)
---------- Hypothesis ----------
You can add @seed(...) to this test or run
  pytest with --hypothesis-seed=... to reproduce
  this failure.

5. Real bug + suppressed flaky error

python -m pytest demo_flaky.py -s -k more

Before (v6.151.9) After

FlakyFailure: Inconsistent results from
replaying a test case!
  last: INTERESTING from AssertionError ...
  this: INTERESTING from
    FlakyStrategyDefinition at datatree.py:1106
  (2 sub-exceptions)
+-+---------------- 1 ----------------
  |   ...
  |     assert False
  | AssertionError: assert False
  +---------------- 2 ----------------
  |   ...long traceback...
  | FlakyStrategyDefinition: Inconsistent data
  |   generation! ...
  | while generating 'Draw 2' from
  |   integers(min_value=0, max_value=10)
  +------------------------------------

FlakyFailure: ...An example failed on the
  first run but now succeeds ...
Falsifying example: test_more_draws(
    data=data(...),
)
Draw 1: 0
+-+---------------- 1 ----------------
  |   ...
  |     assert False
  | AssertionError: assert False
  +------------------------------------
---------- Hypothesis ----------
WARNING: a flaky strategy definition error was
  detected during shrinking and suppressed in
  favor of the real failure above.
  Inconsistent data generation! ...

The second run drew more data than the first run.

You can add @seed(...) to this test or run
  pytest with --hypothesis-seed=... to reproduce
  this failure.

FlakyStrategyDefinition errors now describe what changed between runs (type mismatch, constraint mismatch, forced value difference, more/fewer draws) instead of a generic "inconsistent data generation" message.

ianhi · 2026-03-12T05:20:51Z

I moved this back to draft - looking back I was pushing through some hunger when I submitted this - riding the thrill of trying to get it to the end. And in retrospect I need to spend some more time looking over the tests as carefully as I did the code/behavior. Which I suppose is ironic given this is a testing library - but alas.

Improve FlakyStrategyDefinition error messages with specific details

6e421ee

FlakyStrategyDefinition errors now describe what changed between runs (type mismatch, constraint mismatch, forced value difference, more/fewer draws) instead of a generic "inconsistent data generation" message.

ianhi requested review from Liam-DeVoe and Zac-HD as code owners March 10, 2026 23:02

ianhi added 3 commits March 10, 2026 19:06

better error in final spot

27e881d

format

03951a8

lint

8eed2d9

ianhi mentioned this pull request Mar 10, 2026

Print the seed when shrinking is slow #4677

Open

ianhi added 3 commits March 10, 2026 19:49

test coverage issues

d6a4abc

more test

209177e

clean up comments

441e096

ianhi marked this pull request as draft March 12, 2026 05:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve FlakyStrategyDefinition error messages with specific details#4676

Improve FlakyStrategyDefinition error messages with specific details#4676
ianhi wants to merge 7 commits intoHypothesisWorks:masterfrom
ianhi:flaky-feedback

ianhi commented Mar 10, 2026 •

edited

Loading

Uh oh!

ianhi commented Mar 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ianhi commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Stateful test — constraint mismatch (observability off)

2. Stateful test — constraint mismatch (observability on)

3. Non-stateful — type mismatch

4. Non-stateful — constraint mismatch

5. Real bug + suppressed flaky error

Uh oh!

ianhi commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ianhi commented Mar 10, 2026 •

edited

Loading

ianhi commented Mar 12, 2026 •

edited

Loading