Skip to content

test(ffi): de-flake pump_yields_when_queue_full via threshold recalibration#621

Draft
G4614 wants to merge 1 commit into
boxlite-ai:mainfrom
G4614:fix/pump-yields-combined
Draft

test(ffi): de-flake pump_yields_when_queue_full via threshold recalibration#621
G4614 wants to merge 1 commit into
boxlite-ai:mainfrom
G4614:fix/pump-yields-combined

Conversation

@G4614
Copy link
Copy Markdown
Contributor

@G4614 G4614 commented May 28, 2026

De-flake pump_yields_when_queue_full by lowering its progress >= 10 assertion — which sat right on the boundary of the ~8–10 canary ticks the bulk drain produces — to >= 5, a threshold recalibration with no production change (yield_now doesn't actually starve the canary; a worker-monopolizing producer collapses it to ~2, still caught).

Test plan

  • make test:unit:rust FILTER=pump_yields → 15/15 runs pass, non-flaky.
  • Two-side verified: inject std::hint::spin_loop() in place of the yield (a worker-monopolizing producer) → canary collapses → assertion fails with the named error; restore → passes.
observed pre (>= 10) post (>= 5)
cooperative yield_now (current prod code) ~8–10 ticks → flakes on the 9/10 edge (~40% first-try fail) ~8–10 ticks → passes with margin
worker-monopolizing producer (the real regression) caught (~2 < 10) caught (~2 < 5)
canary max_gap ~21 ms (no starvation) ~21 ms (unchanged)

@G4614 G4614 marked this pull request as ready for review May 28, 2026 13:56
@G4614 G4614 marked this pull request as draft May 29, 2026 03:56
@G4614
Copy link
Copy Markdown
Contributor Author

G4614 commented May 29, 2026

The recalibration to >= 5 cuts the flake rate but doesn't fully close it on a CPU-contended host. Running the suite on an AWS box — where nextest runs the boxlite-c crate's tests in parallel, contending for the single tokio worker the canary shares with the producer — pump_yields_when_queue_full failed both nextest tries, with the canary dipping to 4 ticks:

TRY 1 FAIL: canary advanced only 6 ticks; producer likely busy-spinning instead of yielding
TRY 2 FAIL: canary advanced only 4 ticks; producer likely busy-spinning instead of yielding

With >= 5, the 4-tick run still trips the assert (the 6-tick one would have passed). Suggest lowering to >= 3: it stays clear of the ~2 busy-spin floor this PR measured, while leaving margin below the cooperative range observed here (4-6; ~8-10 on a quiet host). The discriminator is preserved (cooperative → canary advances; busy-spin → stalls at ~2) — >= 3 just sits in the gap instead of on the noisy 4-6 boundary.

(Longer term the absolute-tick assertion is inherently timing-fragile under scheduler contention; a relative baseline — e.g. assert cooperative ticks ≫ an in-run busy-spin sample — would be load-independent. But >= 3 is the minimal change to stop the flake.)

…ration

`progress >= 10` flaked (~40% first-try failure) because the bulk
`boxlite_runtime_drain(rt, 0, …)` empties the queue in ~150-250ms, so a
cooperative producer only lets the canary tick ~8-10 times — the threshold
sat right on that boundary. Measurement shows `yield_now` does not starve the
canary (max_gap ~21ms with or without a timer park), so there is no production
bug: the flake was pure threshold miscalibration.

Lower to `>= 5`, which keeps margin over the cooperative ~8-10 while still
catching the regression it guards — a producer that busy-spins / blocks the
single worker collapses the canary to ~2 ticks (verified by injecting
std::hint::spin_loop() in place of the yield). No production change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@G4614 G4614 force-pushed the fix/pump-yields-combined branch from 5ab6be4 to 3657eef Compare June 1, 2026 04:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant