Draft
Conversation
a798fc3 to
9905e6e
Compare
This comment was marked as outdated.
This comment was marked as outdated.
When `Mempools::execute()` runs mempools in parallel, errors from mempools
whose results were discarded after another mempool succeeded were still
recorded against `driver_mempool_submission`, biasing the per-mempool
success ratio with timing-dependent shadowed failures.
Replace `select_ok` with `FuturesUnordered` + manual loop so observation
runs in the consuming context. Errors that occur before another mempool
succeeds are now recorded under a new `Superseded` label via
`observe::mempool_superseded`, which also records the winning mempool in
the trace fields. Errors in the all-failed case keep their existing
labels (Revert / Expired / Other / Disabled).
Alert query update needed when deploying:
sum by (network) (increase(driver_mempool_submission{cow_fi_environment="prod",result="Success"}[2h]))
/
sum by (network) (increase(driver_mempool_submission{cow_fi_environment="prod",result!~"Disabled|Superseded"}[2h])) < 0.6
`mempool_executed` took a `Result<&SubmissionSuccess, &mempools::Error>` and re-matched the same discriminant several times to pick the log level, metric label, and block-passed labels. Replace it with two functions, `mempool_succeeded(&SubmissionSuccess)` and `mempool_failed(&mempools::Error)`, so each branch is straight-line and call sites pick the correct observer directly. Behavior and emitted metrics are unchanged.
9905e6e to
d9fb0cb
Compare
fleupold
reviewed
May 7, 2026
Contributor
fleupold
left a comment
There was a problem hiding this comment.
Is there a reason you are not using the PR template for the description?
I agree with the change, however I'd like to suggest that we interpret "superseeded" events as success wrt. how you envision to change the metric. A superseeded submission should be considered a successful one.
This way we receive N (# of mempool) events in the happy case, and N events in the failure case allowing us to keep our alert metric as a ratio of successful to failed ones (otherwise failed events would be weighted N times more than successful ones).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem definition
When
Mempools::execute()runs mempools concurrently, errors from mempools whose results were discarded after another mempool succeeded were still recorded againstdriver_mempool_submission, biasing the per-mempool success ratio with timing-dependent shadowed failures.Major change
Replace
select_okwithFuturesUnordered+ a manual loop so observation runs in the consuming context. Errors from mempools overtaken by a later success are recorded under a newSupersededlabel viaobserve::mempool_superseded, which also records the winning mempool in the trace fields. Errors in the all-failed case keep their existing labels (Revert / Expired / Other / Disabled).Minor change
observe::mempool_executedis also split intomempool_succeeded(&SubmissionSuccess)andmempool_failed(&mempools::Error), dropping theResult<&S, &E>indirection now that each call site already knows which branch it is on. Behavior and emitted metrics are unchanged by the split.Alert query update needed when deploying