Skip to content

test: Remove badNonce retries and increase nonce maxConnectionAge#8661

Merged
jsha merged 2 commits intomainfrom
address-badnonce-flake
Mar 10, 2026
Merged

test: Remove badNonce retries and increase nonce maxConnectionAge#8661
jsha merged 2 commits intomainfrom
address-badnonce-flake

Conversation

@beautifulentropy
Copy link
Member

@beautifulentropy beautifulentropy commented Mar 6, 2026

The nonce service's maxConnectionAge (30s) periodically results in a GOAWAY being sent to the WFE's gRPC connections, causing affected SubConns to briefly leave READY state while reconnecting. Due to jitter on maxConnectionAge, the getNonceService and redeemNonceService connections to the same backend can GOAWAY at slightly different times, creating a window where the WFE can still issue nonces from a backend it can no longer redeem against. The chisel2.py retry logic was added to paper over this, but retries mask real failures.

Note: no corresponding change is made/possible in the Go integration tests because badNonce retries are handled internally by github.com/eggsampler/acme.

Since integration test runs complete well within 30 minutes, increasing maxConnectionAge to 30m ensures nonce connections are never cycled during a CI run, which should eliminate the flake.

A follow-up PR will address the underlying issue.

Part of #8662

@beautifulentropy beautifulentropy marked this pull request as ready for review March 6, 2026 22:56
@beautifulentropy beautifulentropy requested a review from a team as a code owner March 6, 2026 22:56
@beautifulentropy beautifulentropy requested a review from jsha March 6, 2026 22:56
@beautifulentropy beautifulentropy marked this pull request as draft March 6, 2026 23:04
@beautifulentropy beautifulentropy removed the request for review from jsha March 6, 2026 23:04
@beautifulentropy beautifulentropy changed the title test: Add badNonce retry to make_client() in integration tests test: Remove badNonce retries and increase nonce maxConnectionAge Mar 10, 2026
@beautifulentropy beautifulentropy marked this pull request as ready for review March 10, 2026 19:55
@beautifulentropy beautifulentropy force-pushed the address-badnonce-flake branch 2 times, most recently from aa55a21 to be7546b Compare March 10, 2026 19:58
Copy link
Contributor

@jsha jsha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general but there is some weirdness with the diff showing changes from other PRs that were already merged.

@beautifulentropy beautifulentropy requested a review from jsha March 10, 2026 20:03
@github-actions
Copy link
Contributor

@beautifulentropy, this PR appears to contain configuration and/or SQL schema changes. Please ensure that a corresponding deployment ticket has been filed with the new values.

@beautifulentropy
Copy link
Member Author

Looks good in general but there is some weirdness with the diff showing changes from other PRs that were already merged.

All fixed, apologies for the force-push.

@beautifulentropy
Copy link
Member Author

@beautifulentropy, this PR appears to contain configuration and/or SQL schema changes. Please ensure that a corresponding deployment ticket has been filed with the new values.

No need, we don't plan to make a corresponding change to this value in staging/production.

@jsha jsha merged commit a5929e6 into main Mar 10, 2026
29 checks passed
@jsha jsha deleted the address-badnonce-flake branch March 10, 2026 22:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants