Cap minion AsyncAuth retry loop with auth_tries (#69442)#69443
Open
dwoz wants to merge 1 commit into
Open
Conversation
The minion's AsyncAuth._authenticate() outer loop on 3006.x and 3007.x
keeps calling sign_in() forever whenever the master answers with the
"retry" sentinel (key not yet accepted, master AES rotation in flight,
multi-master probe). The minion sleeps acceptance_wait_time between
attempts, doubling up to acceptance_wait_time_max, and never surfaces
an error: no log, no traceback, just a stuck minion.
3008.x already caps this loop using the existing auth_tries option
(default 7); backport the same guard so the minion bails out of
_authenticate() with SaltClientError("Failed to authenticate with the
master after N attempts") once auth_tries iterations have been spent
returning "retry". auth_tries=0 keeps the old "loop forever" behavior
for operators who actually want it.
The synchronous SAuth.authenticate() path is intentionally left
unchanged: that is a separate code path used by salt-call and other
single-shot CLI flows, and its existing semantics are out of scope for
this fix.
Fixes saltstack#69442
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Backports the
auth_triesouter-loop cap on the minion'sAsyncAuth._authenticate()from 3008.x. Whensign_in()keeps returningthe
"retry"sentinel, the minion will now bail out of theauthentication loop after
auth_triesattempts (default 7) withSaltClientError("Failed to authenticate with the master after N attempts"), instead of looping silently forever with exponential backoffup to
acceptance_wait_time_max.auth_tries=0preserves the legacy "loop forever" behaviour foroperators who explicitly want it. The
SAuth.authenticate()synchronouspath is intentionally left unchanged — it is the salt-call / single-shot
CLI codepath and is out of scope for this fix.
What issues does this PR fix or reference?
Fixes #69442
Previous Behavior
On 3006.x and 3007.x, a minion whose
sign_in()consistently returns"retry"(master key not yet accepted, master AES rotation in flight,multi-master probe against an unreachable peer, etc.) sleeps
acceptance_wait_timebetween attempts, doubles up toacceptance_wait_time_max, and never logs an error. The minion appearsstuck with no operator-visible signal.
New Behavior
After
auth_triesconsecutive"retry"responses, the loop terminateswith a
SaltClientError:which is then wrapped by
salt.channel.client.AsyncPubChannel.connect()into the user-visible
"Unable to sign_in to master: ..."log line. Thismatches the behaviour 3008.x has had since the
auth_triescap wasintroduced.
Merge requirements satisfied?
auth_triesis alreadydocumented and its default of 7 carries over)
changelog/69442.fixed.md)(
tests/pytests/unit/test_crypt.py::test_authenticate_caps_retry_loop_with_auth_tries_69442)Commits signed with GPG?
No (matching the rest of 3006.x history; let me know if you want this
re-signed.)