Skip to content

[Bug]: Minion sign_in auth loop fails on 3008.x; 3006.x/3007.x hang silently in same condition #69442

@dwoz

Description

@dwoz

Description

A 3008.x minion configured for multi-master fails to authenticate and exits the retry loop with SaltClientError: Failed to authenticate with the master after 7 attempts, surfaced via the outer Unable to sign_in to master wrapping in salt.channel.client.AsyncPubChannel.connect().

Affected versions

  • 3008.x: emits the error and breaks out of the auth loop after auth_tries (default 7) attempts. Visible failure.
  • 3006.x / 3007.x: same underlying condition (sign_in() returning "retry" repeatedly) does not have an outer-loop cap. The minion silently loops forever with exponential backoff up to acceptance_wait_time_max. No error log, no traceback, just a stuck minion.

The auth_tries outer cap was added on 3008.x in bcde0577d7c (originally 68c16baeb73, "Improve salt-ssh relenv/thin parity and fix various regressions"). On 3006.x/3007.x, auth_tries is still defined (default 7) but only consumed inside sign_in() as the per-network-send retry count passed to channel.send(...). It is not used to terminate the outer creds == "retry" loop in _authenticate().

Symptom on 3008.x

2026-06-12T19:52:58.914Z ERROR salt-minion 3987900 [salt@4413] salt.minion: Error while bringing up minion for multi-master. Is master at vsp-instance.vcf.nimbus.internal responding? The error message was Unable to sign_in to master: Failed to authenticate with the master after 7 attempts
Traceback (most recent call last):
  File "/opt/saltstack/salt/lib/python3.14/site-packages/salt/channel/client.py", line 463, in connect
    await self.auth.authenticate()
salt.exceptions.SaltClientError: Failed to authenticate with the master after 7 attempts

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/saltstack/salt/lib/python3.14/site-packages/salt/minion.py", line 1341, in _connect_minion
    await minion.connect_master(failed=failed)
  File "/opt/saltstack/salt/lib/python3.14/site-packages/salt/minion.py", line 1680, in connect_master
    master, self.pub_channel = await self.eval_master(
                               ^^^^^^^^^^^^^^^^^^^^^^^
        self.opts, self.timeout, self.safe, failed
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/opt/saltstack/salt/lib/python3.14/site-packages/salt/minion.py", line 1000, in eval_master
    await pub_channel.connect()
  File "/opt/saltstack/salt/lib/python3.14/site-packages/salt/channel/client.py", line 483, in connect
    raise salt.exceptions.SaltClientError(
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        f"Unable to sign_in to master: {exc}"
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )  # TODO: better error message

Two distinct issues

  1. Diagnose why sign_in() keeps returning "retry" for this minion against vsp-instance.vcf.nimbus.internal (master keys, master AES rotation, key acceptance state, etc.). The above traceback alone does not identify the cause — master logs around the same timestamps are needed.
  2. Backport the outer-loop cap to 3006.x and 3007.x. Today those branches silently spin on the same condition with no operator-visible error. Whatever the root cause turns out to be, having a clear bail-out is the right behavior for all maintained branches.

Salt install type / version

Official package; 3008.x (Python 3.14 onedir, judging by /opt/saltstack/salt/lib/python3.14/...).

Reference

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugbroken, incorrect, or confusing behavior

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions