Skip to content

AIN-289 · dual-key internal-signup-key rotation (zero-downtime)#97

Merged
hizrianraz merged 4 commits into
mainfrom
hizrianraz/ain-289-charter-security-tooling
May 29, 2026
Merged

AIN-289 · dual-key internal-signup-key rotation (zero-downtime)#97
hizrianraz merged 4 commits into
mainfrom
hizrianraz/ain-289-charter-security-tooling

Conversation

@hizrianraz
Copy link
Copy Markdown
Contributor

@hizrianraz hizrianraz commented May 29, 2026

Why

The leaked INTERNAL_SIGNUP_KEY (the master X-Ainfera-Internal-Key, which with X-Ainfera-On-Behalf-Of impersonates any tenant) is still live in prod as of 2026-05-29 — verified by sha256-matching the quarantined REVOKED-AIN-289-* leak file against prod. (The leaked bearer keys were already rotated out; this master key was not.)

Auth has a single verifier (settings.internal_signup_key) with no dual-key support, so a naive rotation would 401 the customer dashboard (Vercel AINFERA_INTERNAL_KEY) and DO-fleet heartbeats during cutover.

What

  • Settings.verify_internal_key(presented) — constant-time (hmac.compare_digest) check against the active key plus an optional internal_signup_key_previous.
  • Route all 7 comparison sites through it: deps.py, auth/ownership.py (×2), routers/signup.py, routers/heartbeat.py, routers/capture_metrics.py, routers/install.py.
  • Unit tests for the rotation window (tests/unit/test_internal_key_rotation.py).

Cutover (founder-gated deploy)

  1. Merge → Railway deploys dual-key code.
  2. Set Railway INTERNAL_SIGNUP_KEY=<new>, INTERNAL_SIGNUP_KEY_PREVIOUS=<old> → redeploy (accepts both).
  3. Rotate Doppler ainfera-os/prd + Vercel dashboard AINFERA_INTERNAL_KEY to <new>; redeploy dashboard; restart DO fleet.
  4. Verify internal endpoints 200 on <new>.
  5. Unset INTERNAL_SIGNUP_KEY_PREVIOUS → redeploy → leaked key retired.

Draft until founder approves the prod cutover (Railway/Vercel are founder-signer gates).


Note

High Risk
Changes master internal-key authentication used for trusted-proxy impersonation across multiple routes; incorrect env cutover could 401 dashboard/fleet until fixed.

Overview
Adds zero-downtime rotation for the leaked INTERNAL_SIGNUP_KEY: Settings.verify_internal_key() does constant-time checks against the active key and optional internal_signup_key_previous, and every X-Ainfera-Internal-Key gate (tenant resolution, ownership, signup reserved namespaces, heartbeat, capture metrics, install GitHub bypass) now uses it instead of a single string equality.

Bundles AIN-289 secret hygiene: gitleaks in CI (full-history detect), pre-commit protect --staged, custom rules for Ainfera bearer/Doppler tokens, tight test allowlists, and historical .gitleaksignore fingerprints. .gitignore blocks .launch-snapshots/ paths; history_purge_ain289.sh supports dry-run vs execute history rewrite plus post-rewrite gitleaks. rotation_verify_ain289.py JSONL-probes /v1/usage/daily per agent key from env (prefix-only logging). Also adds vault_hygiene_ain279.py (unrelated vault markdown repair; not part of the auth cutover).

Reviewed by Cursor Bugbot for commit f0082a4. Bugbot is set up for automated code reviews on this repo. Configure here.

hizrianraz and others added 4 commits May 28, 2026 17:18
…ygiene

Charter A3 (mechanism only) + A4 sweep tool. Code-only — secret minting,
force-push, branch-protection toggle, and worker deploys are explicit
founder-batch steps (see CHARTER-REPORT-2026-05-28.md).

scripts/rotation_verify_ain289.py
  Env-only probe harness. Reads AINFERA_<AGENT>_KEY env vars and probes
  GET /v1/usage/daily, emitting {agent, http_status, key_id_prefix}
  only. Never prints raw secrets. --expect {200,401,both} so the
  same harness verifies new-keys-live and old-keys-revoked.

scripts/history_purge_ain289.sh
  Per-repo dry-run-by-default tool. Adds tainted paths to .gitignore,
  reports HEAD + history hits, then in --execute mode runs
  git-filter-repo followed by a gitleaks gate. Force-push is NOT
  done by the script.

scripts/vault_hygiene_ain279.py
  Deterministic UTF-8 + frontmatter sweep. Byte-level mojibake repair,
  refuses ~/code/hizrianraz/manwe. The L2 routing formula line is
  HARD-GATED; script splits the file around it.

.gitleaks.toml + workflows/gitleaks.yml + pre-commit hook
  Defaults + Ainfera-specific bearer patterns (ai_(infera|prd|stg|dev)_*
  and dp.*.*.*). CI + local gates.

.gitignore
  Adds .launch-snapshots/ and tests/fixtures/launch-snapshots/.

§0 finding: key model is single-column on tenants. Rotation harness
assumes single-key cutover; the additive api_key_hash_pending column
is staged for a follow-up migration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ense)

gitleaks/gitleaks-action@v2 requires a paid GITLEAKS_LICENSE secret on
org-owned repos (the action hard-fails with 'License key is required'
on ainfera-ai/api in CI). Swap to direct binary install — the upstream
OSS gitleaks binary is Apache-2.0 and unencumbered, runs the same
detect against the same .gitleaks.toml config.

No other change. Pinned gitleaks 8.21.2 for reproducibility.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ixtures)

The gitleaks workflow on api#95 was failing on 7 findings that are
ALL pre-existing test fixtures or .env.example placeholders, never
live secrets:

  .env.example                                    FERNET_KEY placeholder
  tests/integration/test_phase6_jws_sender_claim  Ed25519 test fixture
  tests/unit/test_crypto.py                       Ed25519 test fixture
  tests/unit/test_structured_log.py (x2 commits)  sk-* log-redactor input

Accept them via .gitleaksignore (one line per fingerprint = exact
commit-sha:file:rule:line tuple). The fingerprint changes if the file
is moved or the line shifts, so this list cannot mask future drift.
NEW findings in current code still fail CI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…time)

The leaked INTERNAL_SIGNUP_KEY (master X-Ainfera-Internal-Key) is still
live in prod. Auth has a single verifier (settings.internal_signup_key)
with no dual-key support, so rotating it would 401 the customer dashboard
(Vercel) and DO-fleet heartbeats during cutover.

Add Settings.verify_internal_key(presented) — constant-time check against
the active key plus an optional internal_signup_key_previous. Set previous
to the OLD value during the rotation window so web + fleet roll onto the
new key with zero downtime, then unset to retire the leaked key.

Route all 7 comparison sites (deps, ownership x2, signup, heartbeat,
capture_metrics, install) through the helper — also upgrades them from
`==`/`!=` to hmac.compare_digest.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@linear-code
Copy link
Copy Markdown

linear-code Bot commented May 29, 2026

AIN-289 🔴 [SECURITY] Rotate leaked ai_infera_* keys committed in .launch-snapshots/e2e-env.sh

🔴 Live production bearer keys committed to git. ~/code/ainfera-ai/.launch-snapshots/e2e-env.sh (line ~5) contains working ai_infera_* keys for 3 active agents — verified in prod (dftfpwzqxoebwzepygzl):

Agent Status Daily cap Per-call Last call
yavanna active $15 $1.00 2026-05-26
aule active $10 $0.75 2026-05-26
namo active $5 $0.50 2026-05-26

Anyone with repo (or git-history) read access can spend against Ainfera's provider accounts up to these caps daily. Caps bound the blast radius — contain-and-rotate, not catastrophe — but rotate today.

Sequence (founder/terminal — credentialed actions)

  1. REVOKE the 3 exposed keys now (status=revoked) — bleed-stop, before the cleanup PR.
  2. RE-ISSUE via issuance flow → update Doppler.
  3. SCRUB: git rm the file → add .launch-snapshots/ to .gitignore → history-scrub (git filter-repo) → force-push. Keys persist in history; deletion alone is insufficient.
  4. Audit: check provider dashboards for anomalous spend on the 3 keys since 2026-05-15 (file creation).

Branch

security/rotate-launch-snapshot-keys (Tulkas-class finding; Aulë executes the scrub PR; founder runs revoke/re-issue).

Done when

  • 3 keys revoked + re-issued; Doppler updated; fleet probes green on new keys
  • file removed + .gitignore'd + history scrubbed + force-pushed
  • provider spend audited for anomalies
  • grep the org for any other committed ai_infera_* / secrets (sweep)

Found during AIN-285 trace (probe agent 5298a483 = aule per e2e-env.sh:5).

Review in Linear

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is ON. A cloud agent has been kicked off to fix the reported issue.

Reviewed by Cursor Bugbot for commit f0082a4. Configure here.

Comment thread ainfera_api/config.py
candidates = [self.internal_signup_key]
if self.internal_signup_key_previous:
candidates.append(self.internal_signup_key_previous)
return any(hmac.compare_digest(presented, candidate) for candidate in candidates)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any() short-circuits, breaking claimed constant-time guarantee

Low Severity

The docstring documents verify_internal_key as a "Constant-time check," but any() short-circuits on the first True result. When internal_signup_key_previous is set, a match against the active (first) key returns after one hmac.compare_digest call, while a match against the previous key or a rejection requires two. This timing difference leaks which candidate matched and whether the rotation window is open. Replacing the generator-based any() with a list comprehension (evaluating all comparisons unconditionally) and then reducing with any() or | would preserve the constant-time property end-to-end.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit f0082a4. Configure here.

@hizrianraz hizrianraz marked this pull request as ready for review May 29, 2026 04:54
@hizrianraz hizrianraz merged commit 616d76d into main May 29, 2026
5 checks passed
@hizrianraz hizrianraz deleted the hizrianraz/ain-289-charter-security-tooling branch May 29, 2026 04:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant