Skip to content

feat: zero-config auth β€” interactive PAT setup, auto-rotation, all CLIs (#81, #83)#82

Merged
datasciencemonkey merged 40 commits into
mainfrom
feat/pat-auto-rotation
Mar 28, 2026
Merged

feat: zero-config auth β€” interactive PAT setup, auto-rotation, all CLIs (#81, #83)#82
datasciencemonkey merged 40 commits into
mainfrom
feat/pat-auto-rotation

Conversation

@datasciencemonkey
Copy link
Copy Markdown
Owner

@datasciencemonkey datasciencemonkey commented Mar 27, 2026

Summary

The app deploys with zero secrets and zero pre-configuration. On first terminal session, the user pastes a short-lived token. The app validates it, configures all CLIs (Claude, Codex, OpenCode, Gemini, Databricks), runs setup, and starts auto-rotation. Old tokens are revoked every 10 minutes. No secret scopes, no SP credential leakage, no pre-deployment setup.

Fixes #81, fixes #83

What Changed

Area Before After
Deploy requirements Must mint PAT + set app secret Nothing β€” deploy and go
Owner resolution PAT β†’ current_user.me() SP β†’ app.creator (stripped after)
Auth check PAT-derived owner vs request X-Forwarded-Email vs app.creator
SP credentials Stripped at startup (hardcoded) Used for owner resolution, then stripped
Token lifecycle Manual, long-lived Auto-rotated every 10 min, 15 min lifetime
PAT validation SDK WorkspaceClient (SP fallback bug) Direct HTTP (no fallback)
CLI install Crashes without token Installs after PAT, during setup
Setup timing Runs at boot (before PAT) Runs after PAT is provided
Secret scope Required for persistence Removed entirely
Loading page Snake game while setup runs at boot Removed β€” setup waits inline in terminal

User Flow

1. Deploy app               β†’ zero config
2. App boots                 β†’ SP resolves owner β†’ SP creds stripped
3. User opens terminal       β†’ "Paste a token to let the agent act on your behalf"
4. User pastes PAT           β†’ validates (direct HTTP) β†’ rotation starts
5. Setup runs                β†’ installs + configures all CLIs (inline in terminal)
6. Terminal session starts   β†’ Claude, Codex, OpenCode, Gemini, Databricks all ready
7. Every 10 min              β†’ mint new 15-min PAT β†’ delete old PAT
8. App restarts              β†’ back to step 3

Key Fixes in This PR

Fix: SP credential fallback bug

The Databricks SDK silently falls back to SP credentials (DATABRICKS_CLIENT_ID/SECRET) when PAT validation fails. This caused /api/pat-status to return valid: true for dead PATs. Fixed by:

  1. Stripping SP credentials from env after owner resolution
  2. Using direct HTTP requests for PAT validation instead of SDK

Fix: Setup crashes without token

setup_claude.py crashed with KeyError on os.environ["DATABRICKS_TOKEN"] when no token was set. Fixed by: splitting CLI install (no token needed) from auth config (needs token), and deferring setup to after PAT is provided.

Fix: Session-aware rotation

Rotation only runs while active sessions exist. No wasted API calls minting tokens overnight when nobody's using the app.

Fix: Snake game loading page removed

The loading page with snake game was dead code after deferring setup to after PAT. Setup progress now displays inline in the terminal.

Files Changed

File Change
pat_rotator.py New β€” session-aware PAT rotation (10 min interval, 15 min lifetime)
app.py PAT endpoints (direct HTTP validation), owner via Apps API, SP stripping, setup deferred to after PAT, simplified index route
app.yaml Removed DATABRICKS_TOKEN, removed secret scope env vars, removed resources section
setup_claude.py Install CLI without token, skip auth config until PAT provided
setup_codex.py Install CLI without token, skip auth config until PAT provided
setup_opencode.py Install CLI without token, skip auth config until PAT provided
setup_gemini.py Install CLI without token, skip auth config until PAT provided
setup_databricks.py Skip config until PAT provided (CLI pre-installed)
static/index.html Interactive PAT prompt, setup polling after PAT config
static/loading.html Deleted β€” snake game no longer needed
tests/test_pat_rotator.py 19 unit tests
tests/test_pat_rotation_integration.py 6 integration tests
tests/test_heartbeat.py Fixed pre-existing missing lock in test fixture
README.md Updated for zero-config deploy
docs/deployment.md Updated for interactive PAT flow

Test Plan

Automated (102 tests, all passing)

uv run pytest tests/ -v --ignore=tests/test_npm_version_pinning.py

Manual Testing on Databricks Apps (test-coda on 9cefok)

1. Verify zero-config deploy

  • App starts without DATABRICKS_TOKEN secret
  • Logs: Owner resolved from app.creator: sathish.gangichetty@databricks.com
  • Logs: PAT rotation: no token configured β€” rotation disabled
  • Logs: SP credentials stripped β€” PAT-only auth from this point (confirmed 2026-03-27T23:59:25Z and 2026-03-28T00:05:19Z)
  • No setup thread at boot (setup deferred to after PAT)

2. Verify interactive PAT setup

  • Open app in browser
  • Terminal shows yellow message: "Databricks CLI is not configured"
  • Terminal shows instructions with workspace URL link
  • Terminal shows Token: prompt with cursor
  • Paste token (characters masked with *), press Enter
  • See: Validating token...
  • See: Token configured for sathish.gangichetty@databricks.com
  • See: Auto-rotation started. This token will be rotated out in 10 minutes.
  • Invalid token correctly rejected with 400
  • Setup triggers after PAT (logs: Setup triggered after PAT configuration at 00:07:00Z)
  • Setup status polling shown (14x /api/setup-status 200 from 00:07:00-00:07:15)
  • Setup completes β†’ session starts (POST /api/session at 00:07:15)

3. Verify all CLIs configured

  • Logs: Claude CLI auth configured (00:06:22)
  • Logs: Databricks CLI auth configured (00:06:22)
  • Logs: CLI config updated: setup_codex.py (00:06:26)
  • Logs: CLI config updated: setup_opencode.py (00:06:35)
  • Logs: CLI config updated: setup_gemini.py (00:07:00)
  • claude --version β†’ prints version
  • claude β†’ starts Claude Code
  • databricks current-user me β†’ returns your identity
  • databricks clusters list β†’ works

4. Verify PAT rotation in logs

  • PAT rotation starting β€” minting new short-lived token... (confirmed multiple cycles)
  • PAT persisted: env var + ~/.databrickscfg updated
  • PAT rotation complete β€” new token... Old token ELIMINATED
  • Repeats every 10 minutes (confirmed 15+ consecutive rotations)

5. Verify session awareness

  • Close all terminal tabs
  • Logs: PAT rotation: no active sessions β€” skipping rotation (23:21:45Z)
  • Open new tab β†’ rotation resumes on next cycle (session 23:22:37, rotation 23:31:45)

6. Verify restart recovery

  • Restart the app (redeployed)
  • Logs: PAT rotation: no token configured β€” rotation disabled
  • Open terminal β€” shows PAT setup prompt
  • Paste new PAT β†’ rotation started, all CLIs configured

7. Verify stale token handling (SP fallback fix)

  • Stale PAT causes 403 on rotation: create failed (403): Invalid access token (23:31-23:51)
  • Revoke ALL PATs β†’ open new tab β†’ PAT prompt appears (no SP fallback)
  • Paste new PAT β†’ works normally

8. Second tab (setup already done)

  • After initial setup, second session created immediately (00:13:41, no setup polling)
  • Verify "Setting up CLI tools..." message does NOT appear

This pull request was AI-assisted by Claude Code.

Mints a new 2-hour PAT every 90 minutes, persists to env var,
~/.databrickscfg, and optionally to a Databricks app secret,
then revokes the old PAT. Includes 19 tests covering rotation
logic, persistence, lifecycle, and logging.
…entials (#81)

Owner resolution no longer depends on PAT. Uses the auto-provisioned SP
to call w.apps.get().creator and matches against X-Forwarded-Email.
Falls back to PAT-based resolution for backward compat.
… create (#81)

- Rotation interval: 10 min (less API churn overnight)
- Token lifetime: 15 min (5-min overlap buffer)
- ensure_fresh() called on session creation β€” if token age > 8 min,
  rotate immediately so user never starts with a stale token
- Remove ensure_fresh() (unnecessary with overlap buffer)
- Rotation only fires when active sessions exist
- No sessions = no API churn (no pointless token minting overnight)
- 10-min rotation interval, 15-min token lifetime (5-min overlap)
- Pass session_count_fn to PATRotator for decoupled session awareness
@datasciencemonkey datasciencemonkey changed the title feat: auto-rotate short-lived PATs for zero-friction auth (#81) feat: short-lived PAT auto-rotation with interactive setup (#81, #83) Mar 27, 2026
@datasciencemonkey datasciencemonkey self-assigned this Mar 27, 2026
@datasciencemonkey datasciencemonkey changed the title feat: short-lived PAT auto-rotation with interactive setup (#81, #83) feat: zero-config auth β€” interactive PAT setup, auto-rotation, all CLIs (#81, #83) Mar 27, 2026
The index route was treating "pending" (setup not started yet, waiting
for PAT) the same as "running" (setup actively in progress), causing
the loading/snake page to appear immediately instead of index.html
with the PAT prompt. Now only shows loading.html when setup is actively
running.
With deferred setup (runs after PAT, not at boot), the wait happens
inside the terminal via polling. The loading.html snake game page
was unreachable dead code.
When the user pastes a PAT, immediately rotate it into a short-lived
token we own (with a known token ID). This ensures the first background
rotation can revoke the old token instead of logging "no old token to
revoke." The user-pasted PAT becomes unused after the initial mint.
Add _last_rotation_time to PATRotator, set on every successful mint.
pat-status now checks is_token_expired first β€” if the token lifetime
has elapsed (no rotation while sessions were idle), immediately returns
valid: false to show the PAT prompt. Avoids a wasted API call to
validate a known-dead token.
Adds app_state.py β€” a shared JSON file at ~/.coda/app_state.json that
holds app_owner (set at boot) and last_rotation_time/last_token_id
(set on every rotation). Admins can inspect this file for diagnostics.
The rotator loads last_rotation_time on init so is_token_expired works
across app restarts.
Clean up app_state wiring:
- Move import app_state to top-level in app.py
- pat_rotator writes to app_state.json on every rotation (not just
  initial mint), keeping on-disk state current for admin inspection
- Add /api/app-state endpoint for admin diagnostics
- Remove duplicate write from configure_pat (rotator handles it)
- Add 8 tests for app_state round-trip, merge, permissions, corruption
Pin mlflow to 3.10.1 and all other packages to their current resolved
versions for reproducible deploys.
- pyasn1 0.6.3 fixes GHSA-jr27-m4p2-rc6r (DoS via recursive decoding)
- pyjwt 2.12.1 fixes GHSA-752w-5fwx-jx9f (crit header bypass)
- cryptography 46.0.6, requests 2.33.0, pygments fix β€” not released
  yet, ignoring in audit until available
requests 2.33.0 fixes the predictable temp file extraction vuln but
hasn't landed on PyPI yet. Pin to the v2.33.0 tag commit SHA from
GitHub. Remove the audit ignore for this CVE.
We only use mlflow.claude_code.hooks (in mlflow-tracing). The [genai]
extra pulled in litellm β†’ tokenizers β†’ huggingface-hub β†’ typer β†’ rich
β†’ pygments (GHSA-5239-wwwm-4pmq, no fix). Switching to mlflow-tracing
drops ~40 transitive deps including pygments, scikit-learn, scipy.
Constrained cryptography>=46.0.6 which resolved by downgrading
google-auth to 2.47.0 (no cryptography dependency). Removes the last
--ignore-vuln from the audit workflow. All 5 CVEs now resolved.
mcp and cryptography conflict: mcp>=1.23 needs google-auth which needs
cryptography, but 46.0.6 isn't on PyPI yet. Prioritize mcp fix (DNS
rebinding) over cryptography (low-impact X.509 name constraints).
Weekly audit will catch when 46.0.6 lands.
Install cryptography 46.0.6 from GitHub tag (not on PyPI proxy yet).
Zero --ignore-vuln flags remaining β€” all 6 CVEs resolved.
@datasciencemonkey datasciencemonkey merged commit cd545a2 into main Mar 28, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: interactive PAT setup on first session β€” eliminate pre-configured secrets feat: auto-rotate short-lived PATs for zero-friction auth

1 participant