feat(cli): improve agent discoverability and add headless auth login by rafa-thayto · Pull Request #291 · clerk/cli

rafa-thayto · 2026-05-15T19:50:20Z

Summary

Closes the five agentcli-bench gaps (D3, A4, P3, P2, T7) and adds a clerk auth login --token <key> flow for CI / agents.

Top-level help discoverability: clerk --help now renders an Examples: block and an Environment: section listing the five CLERK_* env vars the binary actually reads (CLERK_SECRET_KEY, CLERK_MODE, CLERK_CONFIG_DIR, CLERK_UPDATE_CHANNEL, CLERK_NO_UPDATE_CHECK). Implemented via a new setEnvVars() declaration-merging helper in lib/help.ts, mirroring the existing setExamples() pattern.
--json field documentation: apps list|create, users list|create, and doctor --json now describe the JSON shape in the option description so agents/consumers know what to expect.
Headless authentication (clerk auth login --token <key>): accepts a Clerk PLAPI access token inline or via - (stdin). Validates JWT shape, size cap (8 KB), and azp audience claim locally before the userinfo network call, then persists with no refresh token. A --token - invocation on a TTY refuses up-front instead of hanging on EOF. Sibling awaitConcurrentRefresh skips the race-detection loop for token-only sessions so two parallel logins don't collide on the empty-refresh sentinel.

Background: vault handoff doc — diffs the Clerk CLI's agentcli-bench score (44.5) against resend (55.7) and prescribes the five gap closures landed here. Expected score after this PR is ~52–55 overall.

A property test guards the Environment: list against drift — every documented CLERK_* name must actually be read in cli-core/src/.

Test plan

bun run format / bun run lint / bun run typecheck / bun run test pass locally (CI will re-run)
clerk --help renders the new Examples: and Environment: sections
clerk auth login --help shows --token <key> and references CLERK_SECRET_KEY for per-instance API access
clerk auth login --token <jwt> with a fresh valid token logs in without OAuth and persists the session
clerk auth login --token sk_test_xxx rejects with a clear "expected a JWT" message before hitting the network
clerk auth login --token - refuses on a TTY and reads from stdin when piped
cat token.txt | clerk auth login --token - works end-to-end
clerk users list --json and clerk apps list --json still emit pipeable JSON
Re-run agentcli-bench against the freshly-compiled binary and confirm D3, A4, P3, P2, T7 improve as expected

changeset-bot · 2026-05-15T19:50:24Z

🦋 Changeset detected

Latest commit: f13d4c5

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package

Name	Type
clerk	Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Closes the five agentcli-bench gaps (D3, A4, P3, P2, T7) and adds a `clerk auth login --token <key>` flow for CI / agents: - Top-level `Examples:` block on `clerk --help` (D3) - New `Environment:` help section via `setEnvVars()`, documenting the five `CLERK_*` env vars the binary actually reads (A4) - `--json` field descriptions on `apps list|create`, `users list|create`, and `doctor --json` so consumers know the shape (P3) - Verified `--json` + `isAgent()` coverage across data-returning subcommands (P2) - `clerk auth login --token <key>` for headless auth: accepts a Clerk PLAPI access token (or `-` for stdin), validates JWT shape and audience (`azp` claim, soft check with back-compat) locally before the userinfo call, persists with no refresh token. Sibling `awaitConcurrentRefresh` skips the race-detection loop for token-only sessions so two parallel logins don't collide on the empty-refresh sentinel (T7) A property test guards the `Environment:` list against drift — every documented `CLERK_*` name must be one the CLI actually reads.

login.ts now imports storeAccessToken, assertValidAccessToken, and getJwtAuthorizedParty from credential-store.ts. The shared test stubs were missing these exports, causing login.test.ts to fail with "Export named 'storeAccessToken' not found" when Bun resolved the mocked module.

P9: agentcli-bench rubric. Pair --quiet with existing --verbose so agents can pin log verbosity in either direction. Sets log level to 'error' which keeps fatal output but silences info/warn/success.

P8: agentcli-bench rubric. Color was emitted unconditionally; now gated on stdout TTY detection, the NO_COLOR env var, and the new --no-color global flag. Inline highlight() and tag-prefix codes in log.ts honor the same gate. log.test.ts explicitly forces color on since its assertions inspect ANSI sequences.

P5: agentcli-bench rubric. Bumps EXIT_CODE.USAGE from 2 to 64 and adds DATAERR(65), UNAVAILABLE(69), SOFTWARE(70), TEMPFAIL(75), NOPERM(77) for use by retryable/transient error classification. Wires program.exitOverride() so Commander's unknownOption / unknownCommand / missingArgument errors funnel through runProgram and exit with EX_USAGE instead of Commander's default 1. Agents can now branch on exit code alone: 64 bad invocation — fix the command 75 transient/network — retry 77 auth — re-authenticate Tests that use EXIT_CODE.USAGE symbolically are unaffected by the numeric bump.

R3 + R7: agentcli-bench rubric. Every outputJsonError() now emits {code, message, retryable, nextStep, docsUrl?, errors?}. retryable: HTTP 408/425/429/5xx, plus network ECONNREFUSED/RESET/ ETIMEDOUT/EAI_AGAIN/'fetch failed', are flagged true so agents can implement a single retry loop. nextStep: per-class remedy ('retry with backoff', 'check connectivity with clerk doctor', 'run clerk --help'). exitCode: 4xx auth → EX_NOPERM (77); 5xx → EX_UNAVAILABLE (69); 429/408/425 → EX_TEMPFAIL (75); other → 1/SOFTWARE. Combined R3+R7 because both extend the same JSON shape — splitting would have made the second commit a single-field add.

D4: agentcli-bench rubric. Agents that don't want to parse --help can walk the JSON shape produced by 'clerk schema' to discover every subcommand, argument, and option (with choices, defaults, flags). Returns {cli, version, schemaVersion, command} where command is a recursive SchemaCommand node. schemaVersion=1 is the stable contract; breaking shape changes bump it.

P10: agentcli-bench rubric. JSON shape becomes {data, hasMore, nextCursor, pagination: {offset, limit}}. nextCursor encodes the next offset so agents can paginate forward without knowing the scheme — pass it back as --offset. Existing hasMore is retained as the canonical 'done?' signal.

R8: agentcli-bench rubric. apps create is non-idempotent by default — re-running creates duplicates. --if-not-exists looks up an app by name first and returns it (with reused:true in --json output) instead of creating a duplicate. The default behavior is preserved; agents that need idempotency opt in explicitly.

Commander's recursive parent chain has concrete generic parameters that don't unify across heterogeneous subcommands, so importing Command<Args, Opts, GlobalOpts> for typing the walker fails strict typecheck. Replace with a CommandLike interface that captures only the introspection surface we need (name/aliases/description/ registeredArguments/options/commands).

…probes

…nvelope Update tests to match the sysexits exit code changes (EXIT_CODE.USAGE is now 64, not 2; HTTP 500 errors now exit with 69/EX_UNAVAILABLE). Update users list JSON assertions to include the new nextCursor and pagination fields added to the agent-mode envelope. Remove the stderr assertion from the input-json test for Commander's "unknown option" message, which is written via process.stderr.write and not captured by the test harness's log capture.

`OAUTH_SECTION_INTRO` was a module-level constant that baked in bold() at import time, before tests could call setColorEnabled(true). Convert it to a lazy function so color formatting is evaluated at call time; add setColorEnabled(true/restore) to the deploy test beforeEach/afterEach following the pattern in log.test.ts. Also refactor cli-program.test.ts to use test.each instead of a for loop inside a test, and clean up schema/index.ts structural type access.

rafa-thayto closed this May 19, 2026

rafa-thayto reopened this May 19, 2026

rafa-thayto force-pushed the empty-legume branch 2 times, most recently from 72b3544 to 741a8b9 Compare May 21, 2026 12:20

rafa-thayto force-pushed the empty-legume branch 2 times, most recently from fe4bb11 to 473a410 Compare June 2, 2026 12:34

rafa-thayto added 14 commits June 9, 2026 09:39

feat(cli): add global --quiet flag

bd131f9

P9: agentcli-bench rubric. Pair --quiet with existing --verbose so agents can pin log verbosity in either direction. Sets log level to 'error' which keeps fatal output but silences info/warn/success.

chore(release): regen README help + add changeset for agentcli-bench …

5b3e431

…probes

style: fix formatting after rebase onto main

f13d4c5

rafa-thayto force-pushed the empty-legume branch from b0bf382 to f13d4c5 Compare June 9, 2026 12:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cli): improve agent discoverability and add headless auth login#291

feat(cli): improve agent discoverability and add headless auth login#291
rafa-thayto wants to merge 14 commits into
mainfrom
empty-legume

rafa-thayto commented May 15, 2026

Uh oh!

changeset-bot Bot commented May 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rafa-thayto commented May 15, 2026

Summary

Test plan

Uh oh!

changeset-bot Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

changeset-bot Bot commented May 15, 2026 •

edited

Loading