Skip to content

feat(cli): improve agent discoverability and add headless auth login#291

Draft
rafa-thayto wants to merge 14 commits into
mainfrom
empty-legume
Draft

feat(cli): improve agent discoverability and add headless auth login#291
rafa-thayto wants to merge 14 commits into
mainfrom
empty-legume

Conversation

@rafa-thayto

Copy link
Copy Markdown
Contributor

Summary

Closes the five agentcli-bench gaps (D3, A4, P3, P2, T7) and adds a clerk auth login --token <key> flow for CI / agents.

  • Top-level help discoverability: clerk --help now renders an Examples: block and an Environment: section listing the five CLERK_* env vars the binary actually reads (CLERK_SECRET_KEY, CLERK_MODE, CLERK_CONFIG_DIR, CLERK_UPDATE_CHANNEL, CLERK_NO_UPDATE_CHECK). Implemented via a new setEnvVars() declaration-merging helper in lib/help.ts, mirroring the existing setExamples() pattern.
  • --json field documentation: apps list|create, users list|create, and doctor --json now describe the JSON shape in the option description so agents/consumers know what to expect.
  • Headless authentication (clerk auth login --token <key>): accepts a Clerk PLAPI access token inline or via - (stdin). Validates JWT shape, size cap (8 KB), and azp audience claim locally before the userinfo network call, then persists with no refresh token. A --token - invocation on a TTY refuses up-front instead of hanging on EOF. Sibling awaitConcurrentRefresh skips the race-detection loop for token-only sessions so two parallel logins don't collide on the empty-refresh sentinel.

Background: vault handoff doc — diffs the Clerk CLI's agentcli-bench score (44.5) against resend (55.7) and prescribes the five gap closures landed here. Expected score after this PR is ~52–55 overall.

A property test guards the Environment: list against drift — every documented CLERK_* name must actually be read in cli-core/src/.

Test plan

  • bun run format / bun run lint / bun run typecheck / bun run test pass locally (CI will re-run)
  • clerk --help renders the new Examples: and Environment: sections
  • clerk auth login --help shows --token <key> and references CLERK_SECRET_KEY for per-instance API access
  • clerk auth login --token <jwt> with a fresh valid token logs in without OAuth and persists the session
  • clerk auth login --token sk_test_xxx rejects with a clear "expected a JWT" message before hitting the network
  • clerk auth login --token - refuses on a TTY and reads from stdin when piped
  • cat token.txt | clerk auth login --token - works end-to-end
  • clerk users list --json and clerk apps list --json still emit pipeable JSON
  • Re-run agentcli-bench against the freshly-compiled binary and confirm D3, A4, P3, P2, T7 improve as expected

@changeset-bot

changeset-bot Bot commented May 15, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: f13d4c5

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
clerk Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@rafa-thayto rafa-thayto reopened this May 19, 2026
@rafa-thayto rafa-thayto force-pushed the empty-legume branch 2 times, most recently from 72b3544 to 741a8b9 Compare May 21, 2026 12:20
@rafa-thayto rafa-thayto force-pushed the empty-legume branch 2 times, most recently from fe4bb11 to 473a410 Compare June 2, 2026 12:34
Closes the five agentcli-bench gaps (D3, A4, P3, P2, T7) and adds a
`clerk auth login --token <key>` flow for CI / agents:

- Top-level `Examples:` block on `clerk --help` (D3)
- New `Environment:` help section via `setEnvVars()`, documenting the
  five `CLERK_*` env vars the binary actually reads (A4)
- `--json` field descriptions on `apps list|create`, `users list|create`,
  and `doctor --json` so consumers know the shape (P3)
- Verified `--json` + `isAgent()` coverage across data-returning
  subcommands (P2)
- `clerk auth login --token <key>` for headless auth: accepts a Clerk
  PLAPI access token (or `-` for stdin), validates JWT shape and
  audience (`azp` claim, soft check with back-compat) locally before
  the userinfo call, persists with no refresh token. Sibling
  `awaitConcurrentRefresh` skips the race-detection loop for token-only
  sessions so two parallel logins don't collide on the empty-refresh
  sentinel (T7)

A property test guards the `Environment:` list against drift — every
documented `CLERK_*` name must be one the CLI actually reads.
login.ts now imports storeAccessToken, assertValidAccessToken, and
getJwtAuthorizedParty from credential-store.ts. The shared test stubs
were missing these exports, causing login.test.ts to fail with
"Export named 'storeAccessToken' not found" when Bun resolved the
mocked module.
P9: agentcli-bench rubric. Pair --quiet with existing --verbose so
agents can pin log verbosity in either direction. Sets log level to
'error' which keeps fatal output but silences info/warn/success.
P8: agentcli-bench rubric. Color was emitted unconditionally; now
gated on stdout TTY detection, the NO_COLOR env var, and the new
--no-color global flag. Inline highlight() and tag-prefix codes in
log.ts honor the same gate. log.test.ts explicitly forces color on
since its assertions inspect ANSI sequences.
P5: agentcli-bench rubric. Bumps EXIT_CODE.USAGE from 2 to 64 and
adds DATAERR(65), UNAVAILABLE(69), SOFTWARE(70), TEMPFAIL(75),
NOPERM(77) for use by retryable/transient error classification.

Wires program.exitOverride() so Commander's unknownOption /
unknownCommand / missingArgument errors funnel through runProgram
and exit with EX_USAGE instead of Commander's default 1. Agents can
now branch on exit code alone:
  64  bad invocation        — fix the command
  75  transient/network     — retry
  77  auth                  — re-authenticate

Tests that use EXIT_CODE.USAGE symbolically are unaffected by the
numeric bump.
R3 + R7: agentcli-bench rubric. Every outputJsonError() now emits
{code, message, retryable, nextStep, docsUrl?, errors?}.

  retryable: HTTP 408/425/429/5xx, plus network ECONNREFUSED/RESET/
             ETIMEDOUT/EAI_AGAIN/'fetch failed', are flagged true so
             agents can implement a single retry loop.
  nextStep:  per-class remedy ('retry with backoff', 'check
             connectivity with clerk doctor', 'run clerk --help').
  exitCode:  4xx auth → EX_NOPERM (77); 5xx → EX_UNAVAILABLE (69);
             429/408/425 → EX_TEMPFAIL (75); other → 1/SOFTWARE.

Combined R3+R7 because both extend the same JSON shape — splitting
would have made the second commit a single-field add.
D4: agentcli-bench rubric. Agents that don't want to parse --help can
walk the JSON shape produced by 'clerk schema' to discover every
subcommand, argument, and option (with choices, defaults, flags).

Returns {cli, version, schemaVersion, command} where command is a
recursive SchemaCommand node. schemaVersion=1 is the stable contract;
breaking shape changes bump it.
P10: agentcli-bench rubric. JSON shape becomes
{data, hasMore, nextCursor, pagination: {offset, limit}}. nextCursor
encodes the next offset so agents can paginate forward without
knowing the scheme — pass it back as --offset. Existing hasMore is
retained as the canonical 'done?' signal.
R8: agentcli-bench rubric. apps create is non-idempotent by default
— re-running creates duplicates. --if-not-exists looks up an app by
name first and returns it (with reused:true in --json output)
instead of creating a duplicate. The default behavior is preserved;
agents that need idempotency opt in explicitly.
Commander's recursive parent chain has concrete generic parameters
that don't unify across heterogeneous subcommands, so importing
Command<Args, Opts, GlobalOpts> for typing the walker fails strict
typecheck. Replace with a CommandLike interface that captures only
the introspection surface we need (name/aliases/description/
registeredArguments/options/commands).
…nvelope

Update tests to match the sysexits exit code changes (EXIT_CODE.USAGE
is now 64, not 2; HTTP 500 errors now exit with 69/EX_UNAVAILABLE).

Update users list JSON assertions to include the new nextCursor and
pagination fields added to the agent-mode envelope.

Remove the stderr assertion from the input-json test for Commander's
"unknown option" message, which is written via process.stderr.write
and not captured by the test harness's log capture.
`OAUTH_SECTION_INTRO` was a module-level constant that baked in bold()
at import time, before tests could call setColorEnabled(true). Convert
it to a lazy function so color formatting is evaluated at call time;
add setColorEnabled(true/restore) to the deploy test beforeEach/afterEach
following the pattern in log.test.ts.

Also refactor cli-program.test.ts to use test.each instead of a for
loop inside a test, and clean up schema/index.ts structural type access.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant