Skip to content

feat(auth): add a Claude Platform on AWS credential tier#61

Open
bluedoors wants to merge 1 commit into
anthropics:mainfrom
bluedoors:add-aws-auth-tier
Open

feat(auth): add a Claude Platform on AWS credential tier#61
bluedoors wants to merge 1 commit into
anthropics:mainfrom
bluedoors:add-aws-auth-tier

Conversation

@bluedoors

Copy link
Copy Markdown

Summary

Adds a new, highest-priority, opt-in credential tier that routes requests
through the Claude Platform on AWS gateway. Claude Platform on AWS does not
support OAuth, and today the CLI authenticates only via OAuth login, a
first-party ANTHROPIC_API_KEY, or first-party WIF — none of which reach an
AWS-provisioned org, and the CLI can neither send the required
anthropic-workspace-id header nor SigV4-sign requests. That blocks the
version-controlled Managed Agents workflow (ant beta:agents create < agent.yaml
from CI) on AWS, which is the CLI's canonical use.

The tier is selected only by explicit opt-in — the --aws flag or the
persistent ANTHROPIC_USE_AWS toggle (the CLAUDE_CODE_USE_BEDROCK idiom), both
surfaced through one cmd.Bool("aws"). It is not inferred from ambient AWS
env vars (AWS_REGION / AWS_PROFILE etc.), which legitimately leak into any
shell on an EC2 host or AWS-configured laptop. When active,
getDefaultRequestOptions short-circuits before the existing first-party
precedence switch and builds request options from the SDK's aws/ backend:

  • API-key mode--aws-api-key / ANTHROPIC_AWS_API_KEY set → authenticate
    with x-api-key + the workspace header. No AWS IAM creds needed.
  • SigV4 mode — no API key → sign with the default AWS credential chain
    (IAM role / AWS_PROFILE / env keys), using the region + workspace header.

All ~40 Stainless-generated commands work unchanged over the AWS transport —
full ant parity including beta:agents, --transform, @file, and
YAML/JSON stdin.

What changed

The diff is confined to hand-written files — no Stainless-generated command
file is touched, and there is no SDK version bump (the pinned
anthropic-sdk-go v1.50.1 already ships the aws/ subpackage).

  • pkg/cmd/extras.go — register four global flags in the existing
    hand-written init() block, each paired with its env Sources: (matching the
    --profile / --federation-rule convention):

    • --aws (ANTHROPIC_USE_AWS) — opt into the AWS backend.
    • --aws-region (AWS_REGION, AWS_DEFAULT_REGION) — load-bearing for the
      gateway URL and the SigV4 signing scope.
    • --aws-workspace-id (ANTHROPIC_AWS_WORKSPACE_ID).
    • --aws-api-key (ANTHROPIC_AWS_API_KEY) — optional; its presence selects
      API-key mode.

    Deliberately not added: --aws-profile / --aws-access-key /
    --aws-secret-access-key / --aws-session-token. The principled line is that
    the CLI flags the Anthropic-gateway config and lets the AWS default
    credential chain resolve every AWS credential itself. Region is the one
    exception that earns a flag because it is load-bearing for the URL and signing
    scope, not merely a credential input.

  • pkg/cmd/cmdutil.go

    • buildAWSConfig(cmd) aws.ClientConfig — a pure, unit-tested helper that maps
      the four flags onto the SDK config. Empty values are intentional: the SDK's
      awsauth.ResolveConfig treats an empty string as "fall back to env /
      regional derivation" for every field, so no IsSet guards are needed.
    • A tier-0 short-circuit keyed on cmd.Bool("aws"), placed right after the
      base opts slice and before the credential-resolution block — so an AWS
      call never runs loadProfileIfUsable (disk I/O + possible client_id
      shadow-warning) or warnIfMultipleAuthSources, all wasted/misleading for an
      AWS request. aws.NewClient's clean "no region / no workspace ID / no base
      URL" error is surfaced via the existing federation-path os.Exit(1) style
      (rather than re-threading an error return through ~30 generated handlers),
      instead of a downstream 401.
    • warnIfAWSConflict — warns when --aws overrides a first-party credential
      (--api-key, --auth-token, an explicit profile, or federation), naming the
      ignored source. Critically excludes --aws-api-key, which only selects
      API-key mode within the AWS tier — otherwise every API-key-mode CI run would
      spuriously warn.
  • pkg/cmd/cmd_auth.goauthStatus short-circuits to a focused
    awsAuthStatus row when Bool("aws") is set, showing AWS as the tier-0
    winner with the active mode label (API key … redacted, vs SigV4 via AWS credential chain), the region/workspace, and the backend-resolved base URL.
    This matters in the default CI case: with ANTHROPIC_USE_AWS persistent,
    Bool("aws") is true without typing --aws, so without this row auth status
    would print "(no credential configured…)" while requests in the same env
    succeed.

  • go.mod / go.sum — record the AWS SDK transitive // indirect deps that
    the aws/ subpackage newly makes reachable. The anthropic-sdk-go pin is
    unchanged.

Because the AWS tier short-circuits before the first-party precedence switch,
it does not slot into that switch's ordering. Instead it owns the two surfaces a
short-circuiting tier-0 needs: a dedicated warnIfAWSConflict (parallel to the
first-party warnIfMultipleAuthSources, which the AWS path never reaches), and a
dedicated awsAuthStatus branch in authStatus (ahead of the credWinner
logic). So it reads as a first-class credential tier with its own conflict
diagnostic and status row, not a bolt-on — without entangling the existing
first-party precedence code.

Tests

New unit tests in pkg/cmd/cmd_aws_test.go (no network, no live AWS — they test
logic, following the existing cmd_auth_test.go / cmdutil_test.go patterns):

  • TestBuildAWSConfig — flag→config pass-through, including the empty-flag case.
  • TestAWSConflictWarning — fires for each first-party cred; silent for
    --aws-api-key alone (the must-not-warn case); emits once.
  • TestAWSPrecedenceInvariant — AWS wins at both observable precedence sites
    (request options + auth status) when paired with the top first-party tier.
  • TestAuthStatusAWSRow — API-key vs SigV4 mode labels, secret redaction, base-
    URL override vs regional derivation, and that first-party rows aren't shown.

Test plan

  • go build ./... and go vet ./pkg/cmd/ pass; gofmt clean.
  • go test ./pkg/cmd/ -run 'AWS|MultiAuth' passes. (The repo's full suite
    also requires a mock Steady server on localhost:4010 for the generated
    command tests — unrelated to this change.)
  • Live (manual, AWS-subscribed workspace): auth smoke test confirms host
    aws-external-anthropic.{region}.api.aws, the anthropic-workspace-id
    header, the mode-appropriate auth header, and a real HTTP 200 (not the
    prior 401 x-api-key header is required).
  • Live: Managed Agents YAML round-trip —
    beta:agents create < agent.yaml --transform id -r, list, retrieve,
    then update --version N (optimistic lock; version advanced 1 → 2 and a
    stale --version was rejected) over the AWS transport, then archive.
  • Live: negative checks — unresolvable region → no AWS region found;
    missing workspace → no workspace ID found; --aws + first-party
    --api-key → conflict warning.

Live verification summary

Verified end-to-end against a real Claude-Platform-on-AWS workspace
(us-east-1) in all three credential configurations, all green:

Auth configuration Result
API-key mode (ANTHROPIC_AWS_API_KEY, x-api-key) ✅ 200; x-api-key + workspace header on the wire; key redacted in auth status
SigV4 mode — static keys (AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY) ✅ 200; SigV4 Authorization signature + workspace header on the wire
SigV4 mode — named profile (AWS_PROFILE, default credential chain) ✅ 200; same SigV4 signing path

Each configuration ran the full sequence: auth status tier-0/mode label →
messages create (asserting the actual HTTP/… 200 status line) → the
beta:agents create/list/retrieve/update/archive round-trip with optimistic
locking → the negative checks above. The harness used is intentionally not
part of this PR (it reads live credentials); it lives locally only.

Notes for reviewers

  • The aws.NewClient error uses the existing federation path's os.Exit(1)
    fatal (which itself carries a TODO about bypassing urfave's error pipeline).
    Matches current style; happy to switch to proper error propagation if
    preferred.
  • context.Background() is used at the short-circuit because
    getDefaultRequestOptions takes only *cli.Command (urfave/cli v3 exposes no
    ctx on Command) and is called at ~30 generated sites without ctx threading;
    aws.NewClient uses ctx only for eager SigV4 credential resolution, with no
    long-lived request I/O.

Add a new highest-priority, opt-in credential tier that routes requests
through the Claude Platform on AWS gateway (which does not support OAuth),
unblocking the version-controlled Managed Agents workflow on AWS.

Selected only by explicit opt-in — the --aws flag or the persistent
ANTHROPIC_USE_AWS toggle (not ambient AWS env vars). When active,
getDefaultRequestOptions short-circuits before the first-party credential
switch and builds request options from the SDK's aws/ backend (SigV4 via
the default AWS credential chain, or x-api-key when --aws-api-key is set).

All ~40 Stainless-generated commands work unchanged over the AWS transport.
Changes are confined to hand-written files:

- extras.go: register --aws, --aws-region, --aws-workspace-id, --aws-api-key
  global flags (each paired with its env Source).
- cmdutil.go: buildAWSConfig pure helper, the AWS tier-0 short-circuit, and
  warnIfAWSConflict (warns when --aws overrides a first-party cred; excludes
  --aws-api-key, which only selects API-key mode).
- cmd_auth.go: awsAuthStatus renders AWS as the tier-0 winner with the active
  mode label and backend-resolved base URL, redacting the API key.
- cmd_aws_test.go: buildAWSConfig pass-through, conflict warning fires/silent,
  auth status row, and the cross-site precedence invariant.

The anthropic-sdk-go pin (v1.50.1) is unchanged — its aws/ subpackage was
already present; go.mod only records the AWS SDK transitive indirect deps it
newly makes reachable.
@bluedoors bluedoors marked this pull request as ready for review June 15, 2026 05:31
@bluedoors bluedoors requested a review from a team as a code owner June 15, 2026 05:31
@Gehan-Panapitiya

Gehan-Panapitiya commented Jun 17, 2026

Copy link
Copy Markdown

Great. Upvoted +100

@bluedoors

bluedoors commented Jun 18, 2026

Copy link
Copy Markdown
Author

Known gap: beta:worker (self-hosted sandboxes) is not covered by this AWS tier

This PR gives full control-plane parity on AWS via the getDefaultRequestOptions short-circuit (pkg/cmd/cmdutil.go:77) — beta:agents, beta:environments (incl. the raw :work poll/ack/stop/heartbeat/stats commands), beta:sessions, beta:skills, beta:memory-stores, beta:vaults, beta:files, messages, models.

It does not cover beta:worker (poll / run), the self-hosted sandbox worker loop. beta:worker is the only command that builds its client outside getDefaultRequestOptions — via newWorkerClient (pkg/cmd/worker.go:232) — and, more fundamentally, the SDK worker helper it wraps can't express the AWS auth model.

This is a supported product configuration — per the self-hosted sandboxes docs, on Claude Platform on AWS the worker authenticates with AWS IAM (SigV4) or an AWS-Console API key, not an environment key (attach the AnthropicSelfHostedEnvironmentAccess managed policy; on EKS that's the pod's IRSA role). So there's no auth conflict — the IAM principal is simply the worker credential.

The blocker is upstream in anthropic-sdk-go, not in the CLI. Its lib/environments worker helper hard-requires a non-empty environment key (poller.go:74-75, :249-250; worker.go:138-140, :225-234) and unconditionally injects WithHeaderDel("X-Api-Key") + WithAuthToken(envKey) on every request (poller.go:73-81), which strips/overrides exactly the AWS credential the worker needs. This is identical in the pinned v1.50.1 and the latest v1.50.2, with no escape hatch (no SkipAuth / auth-mode field on any worker option struct). The env-key is used only as the Authorization bearer (never a routing/body field), so the fix is clean but must happen in the SDK — there's no RequestOptions workaround for the poll/heartbeat/skills calls.

Ask / next step: filed an upstream anthropic-sdk-go feature request (anthropics/anthropic-sdk-go#366) for an AWS / client-auth mode in lib/environments — an explicit opt-in (e.g. UseClientAuth bool / AuthMode on EnvironmentWorkerOptions + WorkPollerOptions) that skips the env-key requirement and makes the bearer injection a no-op so the base client's SigV4/x-api-key flows through. Backward-compatible; SessionToolRunner needs no change. Once that lands, the CLI side is a small thread-through (route newWorkerClient through aws.Client.Options, relax the required --environment-key under --aws, and forward AWS creds rather than the env-key in the --on-work/run container path).

Tracking worker-on-AWS as a separate follow-up gated on that SDK change, to keep this control-plane PR clean and upstreamable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants