fix(ci): GHCR lowercase image owner + correct rollout env var (CI deploy has been failing on every push) by acedatacloud-dev · Pull Request #20 · AceDataCloud/X402Guard

acedatacloud-dev · 2026-05-10T09:28:20Z

Why

Every push to main since this workflow was first written has produced a failed deploy workflow run. Two separate bugs stacked on each other:

Bug 1 — GHCR rejects mixed-case repository names

github.repository_owner on this repo evaluates to AceDataCloud (mixed case). The workflow's env block baked that into the image name:

env:
  IMAGE_API: ghcr.io/${{ github.repository_owner }}/x402guard-api

GHCR refuses, error verbatim from CI:

ERROR: failed to build:
  invalid tag "ghcr.io/AceDataCloud/x402guard-api:25625033998":
  repository name must be lowercase

Net effect: zero images have ever been pushed to ghcr.io/acedatacloud/x402guard-{api,web} by CI. The live x402guard.acedata.cloud is running whatever was hand-deployed at hackathon time. PRs #18 / #19 merged but their changes are not running in production.

Bug 2 — `rollout` step env var name mismatch

Even if bug 1 hadn't blocked us, the rollout step would have applied manifests with image tag local:

- name: Deploy
  env:
    GITHUB_RUN_ID: ${{ github.run_id }}     # ← passed here
  run: bash deploy/run.sh

…but deploy/run.sh reads:

TAG="${BUILD_NUMBER:-local}"                # ← reads here

So BUILD_NUMBER is unset → TAG=local → kubectl apply of image: ghcr.io/acedatacloud/x402guard-api:local → ImagePullBackOff. Caught only because rollout was gated on vars.DEPLOY_TO_K8S == 'true' (which is currently not set, so this leg never actually ran).

Fix

.github/workflows/deploy.yaml:

Drop the env-level IMAGE_API / IMAGE_WEB. Add a step that lower-cases the owner once and exposes the full image name via $GITHUB_OUTPUT. Build steps reference ${{ steps.names.outputs.image_{api,web} }}.
Rename GITHUB_RUN_ID → BUILD_NUMBER in the rollout step's env block to match deploy/run.sh.

Verified deploy/production/{api,web}.yaml already use lowercase ghcr.io/acedatacloud/x402guard-{api,web}:${TAG}, so the manifests resolve correctly once the image actually exists.

After this lands

Push trigger	Today	After this PR
`images` job builds + pushes	❌ fails on tag-case error	✅ pushes `ghcr.io/acedatacloud/x402guard-{api,web}:<run_id>` + `:latest`
`rollout` job (when `vars.DEPLOY_TO_K8S=true` is set)	would have applied `:local`	✅ applies `:<run_id>` matching what was just pushed
`rollout` skipped (default)	n/a	n/a — comment about gating preserved

To enable the rollout leg you still need to set:

repo variable DEPLOY_TO_K8S=true
repo secret KUBECONFIG (base64-encoded kubeconfig with acedatacloud namespace access)
bootstrap secret x402guard-secrets in the cluster (deploy/run.sh checks for this and exits 1 if missing — see comment block at top)

This PR doesn't flip those — only fixes the build so the next person who flips them gets a working pipeline.

Why now

PRs #18 (envelope scheme:"exact") and #19 (CLI demo + README) only matter on main if production actually picks up main. Right now production has never picked up any main commit via CI. This PR is the missing link before anyone can verify the fix end-to-end against x402guard.acedata.cloud.

Verification

Workflow YAML still parses (no syntax change in the structure, only refactored interpolation).

Cannot test the rollout leg from here — needs the platform team's KUBECONFIG secret. The build step's success on first run after merge will be the real verification.

Two bugs that have made every push to main fail since this workflow was first added: 1. ghcr.io requires lowercase repository names. github.repository_owner on this repo is "AceDataCloud" (mixed case), which baked through to `ghcr.io/AceDataCloud/x402guard-api:...` and got rejected by GHCR with `ERROR: failed to build: invalid tag ... repository name must be lowercase`. Net effect: zero images have ever been pushed by CI; every deploy run on every push to main has failed at build time. 2. The rollout job passed GITHUB_RUN_ID into the deploy step, but deploy/run.sh reads BUILD_NUMBER (with a `local` fallback). So even if rollout ever ran (it never did — see #1), it would have applied manifests with image tag `local`, which does not exist in any registry. Renamed the workflow env to BUILD_NUMBER to match. Fixes: - Compute `owner_lc=$(echo "$owner" | tr A-Z a-z)` at the top of the `images` job and surface it via job-step outputs. No more env-level ${{ github.repository_owner }} interpolation that bakes mixed case into the image name. - Rename the rollout step env from GITHUB_RUN_ID to BUILD_NUMBER so it matches what deploy/run.sh expects. Verified deploy/production/*.yaml already use lowercase `ghcr.io/acedatacloud/x402guard-{api,web}:${TAG}`, so the manifests will resolve once images actually land on GHCR. After merge: - Push triggers `images` job → lowercase tag → GHCR push succeeds - If `vars.DEPLOY_TO_K8S=true` and secret KUBECONFIG present, rollout step picks up the new tag via BUILD_NUMBER, applies manifests, pods roll to the new image. - Without those repo settings, rollout silently no-ops (intentional — comment in the workflow already explained this gating).

…pay_for_api caveat (#22) Adds a "Live on devnet" badge + a quoted callout near the top with the real 2026-05-10 verification result (3 spends, vault 4.00 -> 3.97 USDC, finalized tx 249u8Pion...3y3D on Solscan). The customer who reported "MCP could not be loaded" can now skim the top of the README, click the Solscan link to confirm the on-chain side is live, and run the curl / demo recipe to confirm their own MCP URL is healthy without any Claude / Cursor / SDK plumbing. Concrete changes: - "60-second verification" section near the top: 3 steps, all `curl` + `python scripts/demo.py`. End-state explicitly: "If steps 1-2 work, any `MCP could not be loaded` you see in Claude Desktop is a client-side problem". - Spelled out the `aceguard_spend` request/response shape with a real finalized tx as the canonical example. Added the `recipient ATA must exist on devnet` pre-req inline (Anchor 3012), with the one-line `spl-token create-account` command to satisfy it. - Pivoted Step 5 of the walkthrough from `pay_for_api` to `aceguard_spend`. Reason: api.acedata.cloud issues mainnet x402 quotes (`EPjFWdd5...` mint, `5iVXFr...` payTo); the production x402guard deploy is on devnet, so the recipient ATA the on-chain program expects does not exist on this cluster. This is *expected* per .plans/X402GUARD.md and called out clearly so customers do not burn an afternoon trying to make that path work pre-mainnet flip. - Updated Step 6 (boundary-in-action prompts) to use `aceguard_spend` invocations that map to actual Anchor errors today, instead of the pre-existing `pay_for_api` examples that no longer fire. Pairs with #18 / #19 / #20 / #21. The mainnet flip stays the V2 step .plans/X402GUARD.md already calls out (#11 / "Why devnet, not mainnet"). Co-authored-by: acedata-bot <bot@acedata.cloud>

Germey approved these changes May 10, 2026

View reviewed changes

Germey merged commit e650001 into main May 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ci): GHCR lowercase image owner + correct rollout env var (CI deploy has been failing on every push)#20

fix(ci): GHCR lowercase image owner + correct rollout env var (CI deploy has been failing on every push)#20
Germey merged 1 commit into
mainfrom
fix/deploy-pipeline

acedatacloud-dev commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

acedatacloud-dev commented May 10, 2026

Why

Bug 1 — GHCR rejects mixed-case repository names

Bug 2 — rollout step env var name mismatch

Fix

After this lands

Why now

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Bug 2 — `rollout` step env var name mismatch