Skip to content

fix(ci): GHCR lowercase image owner + correct rollout env var (CI deploy has been failing on every push)#20

Merged
Germey merged 1 commit into
mainfrom
fix/deploy-pipeline
May 10, 2026
Merged

fix(ci): GHCR lowercase image owner + correct rollout env var (CI deploy has been failing on every push)#20
Germey merged 1 commit into
mainfrom
fix/deploy-pipeline

Conversation

@acedatacloud-dev
Copy link
Copy Markdown
Member

Why

Every push to main since this workflow was first written has produced a failed deploy workflow run. Two separate bugs stacked on each other:

Bug 1 — GHCR rejects mixed-case repository names

github.repository_owner on this repo evaluates to AceDataCloud (mixed case). The workflow's env block baked that into the image name:

env:
  IMAGE_API: ghcr.io/${{ github.repository_owner }}/x402guard-api

GHCR refuses, error verbatim from CI:

ERROR: failed to build:
  invalid tag "ghcr.io/AceDataCloud/x402guard-api:25625033998":
  repository name must be lowercase

Net effect: zero images have ever been pushed to ghcr.io/acedatacloud/x402guard-{api,web} by CI. The live x402guard.acedata.cloud is running whatever was hand-deployed at hackathon time. PRs #18 / #19 merged but their changes are not running in production.

Bug 2 — rollout step env var name mismatch

Even if bug 1 hadn't blocked us, the rollout step would have applied manifests with image tag local:

- name: Deploy
  env:
    GITHUB_RUN_ID: ${{ github.run_id }}     # ← passed here
  run: bash deploy/run.sh

…but deploy/run.sh reads:

TAG="${BUILD_NUMBER:-local}"                # ← reads here

So BUILD_NUMBER is unset → TAG=localkubectl apply of image: ghcr.io/acedatacloud/x402guard-api:local → ImagePullBackOff. Caught only because rollout was gated on vars.DEPLOY_TO_K8S == 'true' (which is currently not set, so this leg never actually ran).

Fix

.github/workflows/deploy.yaml:

  1. Drop the env-level IMAGE_API / IMAGE_WEB. Add a step that lower-cases the owner once and exposes the full image name via $GITHUB_OUTPUT. Build steps reference ${{ steps.names.outputs.image_{api,web} }}.
  2. Rename GITHUB_RUN_IDBUILD_NUMBER in the rollout step's env block to match deploy/run.sh.

Verified deploy/production/{api,web}.yaml already use lowercase ghcr.io/acedatacloud/x402guard-{api,web}:${TAG}, so the manifests resolve correctly once the image actually exists.

After this lands

Push trigger Today After this PR
images job builds + pushes ❌ fails on tag-case error ✅ pushes ghcr.io/acedatacloud/x402guard-{api,web}:<run_id> + :latest
rollout job (when vars.DEPLOY_TO_K8S=true is set) would have applied :local ✅ applies :<run_id> matching what was just pushed
rollout skipped (default) n/a n/a — comment about gating preserved

To enable the rollout leg you still need to set:

  • repo variable DEPLOY_TO_K8S=true
  • repo secret KUBECONFIG (base64-encoded kubeconfig with acedatacloud namespace access)
  • bootstrap secret x402guard-secrets in the cluster (deploy/run.sh checks for this and exits 1 if missing — see comment block at top)

This PR doesn't flip those — only fixes the build so the next person who flips them gets a working pipeline.

Why now

PRs #18 (envelope scheme:"exact") and #19 (CLI demo + README) only matter on main if production actually picks up main. Right now production has never picked up any main commit via CI. This PR is the missing link before anyone can verify the fix end-to-end against x402guard.acedata.cloud.

Verification

Workflow YAML still parses (no syntax change in the structure, only refactored interpolation).

Cannot test the rollout leg from here — needs the platform team's KUBECONFIG secret. The build step's success on first run after merge will be the real verification.

Two bugs that have made every push to main fail since this workflow
was first added:

1. ghcr.io requires lowercase repository names. github.repository_owner
   on this repo is "AceDataCloud" (mixed case), which baked through to
   `ghcr.io/AceDataCloud/x402guard-api:...` and got rejected by GHCR
   with `ERROR: failed to build: invalid tag ... repository name must
   be lowercase`. Net effect: zero images have ever been pushed by CI;
   every deploy run on every push to main has failed at build time.

2. The rollout job passed GITHUB_RUN_ID into the deploy step, but
   deploy/run.sh reads BUILD_NUMBER (with a `local` fallback). So even
   if rollout ever ran (it never did — see #1), it would have applied
   manifests with image tag `local`, which does not exist in any
   registry. Renamed the workflow env to BUILD_NUMBER to match.

Fixes:

- Compute `owner_lc=$(echo "$owner" | tr A-Z a-z)` at the top of the
  `images` job and surface it via job-step outputs. No more env-level
  ${{ github.repository_owner }} interpolation that bakes mixed case
  into the image name.
- Rename the rollout step env from GITHUB_RUN_ID to BUILD_NUMBER so it
  matches what deploy/run.sh expects.

Verified deploy/production/*.yaml already use lowercase
`ghcr.io/acedatacloud/x402guard-{api,web}:${TAG}`, so the manifests
will resolve once images actually land on GHCR.

After merge:
- Push triggers `images` job → lowercase tag → GHCR push succeeds
- If `vars.DEPLOY_TO_K8S=true` and secret KUBECONFIG present, rollout
  step picks up the new tag via BUILD_NUMBER, applies manifests,
  pods roll to the new image.
- Without those repo settings, rollout silently no-ops (intentional —
  comment in the workflow already explained this gating).
@Germey Germey merged commit e650001 into main May 10, 2026
acedatacloud-dev added a commit that referenced this pull request May 10, 2026
…pay_for_api caveat (#22)

Adds a "Live on devnet" badge + a quoted callout near the top with the
real 2026-05-10 verification result (3 spends, vault 4.00 -> 3.97 USDC,
finalized tx 249u8Pion...3y3D on Solscan). The customer who reported
"MCP could not be loaded" can now skim the top of the README, click the
Solscan link to confirm the on-chain side is live, and run the curl /
demo recipe to confirm their own MCP URL is healthy without any Claude
/ Cursor / SDK plumbing.

Concrete changes:
- "60-second verification" section near the top: 3 steps, all `curl` +
  `python scripts/demo.py`. End-state explicitly: "If steps 1-2 work,
  any `MCP could not be loaded` you see in Claude Desktop is a
  client-side problem".
- Spelled out the `aceguard_spend` request/response shape with a real
  finalized tx as the canonical example. Added the
  `recipient ATA must exist on devnet` pre-req inline (Anchor 3012),
  with the one-line `spl-token create-account` command to satisfy it.
- Pivoted Step 5 of the walkthrough from `pay_for_api` to
  `aceguard_spend`. Reason: api.acedata.cloud issues mainnet x402
  quotes (`EPjFWdd5...` mint, `5iVXFr...` payTo); the production
  x402guard deploy is on devnet, so the recipient ATA the on-chain
  program expects does not exist on this cluster. This is *expected*
  per .plans/X402GUARD.md and called out clearly so customers do not
  burn an afternoon trying to make that path work pre-mainnet flip.
- Updated Step 6 (boundary-in-action prompts) to use `aceguard_spend`
  invocations that map to actual Anchor errors today, instead of the
  pre-existing `pay_for_api` examples that no longer fire.

Pairs with #18 / #19 / #20 / #21. The mainnet flip stays the V2 step
.plans/X402GUARD.md already calls out (#11 / "Why devnet, not mainnet").

Co-authored-by: acedata-bot <bot@acedata.cloud>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants