Skip to content

fix(deploy): rewrite manifests to AceDataCloud platform conventions; site is live#14

Merged
acedatacloud-dev merged 1 commit into
mainfrom
fix/k8s-manifests-and-deploy
May 4, 2026
Merged

fix(deploy): rewrite manifests to AceDataCloud platform conventions; site is live#14
acedatacloud-dev merged 1 commit into
mainfrom
fix/k8s-manifests-and-deploy

Conversation

@acedatacloud-dev
Copy link
Copy Markdown
Member

PR #11–13 shipped manifests modeled on a generic K8s setup. None of those actually fit the AceDataCloud TKE cluster + nginx-router ingress + wildcard-cert convention, so when the user opened https://x402guard.acedata.cloud/ they got a Kubernetes Ingress Controller Fake Certificate + 404 (the LB had no rule for the host).

This PR aligns everything with the platform's conventions and the site is now live at https://x402guard.acedata.cloud/ with a real Let's Encrypt cert.

Conventions adopted (matching Wisdom + Nexior + MCPs/* in this org)

before (PR #11) after
namespace x402guard acedatacloud
ingress class ingressClassName: nginx kubernetes.io/ingress.class: nginx-router annotation
TLS secret x402guard-tls + cert-manager annotation tls-wildcard-acedata-cloud (already in cluster)
image-pull secret missing docker-registry
build tag __BUILD__ ${TAG} sed-substituted by run.sh
service names api / web x402guard-api / x402guard-web (avoid collision in shared ns)
storage class cbs (default) cbs-ssd (cbs is Immediate-binding zone-pinned, fails)

What lands

  • Deleted: namespace.yaml, configmap.yaml (env values inlined into Deployment)
  • Updated: api.yaml, web.yaml, ingress.yaml to platform conventions
  • Updated: deploy/run.sh (sed ${TAG}, applies 4 yaml in order)
  • New: postgres.yaml — single-replica StatefulSet on cbs-ssd / 10Gi PVC. Cluster has no shared Postgres; x402guard hosts its own.
  • Updated: docker-compose.yaml service names → x402guard-api / x402guard-web so nginx upstream matches both environments
  • Updated: web/deploy/nginx.conf proxy_pass to x402guard-api:8000

Live verification

$ curl -sS https://x402guard.acedata.cloud/health
  {"status":"ok","version":"0.1.0"}
$ curl -sS https://x402guard.acedata.cloud/.well-known/x402guard
  {"service":"x402guard","version":"0.1.0","cluster":"mainnet",
   "agent_vault_program_id":"5s9rscxc...","usdc_mint":"EPjFWdd5..."}
$ openssl s_client ... -showcerts | openssl x509 -noout -subject -issuer
  subject=CN=acedata.cloud
  issuer=Let's Encrypt E8

Pods (kubectl -n acedatacloud get pods -l app=x402guard):
  x402guard-api-...      1/1 Running  (×2)
  x402guard-postgres-0   1/1 Running
  x402guard-web-...      1/1 Running  (×2)

Bugs caught while bringing the cluster live

Worth recording so the next deploy doesn't hit them again:

  • exec format errordocker compose build on macOS produces darwin/arm64 images. Cluster is amd64. Fix: docker buildx --platform linux/amd64 for the local-deploy fallback path. CI workflow already does this implicitly via docker/build-push-action.
  • PVC Pending on cbs — cbs is Immediate-binding zone-pinned and the cluster's picked zone had no spare capacity. cbs-ssd uses WaitForFirstConsumer.
  • cbs-ssd minimum is 10 Gi — 5 Gi requests fail with "disk size is invalid. Must in [10, 32000]".

Out of scope

  • The CI workflow .github/workflows/deploy.yaml doesn't run yet (DEPLOY_TO_K8S repo var unset). This first deploy was driven from a workstation using the kubeconfig pulled via .claude/scripts/tke.py. Subsequent deploys go through CI once cluster credentials are loaded into the GHCR-secrets vault.

…site is live

PR #11/#12/#13 shipped manifests modeled on a generic K8s setup. None of
those actually fit the AceDataCloud TKE cluster + nginx-router ingress
+ wildcard-cert convention, so when the user opened
https://x402guard.acedata.cloud/ they got a "Kubernetes Ingress
Controller Fake Certificate" + 404 (the LB had no rule for the host).

This PR aligns everything with the platform's conventions and the site
is now live at https://x402guard.acedata.cloud/ with a real Let's
Encrypt cert from the existing tls-wildcard-acedata-cloud secret.

Conventions adopted (matching Wisdom + Nexior + MCPs/* in this org):

  namespace                 acedatacloud (was: x402guard)
  ingress class             annotation kubernetes.io/ingress.class:
                            nginx-router (was: ingressClassName: nginx)
  TLS secret                tls-wildcard-acedata-cloud, already in the
                            cluster, signed *.acedata.cloud (was:
                            x402guard-tls + cert-manager annotation)
  image-pull secret         docker-registry, already in the namespace
                            (was: missing imagePullSecrets entirely)
  build tag                 ${TAG} substituted by sed in deploy/run.sh
                            (was: __BUILD__)
  service names             x402guard-api / x402guard-web — qualified
                            with project prefix to avoid colliding with
                            other tenants in acedatacloud namespace
                            (was: api / web)
  storage class             cbs-ssd (WaitForFirstConsumer, 10Gi minimum)
                            (was: cbs default — fails to bind because
                            cbs is Immediate-binding zone-pinned)

What changes:

  deploy/production/
    namespace.yaml             DELETED (use existing acedatacloud ns)
    configmap.yaml             DELETED (env values inlined into Deployment)
    api.yaml                   namespace + names + imagePullSecrets +
                               annotation; ${TAG} placeholder
    web.yaml                   same
    ingress.yaml               nginx-router annotation;
                               tls-wildcard-acedata-cloud;
                               5 path rules (/api, /mcp, /.well-known,
                               /health, /) all on a single Ingress
    postgres.yaml              NEW — single-replica StatefulSet on cbs-ssd
                               with a 10Gi PVC. POSTGRES_PASSWORD reads
                               from the same x402guard-secrets the api
                               consumes. Cluster has no shared Postgres
                               so x402guard hosts its own.

  deploy/run.sh                Sed ${TAG} -> $BUILD_NUMBER + apply 4 yaml
                               in order; rollout wait + /health probe.
                               Bails clearly if the secret is missing.

  docker-compose.yaml          Service names renamed
                               api -> x402guard-api / web -> x402guard-web
                               so the nginx upstream `x402guard-api`
                               works in both docker-compose and K8s
                               without separate configs.

  web/deploy/nginx.conf        proxy_pass updated to http://x402guard-api:8000
                               in all 4 locations.

Live verification (against https://x402guard.acedata.cloud/):

  $ curl -sS https://x402guard.acedata.cloud/health
    {"status":"ok","version":"0.1.0"}
  $ curl -sS https://x402guard.acedata.cloud/.well-known/x402guard
    {"service":"x402guard","version":"0.1.0","cluster":"mainnet",
     "agent_vault_program_id":"5s9rscxc...","usdc_mint":"EPjFWdd5..."}
  $ curl -sS https://x402guard.acedata.cloud/ | grep '<title>'
    <title>x402guard - Solana-native AI agent wallets</title>
  $ openssl s_client ... | openssl x509 -noout -subject -issuer
    subject=CN=acedata.cloud
    issuer=Let's Encrypt E8

  Pods (kubectl -n acedatacloud get pods -l app=x402guard):
    x402guard-api-79c7d796b7-cdlpd   1/1 Running
    x402guard-api-79c7d796b7-f9mpc   1/1 Running
    x402guard-postgres-0             1/1 Running
    x402guard-web-5869d7cd49-29772   1/1 Running
    x402guard-web-5869d7cd49-zvgcb   1/1 Running

Bugs caught while bringing the cluster live (not in this PR but worth
recording so the next deploy doesn't hit them again):

  - Initial image push was darwin/arm64 because docker compose build
    uses host arch on macOS. Cluster is amd64 -> CrashLoopBackOff with
    "exec format error". Fix: use docker buildx --platform linux/amd64.
    The CI workflow .github/workflows/deploy.yaml already does this
    via docker/build-push-action which defaults to linux/amd64, but
    the local-deploy fallback path needs the explicit platform flag.

  - cbs storage class is Immediate-binding zone-pinned and our cluster
    happened to have no spare capacity in the picked zone, so PVCs
    stayed Pending. cbs-ssd uses WaitForFirstConsumer and binds in
    the same zone the pod actually scheduled into.

  - cbs-ssd minimum disk size is 10Gi (Tencent Cloud limit). 5Gi
    requests fail with "disk size is invalid. Must in [10, 32000]".

Out of scope:
  - The CI workflow .github/workflows/deploy.yaml doesn't run yet
    (DEPLOY_TO_K8S repo var unset). This first deploy was driven from
    a workstation using the kubeconfig pulled via .claude/scripts/tke.py.
    Subsequent deploys will go through CI once the cluster credentials
    are loaded into the GHCR-secrets vault.
@acedatacloud-dev acedatacloud-dev merged commit 0c415ea into main May 4, 2026
1 check passed
@acedatacloud-dev acedatacloud-dev deleted the fix/k8s-manifests-and-deploy branch May 4, 2026 18:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant