Skip to content

feat(ami): single-instance AMI / appliance form factor#21

Merged
Nexus-ABC merged 1 commit into
mainfrom
feat/ami-marketplace-appliance
May 29, 2026
Merged

feat(ami): single-instance AMI / appliance form factor#21
Nexus-ABC merged 1 commit into
mainfrom
feat/ami-marketplace-appliance

Conversation

@Nexus-ABC
Copy link
Copy Markdown
Contributor

@Nexus-ABC Nexus-ABC commented May 29, 2026

Summary

  • Adds nexus-ami/ — Packer template + install / first-boot / harden scripts + prod-shape config yamls + systemd units that bake a Nexus Gateway single-instance appliance image (AWS Marketplace target; on-prem VMware / KVM / bare-metal reuses the same install logic later).
  • Adds the deployment architecture doc (docs/developers/architecture/cross-cutting/deployment/ami-appliance-architecture.md), the operator-facing build guide (docs/operators/ops/ami-build.md), the trigger-table row mapping nexus-ami/** → the arch doc, a Deployment section in the top-level README.md, and make ami-build / make ami-stage targets.
  • Captures 21 bug fixes discovered across eleven Packer iterations — full list in the commit message; the resulting AMI boots clean on a fresh t3.medium with no manual intervention and serves the admin UI end-to-end (OAuth/PKCE login → bearer → /api/admin/me → 151 admin actions).

What's in the image

Layer Component
OS Amazon Linux 2023 (HVM, EBS, gp3)
Cache deps PostgreSQL 16, Valkey 9.0.4 + valkey-search 1.2.0, NATS 2 JetStream, Node 20
Services nexus-hub (3060), nexus-control-plane (3001), nexus-gateway (3050), nexus-proxy (3128), nginx (443)
Per-instance state secrets, MITM CA, self-signed TLS cert with IP SAN, randomised admin password — generated by nexus-first-boot.service
Marketplace invariant per-launch unique admin password at /var/log/nexus/admin-credentials.txt

Verification

Run What
Eleven Packer builds Iteratively uncovered the 21 fixes. Final build is clean.
Fresh t3.medium launch First-boot reaches active in ~60s. All nine systemd units active.
OAuth/PKCE login on fresh boot /oauth/authorize 302 → /loginPOST /authserver/password 200 → /oauth/token 200 → bearer verifies via JWKS → /api/admin/me 200 → 151 admin actions
HTTPS without -k curl https://<public-ip>/.well-known/jwks.json 200 (cert SAN includes IP + CA anchor merged via update-ca-trust)
Per-instance secret uniqueness Two launches produced two different admin passwords
AWS Marketplace scanner share aws ec2 modify-image-attribute --launch-permission UserId=679593333241 succeeded

Test plan

  • cd nexus-ami && ./build.sh produces a working AMI (no manual intervention)
  • Launch a t3.medium from the new AMI; all 9 nexus systemd units active
  • sudo cat /var/log/nexus/admin-credentials.txt shows a populated random password
  • Open https://<public-ip>/ in browser, log in, land on dashboard
  • Launch a SECOND instance from the same AMI and confirm DIFFERENT admin password (Marketplace invariant)
  • Share AMI + snapshot with 679593333241 and trigger Self-Service AMI Scan

🤖 Generated with Claude Code

Adds the Packer template, install / first-boot / harden scripts, prod-shape
config yamls, and systemd units that bake a Nexus Gateway single-instance
appliance image (AWS Marketplace target, with on-prem VMware / KVM / bare-
metal reuse planned via the same install logic).

What's in the image:
  - PostgreSQL 16, Valkey 9 with valkey-search, NATS JetStream
  - The four Nexus Go services (nexus-hub, control-plane, ai-gateway,
    compliance-proxy) + control-plane-ui Vite dist
  - nginx HTTPS reverse proxy with /api/, /oauth/, /authserver/, /healthz
    and SPA fallback
  - Per-instance secrets, MITM CA, self-signed TLS cert (with IP SANs) and
    randomised admin password all generated at first boot
  - Marketplace per-instance-uniqueness invariant honoured (admin password
    is random per launch, NOT shared across launches)

Fixes discovered iterating Packer builds in us-east-1:

  AWS / Packer:
    1.  ami_description must be pure ASCII (em dash rejected at end of
        build, AMI auto-deregistered)
  systemd ordering:
    2.  nexus-first-boot.service: drop Before=postgresql.service (deadlocks
        systemctl start postgresql called from inside the unit)
    3.  valkey.service: use RuntimeDirectory= instead of
        ReadWritePaths=/var/run/valkey (the dir is on tmpfs, wiped at boot)
   15.  first-boot.sh kicks nexus-* + nginx at the tail of its run, to
        clear sticky "Dependency failed" cascades from the boot race
   21.  same kick now includes nginx (which fails its ExecStartPre nginx -t
        before first-boot-ca writes the cert)
  Valkey + valkey-search:
    4.  install -m 0755 libsearch.so (Valkey rejects 0644 — no exec bit)
    5.  Valkey bumped to 9.0.4 (valkey-search 1.2.0 requires >= 9.0.1)
  Runtime quota / iteration:
    6.  vCPU limit guidance in operator doc (Standard family bucket 16)
  first-boot idempotency:
    7.  first-boot-{secrets,ca,db}.sh skip cleanly if state already exists
    8.  first-boot-db.sh: `EXISTING_URL=\$(grep ... || true)` so pipefail
        does not abort the script when no DATABASE_URL line is present yet
  Node / Prisma:
    9.  first-boot-db.sh: prepend /opt/nexus/node/bin to PATH so npx
        shebangs resolve `node`
   10.  drop removed-in-Prisma-7 `--skip-generate` flag from `prisma db push`
   11.  CREATE ROLE nexus WITH SUPERUSER so the seed can DISABLE TRIGGER
        ALL on system RI triggers (PG is 127.0.0.1 + SCRAM only)
  Per-instance config:
   12.  first-boot.sh stamps publicURL into the four yamls from IMDSv2
        (public-ipv4 -> local-ipv4 -> hostname fallback)
   13.  control-plane.config.yaml has authServer.issuer = env override,
        first-boot writes AUTH_SERVER_ISSUER to match publicURL
   14.  compliance-proxy.config.yaml now has the mq / registry / auth
        blocks that the new validators require
   17.  first-boot appends https://<ip>/auth/callback to cp-ui
        OAuthClient.redirectUris so the SPA's PKCE flow lands cleanly
  Auth surface end-to-end:
   16.  nginx /oauth/ -> control-plane:3001 (PKCE authorize/token endpoints)
   18.  nginx /authserver/ -> control-plane:3001 (IDP list + password POST)
   19.  cert is regenerated with subjectAltName IPs and added to the
        system CA trust bundle so the JWT verifier can fetch JWKS over
        HTTPS at the public IP without skipping verification

Docs:
  - docs/developers/architecture/cross-cutting/deployment/
    ami-appliance-architecture.md - full design rationale
  - docs/operators/ops/ami-build.md - operator-facing build steps + common
    failure modes
  - docs/developers/architecture/README.md - trigger-table row for
    nexus-ami/** -> the new arch doc
  - top-level README - Deployment section pointing at the AMI form factor
  - nexus-ami/README.md - quick build / test commands

Verified by:
  - Eleven Packer builds — final build is clean
  - Two fresh t3.medium launches reach all-9-units-active with no manual
    intervention on first boot
  - End-to-end OAuth flow: /oauth/authorize 302 -> /login -> POST
    /authserver/password 200 -> /oauth/token 200 -> bearer token verifies
    via JWKS -> /api/admin/me 200 -> /api/admin/me/permissions returns
    151 admin actions

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Nexus-ABC Nexus-ABC force-pushed the feat/ami-marketplace-appliance branch from 41c1055 to 0695776 Compare May 29, 2026 05:31
@Nexus-ABC Nexus-ABC merged commit 5da0876 into main May 29, 2026
5 checks passed
@Nexus-ABC Nexus-ABC deleted the feat/ami-marketplace-appliance branch June 3, 2026 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant