feat(ami): single-instance AMI / appliance form factor#21
Merged
Conversation
Adds the Packer template, install / first-boot / harden scripts, prod-shape
config yamls, and systemd units that bake a Nexus Gateway single-instance
appliance image (AWS Marketplace target, with on-prem VMware / KVM / bare-
metal reuse planned via the same install logic).
What's in the image:
- PostgreSQL 16, Valkey 9 with valkey-search, NATS JetStream
- The four Nexus Go services (nexus-hub, control-plane, ai-gateway,
compliance-proxy) + control-plane-ui Vite dist
- nginx HTTPS reverse proxy with /api/, /oauth/, /authserver/, /healthz
and SPA fallback
- Per-instance secrets, MITM CA, self-signed TLS cert (with IP SANs) and
randomised admin password all generated at first boot
- Marketplace per-instance-uniqueness invariant honoured (admin password
is random per launch, NOT shared across launches)
Fixes discovered iterating Packer builds in us-east-1:
AWS / Packer:
1. ami_description must be pure ASCII (em dash rejected at end of
build, AMI auto-deregistered)
systemd ordering:
2. nexus-first-boot.service: drop Before=postgresql.service (deadlocks
systemctl start postgresql called from inside the unit)
3. valkey.service: use RuntimeDirectory= instead of
ReadWritePaths=/var/run/valkey (the dir is on tmpfs, wiped at boot)
15. first-boot.sh kicks nexus-* + nginx at the tail of its run, to
clear sticky "Dependency failed" cascades from the boot race
21. same kick now includes nginx (which fails its ExecStartPre nginx -t
before first-boot-ca writes the cert)
Valkey + valkey-search:
4. install -m 0755 libsearch.so (Valkey rejects 0644 — no exec bit)
5. Valkey bumped to 9.0.4 (valkey-search 1.2.0 requires >= 9.0.1)
Runtime quota / iteration:
6. vCPU limit guidance in operator doc (Standard family bucket 16)
first-boot idempotency:
7. first-boot-{secrets,ca,db}.sh skip cleanly if state already exists
8. first-boot-db.sh: `EXISTING_URL=\$(grep ... || true)` so pipefail
does not abort the script when no DATABASE_URL line is present yet
Node / Prisma:
9. first-boot-db.sh: prepend /opt/nexus/node/bin to PATH so npx
shebangs resolve `node`
10. drop removed-in-Prisma-7 `--skip-generate` flag from `prisma db push`
11. CREATE ROLE nexus WITH SUPERUSER so the seed can DISABLE TRIGGER
ALL on system RI triggers (PG is 127.0.0.1 + SCRAM only)
Per-instance config:
12. first-boot.sh stamps publicURL into the four yamls from IMDSv2
(public-ipv4 -> local-ipv4 -> hostname fallback)
13. control-plane.config.yaml has authServer.issuer = env override,
first-boot writes AUTH_SERVER_ISSUER to match publicURL
14. compliance-proxy.config.yaml now has the mq / registry / auth
blocks that the new validators require
17. first-boot appends https://<ip>/auth/callback to cp-ui
OAuthClient.redirectUris so the SPA's PKCE flow lands cleanly
Auth surface end-to-end:
16. nginx /oauth/ -> control-plane:3001 (PKCE authorize/token endpoints)
18. nginx /authserver/ -> control-plane:3001 (IDP list + password POST)
19. cert is regenerated with subjectAltName IPs and added to the
system CA trust bundle so the JWT verifier can fetch JWKS over
HTTPS at the public IP without skipping verification
Docs:
- docs/developers/architecture/cross-cutting/deployment/
ami-appliance-architecture.md - full design rationale
- docs/operators/ops/ami-build.md - operator-facing build steps + common
failure modes
- docs/developers/architecture/README.md - trigger-table row for
nexus-ami/** -> the new arch doc
- top-level README - Deployment section pointing at the AMI form factor
- nexus-ami/README.md - quick build / test commands
Verified by:
- Eleven Packer builds — final build is clean
- Two fresh t3.medium launches reach all-9-units-active with no manual
intervention on first boot
- End-to-end OAuth flow: /oauth/authorize 302 -> /login -> POST
/authserver/password 200 -> /oauth/token 200 -> bearer token verifies
via JWKS -> /api/admin/me 200 -> /api/admin/me/permissions returns
151 admin actions
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
41c1055 to
0695776
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
nexus-ami/— Packer template + install / first-boot / harden scripts + prod-shape config yamls + systemd units that bake a Nexus Gateway single-instance appliance image (AWS Marketplace target; on-prem VMware / KVM / bare-metal reuses the same install logic later).docs/developers/architecture/cross-cutting/deployment/ami-appliance-architecture.md), the operator-facing build guide (docs/operators/ops/ami-build.md), the trigger-table row mappingnexus-ami/**→ the arch doc, a Deployment section in the top-levelREADME.md, andmake ami-build/make ami-stagetargets.t3.mediumwith no manual intervention and serves the admin UI end-to-end (OAuth/PKCE login → bearer →/api/admin/me→ 151 admin actions).What's in the image
nexus-first-boot.service/var/log/nexus/admin-credentials.txtVerification
t3.mediumlaunchactivein ~60s. All nine systemd units active./oauth/authorize302 →/login→POST /authserver/password200 →/oauth/token200 → bearer verifies via JWKS →/api/admin/me200 → 151 admin actions-kcurl https://<public-ip>/.well-known/jwks.json200 (cert SAN includes IP + CA anchor merged viaupdate-ca-trust)aws ec2 modify-image-attribute --launch-permission UserId=679593333241succeededTest plan
cd nexus-ami && ./build.shproduces a working AMI (no manual intervention)t3.mediumfrom the new AMI; all 9 nexus systemd units activesudo cat /var/log/nexus/admin-credentials.txtshows a populated random passwordhttps://<public-ip>/in browser, log in, land on dashboard679593333241and trigger Self-Service AMI Scan🤖 Generated with Claude Code