Stack: MIE Create-a-Container (frontend + backend) + Neon (Postgres) + OpenAI API. Status: Current deployment reference for the merged WorkWell Measure Studio stack. Cost target: keep the live stack under about $25/month.
The primary demo deployment runs on MIE's internal container platform (os.mieweb.org).
One instance only: TWH — Total Worker Health. Encompasses all OSHA safety + eCQM wellness measures.
| Service | Hostname | Image |
|---|---|---|
| Frontend | twh.os.mieweb.org |
ghcr.io/taleef7/workwell-twh-frontend |
| Backend API | twh-api.os.mieweb.org |
ghcr.io/taleef7/workwell-api |
Push to main triggers .github/workflows/deploy-twh-mieweb.yml which:
- Builds the backend image tagged with
latest+sha-<SHA> - Builds the frontend image with TWH branding baked in via build-args
- Deploys both containers to MIE via
deploy-mieweb-container.sh
| Secret | Purpose |
|---|---|
LAUNCHPAD_API_URL |
MIE Create-a-Container API base URL |
LAUNCHPAD_API_KEY |
MIE API authentication key |
DATABASE_URL_TWH |
Neon pooled connection string for TWH instance |
OPENAI_API_KEY |
AI services (Draft Spec, Explain Why Flagged) |
WORKWELL_AUTH_JWT_SECRET_TWH |
JWT signing secret for TWH instance |
The backend detects WORKWELL_INSTANCE=twh (set in the workflow) and seeds:
- All 4 OSHA surveillance measures with full CQL (Audiogram, HAZWOPER, TB, Flu)
- 4 HEDIS wellness catalog measures (Cholesterol, BMI, Diabetes HbA1c, Hypertension)
- All 49 CMS eCQM catalog entries (Draft, awaiting CQL authoring)
Use workflow_dispatch with replace_existing: true from the GitHub Actions UI.
| Layer | Service | Tier | Cost |
|---|---|---|---|
| Frontend | Vercel | Hobby | $0 |
| Backend | Fly.io | shared-cpu-1x, 512MB | ~$2/mo |
| Postgres | Neon | Free | $0 (3GB cap) |
| AI | OpenAI API | direct, budget-capped | variable |
| Domain | Vercel subdomain | n/a | $0 |
Fly 256MB free OOMs Spring Boot. Don't try.
Fallback if Fly cost is a problem: Render free tier (cold-start tradeoff, ~30s first hit per inactive period).
- GitHub account, repo
workwell-measure-studio - Fly CLI:
iwr https://fly.io/install.ps1 -useb | iex(Windows) orcurl -L https://fly.io/install.sh | sh - Vercel CLI:
pnpm i -g vercel - Neon account + project created
- OpenAI API key with a hard monthly budget cap set in console billing
- Create project
workwell-measure-studio, region us-east, Postgres 16 - Copy pooled connection string (for app)
- Copy direct connection string (for Flyway migrations)
- Save as repo secrets:
DATABASE_URL,DATABASE_URL_DIRECT
Do not use neonctl projects create unless it supports pg_version=16; the current CLI defaults to Postgres 17 and is not compliant with the locked stack.
cd backend
fly launch --no-deploy
fly secrets set DATABASE_URL=<neon-pooled>
fly secrets set DATABASE_URL_DIRECT=<neon-direct>
fly secrets set OPENAI_API_KEY=<key>
fly secrets set SPRING_PROFILES_ACTIVE=prod
fly secrets set WORKWELL_AUTH_ENABLED=true
fly secrets set WORKWELL_AUTH_JWT_SECRET=<strong-random-secret>
fly secrets set WORKWELL_AUTH_COOKIE_SAME_SITE=None
fly secrets set WORKWELL_AUTH_COOKIE_SECURE=trueThe frontend (Vercel) and backend (Fly) are different registrable domains, so every browser→API call is cross-site. The refresh-token cookie must be
SameSite=None; Secureor the browser never sends it on the cross-sitePOST /api/auth/refreshfetch — silent token refresh fails and users are logged out on every page reload. Production startup now fails fast ifWORKWELL_AUTH_COOKIE_SAME_SITEis notNoneorWORKWELL_AUTH_COOKIE_SECUREis nottrue.
Edit fly.toml: memory = "512mb", region = closest to you (e.g., ord, iad), and keep min_machines_running = 1 if you need a stable remote MCP connection.
Stop after wiring the secrets and project settings. Deploy only after the stack is provisioned and verified.
First deploy verification:
fly deploy
curl https://<app>.fly.dev/actuator/health # expect {"status":"UP"}- Import GitHub repo, root directory
frontend/ - Framework: Next.js (auto-detected)
- Env vars:
NEXT_PUBLIC_API_BASE_URL= Fly app URL (e.g.,https://workwell-measure-studio-api.fly.dev)NEXT_PUBLIC_APP_NAME=WorkWell Measure StudioNEXT_PUBLIC_DEMO_MODE=trueonly for local/demo builds that should prefill the login form
- Stop after project connection and env configuration. First deploy from
mainhappens after the stack is provisioned and verified.
- Get API key from platform.openai.com
- Set $20/mo hard usage limit in billing
- Save as Fly secret only (never expose to frontend)
| Var | Where | Purpose |
|---|---|---|
DATABASE_URL |
Fly | Pooled Neon connection for app runtime |
DATABASE_URL_DIRECT |
Fly | Direct Neon connection for Flyway migrations |
OPENAI_API_KEY |
Fly | AI calls (drafting and explanation surfaces) |
SPRING_PROFILES_ACTIVE |
Fly | Always prod in deployed env |
WORKWELL_AUTH_ENABLED |
Fly | Enable stub auth; set true in deployed env |
WORKWELL_AUTH_JWT_SECRET |
Fly | Required when auth is enabled; use a strong secret |
WORKWELL_AUTH_COOKIE_SAME_SITE |
Fly | Refresh-cookie SameSite. Must be None in production (cross-site Vercel↔Fly). Default Lax for local same-origin dev. |
WORKWELL_AUTH_COOKIE_SECURE |
Fly | Refresh-cookie Secure flag. Must be true in production (required for SameSite=None). Default false for local HTTP dev. |
NEXT_PUBLIC_API_BASE_URL |
Vercel | Backend URL for fetch calls |
NEXT_PUBLIC_APP_NAME |
Vercel | App display name |
NEXT_PUBLIC_DEMO_MODE |
Vercel | Prefill login form for local/demo builds only |
WORKWELL_EMAIL_PROVIDER |
Fly | Outreach email provider. Stays simulated on the demo stack (default + CLAUDE.md hard rule). |
WORKWELL_EMAIL_SENDGRID_API_KEY |
Fly | SendGrid API key. Wiring exists in code but must remain unset on the demo stack; only set in an explicit non-demo deployment alongside WORKWELL_EMAIL_PROVIDER=sendgrid. |
WORKWELL_EMAIL_FROM_ADDRESS |
Fly | From address for outreach (default noreply@workwell-demo.dev). |
WORKWELL_EMAIL_FROM_NAME |
Fly | From display name (default WorkWell Measure Studio). |
.env.example at repo root mirrors this list (without values). At present, env vars must be verified manually before deploy; the existing CI workflow does not validate deployment secrets or Vercel env configuration.
The demo stack runs WORKWELL_EMAIL_PROVIDER=simulated. Outreach actions never send a real
email — each attempt is logged and written to outreach_delivery_log with status=SIMULATED,
visible in the Admin → Outreach Delivery Log panel. SendGrid wiring (com.workwell.notification.EmailService)
exists for post-demo / non-demo use only and is exercised solely when both
WORKWELL_EMAIL_PROVIDER=sendgrid and WORKWELL_EMAIL_SENDGRID_API_KEY are set; if the
provider is sendgrid but no key is configured it degrades safely back to a simulated send.
Do not set WORKWELL_EMAIL_SENDGRID_API_KEY on the demo stack.
The non-prod POST /api/admin/demo-reset endpoint (admin-only, @Profile("!prod")) truncates
volatile demo tables including audit_events; it returns 403 under the prod profile.
Active deploy workflow: .github/workflows/deploy-twh-mieweb.yml
- Triggers on every push to
mainand viaworkflow_dispatch - Builds backend + frontend Docker images, pushes to GHCR, deploys both containers to MIE
CI workflow: .github/workflows/ci.yml
- Runs backend build + tests
- Runs frontend lint
- Does not deploy (deploy is separate workflow above)
- Backend:
GET /actuator/health→{"status":"UP"} - Frontend:
GET /→ 200 OK - DB: from Fly machine,
fly ssh console→psql $DATABASE_URL_DIRECT -c "SELECT 1"
Post-deploy smoke checklist (MVP complete surface):
GET /actuator/health->200GET /api/runs?limit=1->200GET /api/cases?status=open->200GET /api/exports/runs?format=csv->200GET /api/exports/outcomes?format=csv&runId=<latest-run-id>->200GET /api/exports/cases?format=csv&status=open->200GET /api/audit-events/export?format=csv->200GET /api/admin/integrations->200POST /api/admin/integrations/mcp/sync->200POST /api/cases/{id}/actions/outreach/delivery?deliveryStatus=SENT->200GET /api/cases/{id}confirmslatestOutreachDeliveryStatus=SENT
Add Fly HTTP check every 30s on /actuator/health. Free, alerts on 3 failures.
fly releases list
fly releases rollback <version>Or redeploy a previous SHA:
git checkout <sha>
fly deployDashboard → Deployments → previous → Promote to Production.
Each schema migration creates a branch. Promote previous branch to main from Neon dashboard.
Daily check while the stack is live:
- Fly dashboard: Usage tab, projected monthly
- Neon dashboard: storage + compute consumed
- OpenAI usage dashboard: today's spend
If any approaches limit, fix that day. Don't wait.
Fly deploy fails with OOM
- Verify
memory = "512mb"infly.toml - Reduce JVM heap:
JAVA_OPTS=-Xmx384m -Xss256k - Check
fly logsfor OOMKilled
Neon connection limit hit
- Use pooled connection string (
DATABASE_URL), not direct, in app - HikariCP
maximum-pool-size: 10inapplication.yml - Direct only for Flyway
Vercel build fails
- Check Node version: 20+
- Verify
NEXT_PUBLIC_API_BASE_URLis set in Vercel env - Clear build cache if backend types changed: Vercel dashboard → Settings → Clear Cache
OpenAI 429
- One retry with exponential backoff (1s, 2s)
- Surface "AI temporarily unavailable" in UI
- Fall back to rule-based explanation text
- Audit log records the failure
Audit log missing entries after deploy
- Check Spring profile is
prod, notdev - Verify migration ran:
fly ssh console, thenpsql $DATABASE_URL_DIRECT -c "\dt" - Should see
audit_eventtable
Case detail or outreach delivery endpoint returns 500 after deploy
- Check for SQL operator compatibility in prepared statements.
- PostgreSQL JSON existence should use
jsonb_exists(payload_json, 'key')in JDBC query text rather than raw?operator when bind parameters are present.
MCP server can't be reached
- MCP runs as separate process or endpoint (
/mcp) - Check Fly machine has port exposed if using stdio over HTTP
- Verify Claude Desktop config points to the deployed URL and sends an
Authorizationheader with a valid WorkWell JWT - If the machine is scaling to zero, keep
min_machines_running = 1so the SSE transport stays available for remote clients
Vercel subdomain workwell-measure-studio.vercel.app is fine for the demo. If buying a real domain later:
- Buy on any registrar
- Vercel: Settings → Domains → add, follow DNS instructions
- Fly:
fly certs add api.<your-domain>, follow DNS instructions - Update
NEXT_PUBLIC_API_BASE_URLto new backend domain
- Confirm the active Vercel project is
workwell-measure-studio. - Confirm Vercel Root Directory is
frontend. - For the S0
/runsprobe, validate preflight before debugging POST:OPTIONS https://workwell-measure-studio-api.fly.dev/api/eval- Expect
200plusAccess-Control-Allow-Origin.
- If probe UI shows
404while direct POST works, check CORS/security config and redeploy Fly backend. - Keep
NEXT_PUBLIC_API_BASE_URLas origin-only (for examplehttps://workwell-measure-studio-api.fly.dev), with no/apisuffix and no trailing whitespace.