Skip to content

Add NGINX reverse proxy with SSL, preview routing, and Grafana integration#1515

Merged
youngchingjui merged 11 commits intomainfrom
feature/migrate-nginx-config
Mar 18, 2026
Merged

Add NGINX reverse proxy with SSL, preview routing, and Grafana integration#1515
youngchingjui merged 11 commits intomainfrom
feature/migrate-nginx-config

Conversation

@youngchingjui
Copy link
Copy Markdown
Owner

@youngchingjui youngchingjui commented Feb 8, 2026

Summary

  • Add NGINX reverse proxy configuration for production deployments with automatic SSL/TLS via Let's Encrypt (Certbot + Porkbun DNS-01)
  • Add wildcard subdomain routing (*.issuetopr.dev) for ephemeral preview deployments, with dynamic Docker container resolution
  • Add Grafana monitoring reverse proxy at grafana.issuetopr.dev
  • Add SSE (Server-Sent Events) support with proper proxy buffering/timeout settings
  • Add Certbot Dockerfile and preflight secret validation script
  • Add comprehensive deployment guide and NGINX documentation
  • Update CLAUDE.md with workflow guidelines and README index entries
  • Clean up deprecated files (MIGRATION_COMPLETE.md, environment.toml, CODE_REVIEW_FIXES.md, production-checklist.md)

Test plan

Local (no SSL certs required)

These can be verified on any dev machine:

  • Config syntax check (static): Copy the nginx configs into a temp container and validate syntax (will fail on missing cert paths — this is expected, see note below)
  • Certbot preflight script: Run without porkbun.ini and confirm clear error message:
    docker compose -f docker/docker-compose.yml --profile prod up certbot
    # Should exit with: [ERROR] Missing secret: /etc/letsencrypt/secrets/porkbun.ini
  • Docker compose structure: Verify compose file includes nginx.yml and services resolve:
    docker compose -f docker/docker-compose.yml --profile prod config --services
    # Should list: neo4j, redis, workflow-workers, nginx, certbot
  • Preview network: Verify external network requirement:
    docker network create preview  # if not exists
    docker compose -f docker/docker-compose.yml --profile prod config | grep -A2 "preview"
  • Documentation review: Confirm docker/nginx/README.md and docs/deployment/README.md are accurate and internally consistent
  • No secrets committed: Verify no API keys, passwords, or credentials in any config file

Note: NGINX cannot start locally because all server blocks require Let's Encrypt cert files (/etc/letsencrypt/live/..., options-ssl-nginx.conf, ssl-dhparams.pem). This is a known limitation — consider adding a fallback HTTP-only mode for local testing in a follow-up.

Production server (requires SSL certs + DNS)

These must be tested on the production server where Let's Encrypt certs exist:

  • NGINX starts cleanly: docker compose -f docker/docker-compose.yml --profile prod up -d nginx — container stays healthy
  • Config syntax passes: docker compose -f docker/docker-compose.yml exec nginx nginx -t
  • HTTP → HTTPS redirect: curl -I http://issuetopr.dev returns 301 to https
  • Main app proxies: https://issuetopr.dev serves the Next.js app
  • Grafana proxies: https://grafana.issuetopr.dev serves the Grafana dashboard
  • SSE works through proxy: https://issuetopr.dev/api/sse streams events without buffering issues
  • WebSocket connections work (check Connection header handling)
  • Preview routing: Create a test container on preview network with an alias, confirm https://<alias>.issuetopr.dev resolves to it
  • Cert renewal dry run: docker compose -f docker/docker-compose.yml --profile prod run --rm certbot renew --dry-run
  • Healthcheck passes: docker inspect issue-to-pr-nginx-1 --format='{{.State.Health.Status}}' shows healthy

Follow-up items (out of scope for this PR)

  • Add HTTP-only fallback so NGINX can start without certs (enables local testing)
  • Add security headers (HSTS, X-Frame-Options, X-Content-Type-Options)
  • Add rate limiting (limit_req_zone)
  • Add gzip compression to nginx.conf
  • Use $connection_upgrade map variable instead of hardcoded 'upgrade' for WebSocket headers
  • Add log rotation config to nginx.yml
  • Consolidate Grafana cert path to use wildcard cert instead of separate cert

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added NGINX reverse-proxy to the Docker stack with configurable port, health checks, production profile and preview routing; supports WebSockets and SSE and injects preview widget into HTML responses.
  • Documentation

    • Added comprehensive deployment and developer infrastructure guides, a user docs scaffold, NGINX usage and routing docs, and an expanded workflow for making and reviewing docs.

@cursor
Copy link
Copy Markdown

cursor Bot commented Feb 8, 2026

You have run out of free Bugbot PR reviews for this billing cycle. This will reset on March 19.

To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 8, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds NGINX reverse-proxy support with Docker Compose integration, new NGINX configs (global plus site and preview rules including SSE/widget injection), and accompanying deployment and developer documentation updates.

Changes

Cohort / File(s) Summary
Documentation & Guides
CLAUDE.md, docker/README.md, docker/nginx/README.md, docs/deployment/README.md, docs/dev/README.md, docs/dev/infrastructure.md, docs/user/README.md
Added workflow guidance and documentation categories; introduced NGINX docs and deployment guide; added developer and infrastructure docs scaffolding and user docs scaffold.
Docker Compose
docker/compose/nginx.yml, docker/docker-compose.yml
Added compose/nginx.yml defining an nginx service (image, env-driven port, mounts, networks, healthcheck, prod profile) and included it in docker/docker-compose.yml.
NGINX Core Config
docker/nginx/nginx.conf
New main NGINX configuration with global settings, logging, mime types, and inclusion of conf.d/*.conf.
NGINX Site Configs
docker/nginx/conf.d/issuetopr.dev.conf, docker/nginx/conf.d/grafana.issuetopr.dev.conf, docker/nginx/conf.d/preview.issuetopr.dev.conf
Added server blocks: main domain proxy with SSE-specific /api/sse handling (no buffering, long timeouts); Grafana proxy; wildcard preview routing that maps subdomains to per-subdomain upstreams and injects an HTML widget via sub_filter.

Sequence Diagram(s)

sequenceDiagram
  participant Client as Client (browser)
  participant Nginx as NGINX proxy
  participant App as App container (or preview service)
  participant Host as host.docker.internal (local backend)

  Client->>Nginx: HTTP request to <sub>.issuetopr.dev
  alt preview subdomain
    Nginx->>App: proxy_pass http://<subdomain>:3000 (Docker network)
    App-->>Nginx: HTTP response (HTML)
    Nginx->>Nginx: sub_filter inject widget (uncompressed)
    Nginx-->>Client: proxied response with widget
  else main domain / SSE
    Client->>Nginx: request to /api/sse
    Nginx->>Host: proxy_pass http://host.docker.internal:3000 (SSE, no buffering)
    Host-->>Nginx: streaming SSE events
    Nginx-->>Client: forwarded SSE stream
  end
Loading

Suggested labels

AI generated

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/migrate-nginx-config
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 25b94c3744

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +5 to +7
# Use the wildcard certificate that covers *.issuetopr.dev
ssl_certificate /etc/letsencrypt/live/issuetopr.dev-0001/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/issuetopr.dev-0001/privkey.pem;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use stable cert path for wildcard previews

This preview vhost hardcodes the wildcard cert at /etc/letsencrypt/live/issuetopr.dev-0001/.... On a fresh install, certbot certonly -d issuetopr.dev -d '*.issuetopr.dev' typically creates /etc/letsencrypt/live/issuetopr.dev/ (no -0001 suffix). In that common case NGINX fails to start because the referenced cert files don’t exist, so all preview TLS routes are down until you manually rename or reissue. Consider pointing at the canonical issuetopr.dev path or making the path configurable.

Useful? React with 👍 / 👎.

Comment thread docker/scripts/check-nginx-prereqs.sh Outdated
Comment on lines +85 to +89
if docker run --rm -v "$(pwd)/docker/nginx:/etc/nginx:ro" nginx:stable-alpine nginx -t >/dev/null 2>&1; then
echo -e "${GREEN}✅ NGINX configuration syntax is valid${NC}"
else
echo -e "${RED}❌ ERROR: NGINX configuration syntax is invalid${NC}"
echo " Run this to see details:"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preflight nginx -t ignores cert mounts

The preflight test runs nginx -t in a container that only mounts /etc/nginx. Because the config references /etc/letsencrypt/..., nginx -t will fail even when the config is valid and certs exist on the host, since those files aren’t mounted into the test container. In deploy-production.sh this means the preflight always fails and will block or prompt in non‑interactive runs. Mount /etc/letsencrypt (or skip cert‑dependent checks) to make this test meaningful.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🤖 Fix all issues with AI agents
In `@docker/compose/nginx.yml`:
- Around line 12-14: Add a clear production prerequisite note to the README
explaining that the docker/compose/nginx.yml bind-mount (-
/etc/letsencrypt:/etc/letsencrypt:ro) used by the prod profile requires a
populated /etc/letsencrypt on the host (or Docker will create an empty
root-owned directory causing NGINX to fail to start because configs in
docker/nginx/conf.d/ like issuetopr.dev.conf, preview.issuetopr.dev.conf and
grafana.issuetopr.dev.conf reference /etc/letsencrypt/live/* ssl_certificate
paths); update the README’s deployment/production section to state this
requirement and include a suggested remediation (create/populate
/etc/letsencrypt or omit the prod profile until certificates are present).

In `@docker/nginx/conf.d/preview.issuetopr.dev.conf`:
- Line 3: The server_name regex uses the named capture (?<preview>.+) which is
too permissive; change it to a constrained character class (e.g. use
(?<preview>[a-z0-9-]+) or similar) so $preview only contains allowed subdomain
characters (lowercase letters, digits, hyphens) before the .issuetopr.dev, and
update any other occurrences that rely on the same capture (notably the
proxy_pass usage referencing $preview) to match this stricter pattern.

In `@docker/scripts/check-nginx-prereqs.sh`:
- Around line 85-93: The docker run check in check-nginx-prereqs.sh assumes
$(pwd) is repo root; derive the repository root from the script location instead
and use that path in the volume mount. Add logic near the top of
check-nginx-prereqs.sh to compute REPO_ROOT from the script’s directory (e.g.,
using "${BASH_SOURCE[0]}" resolution) and replace "$(pwd)/docker/nginx" in the
nginx test command with "$REPO_ROOT/docker/nginx" so the config check works
regardless of the current working directory.

In `@docker/scripts/check-secrets.sh`:
- Around line 38-40: The echo call that includes "\n" is not portable in /bin/sh
(BusyBox ash) and can print a literal backslash-n; update the certbot usage
output in the script to use printf with explicit newlines (e.g., printf "%s\n"
...) or break into multiple echo calls without escape sequences so the certbot
arguments render correctly; change the echo invocation found in
docker/scripts/check-secrets.sh that prints the multi-line certbot example to
use printf (or separate echo lines) to guarantee POSIX-compatible newlines.

In `@docker/scripts/renew-certs.sh`:
- Around line 10-39: The script uses set -e which causes the certbot renew
command to abort the script on failure so the subsequent if [ $? -eq 0 ] check
is never reached; change the flow to run the certbot renew command directly in
an if conditional (e.g., if docker compose -f docker/docker-compose.yml
--profile prod run --rm certbot renew; then ...) so failures are handled by the
if/else block and the NGINX reload logic (referencing the docker compose certbot
renew invocation and the nginx reload exec command) executes the appropriate
success or failure branch instead of being short-circuited by set -e.

In `@docs/deployment/production-checklist.md`:
- Around line 360-372: The "3. Clean up old images and logs" step currently runs
the dangerous command "docker system prune -a"; change this to recommend the
safer "docker system prune" (without -a) or add an explicit warning directly
after the "docker system prune -a" line explaining it will remove all unused
images and may force a full re-pull/rebuild for subsequent docker compose up,
and suggest alternatives (e.g., use "docker image prune" or targeted image
removal) so editors can either replace the command or annotate it with the
impact and safer options.

In `@docs/deployment/README.md`:
- Around line 60-70: Update the certbot invocation shown (the docker compose
certbot certonly command) to use the correct authenticator flag and explicitly
pass the credentials file: replace the incorrect --dns-porkbun flag with
--authenticator dns-porkbun and add --dns-porkbun-credentials
/etc/letsencrypt/secrets/porkbun.ini so certbot uses the Porkbun credentials
created earlier when obtaining certificates.

In `@scripts/deploy-production.sh`:
- Around line 121-132: The health-wait loop currently breaks on any "healthy"
match and always prints success; change it to explicitly verify that both neo4j
and redis are healthy before breaking and to fail if the timeout is reached.
Update the for-loop that runs `docker compose -f docker/docker-compose.yml ps |
grep -q "healthy"` to instead check the status lines for the specific service
names "neo4j" and "redis" (e.g., confirm both have "healthy" in their status),
and after the loop add a conditional: if either service is not healthy after 30
attempts print a descriptive error and exit non‑zero (instead of printing the
success message), otherwise print the existing success message.
- Around line 84-93: The read-based prompt after running check-nginx-prereqs.sh
uses `read -r response` which fails silently in non-interactive contexts; modify
the block to detect a TTY (e.g., test with `[ -t 0 ]`) and only prompt when
interactive, otherwise print a clear message and abort (exit 1) or respect an
explicit override (e.g., a --yes / FORCE_CONTINUE env var) so the script does
not silently cancel in CI; update the logic around the `read -r response` and
the `response` check to implement this TTY check and optional override.
🧹 Nitpick comments (11)
docker/certbot-porkbun/Dockerfile (1)

1-3: Pin the base image to a specific version for reproducible builds.

Using certbot/certbot:latest means builds are non-deterministic and could break without warning when a new certbot version is published. Consider pinning to a specific version (e.g., certbot/certbot:v3.x.x) and updating it intentionally.

docs/deployment/CODE_REVIEW_FIXES.md (1)

1-233: This document appears to be AI-generated internal tracking — consider whether it belongs in the repo.

This file documents feedback from "Sam (DevOps persona via Task agent)" — an AI persona. Shipping this as permanent documentation could confuse future contributors who may assume "Sam" is a real reviewer. Consider either:

  1. Moving this to a PR comment or issue instead of a committed file.
  2. Clearly labeling it as AI-assisted analysis at the top if you want to keep it.

Also, the "Testing Performed" section (lines 215–218) has all items unchecked (⬜), which contradicts the "Fixed Immediately" claims above.

MIGRATION_COMPLETE.md (1)

1-232: This migration-tracking file will quickly become stale — consider removing it or moving to a PR description.

MIGRATION_COMPLETE.md at the repo root is a one-time migration artifact (references "Task #14", the old young-and-ai-ec2-nginx repo, etc.). The useful content is already covered by docs/deployment/README.md and docker/nginx/README.md. Leaving this in the repo root adds clutter and will confuse future contributors when it inevitably becomes outdated.

Consider capturing this in the PR description instead and removing the file.

docker/nginx/conf.d/issuetopr.dev.conf (2)

20-39: Inconsistent indentation (mixed tabs and spaces) in the SSE location block.

The main location / block uses 8-space indentation, while location /api/sse uses tabs. This makes the config harder to maintain and diff.

Also, lines 24–25 set Connection 'upgrade' and then line 35 overrides it to Connection ''. The override is correct for SSE, but having the dead 'upgrade' value above is misleading. Consider removing the redundant WebSocket headers from the SSE block.

Proposed cleanup for the SSE block
     # Location block for SSE connections
     location /api/sse {
-  	proxy_pass http://host.docker.internal:3000;
-	proxy_http_version 1.1;
-	proxy_set_header Upgrade $http_upgrade;
-	proxy_set_header Connection 'upgrade';
-	proxy_set_header Host $host;
-	proxy_set_header X-Real-IP $remote_addr;
-	proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
-	proxy_set_header X-Forwarded-Proto $scheme;
-	proxy_cache_bypass $http_upgrade;
-
-	# SSE specific settings
-	proxy_buffering off;
-	proxy_cache off;
-	proxy_set_header Connection '';
-	chunked_transfer_encoding off;
-	proxy_read_timeout 86400s;
-	proxy_send_timeout 86400s;
+        proxy_pass http://host.docker.internal:3000;
+        proxy_http_version 1.1;
+        proxy_set_header Host $host;
+        proxy_set_header X-Real-IP $remote_addr;
+        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
+        proxy_set_header X-Forwarded-Proto $scheme;
+
+        # SSE specific settings
+        proxy_set_header Connection '';
+        proxy_buffering off;
+        proxy_cache off;
+        chunked_transfer_encoding off;
+        proxy_read_timeout 86400s;
+        proxy_send_timeout 86400s;
     }

1-49: No http2 and no security headers on the HTTPS server block.

The listen 443 ssl directive on line 42 does not enable HTTP/2. For a production proxy, consider adding http2 on; (NGINX ≥1.25.1) or listen 443 ssl http2; (older versions). Security headers (HSTS, X-Content-Type-Options, X-Frame-Options) are also absent — I see these are tracked as future items, so just flagging for awareness.

docker/nginx/conf.d/preview.issuetopr.dev.conf (1)

47-66: Same SSE block issue as in issuetopr.dev.conf — redundant Connection 'upgrade' overridden by Connection ''.

Lines 52 and 62 set contradictory Connection header values. Same cleanup as suggested for the other config applies here.

docker/nginx/nginx.conf (1)

36-39: Inconsistent indentation: tabs vs. spaces in the map block.

Lines 37–38 use tab indentation while the rest of the file uses spaces. This won't break NGINX but is a style inconsistency.

docs/deployment/production-checklist.md (1)

14-15: Markdown lint: fenced code blocks need surrounding blank lines.

Several code blocks are missing blank lines before or after them (MD031). This can cause rendering issues in some Markdown parsers. For example, add a blank line between the text and the opening ``` fence.

Also applies to: 23-24, 44-45, 55-56, 91-92, 117-118, 147-148, 152-153

docker/scripts/check-nginx-prereqs.sh (1)

95-110: Container name filter "name=nginx" is a substring match — may produce false positives.

docker ps --filter "name=nginx" matches any container whose name contains "nginx" (e.g., my-nginx-proxy, nginxtest). This could flag unrelated containers. Consider using an anchored regex or exact match if possible.

docker/compose/nginx.yml (1)

21-26: Healthcheck runs nginx -t every 10 seconds — unnecessarily expensive.

nginx -t parses and validates the entire configuration on each invocation. Running it every 10 seconds as part of the healthcheck is wasteful. The HTTP check (wget ... http://localhost:80/) alone is sufficient to verify NGINX is healthy.

Suggested fix
     healthcheck:
-      test: ["CMD", "sh", "-c", "wget --quiet --tries=1 --spider http://localhost:80/ && nginx -t"]
+      test: ["CMD", "sh", "-c", "wget --quiet --tries=1 --spider http://localhost:80/"]
       interval: 10s
       timeout: 5s
       retries: 3
       start_period: 10s
scripts/deploy-production.sh (1)

123-123: Unused loop variable i (SC2034).

Replace i with _ to signal it's intentionally unused.

Fix
-for i in {1..30}; do
+for _ in {1..30}; do

Comment thread docker/compose/nginx.yml Outdated
Comment thread docker/nginx/conf.d/preview.issuetopr.dev.conf Outdated
Comment thread docker/scripts/check-nginx-prereqs.sh Outdated
Comment on lines +85 to +93
if docker run --rm -v "$(pwd)/docker/nginx:/etc/nginx:ro" nginx:stable-alpine nginx -t >/dev/null 2>&1; then
echo -e "${GREEN}✅ NGINX configuration syntax is valid${NC}"
else
echo -e "${RED}❌ ERROR: NGINX configuration syntax is invalid${NC}"
echo " Run this to see details:"
echo " docker run --rm -v \$(pwd)/docker/nginx:/etc/nginx:ro nginx:stable-alpine nginx -t"
echo ""
exit 1
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

NGINX config test assumes $(pwd) is the repository root.

If this script is invoked from a different working directory (e.g., via an absolute path), the volume mount $(pwd)/docker/nginx will point to the wrong location and the config test will fail with a misleading error. Consider deriving the repo root from the script's own location.

Suggested fix
+# Derive the repo root from this script's location
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
+
 # 4. Check NGINX config syntax (if docker is available)
 echo ""
 echo "4️⃣  Testing NGINX configuration syntax..."
-if docker run --rm -v "$(pwd)/docker/nginx:/etc/nginx:ro" nginx:stable-alpine nginx -t >/dev/null 2>&1; then
+if docker run --rm -v "$REPO_ROOT/docker/nginx:/etc/nginx:ro" nginx:stable-alpine nginx -t >/dev/null 2>&1; then
🤖 Prompt for AI Agents
In `@docker/scripts/check-nginx-prereqs.sh` around lines 85 - 93, The docker run
check in check-nginx-prereqs.sh assumes $(pwd) is repo root; derive the
repository root from the script location instead and use that path in the volume
mount. Add logic near the top of check-nginx-prereqs.sh to compute REPO_ROOT
from the script’s directory (e.g., using "${BASH_SOURCE[0]}" resolution) and
replace "$(pwd)/docker/nginx" in the nginx test command with
"$REPO_ROOT/docker/nginx" so the config check works regardless of the current
working directory.

Comment thread docker/scripts/check-secrets.sh Outdated
Comment on lines +38 to +40
echo "Provide certbot args, e.g.:"
echo " docker compose run --rm certbot certonly --agree-tos -m support@youngandai.com --no-eff-email \\\n --dns-porkbun --dns-porkbun-credentials /etc/letsencrypt/secrets/porkbun.ini -d youngandai.com -d '*.youngandai.com'"
fi
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

\n in echo won't produce a newline in POSIX sh.

The certbot base image uses Alpine, where /bin/sh is BusyBox ash. The \n escape in echo has implementation-defined behavior under POSIX and may render as a literal \n rather than a newline. Use printf for portable formatting.

Proposed fix
-  echo "  docker compose run --rm certbot certonly --agree-tos -m support@youngandai.com --no-eff-email \\\n  --dns-porkbun --dns-porkbun-credentials /etc/letsencrypt/secrets/porkbun.ini -d youngandai.com -d '*.youngandai.com'"
+  printf "  docker compose run --rm certbot certonly --agree-tos -m support@youngandai.com --no-eff-email \\\\\n  --dns-porkbun --dns-porkbun-credentials /etc/letsencrypt/secrets/porkbun.ini -d youngandai.com -d '*.youngandai.com'\n"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
echo "Provide certbot args, e.g.:"
echo " docker compose run --rm certbot certonly --agree-tos -m support@youngandai.com --no-eff-email \\\n --dns-porkbun --dns-porkbun-credentials /etc/letsencrypt/secrets/porkbun.ini -d youngandai.com -d '*.youngandai.com'"
fi
echo "Provide certbot args, e.g.:"
printf " docker compose run --rm certbot certonly --agree-tos -m support@youngandai.com --no-eff-email \\\\\n --dns-porkbun --dns-porkbun-credentials /etc/letsencrypt/secrets/porkbun.ini -d youngandai.com -d '*.youngandai.com'\n"
fi
🤖 Prompt for AI Agents
In `@docker/scripts/check-secrets.sh` around lines 38 - 40, The echo call that
includes "\n" is not portable in /bin/sh (BusyBox ash) and can print a literal
backslash-n; update the certbot usage output in the script to use printf with
explicit newlines (e.g., printf "%s\n" ...) or break into multiple echo calls
without escape sequences so the certbot arguments render correctly; change the
echo invocation found in docker/scripts/check-secrets.sh that prints the
multi-line certbot example to use printf (or separate echo lines) to guarantee
POSIX-compatible newlines.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I take your word for it that this is true. maybe just do a double check online if this is true, and if so, then make the fix as suggested.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@youngchingjui, confirmed — the concern is well-founded. Per the official POSIX specification, if any argument to echo contains a backslash character, the results are implementation-defined. In short, echo is only portable if you omit flags and escape sequences; printf should be used instead when you need more than plain text.

Interestingly, ash/dash/busybox/ksh have a feature where echo interprets \n as a newline — but other shells don't do that, so behaviour is unpredictable across environments. printf is fully standardized in POSIX and behaves consistently across all systems.

Here is the fix to apply:

-  echo "  docker compose run --rm certbot certonly --agree-tos -m support@youngandai.com --no-eff-email \\\n  --dns-porkbun --dns-porkbun-credentials /etc/letsencrypt/secrets/porkbun.ini -d youngandai.com -d '*.youngandai.com'"
+  printf "  docker compose run --rm certbot certonly --agree-tos -m support@youngandai.com --no-eff-email \\\\\n  --dns-porkbun --dns-porkbun-credentials /etc/letsencrypt/secrets/porkbun.ini -d youngandai.com -d '*.youngandai.com'\n"

The \\\\ in the printf format string produces a literal \\ in the output (which then displays as the shell line-continuation \ in the example command), and the \n produces the actual newline.


🧠 Learnings used
Learnt from: youngchingjui
Repo: youngchingjui/issue-to-pr PR: 1282
File: shared/src/ports/events/publisher.ts:16-19
Timestamp: 2025-09-22T09:24:26.840Z
Learning: youngchingjui prefers to manage PR scope tightly and defer technical debt improvements to future PRs rather than expanding the current PR's scope, even for related consistency issues.

Comment thread docker/scripts/renew-certs.sh Outdated
Comment on lines +10 to +39
set -e

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"

echo "=== Certificate Renewal: $(date) ==="

# Change to project directory
cd "$PROJECT_ROOT"

# Attempt to renew certificates
echo "Attempting certificate renewal..."
docker compose -f docker/docker-compose.yml --profile prod run --rm certbot renew

# Check if renewal was successful by looking at exit code
if [ $? -eq 0 ]; then
echo "Certificate renewal successful (or not needed yet)"

# Reload NGINX to pick up any new certificates
echo "Reloading NGINX configuration..."
if docker compose -f docker/docker-compose.yml exec nginx nginx -s reload; then
echo "✅ NGINX reloaded successfully"
else
echo "⚠️ Warning: Failed to reload NGINX"
exit 1
fi
else
echo "❌ Certificate renewal failed"
exit 1
fi
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

set -e makes the renewal failure branch unreachable.

With set -e on line 10, if docker compose ... certbot renew on line 22 exits non-zero, the script terminates immediately — the $? check on line 25 is never reached, and the else branch (lines 36-38) is dead code. Use the command directly in the if condition, which suppresses set -e for that command:

Proposed fix
 # Attempt to renew certificates
 echo "Attempting certificate renewal..."
-docker compose -f docker/docker-compose.yml --profile prod run --rm certbot renew
-
-# Check if renewal was successful by looking at exit code
-if [ $? -eq 0 ]; then
+if docker compose -f docker/docker-compose.yml --profile prod run --rm certbot renew; then
     echo "Certificate renewal successful (or not needed yet)"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
set -e
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
echo "=== Certificate Renewal: $(date) ==="
# Change to project directory
cd "$PROJECT_ROOT"
# Attempt to renew certificates
echo "Attempting certificate renewal..."
docker compose -f docker/docker-compose.yml --profile prod run --rm certbot renew
# Check if renewal was successful by looking at exit code
if [ $? -eq 0 ]; then
echo "Certificate renewal successful (or not needed yet)"
# Reload NGINX to pick up any new certificates
echo "Reloading NGINX configuration..."
if docker compose -f docker/docker-compose.yml exec nginx nginx -s reload; then
echo "✅ NGINX reloaded successfully"
else
echo "⚠️ Warning: Failed to reload NGINX"
exit 1
fi
else
echo "❌ Certificate renewal failed"
exit 1
fi
set -e
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
echo "=== Certificate Renewal: $(date) ==="
# Change to project directory
cd "$PROJECT_ROOT"
# Attempt to renew certificates
echo "Attempting certificate renewal..."
if docker compose -f docker/docker-compose.yml --profile prod run --rm certbot renew; then
echo "Certificate renewal successful (or not needed yet)"
# Reload NGINX to pick up any new certificates
echo "Reloading NGINX configuration..."
if docker compose -f docker/docker-compose.yml exec nginx nginx -s reload; then
echo "✅ NGINX reloaded successfully"
else
echo "⚠️ Warning: Failed to reload NGINX"
exit 1
fi
else
echo "❌ Certificate renewal failed"
exit 1
fi
🤖 Prompt for AI Agents
In `@docker/scripts/renew-certs.sh` around lines 10 - 39, The script uses set -e
which causes the certbot renew command to abort the script on failure so the
subsequent if [ $? -eq 0 ] check is never reached; change the flow to run the
certbot renew command directly in an if conditional (e.g., if docker compose -f
docker/docker-compose.yml --profile prod run --rm certbot renew; then ...) so
failures are handled by the if/else block and the NGINX reload logic
(referencing the docker compose certbot renew invocation and the nginx reload
exec command) executes the appropriate success or failure branch instead of
being short-circuited by set -e.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does set -e do?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@youngchingjui, set -e (also written as set -o errexit) is a Bash option that tells the shell to exit immediately if any command returns a non-zero (failure) exit code, rather than continuing to the next line.

In your script, this creates the bug flagged in the review:

set -e   # <-- exit on any failure

# If this command fails (non-zero exit), the script exits HERE...
docker compose -f docker/docker-compose.yml --profile prod run --rm certbot renew

# ...so this line is NEVER reached on failure
if [ $? -eq 0 ]; then
    ...
else
    echo "❌ Certificate renewal failed"   # <-- dead code, never executes
    exit 1
fi

Why is this a problem?

  • If certbot renew fails, set -e kills the script right away on line 22.
  • The $? check on line 25 only ever sees 0 (success), because the script would have already terminated if it were anything else.
  • Your else branch (the "❌ Certificate renewal failed" message) is therefore unreachable dead code.

The fix is to move the command directly into the if condition. When a command is used as an if condition, Bash suspends set -e for that command and lets the if/else logic handle the result normally:

if docker compose -f docker/docker-compose.yml --profile prod run --rm certbot renew; then
    echo "Certificate renewal successful (or not needed yet)"
    ...
else
    echo "❌ Certificate renewal failed"   # now reachable!
    exit 1
fi

🧠 Learnings used
Learnt from: youngchingjui
Repo: youngchingjui/issue-to-pr PR: 1282
File: shared/src/ports/events/publisher.ts:16-19
Timestamp: 2025-09-22T09:24:26.840Z
Learning: youngchingjui prefers to manage PR scope tightly and defer technical debt improvements to future PRs rather than expanding the current PR's scope, even for related consistency issues.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't we just remove set -e?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@youngchingjui, you can — but there's a trade-off to be aware of.

If you remove set -e, the if [ $? -eq 0 ] check works as intended, and the else branch becomes reachable again. That's valid.

However, set -e is also silently guarding two other critical commands earlier in the script:

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
...
cd "$PROJECT_ROOT"   # if this silently fails, docker compose runs from the wrong directory

Without set -e, if cd "$PROJECT_ROOT" fails (e.g., bad path), the script continues running in the wrong directory. The docker compose -f docker/docker-compose.yml command would then fail to find the compose file — still an error, but with a less helpful message.

Summary of options:

Approach Pro Con
Remove set -e Simple, $? check works Loses safety net on cd and path resolution
Keep set -e, move certbot into if Keeps implicit safety on other commands Slightly more verbose
Remove set -e + add explicit cd check Best of both More verbose

The cleanest minimal change is still moving the certbot command into the if condition (as suggested), but removing set -e is a perfectly acceptable simplification for a script this short — the failure modes are easy to reason about.


🧠 Learnings used
Learnt from: youngchingjui
Repo: youngchingjui/issue-to-pr PR: 1282
File: shared/src/ports/events/publisher.ts:16-19
Timestamp: 2025-09-22T09:24:26.840Z
Learning: youngchingjui prefers to manage PR scope tightly and defer technical debt improvements to future PRs rather than expanding the current PR's scope, even for related consistency issues.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer more articulate, clearer logic paths that a more junior developer can understand, rather than something more nifty. Code is writing, after all, and clearer writing wins. So I'd prefer to remove set -e, and be more explicit about checks and failure modes.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@youngchingjui, that's a great principle. Here's the script rewritten with set -e removed and every failure mode handled explicitly:

#!/bin/bash
# Certificate Renewal Script
#
# This script should be run periodically (via cron) to renew SSL certificates
# and reload NGINX to pick up the new certificates.
#
# Add to crontab (runs daily at 2am):
#   0 2 * * * /path/to/issue-to-pr/docker/scripts/renew-certs.sh >> /var/log/certbot-renewal.log 2>&1

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"

echo "=== Certificate Renewal: $(date) ==="

# Change to project directory
cd "$PROJECT_ROOT"
if [ $? -ne 0 ]; then
    echo "❌ Failed to change to project root: $PROJECT_ROOT"
    exit 1
fi

# Attempt to renew certificates
echo "Attempting certificate renewal..."
docker compose -f docker/docker-compose.yml --profile prod run --rm certbot renew

if [ $? -ne 0 ]; then
    echo "❌ Certificate renewal failed"
    exit 1
fi

echo "Certificate renewal successful (or not needed yet)"

# Reload NGINX to pick up any new certificates
echo "Reloading NGINX configuration..."
docker compose -f docker/docker-compose.yml exec nginx nginx -s reload

if [ $? -ne 0 ]; then
    echo "⚠️  Warning: Failed to reload NGINX"
    exit 1
fi

echo "✅ NGINX reloaded successfully"

Each command is run on its own line, followed by an explicit $? check — so a junior developer can read straight down the file and understand exactly what's being tested and what happens when it fails.


✏️ Learnings added
Learnt from: youngchingjui
Repo: youngchingjui/issue-to-pr PR: 1515
File: docker/scripts/renew-certs.sh:10-39
Timestamp: 2026-03-13T09:14:43.223Z
Learning: youngchingjui prefers explicit, readable shell scripting over concise idioms (e.g., prefers running a command then checking $? on the next line over embedding the command in an if condition). Values clarity for junior developers over brevity. Avoids set -e in favour of explicit per-command failure checks.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: youngchingjui
Repo: youngchingjui/issue-to-pr PR: 1282
File: shared/src/ports/events/publisher.ts:16-19
Timestamp: 2025-09-22T09:24:26.840Z
Learning: youngchingjui prefers to manage PR scope tightly and defer technical debt improvements to future PRs rather than expanding the current PR's scope, even for related consistency issues.

Comment thread docs/deployment/production-checklist.md Outdated
Comment thread docs/deployment/README.md Outdated
Comment thread scripts/deploy-production.sh Outdated
Comment on lines +84 to +93
bash docker/scripts/check-nginx-prereqs.sh || {
echo ""
echo -e "${YELLOW}⚠️ Pre-flight checks found potential issues.${NC}"
echo "Continue anyway? (y/N) "
read -r response
if [[ ! "$response" =~ ^[Yy]$ ]]; then
echo "Deployment cancelled."
exit 1
fi
}
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

read will fail silently in non-interactive contexts (e.g., CI/CD).

When stdin is not a terminal (piped input, CI runner), read -r response gets EOF, response is empty, and the deployment is silently cancelled. If this script is intended for CI/CD use, consider checking for a TTY or defaulting to abort with a clear message.

Suggested improvement
         bash docker/scripts/check-nginx-prereqs.sh || {
             echo ""
             echo -e "${YELLOW}⚠️  Pre-flight checks found potential issues.${NC}"
-            echo "Continue anyway? (y/N) "
-            read -r response
-            if [[ ! "$response" =~ ^[Yy]$ ]]; then
-                echo "Deployment cancelled."
-                exit 1
+            if [ -t 0 ]; then
+                echo "Continue anyway? (y/N) "
+                read -r response
+                if [[ ! "$response" =~ ^[Yy]$ ]]; then
+                    echo "Deployment cancelled."
+                    exit 1
+                fi
+            else
+                echo "Non-interactive mode detected. Use --skip-checks to bypass."
+                exit 1
             fi
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
bash docker/scripts/check-nginx-prereqs.sh || {
echo ""
echo -e "${YELLOW}⚠️ Pre-flight checks found potential issues.${NC}"
echo "Continue anyway? (y/N) "
read -r response
if [[ ! "$response" =~ ^[Yy]$ ]]; then
echo "Deployment cancelled."
exit 1
fi
}
bash docker/scripts/check-nginx-prereqs.sh || {
echo ""
echo -e "${YELLOW}⚠️ Pre-flight checks found potential issues.${NC}"
if [ -t 0 ]; then
echo "Continue anyway? (y/N) "
read -r response
if [[ ! "$response" =~ ^[Yy]$ ]]; then
echo "Deployment cancelled."
exit 1
fi
else
echo "Non-interactive mode detected. Use --skip-checks to bypass."
exit 1
fi
}
🤖 Prompt for AI Agents
In `@scripts/deploy-production.sh` around lines 84 - 93, The read-based prompt
after running check-nginx-prereqs.sh uses `read -r response` which fails
silently in non-interactive contexts; modify the block to detect a TTY (e.g.,
test with `[ -t 0 ]`) and only prompt when interactive, otherwise print a clear
message and abort (exit 1) or respect an explicit override (e.g., a --yes /
FORCE_CONTINUE env var) so the script does not silently cancel in CI; update the
logic around the `read -r response` and the `response` check to implement this
TTY check and optional override.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the consideration, but I don't think we intend to use this script for those purposes, but we should clearly state that at the top of the file or in a readme or both.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@youngchingjui, makes sense! Here are concrete suggestions for both locations:

1. Add a note to the script header (scripts/deploy-production.sh, after line 13):

 # Options:
 #   --with-nginx      Deploy NGINX reverse proxy (requires ports 80/443)
 #   --skip-checks     Skip pre-flight validation checks
 #   --help            Show this help message
+#
+# NOTE: This script is intended for interactive, manual deployments only.
+# It is NOT designed for use in CI/CD pipelines or non-interactive environments.

2. Add a note to the deployment README (docs/deployment/README.md), in whichever section covers the production deployment script:

> **Note:** `scripts/deploy-production.sh` is designed for **interactive, manual use** only (e.g., deploying from an SSH session on the server). It is not intended for CI/CD pipelines or automated non-interactive environments.

Would you like me to plan these documentation changes, or would you prefer to add them manually?


🧠 Learnings used
Learnt from: youngchingjui
Repo: youngchingjui/issue-to-pr PR: 1282
File: shared/src/ports/events/publisher.ts:16-19
Timestamp: 2025-09-22T09:24:26.840Z
Learning: youngchingjui prefers to manage PR scope tightly and defer technical debt improvements to future PRs rather than expanding the current PR's scope, even for related consistency issues.

Comment thread scripts/deploy-production.sh Outdated
Comment on lines +121 to +132
# Wait for services to be ready
echo "Waiting for services to initialize..."
for i in {1..30}; do
if docker compose -f docker/docker-compose.yml ps | grep -q "healthy"; then
break
fi
sleep 1
echo -n "."
done
echo ""
echo -e "${GREEN}✅ Core services ready${NC}"
echo ""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Health-wait loop is unreliable: breaks on any "healthy" service and reports success on timeout.

Two issues:

  1. grep -q "healthy" matches if any service is healthy, not necessarily neo4j and redis specifically.
  2. If the loop exhausts 30 iterations without a healthy match, the script still prints "Core services ready" — there's no timeout/failure path.
Suggested fix
 # Wait for services to be ready
 echo "Waiting for services to initialize..."
-for i in {1..30}; do
-    if docker compose -f docker/docker-compose.yml ps | grep -q "healthy"; then
+READY=0
+for _ in {1..30}; do
+    # Check that both neo4j and redis are healthy
+    UNHEALTHY=$(docker compose -f docker/docker-compose.yml ps neo4j redis | grep -cv "healthy" || true)
+    if [ "$UNHEALTHY" -le 1 ]; then  # header line always present
+        READY=1
         break
     fi
     sleep 1
     echo -n "."
 done
 echo ""
-echo -e "${GREEN}✅ Core services ready${NC}"
+if [ "$READY" -eq 1 ]; then
+    echo -e "${GREEN}✅ Core services ready${NC}"
+else
+    echo -e "${RED}❌ Core services did not become healthy within 30 seconds${NC}"
+    docker compose -f docker/docker-compose.yml ps neo4j redis
+    exit 1
+fi
 echo ""
🧰 Tools
🪛 Shellcheck (0.11.0)

[warning] 123-123: i appears unused. Verify use (or export if used externally).

(SC2034)

🤖 Prompt for AI Agents
In `@scripts/deploy-production.sh` around lines 121 - 132, The health-wait loop
currently breaks on any "healthy" match and always prints success; change it to
explicitly verify that both neo4j and redis are healthy before breaking and to
fail if the timeout is reached. Update the for-loop that runs `docker compose -f
docker/docker-compose.yml ps | grep -q "healthy"` to instead check the status
lines for the specific service names "neo4j" and "redis" (e.g., confirm both
have "healthy" in their status), and after the loop add a conditional: if either
service is not healthy after 30 attempts print a descriptive error and exit
non‑zero (instead of printing the success message), otherwise print the existing
success message.

@youngchingjui
Copy link
Copy Markdown
Owner Author

Testing Plan for Nginx Migration PR

This PR is a config migration (moving nginx from a standalone repo into the monorepo), not new application code. Testing verifies the configs work identically in the new setup, with zero downtime.

Key config change

The only meaningful behavioral difference: proxy_pass http://localhost:3000proxy_pass http://host.docker.internal:3000. This is correct — the old nginx runs on the host, the new one runs in Docker so it needs host.docker.internal to reach the host's port 3000.


Phase 1: Local/Offline Validation (safe, do anytime)

  1. Nginx config syntax check — run locally with Docker:

    git checkout feature/migrate-nginx-config
    docker run --rm -v $(pwd)/docker/nginx:/etc/nginx:ro nginx:stable-alpine nginx -t
  2. Review compose integration — verify docker/compose/nginx.yml is included in docker/docker-compose.yml and the prod profile prevents accidental startup in dev.

  3. Run the prereqs script (on EC2):

    bash docker/scripts/check-nginx-prereqs.sh

    Checks ports, networks, certs, and running containers — gives a go/no-go.


Phase 2: Parallel Test on EC2 (non-destructive)

Run the new nginx alongside the old one on different ports to verify it works before cutting over.

  1. Pull the branch on EC2:

    cd ~/issue-to-pr
    git fetch && git checkout feature/migrate-nginx-config
  2. Start new nginx on alternate ports (old one keeps 80/443):

    export NGINX_HTTP_PORT=8080
    export NGINX_HTTPS_PORT=8443
    docker compose -f docker/docker-compose.yml --profile prod up -d nginx
  3. Test main app routing:

    # HTTP redirect
    curl -I http://localhost:8080 -H "Host: issuetopr.dev"
    # Should return 301 → https
    
    # HTTPS proxies to Next.js
    curl -k https://localhost:8443 -H "Host: issuetopr.dev"
    # Should return the app's HTML
  4. Test SSE endpoint:

    curl -k -N https://localhost:8443/api/sse -H "Host: issuetopr.dev" --max-time 5
    # Should connect (or return appropriate response), not 502
  5. Test preview subdomain routing (pick any running agent container):

    curl -k https://localhost:8443 -H "Host: <agent-container-name>.issuetopr.dev"
    # Should attempt to proxy to that container on the preview network
  6. Test Grafana routing:

    curl -k https://localhost:8443 -H "Host: grafana.issuetopr.dev"
  7. Check logs:

    docker compose -f docker/docker-compose.yml logs nginx --tail 50

Phase 3: Cutover (~30 seconds of risk)

Only after Phase 2 passes:

  1. Stop old nginx:

    docker stop young-and-ai-ec2-nginx-nginx-1
  2. Restart new nginx on standard ports:

    unset NGINX_HTTP_PORT NGINX_HTTPS_PORT
    docker compose -f docker/docker-compose.yml --profile prod up -d nginx
  3. Verify live:

    curl -I https://issuetopr.dev
  4. Rollback plan — if anything breaks:

    docker compose -f docker/docker-compose.yml --profile prod down
    docker start young-and-ai-ec2-nginx-nginx-1

    Old nginx comes right back. Total downtime: ~5 seconds.


Phase 4: Cleanup (after cutover confirmed)

  • Remove old nginx container: docker rm young-and-ai-ec2-nginx-nginx-1
  • Archive the young-and-ai-ec2-nginx GitHub repo
  • Set up cert renewal cron from the new scripts
  • Remove the old GitHub Actions runner for the nginx repo

@youngchingjui
Copy link
Copy Markdown
Owner Author

PR #1515 Code Review — NGINX Migration

Phase 1: Local Validation Results

1. NGINX Config Syntax Check (nginx -t) — FAILED ❌

docker run --rm -v $(pwd)/docker/nginx:/etc/nginx:ro nginx:stable-alpine nginx -t

Result:

nginx: [emerg] open() "/etc/nginx/mime.types" failed (2: No such file or directory) in /etc/nginx/nginx.conf:29

Root cause: The volume mount ../nginx:/etc/nginx:ro (in docker/compose/nginx.yml) replaces the entire /etc/nginx directory, overwriting the image's built-in mime.types, fastcgi_params, etc. The project's docker/nginx/ directory only contains nginx.conf, conf.d/, and README.md.

Fix options:

  1. Mount individual files/directories instead of the whole /etc/nginx:
    volumes:
      - ../nginx/nginx.conf:/etc/nginx/nginx.conf:ro
      - ../nginx/conf.d:/etc/nginx/conf.d:ro
  2. Or copy mime.types from the nginx image into docker/nginx/.

Additionally, the compose file mounts ../nginx/default.d:/etc/nginx/default.d:ro but no default.d/ directory exists in the project. While nginx may silently ignore this (the include uses a glob), the mount itself will fail.

Verdict: This config will not start in production as-is.


2. Docker Compose Integration — OK ✅ (with caveats)

  • docker/compose/nginx.yml is properly included in docker/docker-compose.yml via the include: directive.
  • The prod profile gating is correctly applied to both nginx and certbot services.
  • The preview network is correctly declared as external: true.
  • issue-to-pr-network is referenced by the nginx service but not declared in nginx.yml — this works because compose include merges configs and the parent declares it.

Issue: The preview network being external means docker compose up will fail if the network doesn't exist yet. The deploy script handles this by creating it first, but running docker compose --profile prod up -d nginx directly (as documented in several READMEs) will fail without that step.


3. proxy_pass — Consistent ✅

All proxy_pass directives correctly use host.docker.internal:

  • issuetopr.dev.confhttp://host.docker.internal:3000
  • grafana.issuetopr.dev.confhttp://host.docker.internal:3001
  • preview.issuetopr.dev.conf → dynamic $upstream (container name on preview network)

The compose file correctly includes extra_hosts: ["host.docker.internal:host-gateway"] for Linux compatibility.

No references to localhost:3000 — good.


4. Script Audit

check-nginx-prereqs.sh — Minor issues ⚠️

  • Good: Checks ports, networks, SSL certs, config syntax, conflicting containers.
  • Issue: Line docker run --rm -v "$(pwd)/docker/nginx:/etc/nginx:ro" has the same full-directory mount bug — this test will fail due to missing mime.types.
  • Nitpick: Uses emojis in output which may not render on all terminals.

check-secrets.sh — OK ✅

  • Validates porkbun.ini existence, readability, and content.
  • Falls through to exec certbot "$@" which is clean.
  • Permissions guidance is appropriate (chmod 700 on dir, 600 on file).

renew-certs.sh — Bug ⚠️

  • Bug: Script has set -e at the top, but then does:
    docker compose ... run --rm certbot renew
    if [ $? -eq 0 ]; then  # <-- dead code
    With set -e, if certbot fails, the script exits immediately — the else branch is unreachable dead code. Fix: either remove set -e and handle errors manually, or use if docker compose ...; then pattern directly.

deploy-production.sh — Several issues ⚠️

  1. set -e + $? pattern (same bug as renew-certs.sh): The healthcheck wait loop at line ~120 uses if docker compose ... ps | grep -q "healthy" which is fine, but the certbot renew pattern should be checked for consistency.

  2. source docker/env/.env.nginx (line ~153): Sourcing an env file executes it as shell code. If the file contains anything beyond KEY=value, it could execute arbitrary commands. Use env $(cat file | xargs) or export $(grep -v '^#' file | xargs) instead.

  3. read -r response (line ~102): This blocks for interactive input but the script is likely to be run in CI/non-interactive environments. Consider adding a --yes/-y flag to skip prompts.

  4. No cd to project root: The script assumes it's run from the project root (docker/docker-compose.yml relative path) but doesn't validate this. A cd "$(dirname "$0")/.." at the top would be safer.


5. Full Code Review

Critical Issues

# Severity File Issue
1 🔴 Critical docker/compose/nginx.yml Full /etc/nginx mount overwrites mime.typesNGINX won't start
2 🔴 Critical docker/compose/nginx.yml Mount of ../nginx/default.d references non-existent directory
3 🟡 Medium docker/scripts/renew-certs.sh set -e makes error handling branch unreachable
4 🟡 Medium scripts/deploy-production.sh source .env.nginx is unsafe — executes as shell code
5 🟡 Medium docker/nginx/conf.d/issuetopr.dev.conf SSE block sets Connection 'upgrade' then overrides with Connection '' — the first set is misleading dead config

Security Observations

# Severity Issue
1 🟡 Medium No rate limiting configured (documented in checklist but should be included in initial config)
2 🟡 Medium No security headers (X-Frame-Options, X-Content-Type-Options, etc.)
3 🟢 Low sub_filter in preview config injects a <script> tag — benign (console.log only) but worth noting
4 🟢 Low Certbot Dockerfile uses certbot/certbot:latest — should pin to a specific version for reproducibility

Documentation Concerns

# Issue
1 MIGRATION_COMPLETE.md (232 lines) is a one-time status report, not ongoing documentation. It references files that don't exist in this PR (staging.youngandai.com.conf, financial-suitability.poc.youngandai.com.conf, .env.nginx.example). Should be removed or converted to a PR description.
2 docs/deployment/CODE_REVIEW_FIXES.md (233 lines) is internal process documentation ("Sam's review"). Should not be committed to the repo.
3 .codex/environments/environment.toml is unrelated to the nginx migration.
4 CLAUDE.md changes add a generic "Workflow for Making Changes" section unrelated to nginx.
5 Excessive documentation volume (5 new markdown files, ~1300+ lines) for what is essentially an nginx config migration.

Positive Aspects ✅

  • Clean separation of nginx configs into conf.d/ per-domain files
  • Proper SSE support with proxy_buffering off and long timeouts
  • WebSocket upgrade map in nginx.conf
  • Preview URL routing with dynamic DNS resolution is well-designed
  • Healthcheck includes nginx -t (good catch from earlier review)
  • Deploy script has smoke tests
  • Profile-based gating (prod) keeps dev environment clean
  • extra_hosts for Linux host.docker.internal support

Summary

This PR should not be merged as-is. The critical volume mount issue means NGINX will fail to start. Minimum required fixes before merge:

  1. Fix the /etc/nginx volume mount to not overwrite mime.types
  2. Remove or create the default.d/ directory mount
  3. Fix set -e + $? dead code in renew-certs.sh
  4. Remove MIGRATION_COMPLETE.md and CODE_REVIEW_FIXES.md from the PR
  5. Remove unrelated changes (.codex/environments/environment.toml, generic CLAUDE.md additions)

After those fixes, re-run nginx -t to confirm the config is valid before merging.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docker/nginx/README.md`:
- Around line 15-23: The README.md contains a fenced code block without a
language tag (triggering markdownlint MD040); update the opening fence in the
docker/nginx README's directory-listing block to include a language (e.g., add
"text" after the triple backticks) so the block becomes a labeled code fence and
satisfies the linter.
- Around line 113-118: Add a one-line prerequisite that ensures the external
Docker network named "preview" exists before the "Start NGINX (Production
Profile)" docker compose command; update the section containing the docker
compose command (the block with "docker compose -f docker/docker-compose.yml
--profile prod up -d nginx") to include a preceding line that runs a safe create
for the "preview" network (e.g., docker network create preview 2>/dev/null ||
true) so the compose up won't fail on fresh environments.
- Line 22: Remove the stale "default.d/" entry from the README directory tree:
edit the docker/nginx/README.md to delete the "└── default.d/           #
Default server configurations" line (and any mention of default.d) so docs match
the repo; if you prefer to keep the include in nginx.conf (the "include
/etc/nginx/default.d/*.conf" at line 53), alternatively create the local
default.d directory and add it to docker-compose mounts, but the immediate fix
is to remove the nonexistent default.d reference from the README so the
documentation reflects the actual configuration.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8c25a85e-a032-437b-9968-2ded761e8bf5

📥 Commits

Reviewing files that changed from the base of the PR and between 25b94c3 and 670e574.

📒 Files selected for processing (2)
  • docker/compose/nginx.yml
  • docker/nginx/README.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • docker/compose/nginx.yml

Comment thread docker/nginx/README.md Outdated
Comment thread docker/nginx/README.md Outdated
Comment thread docker/nginx/README.md Outdated
@youngchingjui
Copy link
Copy Markdown
Owner Author

Follow-up: Simplification Recommendations

After reviewing all the other reviewer comments (CodeRabbit, Codex), here's a consolidated take. The core nginx migration is solid — the problems are in the wrapping scripts and documentation. Less code = fewer bugs.


Challenge: Do the scripts need to exist?

check-nginx-prereqs.sh (124 lines) — Consider deleting

Docker Compose already gives clear errors for port conflicts, missing networks, and bad configs. This script duplicates that, and has its own bugs ($(pwd) assumption, same full /etc/nginx mount issue). If docker compose --profile prod up -d nginx fails, Docker tells you why.

deploy-production.sh (254 lines) — Consider replacing with documented commands

The script orchestrates what is essentially:

docker network create preview 2>/dev/null || true
docker compose -f docker/docker-compose.yml --profile prod up -d

It adds smoke tests, interactive prompts, and colored output — but also introduces bugs (set -e + read, unsafe source .env.nginx, no cd to project root). A 10-line script or just documented commands would be more maintainable.

renew-certs.sh (41 lines) — This is a 2-line crontab entry

# Crontab (daily at 2am):
0 2 * * * docker compose -f /path/to/docker/docker-compose.yml --profile prod run --rm certbot renew && docker compose -f /path/to/docker/docker-compose.yml exec nginx nginx -s reload >> /var/log/certbot-renewal.log 2>&1

The set -e + $? dead code bug wouldn't exist if this was just a crontab line.


Reviewer suggestions — simpler alternatives

Reviewer suggestion Their fix Simpler alternative
renew-certs.sh set-e bug Run certbot in if conditional Delete the script, use 2-line crontab
deploy-production.sh TTY check Add --yes flag + TTY detection Remove the interactive prompt (or the script)
check-nginx-prereqs.sh $(pwd) Derive root from SCRIPT_DIR Delete the script
Healthcheck nginx -t every 10s HTTP-only check Agree — just wget --spider http://localhost:80/ suffices
source .env.nginx unsafe env $(cat file) pattern Use env_file: in compose YAML instead

Agreed-upon fixes from all reviewers

These are straightforward and should be applied:

  1. Volume mount — Mount individual paths, not all of /etc/nginx:

    volumes:
      - ../nginx/nginx.conf:/etc/nginx/nginx.conf:ro
      - ../nginx/conf.d:/etc/nginx/conf.d:ro
      - /etc/letsencrypt:/etc/letsencrypt:ro

    Remove the default.d mount entirely (directory doesn't exist).

  2. SSE location block — Remove the misleading WebSocket headers. Clean version:

    location /api/sse {
        proxy_pass http://host.docker.internal:3000;
        proxy_http_version 1.1;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_set_header Connection '';
        proxy_buffering off;
        proxy_cache off;
        chunked_transfer_encoding off;
        proxy_read_timeout 86400s;
        proxy_send_timeout 86400s;
    }
  3. Preview regex — Tighten from (?<preview>.+) to (?<preview>[a-z0-9-]+) for safety.

  4. Pin certbot Dockerfile — Use a specific version instead of :latest.

  5. Drop nginx -t from healthcheck — Running config parse every 10s is wasteful; HTTP check alone is sufficient.


Documentation: what to keep vs. delete

Keep:

  • docker/nginx/README.md — useful, close to the code
  • docker/README.md updates — good additions

Delete from PR:

  • MIGRATION_COMPLETE.md — one-time status report, references files not in this PR (staging.youngandai.com.conf, .env.nginx.example)
  • docs/deployment/CODE_REVIEW_FIXES.md — internal AI review session log, not repo documentation
  • docs/deployment/production-checklist.md (388 lines) — the useful bits (cert renewal cron, log rotation config) are ~10 lines that belong in the nginx README
  • docs/deployment/README.md (299 lines) — largely duplicates docker/README.md

Also remove unrelated changes:

  • .codex/environments/environment.toml
  • Generic "Workflow for Making Changes" addition to CLAUDE.md

Proposed minimal PR

The simplest version of this migration:

  1. docker/nginx/nginx.conf + conf.d/*.conf (with fixes above)
  2. docker/compose/nginx.yml (fixed volumes, simpler healthcheck)
  3. docker/certbot-porkbun/Dockerfile (pinned version)
  4. docker/docker-compose.yml (one include line)
  5. docker/README.md updates (nginx section + cert renewal crontab)
  6. docker/nginx/README.md
  7. docker/scripts/check-secrets.sh (certbot entrypoint — this one earns its keep)

That's ~200 lines of actual config, zero script bugs, zero doc duplication. Everything else can be added incrementally if/when it's actually needed.

- Deleted `MIGRATION_COMPLETE.md`, `environment.toml`, and `CODE_REVIEW_FIXES.md` as they are no longer needed.
- Removed the `production-checklist.md` to streamline deployment documentation.
- Updated `CLAUDE.md` to reflect the removal of the production checklist and adjusted the documentation index accordingly.
- Enhanced `docker/nginx/README.md` with clearer instructions for SSL certificate management and deployment processes.
- Improved NGINX configuration files for better readability and organization.

These changes aim to simplify the documentation and improve the clarity of deployment instructions.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
docker/nginx/conf.d/issuetopr.dev.conf (2)

1-44: Consider adding security headers for the HTTPS server block.

The configuration lacks common security headers that harden the deployment. These are particularly important for a production site handling user authentication.

📋 Recommended headers
server {
    server_name issuetopr.dev www.issuetopr.dev;

    # Security headers
    add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
    add_header X-Content-Type-Options "nosniff" always;
    add_header X-Frame-Options "SAMEORIGIN" always;
    add_header Referrer-Policy "strict-origin-when-cross-origin" always;

    # ... rest of config
}

This can be deferred to a follow-up PR if desired, given the scope of this migration.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/nginx/conf.d/issuetopr.dev.conf` around lines 1 - 44, Add common
security headers to the HTTPS server block shown (the server { ... } that
contains listen 443 ssl and the location blocks) by inserting add_header
directives for Strict-Transport-Security "max-age=31536000; includeSubDomains"
always;, X-Content-Type-Options "nosniff" always;, X-Frame-Options "SAMEORIGIN"
always;, and Referrer-Policy "strict-origin-when-cross-origin" always; so they
are sent on all responses; place these lines near the top of the server block
(above the location blocks) so they apply to both / and /api/sse responses.

37-41: Consider enabling HTTP/2 for improved performance.

HTTP/2 provides multiplexing, header compression, and other benefits for modern browsers. Adding http2 to the listen directive is straightforward.

♻️ Proposed fix
-    listen 443 ssl; # managed by Certbot
+    listen 443 ssl http2; # managed by Certbot
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/nginx/conf.d/issuetopr.dev.conf` around lines 37 - 41, The nginx SSL
server block currently uses the directive "listen 443 ssl;" without HTTP/2;
update the listen directive to "listen 443 ssl http2;" (i.e., add http2 to the
existing listen 443 ssl line) so the server block that includes ssl_certificate,
ssl_certificate_key, include /etc/letsencrypt/options-ssl-nginx.conf, and
ssl_dhparam uses HTTP/2 for improved performance; after change, reload/restart
nginx to apply.
docs/deployment/README.md (1)

85-90: Consider using a renewal script instead of inline cron command.

The crontab entry is duplicated (here and line 276) and spans a single long line, making it error-prone. A wrapper script in docker/scripts/ would be more maintainable.

📝 Alternative approach

Create docker/scripts/renew-certs.sh (if not already present) and reference it:

# Add to crontab (runs daily at 2am):
0 2 * * * /path/to/issue-to-pr/docker/scripts/renew-certs.sh >> /var/log/certbot-renewal.log 2>&1

This centralizes the renewal logic and avoids duplication in documentation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/deployment/README.md` around lines 85 - 90, The crontab example in the
README is long, duplicated, and error-prone; create a wrapper script named
docker/scripts/renew-certs.sh that encapsulates the renew and reload commands
(the docker compose certbot renew invocation and the nginx reload), make the
script executable and handle logging/exit codes, then update the README examples
(the crontab line shown and the duplicate later) to reference the script as "0 2
* * * /path/to/issue-to-pr/docker/scripts/renew-certs.sh >>
/var/log/certbot-renewal.log 2>&1" so the renewal logic is centralized and the
docs no longer contain the long inline command.
docker/certbot-porkbun/Dockerfile (1)

1-3: Pin the certbot-dns-porkbun version for reproducibility.

The base image is pinned to v3.3.0, but certbot-dns-porkbun is unpinned. This could lead to unexpected behavior if a breaking change is released. The latest version available is 0.11.0.

Note: The Trivy warning about running as root is expected—certbot requires root privileges to write certificates to /etc/letsencrypt.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/certbot-porkbun/Dockerfile` around lines 1 - 3, Pin the
certbot-dns-porkbun dependency in the Dockerfile by changing the RUN pip install
step (the line that calls pip install certbot-dns-porkbun) to install a specific
version — e.g., certbot-dns-porkbun==0.11.0 — while keeping the existing flags
like --no-cache-dir to ensure reproducible builds.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/deployment/README.md`:
- Around line 295-301: Rename the duplicate "Monitoring" subsection heading at
the second occurrence to a unique title (for example "Uptime & Health
Monitoring" or "Service Monitoring") to avoid conflict with the earlier
"Monitoring" heading; update the heading text in the docs where the second
"Monitoring" appears and ensure any internal links or TOC entries that pointed
to that subsection are updated to the new heading name so links remain correct.

---

Nitpick comments:
In `@docker/certbot-porkbun/Dockerfile`:
- Around line 1-3: Pin the certbot-dns-porkbun dependency in the Dockerfile by
changing the RUN pip install step (the line that calls pip install
certbot-dns-porkbun) to install a specific version — e.g.,
certbot-dns-porkbun==0.11.0 — while keeping the existing flags like
--no-cache-dir to ensure reproducible builds.

In `@docker/nginx/conf.d/issuetopr.dev.conf`:
- Around line 1-44: Add common security headers to the HTTPS server block shown
(the server { ... } that contains listen 443 ssl and the location blocks) by
inserting add_header directives for Strict-Transport-Security "max-age=31536000;
includeSubDomains" always;, X-Content-Type-Options "nosniff" always;,
X-Frame-Options "SAMEORIGIN" always;, and Referrer-Policy
"strict-origin-when-cross-origin" always; so they are sent on all responses;
place these lines near the top of the server block (above the location blocks)
so they apply to both / and /api/sse responses.
- Around line 37-41: The nginx SSL server block currently uses the directive
"listen 443 ssl;" without HTTP/2; update the listen directive to "listen 443 ssl
http2;" (i.e., add http2 to the existing listen 443 ssl line) so the server
block that includes ssl_certificate, ssl_certificate_key, include
/etc/letsencrypt/options-ssl-nginx.conf, and ssl_dhparam uses HTTP/2 for
improved performance; after change, reload/restart nginx to apply.

In `@docs/deployment/README.md`:
- Around line 85-90: The crontab example in the README is long, duplicated, and
error-prone; create a wrapper script named docker/scripts/renew-certs.sh that
encapsulates the renew and reload commands (the docker compose certbot renew
invocation and the nginx reload), make the script executable and handle
logging/exit codes, then update the README examples (the crontab line shown and
the duplicate later) to reference the script as "0 2 * * *
/path/to/issue-to-pr/docker/scripts/renew-certs.sh >>
/var/log/certbot-renewal.log 2>&1" so the renewal logic is centralized and the
docs no longer contain the long inline command.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: db1ed645-909c-4e13-9b88-a88b53c626ea

📥 Commits

Reviewing files that changed from the base of the PR and between 670e574 and 42a02e0.

📒 Files selected for processing (8)
  • CLAUDE.md
  • docker/certbot-porkbun/Dockerfile
  • docker/compose/nginx.yml
  • docker/nginx/README.md
  • docker/nginx/conf.d/issuetopr.dev.conf
  • docker/nginx/conf.d/preview.issuetopr.dev.conf
  • docker/nginx/nginx.conf
  • docs/deployment/README.md
✅ Files skipped from review due to trivial changes (1)
  • docker/nginx/README.md
🚧 Files skipped from review as they are similar to previous changes (4)
  • docker/nginx/conf.d/preview.issuetopr.dev.conf
  • docker/compose/nginx.yml
  • CLAUDE.md
  • docker/nginx/nginx.conf

Comment thread docs/deployment/README.md Outdated
Comment on lines +295 to +301
### Monitoring

Set up uptime monitoring (e.g. UptimeRobot free tier) for your domain. Monitor:

- HTTPS response on your domain
- SSL certificate expiration
- Disk space on the server
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Rename to avoid duplicate heading.

The static analysis tool flagged a duplicate heading—"Monitoring" appears at both line 244 (section heading) and line 295 (subsection). This can confuse document linking and navigation.

📝 Proposed fix
-### Monitoring
+### Uptime Monitoring
 
 Set up uptime monitoring (e.g. UptimeRobot free tier) for your domain. Monitor:
🧰 Tools
🪛 markdownlint-cli2 (0.21.0)

[warning] 295-295: Multiple headings with the same content

(MD024, no-duplicate-heading)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/deployment/README.md` around lines 295 - 301, Rename the duplicate
"Monitoring" subsection heading at the second occurrence to a unique title (for
example "Uptime & Health Monitoring" or "Service Monitoring") to avoid conflict
with the earlier "Monitoring" heading; update the heading text in the docs where
the second "Monitoring" appears and ensure any internal links or TOC entries
that pointed to that subsection are updated to the new heading name so links
remain correct.

@youngchingjui
Copy link
Copy Markdown
Owner Author

Code Review: NGINX Reverse Proxy Configuration

Overall this is a solid setup — well-documented, clean compose structure, and good use of profiles. Below are findings organized by severity.


Security Issues

1. Missing security headers on all server blocks (High)

None of the NGINX server blocks set security headers. For a production deployment, you should add at minimum:

add_header Strict-Transport-Security "max-age=63072000; includeSubDomains; preload" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-Frame-Options "DENY" always;
add_header Referrer-Policy "strict-origin-when-cross-origin" always;

Consider adding these in nginx.conf at the http block level so they apply globally, or per server block if different policies are needed (e.g., Grafana needs X-Frame-Options: SAMEORIGIN).

2. No rate limiting (Medium)

There's no limit_req_zone or limit_conn_zone configured. The preview subdomain routing is especially risky — an attacker could enumerate subdomains or flood preview containers. Consider adding:

limit_req_zone $binary_remote_addr zone=general:10m rate=10r/s;

3. www.issuetopr.dev redirects to itself, not the bare domain (Low)

In issuetopr.dev.conf, the HTTP→HTTPS redirect for www.issuetopr.dev sends users to https://www.issuetopr.dev instead of canonicalizing to https://issuetopr.dev. This means www still serves traffic from the main server block. Consider:

if ($host = www.issuetopr.dev) {
    return 301 https://issuetopr.dev$request_uri;
}

Correctness Issues

4. WebSocket Connection header uses hardcoded string instead of map variable (Medium)

nginx.conf defines a map $http_upgrade $connection_upgrade block, but none of the proxy configs use $connection_upgrade. Instead they all hardcode:

proxy_set_header Connection 'upgrade';

This means non-WebSocket requests also get Connection: upgrade, which can cause issues with HTTP/1.1 keepalive. Use the map variable instead:

proxy_set_header Connection $connection_upgrade;

5. Grafana uses a separate cert path, inconsistent with wildcard setup (Medium)

grafana.issuetopr.dev.conf points to /etc/letsencrypt/live/grafana.issuetopr.dev/ while the preview config uses the wildcard cert at /etc/letsencrypt/live/issuetopr.dev-0001/. Since grafana.issuetopr.dev is a subdomain of issuetopr.dev, the wildcard cert should cover it. Using two different cert paths means you need to manage separate certificates — was this intentional? If not, Grafana should use the wildcard cert too.

6. Preview sub_filter disables upstream compression (Low-Medium)

proxy_set_header Accept-Encoding "";

This forces upstream to send uncompressed responses so sub_filter can work, but it means all proxied content (JS, CSS, images) from preview containers is sent uncompressed to NGINX, which then also doesn't re-compress it since gzip isn't configured in nginx.conf. This could significantly increase bandwidth usage. Consider:

  • Adding gzip on; and related directives to nginx.conf
  • Or limiting sub_filter and Accept-Encoding "" to only HTML responses via a separate location block or conditional

Best Practice Issues

7. No gzip compression in nginx.conf (Medium)

nginx.conf has no gzip configuration. For a production reverse proxy, add:

gzip on;
gzip_vary on;
gzip_proxied any;
gzip_comp_level 6;
gzip_types text/plain text/css application/json application/javascript text/xml application/xml text/javascript;

8. Log rotation not configured in compose (Low)

The deployment guide mentions adding log rotation as a TODO, but the actual nginx.yml doesn't include it. Since this is a production config, add it now:

logging:
  driver: "json-file"
  options:
    max-size: "10m"
    max-file: "3"

9. Healthcheck may fail on HTTPS-only setups (Low)

The healthcheck hits http://localhost:80/ but the default server block only serves error pages. If NGINX is configured to redirect all HTTP to HTTPS, the healthcheck would get a 301 which wget --spider follows but may fail if certs aren't valid for localhost. Consider using curl -f http://localhost:80/ || exit 1 or a dedicated /health endpoint.

10. check-secrets.sh example email is hardcoded (Nit)

The script contains support@youngandai.com as an example email while the deployment guide uses admin@yourdomain.com. Pick one placeholder pattern.

11. Trailing return 404 in issuetopr.dev.conf HTTP block is confusing (Nit)

if ($host = www.issuetopr.dev) {
    return 301 https://$host$request_uri;
}
if ($host = issuetopr.dev) {
    return 301 https://$host$request_uri;
}
listen 80;
server_name issuetopr.dev www.issuetopr.dev;
return 404;

The return 404 is dead code (both valid hosts already matched by if blocks). This looks like Certbot auto-generated config. Clean it up to a simple return 301.


Documentation

The docs are comprehensive and well-organized. The docker/nginx/README.md and docs/deployment/README.md are excellent additions. The CLAUDE.md updates with the workflow process and README index are good.

One suggestion: the deployment guide's "Production Checklist" section recommends things (log rotation, monitoring) that could/should be included in the actual configs rather than left as manual follow-ups.


Summary

The architecture is sound — compose profiles, external preview network, certbot with Porkbun DNS-01, and the wildcard subdomain routing pattern are all well-designed. The main gaps are security hardening (headers, rate limiting) and compression. The WebSocket Connection header issue (#4) should definitely be fixed before merge.

@youngchingjui youngchingjui changed the title this needs review Add NGINX reverse proxy with SSL, preview routing, and Grafana integration Mar 16, 2026
- Removed the `certbot-porkbun` directory and its associated Dockerfile as SSL management is now handled differently.
- Updated `docker/nginx` configuration files to simplify SSL handling and improve readability, including changes to server blocks and proxy settings.
- Enhanced the `docker/nginx/README.md` to clarify the routing and SSL setup processes, emphasizing the use of HTTP for local development.
- Streamlined the deployment documentation in `docs/deployment/README.md`, focusing on the updated SSL setup and removing unnecessary steps.

These changes aim to improve the clarity and maintainability of the Docker setup and related documentation.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
docker/nginx/conf.d/preview.issuetopr.dev.conf (1)

25-28: Consider using the default sub_filter_once on for single-replacement scenarios.

Valid HTML documents have exactly one </body> tag, so the default sub_filter_once on would suffice and avoids scanning the entire response body after the first replacement. If set to off intentionally to handle malformed HTML edge cases, this is fine as-is.

♻️ Optional simplification
         sub_filter '</body>' '<script>try{console.log("[IssueToPR Preview] This site is securely hosted and managed by Issue To PR.");}catch(e){}</script></body>';
-        sub_filter_once off;
+        sub_filter_once on;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/nginx/conf.d/preview.issuetopr.dev.conf` around lines 25 - 28, The
nginx config currently forces sub_filter_once off which causes NGINX to scan the
whole response for multiple '</body>' matches; change sub_filter_once to the
default "on" so the single replacement of '</body>' (handled by the existing
sub_filter) stops after the first match—update the sub_filter_once directive
(and keep the sub_filter line unchanged) unless you intentionally need off to
handle malformed HTML.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@docker/nginx/conf.d/preview.issuetopr.dev.conf`:
- Around line 25-28: The nginx config currently forces sub_filter_once off which
causes NGINX to scan the whole response for multiple '</body>' matches; change
sub_filter_once to the default "on" so the single replacement of '</body>'
(handled by the existing sub_filter) stops after the first match—update the
sub_filter_once directive (and keep the sub_filter line unchanged) unless you
intentionally need off to handle malformed HTML.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 93ee47dd-c84c-493d-bcde-f7a2b906c32d

📥 Commits

Reviewing files that changed from the base of the PR and between 42a02e0 and f657066.

📒 Files selected for processing (8)
  • docker/README.md
  • docker/compose/nginx.yml
  • docker/nginx/README.md
  • docker/nginx/conf.d/grafana.issuetopr.dev.conf
  • docker/nginx/conf.d/issuetopr.dev.conf
  • docker/nginx/conf.d/preview.issuetopr.dev.conf
  • docker/nginx/nginx.conf
  • docs/deployment/README.md
🚧 Files skipped from review as they are similar to previous changes (7)
  • docker/README.md
  • docker/nginx/README.md
  • docker/nginx/conf.d/issuetopr.dev.conf
  • docker/compose/nginx.yml
  • docs/deployment/README.md
  • docker/nginx/conf.d/grafana.issuetopr.dev.conf
  • docker/nginx/nginx.conf

@youngchingjui
Copy link
Copy Markdown
Owner Author

You know I feel these changes all need to be grounded in some sort of user experience or developer experience outline or goal, so we can assess if this is the correct implementation. I think from the users perspective, they won't see much of a difference yet, except to avoid non renewed SSL certificates. So this is more of a developer devops change, and so we should also describe those goals clearly in our documentation somewhere in /docs. My thinking is that we need some level of single-command line deployment on a server to get everything up and running for issue to PR. Or at least a single command for just the basic level of issue to pr. And have additional modules available (like HTTPS, preview domains, etc) also setup with additional (single-line) commands. So then we'd need to think about fencing off modules, and having a clear diagram and commands about what does what. I'm not sure if we have clear docs about this yet, but that should be our guiding star. Any implementation detail in here should be tied to those goals (UX or DX). Do we have that mapping here? @codex

@chatgpt-codex-connector
Copy link
Copy Markdown

Summary

  • Added a new deployment strategy doc at docs/deployment/ux-dx-goals.md that defines explicit UX/DX goals, a baseline-vs-optional module model, single-command examples for each module, an architecture diagram, and a PR checklist to enforce goal-mapping for future infra changes. docs/deployment/ux-dx-goals.mdL1-L106
  • Updated docs/deployment/README.md with a new Deployment UX/DX Goals section that links to the new mapping doc and sets expectations for goal-driven deployment changes in this area. docs/deployment/README.mdL45-L53
  • Committed the changes on the current branch (db8cf436) and created a follow-up PR via the make_pr tool.

Testing

  • rg -n "Deployment UX/DX Goals|Deployment UX/DX Goals and Implementation Mapping" docs/deployment/README.md docs/deployment/ux-dx-goals.md

View task →

Comment thread docs/dev/deployment.md

```bash
docker network create preview
docker compose -f docker/docker-compose.yml --profile prod up -d
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this doesn't fully run the issue to pr NextJS app. That's for another issue, so we'll just need a placeholder of some sort, either in docker compose or we add to documentation here for now that you have to launch NextJS app manually or setup systemd

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
docs/deployment/README.md (1)

9-14: ⚠️ Potential issue | 🟠 Major

Local Development command is incomplete/misleading.

pnpm dev does not provision infra; the text on Line 13 says it starts Neo4j/Redis, which conflicts with the compose-based setup shown elsewhere. This can block first-time setup.

Suggested doc fix
 ### Local Development
 
 ```bash
-pnpm dev
+docker compose -f docker/docker-compose.yml up -d
+pnpm dev

-Starts Neo4j and Redis. Run Next.js and workers locally with hot reload.
+Starts Neo4j and Redis via Docker, then runs the app locally with hot reload.
+If workers are needed locally, start them separately (or document the exact command here).

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @docs/deployment/README.md around lines 9 - 14, Replace the misleading single
command "pnpm dev" with an explicit two-step instruction: first run the Docker
compose command that provisions Neo4j and Redis (reference the compose file
shown elsewhere, e.g., docker compose -f docker/docker-compose.yml up -d) and
then run the local dev server (pnpm dev); update the surrounding prose that
currently says "Starts Neo4j and Redis. Run Next.js and workers locally with hot
reload." to something like "Starts Neo4j and Redis via Docker, then runs the app
locally with hot reload." and add a short note about starting workers separately
(or include the exact worker start command) so readers aren’t misled into
thinking pnpm dev provisions infra.


</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🧹 Nitpick comments (1)</summary><blockquote>

<details>
<summary>docs/dev/README.md (1)</summary><blockquote>

`26-39`: **Consider moving the migration backlog out of this index page.**

The “TODO: Migrate existing docs here” section turns this file into both a docs index and a migration tracker. Prefer a separate migration-tracking doc/issue and keep this page as the stable navigation entry point.


As per coding guidelines, “Each documentation file should cover a single topic or component” and docs should be “broken down into logical, focused files.”

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @docs/dev/README.md around lines 26 - 39, The "TODO: Migrate existing docs
here" migration backlog in docs/dev/README.md should be moved out of the index
page: remove the large migration bullet list under the "TODO: Migrate existing
docs here" header from docs/dev/README.md and create a separate
migration-tracking doc (e.g., docs/dev/migration-backlog.md) that contains the
listed mappings (docs/deployment/README.md → docs/dev/deployment.md, etc.);
leave docs/dev/README.md as a stable navigation entry that links to the new
migration-tracking file and to the finalized dev docs (architecture,
authentication, ai-integration, etc.) so the index only covers navigation and
not migration state.


</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In @docs/deployment/README.md:

  • Around line 9-14: Replace the misleading single command "pnpm dev" with an
    explicit two-step instruction: first run the Docker compose command that
    provisions Neo4j and Redis (reference the compose file shown elsewhere, e.g.,
    docker compose -f docker/docker-compose.yml up -d) and then run the local dev
    server (pnpm dev); update the surrounding prose that currently says "Starts
    Neo4j and Redis. Run Next.js and workers locally with hot reload." to something
    like "Starts Neo4j and Redis via Docker, then runs the app locally with hot
    reload." and add a short note about starting workers separately (or include the
    exact worker start command) so readers aren’t misled into thinking pnpm dev
    provisions infra.

Nitpick comments:
In @docs/dev/README.md:

  • Around line 26-39: The "TODO: Migrate existing docs here" migration backlog in
    docs/dev/README.md should be moved out of the index page: remove the large
    migration bullet list under the "TODO: Migrate existing docs here" header from
    docs/dev/README.md and create a separate migration-tracking doc (e.g.,
    docs/dev/migration-backlog.md) that contains the listed mappings
    (docs/deployment/README.md → docs/dev/deployment.md, etc.); leave
    docs/dev/README.md as a stable navigation entry that links to the new
    migration-tracking file and to the finalized dev docs (architecture,
    authentication, ai-integration, etc.) so the index only covers navigation and
    not migration state.

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: Organization UI

**Review profile**: CHILL

**Plan**: Pro

**Run ID**: `2cbfe277-cd21-48d6-a928-87af38ecbb95`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between f657066406f64d219d7d1daa9cba45e22dc92e54 and 98b538b3a255d9d4bcf6e48483714ad05eadc2e9.

</details>

<details>
<summary>📒 Files selected for processing (5)</summary>

* `CLAUDE.md`
* `docs/deployment/README.md`
* `docs/dev/README.md`
* `docs/dev/infrastructure.md`
* `docs/user/README.md`

</details>

<details>
<summary>✅ Files skipped from review due to trivial changes (2)</summary>

* docs/dev/infrastructure.md
* docs/user/README.md

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

@vercel vercel Bot temporarily deployed to Preview – issue-to-pr-storybook March 18, 2026 03:47 Inactive
@youngchingjui youngchingjui merged commit 042c321 into main Mar 18, 2026
4 checks passed
@youngchingjui youngchingjui deleted the feature/migrate-nginx-config branch March 18, 2026 04:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant