Skip to content

Add graceful degradation for external service dependencies#192

Merged
welshDog merged 1 commit intomainfrom
railway/code-change-HPWvwz
Apr 30, 2026
Merged

Add graceful degradation for external service dependencies#192
welshDog merged 1 commit intomainfrom
railway/code-change-HPWvwz

Conversation

@railway-app
Copy link
Copy Markdown
Contributor

@railway-app railway-app Bot commented Apr 30, 2026

Problem

When Postgres is unavailable at startup the application crashes or enters a boot loop, and when Redis is down the health endpoint returns 500 instead of a meaningful degraded status. There is no structured startup logging to show which services are reachable, making it hard to diagnose transient dependency failures.

Solution

Added a lazy DB availability probe (probe_db / is_db_available) in db/session.py that never raises — the engine object is created unconditionally (SQLAlchemy defers the TCP handshake to the first query) and a cheap SELECT 1 is used to detect reachability. The lifespan in main.py now probes Postgres, Redis, and Metrics at startup and emits structured [STARTUP] Service: STATE (note) log lines followed by a summary line (Application ready (all services healthy) or Application ready (degraded mode)). The root /health endpoint now returns 503 with "postgres": "unavailable" when the DB probe fails, and the deep /api/v1/health endpoint returns 503 when Postgres is down while keeping 200 for Redis/Discord failures (non-critical). The _boot_guard middleware is unchanged — it only blocks on hard config errors, not DB unavailability, so the app continues to serve all routes in degraded mode.

Changes

  • Modified backend/app/db/session.py
  • Modified backend/app/api/v1/endpoints/health.py
  • Modified backend/app/main.py

Generated by Railway

@welshDog welshDog merged commit 6f456af into main Apr 30, 2026
1 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant