stellar · aristidesstaffieri · Apr 16, 2026 · Apr 7, 2026 · Apr 9, 2026 · Apr 9, 2026
diff --git a/.env-EXAMPLE b/.env-EXAMPLE
@@ -14,6 +14,8 @@ BLOCKAID_KEY=not-set
 COINBASE_API_KEY=not-set
 COINBASE_API_SECRET=not-set
 FREIGHTER_HORIZON_URL=not-set
+FREIGHTER_RPC_PUBNET_URL=not-set
+FREIGHTER_TRUST_PROXY_RANGE=
 DISABLE_TOKEN_PRICES=not-set
 PRICE_BATCH_UPDATE_DELAY_MS=not-set
 PRICE_CALCULATION_TIMEOUT_MS=not-set

diff --git a/README.md b/README.md
@@ -1,6 +1,18 @@
 # Freighter-Backend
 
-Freighter's indexer integration layer and general backend
+Freighter's indexer integration layer and general backend.
+
+## Documentation
+
+| Document                                       | What it covers                                                    |
+| ---------------------------------------------- | ----------------------------------------------------------------- |
+| [docs/README.md](./docs/README.md)             | Entry point for backend operational and architecture docs         |
+| [docs/architecture.md](./docs/architecture.md) | Runtime topology, dependencies, request flow, and build artifacts |
+| [docs/runbook.md](./docs/runbook.md)           | Startup, configuration, health checks, and incident response      |
+| [docs/workers.md](./docs/workers.md)           | Price worker and Mercury integrity-check worker behavior          |
+| [docs/metrics.md](./docs/metrics.md)           | Prometheus endpoint, metrics, labels, and scrape notes            |
+| [docs/debugging.md](./docs/debugging.md)       | Logs, source maps, Redis inspection, and common pitfalls          |
+| [docs/mercury.md](./docs/mercury.md)           | Mercury-specific integration notes                                |
 
 ## Prerequisites
 
@@ -16,18 +28,18 @@ This application relies on a Redis instance when `MODE=production`, you can eith
 To start the server in development mode, run:
 `yarn i && yarn start`
 
-For full runbook details, please reference [the runbook.](./docs/runbook.md)
+For full operational details, start with [the docs index](./docs/README.md) or jump directly to [the runbook](./docs/runbook.md).
 
 ## Production build
 
 `yarn build:prod`
 
 ## Mercury Details
 
-This project integrates with Mercury, an indexer for Stellar/Soroban. You can find general developer documentation (in their repo docs)[https://github.com/xycloo/merury-developers-documentation/blob/main/src/SUMMARY.md].
+This project integrates with Mercury, an indexer for Stellar/Soroban. You can find general developer documentation in [their repo docs](https://github.com/xycloo/merury-developers-documentation/blob/main/src/SUMMARY.md).
 
 For full integration details, see [the Mercury docs](./docs/mercury.md).
 
 ## Coinbase integrations
 
-This projects connects to Coinbase to generate a session token. In order to retrieve this locally, enter Coinbase API key and Coinbase API secret in `.env`. These values can be generated in the Coinbase Developer Platform in `API Keys`.
+This project connects to Coinbase to generate a session token. In order to retrieve this locally, enter Coinbase API key and Coinbase API secret in `.env`. These values can be generated in the Coinbase Developer Platform in `API Keys`.
diff --git a/docs/README.md b/docs/README.md
@@ -0,0 +1,20 @@
+# Backend Docs
+
+Use this directory as the operator and maintainer entry point for the backend.
+
+| Document                             | Purpose                                                                        |
+| ------------------------------------ | ------------------------------------------------------------------------------ |
+| [architecture.md](./architecture.md) | Runtime topology, major components, data flow, and build artifacts             |
+| [runbook.md](./runbook.md)           | Startup steps, required configuration, health checks, and incident playbooks   |
+| [workers.md](./workers.md)           | Background worker lifecycle, Redis keys, restart behavior, and manual recovery |
+| [metrics.md](./metrics.md)           | Prometheus endpoint details, emitted metrics, labels, and scraping notes       |
+| [debugging.md](./debugging.md)       | Logs, source maps, Redis inspection, and common failure patterns               |
+| [mercury.md](./mercury.md)           | Mercury-specific integration details, playground notes, and query guidance     |
+
+Suggested reading order for new contributors:
+
+1. [architecture.md](./architecture.md)
+2. [runbook.md](./runbook.md)
+3. [workers.md](./workers.md)
+4. [metrics.md](./metrics.md)
+5. [debugging.md](./debugging.md)
diff --git a/docs/architecture.md b/docs/architecture.md
@@ -0,0 +1,98 @@
+# Architecture
+
+## Runtime topology
+
+```text
+Clients
+  |
+  v
+Fastify API server (:3002 by default, /api/v1/*)
+  |-- Mercury + Horizon + Soroban RPC clients
+  |-- Blockaid service
+  |-- Coinbase onramp token helper
+  |-- Redis-backed feature/runtime state (production)
+  |
+  +-- Prometheus metrics server (:9090, /metrics)
+  +-- Price worker (production unless DISABLE_TOKEN_PRICES=true)
+  +-- Mercury integrity-check worker (production when USE_MERCURY=true)
+```
+
+## Main process responsibilities
+
+The main process in `src/index.ts` is responsible for:
+
+1. Loading `.env`, building config, and applying CLI overrides for `--env` and `--port`.
+2. Creating the shared Prometheus registry and default process metrics.
+3. Connecting Redis in production mode, including a dedicated Redis time-series client for token prices.
+4. Initializing the public API server and a separate metrics server.
+5. Spawning worker-thread bundles for token prices and Mercury integrity checks.
+6. Handling shutdown signals and clearing metrics on exit.
+
+## Major components
+
+| Component         | Responsibility                           | Notes                                                                   |
+| ----------------- | ---------------------------------------- | ----------------------------------------------------------------------- |
+| API server        | Serves `/api/v1/*` routes                | Fastify with CORS, Helmet, AJV validation, and request-duration metrics |
+| Metrics server    | Serves Prometheus metrics                | Separate Fastify instance on port `9090`                                |
+| `MercuryClient`   | Mercury/Horizon/Soroban abstraction      | Used by public routes and the integrity-check worker                    |
+| `BlockAidService` | Dapp, transaction, and asset scans       | Emits scan-miss metrics on provider failures                            |
+| `PriceClient`     | Token price calculation and cache access | Uses Redis time series for historical price data                        |
+| Integrity checker | Verifies Mercury data against Horizon    | Can flip the runtime `USE_MERCURY` flag off on failure                  |
+| Price worker      | Maintains price cache freshness          | Initializes cache, updates prices, and records last-update timestamps   |
+
+## Data flow and runtime state
+
+### Request flow
+
+1. Clients call versioned routes under `/api/v1`.
+2. Each request is timed with the `http_request_duration_s` histogram.
+3. Route handlers delegate to Mercury, Horizon, Soroban RPC, Blockaid, Coinbase, or the price cache.
+4. In production, Mercury usage is gated by a Redis-backed runtime flag (`USE_MERCURY`), so the service can fall back without a process restart.
+
+### Redis usage
+
+The backend uses Redis for both runtime state and time-series storage in production:
+
+| Key or prefix              | Purpose                                                         |
+| -------------------------- | --------------------------------------------------------------- |
+| `USE_MERCURY`              | Runtime feature gate for Mercury-backed responses               |
+| `price_cache_initialized`  | Signals that the price cache has been bootstrapped successfully |
+| `price_worker_last_update` | Timestamp of the last successful price refresh                  |
+| `ts:price:*`               | Redis time-series entries for token prices                      |
+| `token_counter`            | Sorted set used to prioritize frequently requested tokens       |
+
+## External dependencies
+
+| Dependency  | Used for                                                                  |
+| ----------- | ------------------------------------------------------------------------- |
+| Redis       | Runtime feature flags, worker coordination, and price time-series storage |
+| Mercury     | Indexed account and subscription data                                     |
+| Horizon     | Fallback account data, health checks, and integrity comparisons           |
+| Soroban RPC | RPC health checks, simulation, and transaction preparation                |
+| Blockaid    | Dapp, transaction, and asset scanning                                     |
+| Coinbase    | Onramp session token generation                                           |
+| Sentry      | Integrity-check failure reporting                                         |
+
+## Environment modes
+
+| Mode          | Behavior                                                                                                       |
+| ------------- | -------------------------------------------------------------------------------------------------------------- |
+| `development` | API and metrics servers run, but Redis-backed workers do not start                                             |
+| `production`  | Redis is required, the price worker can start, and the Mercury integrity worker starts when `USE_MERCURY=true` |
+
+Two important nuances:
+
+- `yarn start` uses `ts-node`, which is fine for route work but is not the right way to validate worker behavior.
+- Production builds emit separate worker bundles, so worker issues need to be debugged from the built output rather than from `src/index.ts` alone.
+
+## Build artifacts
+
+`yarn build:prod` produces these Node bundles in `build/`:
+
+| Artifact                | Source                         |
+| ----------------------- | ------------------------------ |
+| `build/index.js`        | Main process                   |
+| `build/worker.js`       | Mercury integrity-check worker |
+| `build/price-worker.js` | Token price worker             |
+
+All three bundles are built with source maps enabled, which makes production-style debugging much easier with `NODE_OPTIONS=--enable-source-maps`.
diff --git a/docs/debugging.md b/docs/debugging.md
@@ -0,0 +1,78 @@
+# Debugging
+
+## Quick triage
+
+| Symptom                              | First checks                                                                   |
+| ------------------------------------ | ------------------------------------------------------------------------------ |
+| API appears down                     | `curl http://localhost:3002/api/v1/ping`                                       |
+| Metrics missing                      | `curl http://localhost:9090/metrics`                                           |
+| Price data looks stale               | `curl http://localhost:3002/api/v1/price-worker-health` and inspect Redis keys |
+| Mercury-backed routes look wrong     | Check `USE_MERCURY` in Redis, then review integrity-check logs and Sentry      |
+| RPC problems reported by clients     | `curl "http://localhost:3002/api/v1/rpc-health?network=PUBLIC"`                |
+| Horizon problems reported by clients | `curl "http://localhost:3002/api/v1/horizon-health?network=PUBLIC"`            |
+
+## Logs
+
+The service uses `pino` with `pino-pretty`.
+
+Important logging behavior:
+
+- request metadata is serialized with Pino standard serializers,
+- request IP/port, host, user agent, and key account identifiers are redacted,
+- `req.url` is also redacted and normalized for account-history and account-balances routes.
+
+That means logs are safe to share more broadly than raw access logs, but they will not preserve full request identity for debugging.
+
+## Production-style local debugging
+
+Use a build when the issue might involve workers, bundle output, or source maps:
+
+```bash
+yarn build:prod
+NODE_OPTIONS=--enable-source-maps node build/index.js --env production --port 3002
+```
+
+This gives you stack traces that map back to TypeScript source and exercises the same worker bundles that production uses.
+
+## Useful Redis checks
+
+```bash
+redis-cli GET USE_MERCURY
+redis-cli GET price_cache_initialized
+redis-cli GET price_worker_last_update
+redis-cli KEYS 'ts:price:*'
+```
+
+What they tell you:
+
+- `USE_MERCURY=false`: the service has fallen back away from Mercury at runtime.
+- missing `price_cache_initialized`: the price worker never finished bootstrap.
+- stale `price_worker_last_update`: the worker is alive but not refreshing successfully.
+- missing `ts:price:*` keys: price cache bootstrap likely failed or token prices are disabled.
+
+## Known gotchas
+
+1. The repo requires Node `>=25.3.0`. Older Node versions can block `yarn` commands before the app even starts.
+2. `development` mode does not start Redis-backed workers, so worker-only issues must be reproduced from a production build.
+3. `/api/v1/rpc-health` returns HTTP `200` even when the RPC reports an unhealthy status in the response body.
+4. `/api/v1/price-worker-health` is not meaningful when Redis is unavailable or token prices are intentionally disabled.
+5. The metrics endpoint lives on port `9090`, not on the main API port.
+
+## When Mercury falls back
+
+The integrity worker disables Mercury by writing `USE_MERCURY=false` to Redis on failures. Treat this as a protective circuit breaker:
+
+1. confirm the failure in logs and Sentry,
+2. compare Mercury and Horizon responses for the affected account or operation,
+3. fix the upstream issue or wait for the provider to recover,
+4. manually restore `USE_MERCURY=true`.
+
+## When the price worker is unhealthy
+
+Work through these checks in order:
+
+1. confirm Redis is reachable,
+2. confirm Horizon health at `${FREIGHTER_HORIZON_URL}/health`,
+3. inspect `price_cache_initialized`,
+4. inspect `price_worker_last_update`,
+5. review logs for repeated restart attempts or cache-initialization failures.
diff --git a/docs/metrics.md b/docs/metrics.md
@@ -0,0 +1,71 @@
+# Metrics
+
+## Endpoint
+
+Prometheus metrics are exposed by a dedicated Fastify server on port `9090`:
+
+- URL: `GET /metrics`
+- Example: `curl http://localhost:9090/metrics`
+- Rate limit: `350` requests per minute
+
+This endpoint is not served from the application port and does not live under `/api/v1`.
+
+## What gets emitted
+
+The backend emits both default process metrics and a small set of application metrics.
+
+| Metric                                   | Type                       | Meaning                                                                        |
+| ---------------------------------------- | -------------------------- | ------------------------------------------------------------------------------ |
+| `process_*`, `nodejs_*`                  | Default Prometheus metrics | Standard process and Node.js telemetry collected via `collectDefaultMetrics()` |
+| `http_request_duration_s`                | Histogram                  | End-to-end request latency for API routes                                      |
+| `freighter_backend_mercury_error_count`  | Counter                    | Mercury-side request errors                                                    |
+| `freighter_backend_rpc_error_count`      | Counter                    | Horizon or Soroban RPC errors                                                  |
+| `freighter_backend_critical_error_count` | Counter                    | Errors that need manual operator investigation                                 |
+| `freighter_backend_integrity_check_pass` | Counter                    | Successful Mercury-vs-Horizon integrity checks                                 |
+| `freighter_backend_integrity_check_fail` | Counter                    | Failed Mercury-vs-Horizon integrity checks                                     |
+| `freighter_backend_scan_miss_count`      | Counter                    | Blockaid scan failures that fell back to a miss/default response               |
+
+## Request latency labels
+
+`http_request_duration_s` is labeled with:
+
+| Label     | Source                                    |
+| --------- | ----------------------------------------- |
+| `method`  | HTTP method                               |
+| `route`   | Normalized route name                     |
+| `status`  | Final response status code                |
+| `network` | `network` query-string value or `unknown` |
+
+Route labels are normalized on purpose:
+
+- Parameterized routes such as `/account-history/:pubKey` are grouped as `/account-history`.
+- Only a whitelist of public routes gets its own label.
+- Anything else is labeled as `other` to prevent metric-cardinality blowups.
+
+One consequence of the current implementation is that routes that receive `network` in the request body, rather than in the query string, will show `network="unknown"` in request-duration metrics.
+
+## Scrape example
+
+```yaml
+scrape_configs:
+  - job_name: freighter-backend
+    static_configs:
+      - targets:
+          - backend-hostname:9090
+```
+
+## Operational notes
+
+- The API server and the metrics server use different rate limits: `100/minute` on the API, `350/minute` on `/metrics`.
+- The two integrity-check counters are driven by worker messages from the Mercury integrity checker.
+- `freighter_backend_critical_error_count` is the clearest signal that a background process stopped recovering on its own, especially for the price worker restart loop.
+- If `/metrics` is empty or unavailable, debug the metrics server separately from the API server.
+
+## Good alert candidates
+
+These are the most useful signals to wire into alerting:
+
+1. Sustained `5xx` responses or a rising `http_request_duration_s` p95.
+2. Any increase in `freighter_backend_integrity_check_fail`.
+3. Any increase in `freighter_backend_critical_error_count`.
+4. A sharp increase in `freighter_backend_scan_miss_count`.