-
Notifications
You must be signed in to change notification settings - Fork 7
docs: expand docs #301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
docs: expand docs #301
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
4127298
expands docs with more info in runbook, docs on all modules and sub s…
aristidesstaffieri 5eae6e2
Update README.md
aristidesstaffieri 5ac53d2
docs: document DISABLE_TOKEN_PRICES=true behavior for price-worker-he…
Copilot d59bfcc
docs: address PR #301 review feedback
aristidesstaffieri 9df0d3e
Merge branch 'main' into docs/expand-docs
aristidesstaffieri 50ca2d4
Merge branch 'main' into docs/expand-docs
aristidesstaffieri File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,20 @@ | ||
| # Backend Docs | ||
|
|
||
| Use this directory as the operator and maintainer entry point for the backend. | ||
|
|
||
| | Document | Purpose | | ||
| | ------------------------------------ | ------------------------------------------------------------------------------ | | ||
| | [architecture.md](./architecture.md) | Runtime topology, major components, data flow, and build artifacts | | ||
| | [runbook.md](./runbook.md) | Startup steps, required configuration, health checks, and incident playbooks | | ||
| | [workers.md](./workers.md) | Background worker lifecycle, Redis keys, restart behavior, and manual recovery | | ||
| | [metrics.md](./metrics.md) | Prometheus endpoint details, emitted metrics, labels, and scraping notes | | ||
| | [debugging.md](./debugging.md) | Logs, source maps, Redis inspection, and common failure patterns | | ||
| | [mercury.md](./mercury.md) | Mercury-specific integration details, playground notes, and query guidance | | ||
|
|
||
| Suggested reading order for new contributors: | ||
|
|
||
| 1. [architecture.md](./architecture.md) | ||
| 2. [runbook.md](./runbook.md) | ||
| 3. [workers.md](./workers.md) | ||
| 4. [metrics.md](./metrics.md) | ||
| 5. [debugging.md](./debugging.md) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,98 @@ | ||
| # Architecture | ||
|
|
||
| ## Runtime topology | ||
|
|
||
| ```text | ||
| Clients | ||
| | | ||
| v | ||
| Fastify API server (:3002 by default, /api/v1/*) | ||
| |-- Mercury + Horizon + Soroban RPC clients | ||
| |-- Blockaid service | ||
| |-- Coinbase onramp token helper | ||
| |-- Redis-backed feature/runtime state (production) | ||
| | | ||
| +-- Prometheus metrics server (:9090, /metrics) | ||
| +-- Price worker (production unless DISABLE_TOKEN_PRICES=true) | ||
| +-- Mercury integrity-check worker (production when USE_MERCURY=true) | ||
| ``` | ||
|
|
||
| ## Main process responsibilities | ||
|
|
||
| The main process in `src/index.ts` is responsible for: | ||
|
|
||
| 1. Loading `.env`, building config, and applying CLI overrides for `--env` and `--port`. | ||
| 2. Creating the shared Prometheus registry and default process metrics. | ||
| 3. Connecting Redis in production mode, including a dedicated Redis time-series client for token prices. | ||
| 4. Initializing the public API server and a separate metrics server. | ||
| 5. Spawning worker-thread bundles for token prices and Mercury integrity checks. | ||
| 6. Handling shutdown signals and clearing metrics on exit. | ||
|
|
||
| ## Major components | ||
|
|
||
| | Component | Responsibility | Notes | | ||
| | ----------------- | ---------------------------------------- | ----------------------------------------------------------------------- | | ||
| | API server | Serves `/api/v1/*` routes | Fastify with CORS, Helmet, AJV validation, and request-duration metrics | | ||
| | Metrics server | Serves Prometheus metrics | Separate Fastify instance on port `9090` | | ||
| | `MercuryClient` | Mercury/Horizon/Soroban abstraction | Used by public routes and the integrity-check worker | | ||
| | `BlockAidService` | Dapp, transaction, and asset scans | Emits scan-miss metrics on provider failures | | ||
| | `PriceClient` | Token price calculation and cache access | Uses Redis time series for historical price data | | ||
| | Integrity checker | Verifies Mercury data against Horizon | Can flip the runtime `USE_MERCURY` flag off on failure | | ||
| | Price worker | Maintains price cache freshness | Initializes cache, updates prices, and records last-update timestamps | | ||
|
|
||
| ## Data flow and runtime state | ||
|
|
||
| ### Request flow | ||
|
|
||
| 1. Clients call versioned routes under `/api/v1`. | ||
| 2. Each request is timed with the `http_request_duration_s` histogram. | ||
| 3. Route handlers delegate to Mercury, Horizon, Soroban RPC, Blockaid, Coinbase, or the price cache. | ||
| 4. In production, Mercury usage is gated by a Redis-backed runtime flag (`USE_MERCURY`), so the service can fall back without a process restart. | ||
|
|
||
| ### Redis usage | ||
|
|
||
| The backend uses Redis for both runtime state and time-series storage in production: | ||
|
|
||
| | Key or prefix | Purpose | | ||
| | -------------------------- | --------------------------------------------------------------- | | ||
| | `USE_MERCURY` | Runtime feature gate for Mercury-backed responses | | ||
| | `price_cache_initialized` | Signals that the price cache has been bootstrapped successfully | | ||
| | `price_worker_last_update` | Timestamp of the last successful price refresh | | ||
| | `ts:price:*` | Redis time-series entries for token prices | | ||
| | `token_counter` | Sorted set used to prioritize frequently requested tokens | | ||
|
|
||
| ## External dependencies | ||
|
|
||
| | Dependency | Used for | | ||
| | ----------- | ------------------------------------------------------------------------- | | ||
| | Redis | Runtime feature flags, worker coordination, and price time-series storage | | ||
| | Mercury | Indexed account and subscription data | | ||
| | Horizon | Fallback account data, health checks, and integrity comparisons | | ||
| | Soroban RPC | RPC health checks, simulation, and transaction preparation | | ||
| | Blockaid | Dapp, transaction, and asset scanning | | ||
| | Coinbase | Onramp session token generation | | ||
| | Sentry | Integrity-check failure reporting | | ||
|
|
||
| ## Environment modes | ||
|
|
||
| | Mode | Behavior | | ||
| | ------------- | -------------------------------------------------------------------------------------------------------------- | | ||
| | `development` | API and metrics servers run, but Redis-backed workers do not start | | ||
| | `production` | Redis is required, the price worker can start, and the Mercury integrity worker starts when `USE_MERCURY=true` | | ||
|
|
||
| Two important nuances: | ||
|
|
||
| - `yarn start` uses `ts-node`, which is fine for route work but is not the right way to validate worker behavior. | ||
| - Production builds emit separate worker bundles, so worker issues need to be debugged from the built output rather than from `src/index.ts` alone. | ||
|
|
||
| ## Build artifacts | ||
|
|
||
| `yarn build:prod` produces these Node bundles in `build/`: | ||
|
|
||
| | Artifact | Source | | ||
| | ----------------------- | ------------------------------ | | ||
| | `build/index.js` | Main process | | ||
| | `build/worker.js` | Mercury integrity-check worker | | ||
| | `build/price-worker.js` | Token price worker | | ||
|
|
||
| All three bundles are built with source maps enabled, which makes production-style debugging much easier with `NODE_OPTIONS=--enable-source-maps`. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,78 @@ | ||
| # Debugging | ||
|
|
||
| ## Quick triage | ||
|
|
||
| | Symptom | First checks | | ||
| | ------------------------------------ | ------------------------------------------------------------------------------ | | ||
| | API appears down | `curl http://localhost:3002/api/v1/ping` | | ||
| | Metrics missing | `curl http://localhost:9090/metrics` | | ||
| | Price data looks stale | `curl http://localhost:3002/api/v1/price-worker-health` and inspect Redis keys | | ||
| | Mercury-backed routes look wrong | Check `USE_MERCURY` in Redis, then review integrity-check logs and Sentry | | ||
| | RPC problems reported by clients | `curl "http://localhost:3002/api/v1/rpc-health?network=PUBLIC"` | | ||
| | Horizon problems reported by clients | `curl "http://localhost:3002/api/v1/horizon-health?network=PUBLIC"` | | ||
|
|
||
| ## Logs | ||
|
|
||
| The service uses `pino` with `pino-pretty`. | ||
|
|
||
| Important logging behavior: | ||
|
|
||
| - request metadata is serialized with Pino standard serializers, | ||
| - request IP/port, host, user agent, and key account identifiers are redacted, | ||
| - `req.url` is also redacted and normalized for account-history and account-balances routes. | ||
|
|
||
| That means logs are safe to share more broadly than raw access logs, but they will not preserve full request identity for debugging. | ||
|
|
||
| ## Production-style local debugging | ||
|
|
||
| Use a build when the issue might involve workers, bundle output, or source maps: | ||
|
|
||
| ```bash | ||
| yarn build:prod | ||
| NODE_OPTIONS=--enable-source-maps node build/index.js --env production --port 3002 | ||
| ``` | ||
|
|
||
| This gives you stack traces that map back to TypeScript source and exercises the same worker bundles that production uses. | ||
|
|
||
| ## Useful Redis checks | ||
|
|
||
| ```bash | ||
| redis-cli GET USE_MERCURY | ||
| redis-cli GET price_cache_initialized | ||
| redis-cli GET price_worker_last_update | ||
| redis-cli KEYS 'ts:price:*' | ||
| ``` | ||
|
|
||
| What they tell you: | ||
|
|
||
| - `USE_MERCURY=false`: the service has fallen back away from Mercury at runtime. | ||
| - missing `price_cache_initialized`: the price worker never finished bootstrap. | ||
| - stale `price_worker_last_update`: the worker is alive but not refreshing successfully. | ||
| - missing `ts:price:*` keys: price cache bootstrap likely failed or token prices are disabled. | ||
|
|
||
| ## Known gotchas | ||
|
|
||
| 1. The repo requires Node `>=25.3.0`. Older Node versions can block `yarn` commands before the app even starts. | ||
| 2. `development` mode does not start Redis-backed workers, so worker-only issues must be reproduced from a production build. | ||
| 3. `/api/v1/rpc-health` returns HTTP `200` even when the RPC reports an unhealthy status in the response body. | ||
| 4. `/api/v1/price-worker-health` is not meaningful when Redis is unavailable or token prices are intentionally disabled. | ||
| 5. The metrics endpoint lives on port `9090`, not on the main API port. | ||
|
|
||
| ## When Mercury falls back | ||
|
|
||
| The integrity worker disables Mercury by writing `USE_MERCURY=false` to Redis on failures. Treat this as a protective circuit breaker: | ||
|
|
||
| 1. confirm the failure in logs and Sentry, | ||
| 2. compare Mercury and Horizon responses for the affected account or operation, | ||
| 3. fix the upstream issue or wait for the provider to recover, | ||
| 4. manually restore `USE_MERCURY=true`. | ||
|
|
||
| ## When the price worker is unhealthy | ||
|
|
||
| Work through these checks in order: | ||
|
|
||
| 1. confirm Redis is reachable, | ||
| 2. confirm Horizon health at `${FREIGHTER_HORIZON_URL}/health`, | ||
| 3. inspect `price_cache_initialized`, | ||
| 4. inspect `price_worker_last_update`, | ||
| 5. review logs for repeated restart attempts or cache-initialization failures. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,71 @@ | ||
| # Metrics | ||
|
|
||
| ## Endpoint | ||
|
|
||
| Prometheus metrics are exposed by a dedicated Fastify server on port `9090`: | ||
|
|
||
| - URL: `GET /metrics` | ||
| - Example: `curl http://localhost:9090/metrics` | ||
| - Rate limit: `350` requests per minute | ||
|
|
||
| This endpoint is not served from the application port and does not live under `/api/v1`. | ||
|
|
||
| ## What gets emitted | ||
|
|
||
| The backend emits both default process metrics and a small set of application metrics. | ||
|
|
||
| | Metric | Type | Meaning | | ||
| | ---------------------------------------- | -------------------------- | ------------------------------------------------------------------------------ | | ||
| | `process_*`, `nodejs_*` | Default Prometheus metrics | Standard process and Node.js telemetry collected via `collectDefaultMetrics()` | | ||
| | `http_request_duration_s` | Histogram | End-to-end request latency for API routes | | ||
| | `freighter_backend_mercury_error_count` | Counter | Mercury-side request errors | | ||
| | `freighter_backend_rpc_error_count` | Counter | Horizon or Soroban RPC errors | | ||
| | `freighter_backend_critical_error_count` | Counter | Errors that need manual operator investigation | | ||
| | `freighter_backend_integrity_check_pass` | Counter | Successful Mercury-vs-Horizon integrity checks | | ||
| | `freighter_backend_integrity_check_fail` | Counter | Failed Mercury-vs-Horizon integrity checks | | ||
| | `freighter_backend_scan_miss_count` | Counter | Blockaid scan failures that fell back to a miss/default response | | ||
|
|
||
| ## Request latency labels | ||
|
|
||
| `http_request_duration_s` is labeled with: | ||
|
|
||
| | Label | Source | | ||
| | --------- | ----------------------------------------- | | ||
| | `method` | HTTP method | | ||
| | `route` | Normalized route name | | ||
| | `status` | Final response status code | | ||
| | `network` | `network` query-string value or `unknown` | | ||
|
|
||
| Route labels are normalized on purpose: | ||
|
|
||
| - Parameterized routes such as `/account-history/:pubKey` are grouped as `/account-history`. | ||
| - Only a whitelist of public routes gets its own label. | ||
| - Anything else is labeled as `other` to prevent metric-cardinality blowups. | ||
|
|
||
| One consequence of the current implementation is that routes that receive `network` in the request body, rather than in the query string, will show `network="unknown"` in request-duration metrics. | ||
|
|
||
| ## Scrape example | ||
|
|
||
| ```yaml | ||
| scrape_configs: | ||
| - job_name: freighter-backend | ||
| static_configs: | ||
| - targets: | ||
| - backend-hostname:9090 | ||
| ``` | ||
|
|
||
| ## Operational notes | ||
|
|
||
| - The API server and the metrics server use different rate limits: `100/minute` on the API, `350/minute` on `/metrics`. | ||
| - The two integrity-check counters are driven by worker messages from the Mercury integrity checker. | ||
| - `freighter_backend_critical_error_count` is the clearest signal that a background process stopped recovering on its own, especially for the price worker restart loop. | ||
| - If `/metrics` is empty or unavailable, debug the metrics server separately from the API server. | ||
|
|
||
| ## Good alert candidates | ||
|
|
||
| These are the most useful signals to wire into alerting: | ||
|
|
||
| 1. Sustained `5xx` responses or a rising `http_request_duration_s` p95. | ||
| 2. Any increase in `freighter_backend_integrity_check_fail`. | ||
| 3. Any increase in `freighter_backend_critical_error_count`. | ||
| 4. A sharp increase in `freighter_backend_scan_miss_count`. | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I think this means I had bumped the wrong rate limit when I was trying to control for users with lots of accounts hitting 429's previously. Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont recall which one you bumped but this is the API rate limit