Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .env-EXAMPLE
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ BLOCKAID_KEY=not-set
COINBASE_API_KEY=not-set
COINBASE_API_SECRET=not-set
FREIGHTER_HORIZON_URL=not-set
FREIGHTER_RPC_PUBNET_URL=not-set
FREIGHTER_TRUST_PROXY_RANGE=
DISABLE_TOKEN_PRICES=not-set
PRICE_BATCH_UPDATE_DELAY_MS=not-set
PRICE_CALCULATION_TIMEOUT_MS=not-set
Expand Down
20 changes: 16 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,18 @@
# Freighter-Backend

Freighter's indexer integration layer and general backend
Freighter's indexer integration layer and general backend.

## Documentation

| Document | What it covers |
| ---------------------------------------------- | ----------------------------------------------------------------- |
| [docs/README.md](./docs/README.md) | Entry point for backend operational and architecture docs |
| [docs/architecture.md](./docs/architecture.md) | Runtime topology, dependencies, request flow, and build artifacts |
| [docs/runbook.md](./docs/runbook.md) | Startup, configuration, health checks, and incident response |
| [docs/workers.md](./docs/workers.md) | Price worker and Mercury integrity-check worker behavior |
| [docs/metrics.md](./docs/metrics.md) | Prometheus endpoint, metrics, labels, and scrape notes |
| [docs/debugging.md](./docs/debugging.md) | Logs, source maps, Redis inspection, and common pitfalls |
| [docs/mercury.md](./docs/mercury.md) | Mercury-specific integration notes |

## Prerequisites

Expand All @@ -16,18 +28,18 @@ This application relies on a Redis instance when `MODE=production`, you can eith
To start the server in development mode, run:
`yarn i && yarn start`

For full runbook details, please reference [the runbook.](./docs/runbook.md)
For full operational details, start with [the docs index](./docs/README.md) or jump directly to [the runbook](./docs/runbook.md).

## Production build

`yarn build:prod`

## Mercury Details

This project integrates with Mercury, an indexer for Stellar/Soroban. You can find general developer documentation (in their repo docs)[https://github.com/xycloo/merury-developers-documentation/blob/main/src/SUMMARY.md].
This project integrates with Mercury, an indexer for Stellar/Soroban. You can find general developer documentation in [their repo docs](https://github.com/xycloo/merury-developers-documentation/blob/main/src/SUMMARY.md).

For full integration details, see [the Mercury docs](./docs/mercury.md).

## Coinbase integrations

This projects connects to Coinbase to generate a session token. In order to retrieve this locally, enter Coinbase API key and Coinbase API secret in `.env`. These values can be generated in the Coinbase Developer Platform in `API Keys`.
This project connects to Coinbase to generate a session token. In order to retrieve this locally, enter Coinbase API key and Coinbase API secret in `.env`. These values can be generated in the Coinbase Developer Platform in `API Keys`.
20 changes: 20 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Backend Docs

Use this directory as the operator and maintainer entry point for the backend.

| Document | Purpose |
| ------------------------------------ | ------------------------------------------------------------------------------ |
| [architecture.md](./architecture.md) | Runtime topology, major components, data flow, and build artifacts |
| [runbook.md](./runbook.md) | Startup steps, required configuration, health checks, and incident playbooks |
| [workers.md](./workers.md) | Background worker lifecycle, Redis keys, restart behavior, and manual recovery |
| [metrics.md](./metrics.md) | Prometheus endpoint details, emitted metrics, labels, and scraping notes |
| [debugging.md](./debugging.md) | Logs, source maps, Redis inspection, and common failure patterns |
| [mercury.md](./mercury.md) | Mercury-specific integration details, playground notes, and query guidance |

Suggested reading order for new contributors:

1. [architecture.md](./architecture.md)
2. [runbook.md](./runbook.md)
3. [workers.md](./workers.md)
4. [metrics.md](./metrics.md)
5. [debugging.md](./debugging.md)
98 changes: 98 additions & 0 deletions docs/architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# Architecture

## Runtime topology

```text
Clients
|
v
Fastify API server (:3002 by default, /api/v1/*)
|-- Mercury + Horizon + Soroban RPC clients
|-- Blockaid service
|-- Coinbase onramp token helper
|-- Redis-backed feature/runtime state (production)
|
+-- Prometheus metrics server (:9090, /metrics)
+-- Price worker (production unless DISABLE_TOKEN_PRICES=true)
+-- Mercury integrity-check worker (production when USE_MERCURY=true)
```

## Main process responsibilities

The main process in `src/index.ts` is responsible for:

1. Loading `.env`, building config, and applying CLI overrides for `--env` and `--port`.
2. Creating the shared Prometheus registry and default process metrics.
3. Connecting Redis in production mode, including a dedicated Redis time-series client for token prices.
4. Initializing the public API server and a separate metrics server.
5. Spawning worker-thread bundles for token prices and Mercury integrity checks.
6. Handling shutdown signals and clearing metrics on exit.

## Major components

| Component | Responsibility | Notes |
| ----------------- | ---------------------------------------- | ----------------------------------------------------------------------- |
| API server | Serves `/api/v1/*` routes | Fastify with CORS, Helmet, AJV validation, and request-duration metrics |
| Metrics server | Serves Prometheus metrics | Separate Fastify instance on port `9090` |
| `MercuryClient` | Mercury/Horizon/Soroban abstraction | Used by public routes and the integrity-check worker |
| `BlockAidService` | Dapp, transaction, and asset scans | Emits scan-miss metrics on provider failures |
| `PriceClient` | Token price calculation and cache access | Uses Redis time series for historical price data |
| Integrity checker | Verifies Mercury data against Horizon | Can flip the runtime `USE_MERCURY` flag off on failure |
| Price worker | Maintains price cache freshness | Initializes cache, updates prices, and records last-update timestamps |

## Data flow and runtime state

### Request flow

1. Clients call versioned routes under `/api/v1`.
2. Each request is timed with the `http_request_duration_s` histogram.
3. Route handlers delegate to Mercury, Horizon, Soroban RPC, Blockaid, Coinbase, or the price cache.
4. In production, Mercury usage is gated by a Redis-backed runtime flag (`USE_MERCURY`), so the service can fall back without a process restart.

### Redis usage

The backend uses Redis for both runtime state and time-series storage in production:

| Key or prefix | Purpose |
| -------------------------- | --------------------------------------------------------------- |
| `USE_MERCURY` | Runtime feature gate for Mercury-backed responses |
| `price_cache_initialized` | Signals that the price cache has been bootstrapped successfully |
| `price_worker_last_update` | Timestamp of the last successful price refresh |
| `ts:price:*` | Redis time-series entries for token prices |
| `token_counter` | Sorted set used to prioritize frequently requested tokens |

## External dependencies

| Dependency | Used for |
| ----------- | ------------------------------------------------------------------------- |
| Redis | Runtime feature flags, worker coordination, and price time-series storage |
| Mercury | Indexed account and subscription data |
| Horizon | Fallback account data, health checks, and integrity comparisons |
| Soroban RPC | RPC health checks, simulation, and transaction preparation |
| Blockaid | Dapp, transaction, and asset scanning |
| Coinbase | Onramp session token generation |
| Sentry | Integrity-check failure reporting |

## Environment modes

| Mode | Behavior |
| ------------- | -------------------------------------------------------------------------------------------------------------- |
| `development` | API and metrics servers run, but Redis-backed workers do not start |
| `production` | Redis is required, the price worker can start, and the Mercury integrity worker starts when `USE_MERCURY=true` |

Two important nuances:

- `yarn start` uses `ts-node`, which is fine for route work but is not the right way to validate worker behavior.
- Production builds emit separate worker bundles, so worker issues need to be debugged from the built output rather than from `src/index.ts` alone.

## Build artifacts

`yarn build:prod` produces these Node bundles in `build/`:

| Artifact | Source |
| ----------------------- | ------------------------------ |
| `build/index.js` | Main process |
| `build/worker.js` | Mercury integrity-check worker |
| `build/price-worker.js` | Token price worker |

All three bundles are built with source maps enabled, which makes production-style debugging much easier with `NODE_OPTIONS=--enable-source-maps`.
78 changes: 78 additions & 0 deletions docs/debugging.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Debugging

## Quick triage

| Symptom | First checks |
| ------------------------------------ | ------------------------------------------------------------------------------ |
| API appears down | `curl http://localhost:3002/api/v1/ping` |
| Metrics missing | `curl http://localhost:9090/metrics` |
| Price data looks stale | `curl http://localhost:3002/api/v1/price-worker-health` and inspect Redis keys |
| Mercury-backed routes look wrong | Check `USE_MERCURY` in Redis, then review integrity-check logs and Sentry |
| RPC problems reported by clients | `curl "http://localhost:3002/api/v1/rpc-health?network=PUBLIC"` |
| Horizon problems reported by clients | `curl "http://localhost:3002/api/v1/horizon-health?network=PUBLIC"` |

## Logs

The service uses `pino` with `pino-pretty`.

Important logging behavior:

- request metadata is serialized with Pino standard serializers,
- request IP/port, host, user agent, and key account identifiers are redacted,
- `req.url` is also redacted and normalized for account-history and account-balances routes.

That means logs are safe to share more broadly than raw access logs, but they will not preserve full request identity for debugging.

## Production-style local debugging

Use a build when the issue might involve workers, bundle output, or source maps:

```bash
yarn build:prod
NODE_OPTIONS=--enable-source-maps node build/index.js --env production --port 3002
```

This gives you stack traces that map back to TypeScript source and exercises the same worker bundles that production uses.

## Useful Redis checks

```bash
redis-cli GET USE_MERCURY
redis-cli GET price_cache_initialized
redis-cli GET price_worker_last_update
redis-cli KEYS 'ts:price:*'
```

What they tell you:

- `USE_MERCURY=false`: the service has fallen back away from Mercury at runtime.
- missing `price_cache_initialized`: the price worker never finished bootstrap.
- stale `price_worker_last_update`: the worker is alive but not refreshing successfully.
- missing `ts:price:*` keys: price cache bootstrap likely failed or token prices are disabled.

## Known gotchas

1. The repo requires Node `>=25.3.0`. Older Node versions can block `yarn` commands before the app even starts.
2. `development` mode does not start Redis-backed workers, so worker-only issues must be reproduced from a production build.
3. `/api/v1/rpc-health` returns HTTP `200` even when the RPC reports an unhealthy status in the response body.
4. `/api/v1/price-worker-health` is not meaningful when Redis is unavailable or token prices are intentionally disabled.
5. The metrics endpoint lives on port `9090`, not on the main API port.

## When Mercury falls back

The integrity worker disables Mercury by writing `USE_MERCURY=false` to Redis on failures. Treat this as a protective circuit breaker:

1. confirm the failure in logs and Sentry,
2. compare Mercury and Horizon responses for the affected account or operation,
3. fix the upstream issue or wait for the provider to recover,
4. manually restore `USE_MERCURY=true`.

## When the price worker is unhealthy

Work through these checks in order:

1. confirm Redis is reachable,
2. confirm Horizon health at `${FREIGHTER_HORIZON_URL}/health`,
3. inspect `price_cache_initialized`,
4. inspect `price_worker_last_update`,
5. review logs for repeated restart attempts or cache-initialization failures.
71 changes: 71 additions & 0 deletions docs/metrics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# Metrics

## Endpoint

Prometheus metrics are exposed by a dedicated Fastify server on port `9090`:

- URL: `GET /metrics`
- Example: `curl http://localhost:9090/metrics`
- Rate limit: `350` requests per minute

This endpoint is not served from the application port and does not live under `/api/v1`.

## What gets emitted

The backend emits both default process metrics and a small set of application metrics.

| Metric | Type | Meaning |
| ---------------------------------------- | -------------------------- | ------------------------------------------------------------------------------ |
| `process_*`, `nodejs_*` | Default Prometheus metrics | Standard process and Node.js telemetry collected via `collectDefaultMetrics()` |
| `http_request_duration_s` | Histogram | End-to-end request latency for API routes |
| `freighter_backend_mercury_error_count` | Counter | Mercury-side request errors |
| `freighter_backend_rpc_error_count` | Counter | Horizon or Soroban RPC errors |
| `freighter_backend_critical_error_count` | Counter | Errors that need manual operator investigation |
| `freighter_backend_integrity_check_pass` | Counter | Successful Mercury-vs-Horizon integrity checks |
| `freighter_backend_integrity_check_fail` | Counter | Failed Mercury-vs-Horizon integrity checks |
| `freighter_backend_scan_miss_count` | Counter | Blockaid scan failures that fell back to a miss/default response |

## Request latency labels

`http_request_duration_s` is labeled with:

| Label | Source |
| --------- | ----------------------------------------- |
| `method` | HTTP method |
| `route` | Normalized route name |
| `status` | Final response status code |
| `network` | `network` query-string value or `unknown` |

Route labels are normalized on purpose:

- Parameterized routes such as `/account-history/:pubKey` are grouped as `/account-history`.
- Only a whitelist of public routes gets its own label.
- Anything else is labeled as `other` to prevent metric-cardinality blowups.

One consequence of the current implementation is that routes that receive `network` in the request body, rather than in the query string, will show `network="unknown"` in request-duration metrics.

## Scrape example

```yaml
scrape_configs:
- job_name: freighter-backend
static_configs:
- targets:
- backend-hostname:9090
```

## Operational notes

- The API server and the metrics server use different rate limits: `100/minute` on the API, `350/minute` on `/metrics`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I think this means I had bumped the wrong rate limit when I was trying to control for users with lots of accounts hitting 429's previously. Is that correct?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont recall which one you bumped but this is the API rate limit

- The two integrity-check counters are driven by worker messages from the Mercury integrity checker.
- `freighter_backend_critical_error_count` is the clearest signal that a background process stopped recovering on its own, especially for the price worker restart loop.
- If `/metrics` is empty or unavailable, debug the metrics server separately from the API server.

## Good alert candidates

These are the most useful signals to wire into alerting:

1. Sustained `5xx` responses or a rising `http_request_duration_s` p95.
2. Any increase in `freighter_backend_integrity_check_fail`.
3. Any increase in `freighter_backend_critical_error_count`.
4. A sharp increase in `freighter_backend_scan_miss_count`.
Loading
Loading