fix(eth-indexer): make /eth/health O(1)#853
Merged
Merged
Conversation
The previous GetHealth ran a UNION/COUNT across users + associated_wallets to populate tracked_wallets, plus a COUNT(*) on eth_wallet_balances for cached_wallets. On prod that's ~3.15M rows through a seq scan + dedup sort and consistently times out (or hangs the handler — there was no statement timeout). Cheap locally, lethal in prod. Drop both counts from the response. They were nice-to-have stats, not liveness signals — a health endpoint that takes 30s to tell you the indexer is alive is worse than no endpoint. If you need population stats, query eth_wallet_balances directly. What's left is all O(1): - connected, rpc_configured, last_block_seen, last_event_at: in-memory - checkpoint_block: single-row PK lookup on eth_indexer_checkpoints Also add a 2s context timeout to the handler. Even if a future query is added that turns slow, the request fails fast instead of hanging the ingress.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
https://api.audius.co/eth/healthwas hanging on prod because the handler ran:```sql
SELECT COUNT(*) FROM (
SELECT LOWER(wallet) FROM users WHERE wallet IS NOT NULL AND wallet <> ''
UNION
SELECT LOWER(wallet) FROM associated_wallets WHERE chain='eth' AND is_delete=FALSE
) t
```
On prod that's ~3.15M rows through a seq scan + dedup sort and takes long enough to keep the HTTP request open indefinitely (no statement timeout, no handler timeout). Cheap locally with one seeded user, lethal in prod.
These counts (
tracked_wallets,cached_wallets) were nice-to-have stats, not liveness signals. They don't belong on a health endpoint.Changes
GetHealth: drop the COUNT subqueries and the corresponding fields from the response. What remains is all O(1):connected,rpc_configured,last_block_seen,last_event_at— in-memory atomicscheckpoint_block— single-row PK lookup oneth_indexer_checkpoints/eth/healthhandler: wrapGetHealthin a 2s context timeout. Even if a future query turns slow, the request fails fast instead of hanging the ingress.After merge
Wait for the auto-upgrader to pick up the new image (every 3 min, see
Pulumi.prod-api.yaml'sautoUpgradeSchedule), or roll the deployment manually:```bash
kubectl -n api rollout restart deployment/eth-indexer
```
Then:
```bash
time curl -s https://api.audius.co/eth/health | jq
Expect: <1s, JSON returned
```
If you want population stats, query directly:
```bash
kubectl -n api exec -i deploy/bridge -- psql "$writeDbUrl" -c \
"SELECT COUNT(*) FROM eth_wallet_balances;"
```
Test plan
go build ./...cleango vet ./eth/...cleancurl -m 5 https://api.audius.co/eth/healthreturns JSON in well under 1serrors,connected,rpc_configured,last_block_seen,checkpoint_block,last_event_at