feat(eth-indexer): periodic stale-refresh sweep by raymondjacobson · Pull Request #854 · AudiusProject/api

raymondjacobson · 2026-05-23T00:02:13Z

Summary

Adds a background sweep inside the existing eth-indexer pod that re-reads the K oldest rows in eth_wallet_balances by updated_at every N seconds and upserts the fresh values. Complements the live WS subscription, which only learns about wallets that move AUDIO — the sweep handles drift correction, missed events during WS disconnects, and any wallet that's been silent on-chain (e.g. multi-wallet backfill placeholders).

How it works

New goroutine ScheduleStaleRefresh launched from Start alongside runSubscriptionLoop.
Each tick: SELECT wallet FROM eth_wallet_balances ORDER BY updated_at ASC LIMIT batchSize (uses eth_wallet_balances_updated_at_idx from migration 0203), then fan-out totalAudioBalance via the existing 8-worker pool, then one INSERT … ON CONFLICT … DO UPDATE.
Reuses the event path's fan-out by extracting processLogs's tail into refreshAddresses(ctx, addrs, blockByAddr). blockByAddr is the event-block map for live events, nil for stale-refresh.

blocknumber semantics fix

The original ON CONFLICT … GREATEST(eth_wallet_balances.blocknumber, EXCLUDED.blocknumber) would write 0 over an existing block when the stale-refresh path didn't have one. Fixed:

NULLIF(unnest(@blocks::bigint[]), 0)   -- pass NULL when stale-refresh
…
ON CONFLICT … DO UPDATE SET
  blocknumber = CASE
    WHEN EXCLUDED.blocknumber IS NULL THEN eth_wallet_balances.blocknumber
    ELSE GREATEST(COALESCE(eth_wallet_balances.blocknumber, 0), EXCLUDED.blocknumber)
  END,

Smoke-checked against the local DB: real block (12345) preserved across both a NULL-block stale refresh and a lower-block event upsert.

Config

env var	default	effect
`ethStaleRefreshIntervalSecs`	30	Tick interval
`ethStaleRefreshBatchSize`	50	Wallets per tick

At defaults: ~1.7 wallets/sec, ~5 RPC calls/sec (each wallet = balanceOf + totalStakedFor + getTotalDelegatorStake in parallel). For ~3.15M tracked wallets, a full sweep takes ~22 days. Tune via env vars if you want faster freshness.

The ticker drops a tick if the previous run is still in flight, so a slow upstream can't pile up work. A deferred recover() guards against an unexpected panic taking down the WS subscription with it.

Smoke test (local)

Brought up local Postgres, applied migration 0203
Seeded two rows in eth_wallet_balances with updated_at='1970-01-01' to force them to the top of the sweep
Ran ethStaleRefreshIntervalSecs=5 ethStaleRefreshBatchSize=10 go run main.go eth-indexer against the live Alchemy URL
First tick log: stale refresh: tick complete requested:2 updated:2
Both rows' updated_at advanced; balance values matched independent eth_call checks (one was on-chain 0, one had moved AUDIO out since an earlier test — indexer correctly reflected both)

Test plan

go build ./... clean
go vet ./eth/... clean
Local end-to-end: stale rows refreshed on first tick, no errors across multiple ticks
SQL block-preserve verified (real block survives NULL stale-refresh upsert)
After deploy: kubectl -n api logs deploy/eth-indexer | grep "stale refresh: tick complete" shows ticks at the configured interval
After ~24h: SELECT MIN(updated_at) FROM eth_wallet_balances is no more than a few days old (proves the sweep is making progress)

Coordinating with the one-shot backfill SQL

Before this lands you'll want to run the backfill SQL from my prior message so eth_wallet_balances has rows. Otherwise the sweep just runs on an empty table and finds nothing to refresh until the live WS path discovers wallets organically. Specifically the third query (multi-wallet placeholders with updated_at='1970-01-01') is what the sweep will chew through first.

🤖 Generated with Claude Code

The live WS subscription only learns about wallets that move AUDIO. To recover from drift, missed events during disconnects, and multi-wallet backfill placeholders (where user_balances.associated_wallets_balance couldn't be decomposed per-wallet), add a background sweep that re-reads the K oldest rows in eth_wallet_balances by updated_at every N seconds, calls totalAudioBalance, and upserts. Reuses the existing fan-out: extracts processLogs's "fan-out totalAudioBalance + upsert" tail into refreshAddresses(ctx, addrs, blockByAddr) and calls it from both the event path and the new sweep. blocknumber handling: - Event path (block > 0): GREATEST(existing, new). Already worked. - Stale-refresh path (no block): preserve existing. Pass NULL via NULLIF(0) so we don't write 0 over a real block. Smoke-tested: initial insert with block 12345, then NULL update keeps 12345, then a lower block (100) also keeps 12345. Config (defaults give a ~22-day full sweep over 3.15M wallets): ethStaleRefreshIntervalSecs default 30 ethStaleRefreshBatchSize default 50 Sustained at defaults: ~1.7 wallets/sec, ~5 RPC/sec total (each wallet runs balanceOf + totalStakedFor + getTotalDelegatorStake in parallel via the existing errgroup path). Well under any Alchemy tier ceiling. Bounded by design — the ticker drops a tick if the previous run is still in flight, so a slow upstream can't pile up work. Panic-safe via deferred recover so an unexpected error in the sweep won't crash the pod and take the WS subscription down with it. Smoke-tested locally: pre-seeded two rows with updated_at='1970-01-01', ran with ethStaleRefreshIntervalSecs=5 against the live Alchemy endpoint. Both rows refreshed on the first tick (balance read and upserted, updated_at advanced), and the sweep continued ticking every 5s without errors.

Bundle balanceOf + totalStakedFor + getTotalDelegatorStake for every holder in a refresh batch into a single Multicall3 `aggregate3` call instead of issuing them as separate `eth_call`s. Before, with the default stale-refresh tick (50 holders × 3 selectors): 150 `eth_call` round-trips per tick → ~3,900 Alchemy CUs per tick After: 1 `eth_call` per tick → ~26 Alchemy CUs per tick A ~150× reduction in CUs and round-trips at the default config, and removes any cost concern with running the sweep at a tighter cadence. Multicall3 is deployed at the same address on every EVM chain (`0xcA11bde05977b3631167028862bE2a173976CA11`); held as a package constant since it's universal and we'd never need to change it. Implementation: - New file eth/indexer/multicall.go with the Multicall3 ABI encoding/decoding via go-ethereum's accounts/abi package, plus a totalAudioBalances(holders) entry point. Chunked at 200 holders per outer eth_call (= 600 sub-calls) to keep individual requests modest. - refreshAddresses simplified to one Multicall3 round-trip plus the same upsert it always did — drops the per-holder errgroup, worker pool, jobs/results channels, and the balanceFetchWorkers constant. - Same conservative posture on partial failures: holders whose three sub-calls didn't all succeed are skipped (omitted from the result map), so we never persist a partial sum. AllowFailure: true on each Call3 so one bad sub-call doesn't fail the whole multicall. Smoke-tested locally: ran with ethStaleRefreshBatchSize=10 against the live Alchemy endpoint, three pre-seeded rows (rayjacobson primary, 0xb46a… DEX router, Audius staking contract self-balance). All three refreshed in one tick via one multicall: stale refresh: tick complete requested:3 updated:3 Cross-checked the staking contract row — 247,024,527,620,589,302,425, 363,078 wei matched an independent eth_call against the AUDIO contract's balanceOf for that address. Pipeline correct end-to-end.

Follow-up to the review on this PR — fills the "no test coverage" gap on the trickiest bits of the new code. multicall_test.go (pure, no infra): - TestDecodeUint — pins the four sub-call decode states (32-byte uint256, zero, failure, empty data). - TestMulticallEncodingRoundtrip — packs Call3[], unpacks it back, then does the same for Result3[] including the abi.ConvertType coercion into our named structs. Catches drift between our `call3` / `result3` field names and the ABI tuple component names, which is exactly where the live multicall would silently break. - TestAggregate3Selector — pins the 0x82ad56cb selector. If the keccak helper or signature string ever drift, this fails loudly instead of sending unroutable calls to Multicall3. eth_indexer_test.go (needs the docker-compose db on :21300): - TestUpsertBalanceUpdates_BlockSemantics walks the four orderings that the CASE/NULLIF/GREATEST clause has to get right: 1. event with block N → stored 2. stale-refresh with block 0 → balance updates, block preserved 3. event with lower block → block does NOT regress 4. event with higher block → block advances - TestUpsertBalanceUpdates_InsertWithNullBlock — cold-start case where a wallet is first observed via the stale-refresh path (e.g. multi-wallet backfill placeholders): block=0 must insert as NULL, not 0. migration0203SQL is inlined into the test file because sql/01_schema.sql hasn't been regenerated to include eth_wallet_balances yet (the test-schema regen path was the broken pg_migrate.sh chain). Keeps the test self-contained against the default test_jobs template. All 5 tests pass locally; no production code changed in this commit.

raymondjacobson added 3 commits May 22, 2026 17:01

raymondjacobson merged commit 38a0da4 into main May 23, 2026
5 checks passed

raymondjacobson deleted the api/eth-stale-refresh branch May 23, 2026 00:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(eth-indexer): periodic stale-refresh sweep#854

feat(eth-indexer): periodic stale-refresh sweep#854
raymondjacobson merged 3 commits into
mainfrom
api/eth-stale-refresh

raymondjacobson commented May 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

raymondjacobson commented May 23, 2026

Summary

How it works

blocknumber semantics fix

Config

Smoke test (local)

Test plan

Coordinating with the one-shot backfill SQL

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant