Skip to content

feat(eth-indexer): periodic stale-refresh sweep#854

Merged
raymondjacobson merged 3 commits into
mainfrom
api/eth-stale-refresh
May 23, 2026
Merged

feat(eth-indexer): periodic stale-refresh sweep#854
raymondjacobson merged 3 commits into
mainfrom
api/eth-stale-refresh

Conversation

@raymondjacobson
Copy link
Copy Markdown
Member

Summary

Adds a background sweep inside the existing eth-indexer pod that re-reads the K oldest rows in eth_wallet_balances by updated_at every N seconds and upserts the fresh values. Complements the live WS subscription, which only learns about wallets that move AUDIO — the sweep handles drift correction, missed events during WS disconnects, and any wallet that's been silent on-chain (e.g. multi-wallet backfill placeholders).

How it works

  • New goroutine ScheduleStaleRefresh launched from Start alongside runSubscriptionLoop.
  • Each tick: SELECT wallet FROM eth_wallet_balances ORDER BY updated_at ASC LIMIT batchSize (uses eth_wallet_balances_updated_at_idx from migration 0203), then fan-out totalAudioBalance via the existing 8-worker pool, then one INSERT … ON CONFLICT … DO UPDATE.
  • Reuses the event path's fan-out by extracting processLogs's tail into refreshAddresses(ctx, addrs, blockByAddr). blockByAddr is the event-block map for live events, nil for stale-refresh.

blocknumber semantics fix

The original ON CONFLICT … GREATEST(eth_wallet_balances.blocknumber, EXCLUDED.blocknumber) would write 0 over an existing block when the stale-refresh path didn't have one. Fixed:

NULLIF(unnest(@blocks::bigint[]), 0)   -- pass NULL when stale-refreshON CONFLICT … DO UPDATE SET
  blocknumber = CASE
    WHEN EXCLUDED.blocknumber IS NULL THEN eth_wallet_balances.blocknumber
    ELSE GREATEST(COALESCE(eth_wallet_balances.blocknumber, 0), EXCLUDED.blocknumber)
  END,

Smoke-checked against the local DB: real block (12345) preserved across both a NULL-block stale refresh and a lower-block event upsert.

Config

env var default effect
ethStaleRefreshIntervalSecs 30 Tick interval
ethStaleRefreshBatchSize 50 Wallets per tick

At defaults: ~1.7 wallets/sec, ~5 RPC calls/sec (each wallet = balanceOf + totalStakedFor + getTotalDelegatorStake in parallel). For ~3.15M tracked wallets, a full sweep takes ~22 days. Tune via env vars if you want faster freshness.

The ticker drops a tick if the previous run is still in flight, so a slow upstream can't pile up work. A deferred recover() guards against an unexpected panic taking down the WS subscription with it.

Smoke test (local)

  1. Brought up local Postgres, applied migration 0203
  2. Seeded two rows in eth_wallet_balances with updated_at='1970-01-01' to force them to the top of the sweep
  3. Ran ethStaleRefreshIntervalSecs=5 ethStaleRefreshBatchSize=10 go run main.go eth-indexer against the live Alchemy URL
  4. First tick log: stale refresh: tick complete requested:2 updated:2
  5. Both rows' updated_at advanced; balance values matched independent eth_call checks (one was on-chain 0, one had moved AUDIO out since an earlier test — indexer correctly reflected both)

Test plan

  • go build ./... clean
  • go vet ./eth/... clean
  • Local end-to-end: stale rows refreshed on first tick, no errors across multiple ticks
  • SQL block-preserve verified (real block survives NULL stale-refresh upsert)
  • After deploy: kubectl -n api logs deploy/eth-indexer | grep "stale refresh: tick complete" shows ticks at the configured interval
  • After ~24h: SELECT MIN(updated_at) FROM eth_wallet_balances is no more than a few days old (proves the sweep is making progress)

Coordinating with the one-shot backfill SQL

Before this lands you'll want to run the backfill SQL from my prior message so eth_wallet_balances has rows. Otherwise the sweep just runs on an empty table and finds nothing to refresh until the live WS path discovers wallets organically. Specifically the third query (multi-wallet placeholders with updated_at='1970-01-01') is what the sweep will chew through first.

🤖 Generated with Claude Code

The live WS subscription only learns about wallets that move AUDIO. To
recover from drift, missed events during disconnects, and multi-wallet
backfill placeholders (where user_balances.associated_wallets_balance
couldn't be decomposed per-wallet), add a background sweep that
re-reads the K oldest rows in eth_wallet_balances by updated_at every
N seconds, calls totalAudioBalance, and upserts.

Reuses the existing fan-out: extracts processLogs's "fan-out
totalAudioBalance + upsert" tail into refreshAddresses(ctx, addrs,
blockByAddr) and calls it from both the event path and the new sweep.

blocknumber handling:
- Event path (block > 0): GREATEST(existing, new). Already worked.
- Stale-refresh path (no block): preserve existing. Pass NULL via
  NULLIF(0) so we don't write 0 over a real block. Smoke-tested:
  initial insert with block 12345, then NULL update keeps 12345,
  then a lower block (100) also keeps 12345.

Config (defaults give a ~22-day full sweep over 3.15M wallets):
  ethStaleRefreshIntervalSecs  default 30
  ethStaleRefreshBatchSize     default 50

Sustained at defaults: ~1.7 wallets/sec, ~5 RPC/sec total (each wallet
runs balanceOf + totalStakedFor + getTotalDelegatorStake in parallel
via the existing errgroup path). Well under any Alchemy tier ceiling.

Bounded by design — the ticker drops a tick if the previous run is
still in flight, so a slow upstream can't pile up work. Panic-safe via
deferred recover so an unexpected error in the sweep won't crash the
pod and take the WS subscription down with it.

Smoke-tested locally: pre-seeded two rows with updated_at='1970-01-01',
ran with ethStaleRefreshIntervalSecs=5 against the live Alchemy
endpoint. Both rows refreshed on the first tick (balance read and
upserted, updated_at advanced), and the sweep continued ticking every
5s without errors.
Bundle balanceOf + totalStakedFor + getTotalDelegatorStake for every
holder in a refresh batch into a single Multicall3 `aggregate3` call
instead of issuing them as separate `eth_call`s.

Before, with the default stale-refresh tick (50 holders × 3 selectors):
  150 `eth_call` round-trips per tick  →  ~3,900 Alchemy CUs per tick

After:
  1 `eth_call` per tick  →  ~26 Alchemy CUs per tick

A ~150× reduction in CUs and round-trips at the default config, and
removes any cost concern with running the sweep at a tighter cadence.

Multicall3 is deployed at the same address on every EVM chain
(`0xcA11bde05977b3631167028862bE2a173976CA11`); held as a package
constant since it's universal and we'd never need to change it.

Implementation:
- New file eth/indexer/multicall.go with the Multicall3 ABI
  encoding/decoding via go-ethereum's accounts/abi package, plus a
  totalAudioBalances(holders) entry point. Chunked at 200 holders per
  outer eth_call (= 600 sub-calls) to keep individual requests modest.
- refreshAddresses simplified to one Multicall3 round-trip plus the
  same upsert it always did — drops the per-holder errgroup, worker
  pool, jobs/results channels, and the balanceFetchWorkers constant.
- Same conservative posture on partial failures: holders whose three
  sub-calls didn't all succeed are skipped (omitted from the result
  map), so we never persist a partial sum. AllowFailure: true on each
  Call3 so one bad sub-call doesn't fail the whole multicall.

Smoke-tested locally: ran with ethStaleRefreshBatchSize=10 against the
live Alchemy endpoint, three pre-seeded rows (rayjacobson primary,
0xb46a… DEX router, Audius staking contract self-balance). All three
refreshed in one tick via one multicall:

  stale refresh: tick complete  requested:3  updated:3

Cross-checked the staking contract row — 247,024,527,620,589,302,425,
363,078 wei matched an independent eth_call against the AUDIO
contract's balanceOf for that address. Pipeline correct end-to-end.
Follow-up to the review on this PR — fills the "no test coverage" gap on
the trickiest bits of the new code.

multicall_test.go (pure, no infra):
  - TestDecodeUint — pins the four sub-call decode states (32-byte
    uint256, zero, failure, empty data).
  - TestMulticallEncodingRoundtrip — packs Call3[], unpacks it back,
    then does the same for Result3[] including the abi.ConvertType
    coercion into our named structs. Catches drift between our
    `call3` / `result3` field names and the ABI tuple component names,
    which is exactly where the live multicall would silently break.
  - TestAggregate3Selector — pins the 0x82ad56cb selector. If the keccak
    helper or signature string ever drift, this fails loudly instead of
    sending unroutable calls to Multicall3.

eth_indexer_test.go (needs the docker-compose db on :21300):
  - TestUpsertBalanceUpdates_BlockSemantics walks the four orderings
    that the CASE/NULLIF/GREATEST clause has to get right:
      1. event with block N         → stored
      2. stale-refresh with block 0 → balance updates, block preserved
      3. event with lower block     → block does NOT regress
      4. event with higher block    → block advances
  - TestUpsertBalanceUpdates_InsertWithNullBlock — cold-start case
    where a wallet is first observed via the stale-refresh path
    (e.g. multi-wallet backfill placeholders): block=0 must insert as
    NULL, not 0.

migration0203SQL is inlined into the test file because sql/01_schema.sql
hasn't been regenerated to include eth_wallet_balances yet (the
test-schema regen path was the broken pg_migrate.sh chain). Keeps the
test self-contained against the default test_jobs template.

All 5 tests pass locally; no production code changed in this commit.
@raymondjacobson raymondjacobson merged commit 38a0da4 into main May 23, 2026
5 checks passed
@raymondjacobson raymondjacobson deleted the api/eth-stale-refresh branch May 23, 2026 00:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant