Skip to content

Archive node: unbounded anon memory growth (~3-4 GB/h) under sustained RPC load → hard cgroup throttle / hang #2724

@Tamis

Description

@Tamis

Summary

A --state-pruning archive node serving sustained JSON-RPC traffic exhibits unbounded growth of anonymous (non-reclaimable) process memory at a steady ~3–4 GB/hour. The growth is real process memory (memory.stat anon), not page cache, and it is not bounded by --db-cache or --trie-cache-size. Left running, the node climbs to the host/cgroup memory limit and — with swap disabled — gets hard-throttled by memory.high (millions of throttle events), at which point RPC stops responding and the node effectively hangs until restarted.

Reproduced on two versions (a Nov-2025 build and the current v3.4.2-415), so this is not a recently-introduced regression — it appears inherent to the archive node under RPC load.

Environment

  • subtensor: v3.4.2-415 (also reproduced on a Nov-2025 v3.2.9-era build). Note: system_version RPC returns 4.0.0-dev-unknown on both, so it's not a useful version discriminator.
  • OS: Ubuntu 24.04.4 LTS
  • Host: 16 vCPU, 62 GiB RAM, NVMe (RocksDB backend, db/full)
  • Chain: finney mainnet, archive node (~3.7 TB DB)
  • Run via: systemd unit, with a cgroup MemoryHigh=48G / MemoryMax=52G and MemorySwapMax=0.

Launch flags

node-subtensor \
  --chain <finney-raw-spec> --base-path <data> \
  --state-pruning archive --blocks-pruning archive \
  --rpc-external --rpc-cors all --rpc-methods unsafe \
  --rpc-port 9944 --port 30333 \
  --rpc-max-connections 1000 --no-mdns \
  --rpc-max-response-size 256 --rpc-max-request-size 256 \
  --in-peers 75 --out-peers 25 \
  --prometheus-external --prometheus-port 9615 \
  --db-cache 8192 --trie-cache-size 4294967296 \
  --runtime-cache-size 4 --max-runtime-instances 8 \
  --wasm-execution compiled --wasmtime-instantiation-strategy pooling-copy-on-write

The node serves a steady stream of archive state RPCs from an external client (historical state_getStorage, state_call, state_getReadProof, state_queryStorageAt, etc. against old block hashes).

Observed behaviour

Anonymous memory grows roughly linearly under load and never plateaus:

# /sys/fs/cgroup/.../subtensor.service/memory.stat
anon ≈ 24–39 GB        ← real, non-reclaimable
file ≈ 0.1–4.5 GB      ← page cache (small)

Growth curve after a fresh restart (netdata, RSS):

t+0min   ~10 GB   (post-restart)
t+1h     ~18 GB
t+2h     ~23.5 GB   steady creep ≈ +3–4 GB/h, NOT decelerating to zero
...      climbs linearly
t+~11h   ~48 GB     hits MemoryHigh

At the ceiling, with MemorySwapMax=0, the kernel cannot reclaim the (anonymous) memory, so it throttles via memory.high:

# memory.events at the ceiling
high  254531610     ← ~254M throttle events
max   0
oom_kill 0

The node spins in reclaim, RPC latency explodes, and system_health RPC eventually times out — the node is effectively hung until restarted. (oom_kill is 0 because it throttles rather than OOMs.)

Key points:

  • The leaked memory is anon, not page cache — so it is genuinely held by the process and cannot be reclaimed.
  • It vastly exceeds the configured caches (8 GiB db-cache + 4 GiB trie-cache = 12 GiB, but anon reaches 24–39 GiB and keeps climbing).
  • Reducing --db-cache 8192 → 4096 did NOT reduce the steady-state footprint and did not stop the climb — confirming the growth is not the configured block cache.
  • Growth correlates with RPC query load; an idle/lite node does not exhibit it at the same rate.
  • A restart drops it back to ~10 GB and the cycle repeats.

Steps to reproduce

  1. Run an archive node (--state-pruning archive) on finney with RocksDB.
  2. Subject it to sustained archive-state JSON-RPC queries against historical block hashes (e.g. state_getReadProof / state_call / state_getStorage at old blocks), as a high-traffic archive RPC provider would.
  3. Watch anon in the service's cgroup memory.stat (or RSS) over several hours.
  4. Observe a steady ~3–4 GB/h climb with no plateau, until the host/cgroup limit is reached.

Expected behaviour

Steady-state memory should plateau (bounded by the configured caches + a stable working set) rather than growing unbounded under continuous RPC load.

Impact

On a RAM-constrained host this forces a periodic restart treadmill (every ~6–8 h) to avoid the node hanging itself at the memory ceiling. For archive RPC providers this means recurring downtime and degraded tail latency as the node approaches the limit.

Current workaround

Scheduled restart every ~8 h (before the node reaches the throttle ceiling). This is a band-aid, not a fix.

Questions for maintainers

  • Is unbounded anon growth under archive RPC load a known issue?
  • Is it related to the trie/state cache not honouring --trie-cache-size under archive queries, the wasmtime pooling allocator, RPC subscription/connection buffers, or something else?
  • Is there a flag to bound the per-process memory under archive RPC load that we've missed?

Happy to provide netdata exports, memory.stat snapshots over time, heaptrack/massif profiles, or an RPC query sample if useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions