Fix OOM, sync stalls, attestations, and checkpoint safety#84
Fix OOM, sync stalls, attestations, and checkpoint safety#84ArtiomTr merged 2 commits intograndinetech:devnet-3from
Conversation
| let finalized_slot = self.status_provider.read().finalized.slot.0; | ||
| provider.retain(|_, b| b.message.block.slot.0 > finalized_slot); | ||
| // Hard cap: evict lowest-slot blocks if still over limit. | ||
| if provider.len() > MAX_BLOCK_CACHE_SIZE { | ||
| let to_remove = provider.len() - MAX_BLOCK_CACHE_SIZE; | ||
| let mut slots: Vec<(H256, u64)> = provider | ||
| .iter() | ||
| .map(|(root, b)| (*root, b.message.block.slot.0)) | ||
| .collect(); | ||
| slots.sort_by_key(|(_, slot)| *slot); | ||
| for (root, _) in slots.into_iter().take(to_remove) { | ||
| provider.remove(&root); | ||
| } | ||
| } |
There was a problem hiding this comment.
I think you shouldn't really prune blocks. The blocks are mostly lightweight, and they allow to reconstruct historical states easily. What may be pruned, is the old states - those aren't needed, and can be trivially reconstructed by taking state and applying blocks on top of them
There was a problem hiding this comment.
The reason we added pruning here is that signed_block_provider is lean_client's only block store there is no RocksDB backing it. SignedBlockWithAttestation carries XMSS signatures (3,112 bytes per block minimum), so the cache is not lightweight in this protocol. Across 5 runs in our test log the node accumulated 142,735 blocks in memory and was OOM-killed by the kernel each time with no log entry.
There was a problem hiding this comment.
persist blocks to storage so the in-memory cache is just a working set and can be evicted freely. We can track that as a follow-up. In the meantime, should we drop the finalization pruning and keep only the hard cap, or would you prefer we revert entirely until persistence is in place?
There was a problem hiding this comment.
I agree that many blocks may easily inflate memory usage, but I really doubt that it is primary OOM issue source. If you'll try to estimate, the 142735 blocks, 3112 bytes each, weight approximately 423MiB, or ~0.5GiB roughly. This is a lot, but still manageable.
On the other hand, one ssz-encoded state weights ~900 KiB (you can try downloading http://leanpoint.leanroadmap.org/lean/v0/states/finalized) - so 142735 of them would weight ~122GiB. And if you'll look closely into forkchoice store processing code, you'll see that states are cloned on each new block processed:
lean/lean_client/fork_choice/src/handlers.rs
Line 468 in 4e08fe2
But they are never pruned. Of course, the real memory usage will be lower than 122GiB, because we're using structural sharing, but nevertheless, I think this is the main weak point causing memory issues.
Also noticed one more thing - we're currently saving both blocks & signed blocks:
lean/lean_client/fork_choice/src/handlers.rs
Lines 467 to 470 in 4e08fe2
No need to do that, as signed block contains strictly more information, than block. So block memory usage has twice of size.
Regarding RocksDB - in beacon client we're actually using libmdbx, but probably there is no need currently add persistence layer to the lean client, I think better to keep it simpler.
For now, you can keep block pruning process, but I wouldn't recommend making it too aggressive. Better to focus on two primary issues:
- Blocks are duplicated in memory cache.
- States are never pruned.
No description provided.