feat: Implement persistent storage by SoarinSkySagar · Pull Request #108 · grandinetech/lean

SoarinSkySagar · 2026-06-27T10:14:49Z

This PR adds a persistent libmdbx-backed database for client data.

ArtiomTr

looks like this is not finished? there is nothing implemented yet?

bomanaps · 2026-06-30T10:01:47Z

looks like this is not finished? there is nothing implemented yet?

This will take some thing as it won't be a one of pr

ArtiomTr · 2026-06-30T10:05:16Z

This will take some thing as it won't be a one of pr

Ok but there is absolutely zero functionality yet? We can do this in iterations, i.e. first we can persist only blocks, but there is no point of merging dead code

ArtiomTr · 2026-06-30T10:14:03Z

 bls = { git = "https://github.com/grandinetech/grandine", package = "bls", features = ["blst"], rev = "64afdee3c6be79fceffb66933dcb69a943f3f1ae" }
+bytesize = { version = '2', features = ['serde'] }
 clap = { version = "4", features = ["derive"] }
+database = { git = "https://github.com/grandinetech/grandine", package = "database", rev = "64afdee3c6be79fceffb66933dcb69a943f3f1ae" }


the database crate from grandine is a bit flawed, I would suggest implementing your own. Specifically, current database impl forces snappy compression for all values - this is inefficient in some cases (e.g., if we save slot -> state_root indexes, there is no point of compressing state_root, as it is pure entropy), and not optimal in others (e.g., for blobs you probably want to use something more compressing, like zstd). Thus, forcing single compression algorithm for all values wasn't a good idea. Also, for cases where performance matters, looks like there are currently compression algorithms that are both more performant & give better compression ratios than snappy - like lz4.

okay, I will be implementing my own database crate on top of this next

SoarinSkySagar · 2026-06-30T22:30:36Z

Ok but there is absolutely zero functionality yet? We can do this in iterations, i.e. first we can persist only blocks, but there is no point of merging dead code

yeah this is still a WIP, I was asking for review on the initial architecture (KV pairs and grandine database), if this is the correct direction

bomanaps · 2026-06-30T22:47:21Z

One more thing can we remove the OOMing framing, as that issue has been resolved?

SoarinSkySagar · 2026-07-01T08:02:30Z

@bomanaps but since all state data is being managed through memory right now, don't you think it is bound to OOM sometime when the node is running for a long time?

SoarinSkySagar · 2026-07-01T08:03:55Z

On a separate note, after implementation of persistent db I'm planning to research into LRU cache implementation in the crate itself which it will manage automatically such that the db functions can be used without worrying about caching. What do you think about this?

bomanaps · 2026-07-01T08:08:40Z

On a separate note, after implementation of persistent db I'm planning to research into LRU cache implementation in the crate itself which it will manage automatically such that the db functions can be used without worrying about caching. What do you think about this?

The best person to answer this is @ArtiomTr and also on the side have you tried running a node maybe 3 node setup or more depending on your laptop capacity as this should give you a better feel of how lean Ethereum runs?

ArtiomTr · 2026-07-01T10:07:22Z

yeah this is still a WIP, I was asking for review on the initial architecture (KV pairs and grandine database), if this is the correct direction

It is hard to tell what is going on, without seeing actual implementation :). Better to implement something first, see if it works & is performant enough, then proceed with review. To avoid wasting much time, start with smaller scope - like just saving the blocks first. The database must keep blocks, because if you have blocks, you can reconstruct any historical state, at the cost of cpu time. This way you can scaffold database structure, get early feedback on that, and then proceed on implementing everything else.

＠bomanaps but since all state data is being managed through memory right now, don't you think it is bound to OOM sometime when the node is running for a long time?

This is true only for some cases, e.g. during long non-finality periods - roughly speaking, validator has to track every "branch", to be able to properly converge into whatever branch eventually wins. However, even in those cases, I think there are clever algorithms to avoid keeping all unfinalized history in memory.

During normal operation, usually memory consumption won't grow indefinitely - node has to keep only last finalized state, maybe some older ones, but no more. Node also has to keep some historical blocks (I believe all blocks up to weak subjectivity period, although I don't remember exactly and may be wrong on this), but blocks usually take only a fraction of space comparing to states, so should be a non-issue. Grandine the beacon chain existed for quite some time without database at all, and worked really good.

On a separate note, after implementation of persistent db I'm planning to research into LRU cache implementation in the crate itself which it will manage automatically such that the db functions can be used without worrying about caching. What do you think about this?

It is kinda complicated topic. Ideally, the node should operate without depending on database at all. So in this sense, caching just wastes cpu/memory. However, sometimes you actually do want to have caches, for instance there may be cases when you need to load some state that is a bit older than finalized point on a hot path, so loading it quickly may be desirable. However, caches wont magically make loading quicker -- instead, you will pay some small performance cost once, for being able to query the same thing instantly next time. If you take straightforward approach, and cache every database query, then such caches are pointless -- it is very rare that same object is queried from database twice. But if you make them smart, by somehow, caching intermediate values that may be needed for both querying objects A and B, then such caches will be very useful. This is the approach I take when implementing new database layout for grandine beacon chain (https://github.com/ArtiomTr/grandine/blob/4ec3964cf42b04b8d1ac93791a6a14ff788b2d18/fork_choice_control/src/storage.rs#L907). Although this requires careful benchmarking & profiling first, so probably better to think about caches after you have working database.

Also, let me give you some advice on using libmdbx:

Try to do sequential writes, they will be much quicker, as libmdbx is a B+ tree internally (there is a good blog post explaining why sequential writes are more performant for B+ trees https://planetscale.com/blog/btrees-and-database-indexes)
Libmdbx allows to do range queries, so you can extract values only by key prefixes. Database also permits having large keys, up to 2022 bytes for default 4kb page size. This means you can put more information into key, allowing to drop indexes for example. In case of blocks, you can, for example, save slot + block_root + state_root in key. This way you can achieve three goals at the same time:
- slot + block_root gives you a unique key per block, even for unfinalized chain, where proposers may differ (this is irrelevant for lean chain currently, as validators cannot enter/exit, and proposer is chosen via round-robin, though this will likely change when going to mainnet)
- slot being at the beginning makes all writes to database sequential, except for backsync - although this shouldn't be a problem, as those writes are still probably gonna get into the same B-tree branch.
- state_root in key allows to find which state corresponds to this block, without even reading/decompressing/decoding the block itself.
Keep in mind that keys in libmdbx are not necessarily UTF-8 encoded strings, so you can use any byte sequence you want to.
Use separate libmdbx databases for different types. This will allow to parallelize writes into different databases, although remember that by keeping values in different databases you lose atomicity guarantees -- so probably it is better to write "value" first, only then proceed with writing its indices, so you don't have dangling references. For deleting values, process should be reversed -- deleting indices first, values next. Or, you can just write/delete in any order you want to, while being careful and handle all edge cases, where write/delete into one database fails, and succeeds in other, and where index may point to non-existing value.

ArtiomTr · 2026-07-01T10:27:05Z

also, don't forget to change target branch to devnet-5, instead of main - latest changes are on devnet-5 branch.

SoarinSkySagar · 2026-07-01T23:03:04Z

rebased and changed target branch to devnet-5. continuing work now, setting up lean's own database.

ArtiomTr reviewed Jun 30, 2026

View reviewed changes

SoarinSkySagar changed the title ~~feat: Implement libmdbx in lean client for persistent DB storage~~ feat: Implement persistent storage Jul 1, 2026

SoarinSkySagar added 6 commits July 2, 2026 04:19

feat: init database crate

6acb243

chore: pull database as a grandine dep and create storage crate

19c6afd

feat: add leanspec consts for db

57d0438

feat: add databases for each KV pair defined in spec

e0233e0

feat: add Storage impl with builderplate functions from spec

4376cdd

feat: storage spec API functions

2a25e9e

SoarinSkySagar force-pushed the feature/database branch from a2bca8a to 2a25e9e Compare July 1, 2026 22:51

SoarinSkySagar changed the base branch from main to devnet-5 July 1, 2026 22:52

SoarinSkySagar added 5 commits July 2, 2026 23:00

feat: implement libmdbx with custom compression per database instance

efe3e0a

chore: tests (all passing)

2da086b

chore: combine database and storage

25902c6

feat: fully spec-compliant storage API

eaffc22

chore: privatized Database and removed DatabaseMode

5f1fdb8

Uh oh!

Conversation

SoarinSkySagar commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ArtiomTr left a comment

Choose a reason for hiding this comment

Uh oh!

bomanaps commented Jun 30, 2026

Uh oh!

ArtiomTr commented Jun 30, 2026

Uh oh!

ArtiomTr Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

SoarinSkySagar Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

SoarinSkySagar commented Jun 30, 2026

Uh oh!

bomanaps commented Jun 30, 2026

Uh oh!

SoarinSkySagar commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SoarinSkySagar commented Jul 1, 2026

Uh oh!

bomanaps commented Jul 1, 2026

Uh oh!

ArtiomTr commented Jul 1, 2026

Uh oh!

ArtiomTr commented Jul 1, 2026

Uh oh!

SoarinSkySagar commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SoarinSkySagar commented Jun 27, 2026 •

edited

Loading

SoarinSkySagar commented Jul 1, 2026 •

edited

Loading