Reconsider how to hold back underfull runs

Due to compaction, the output run of a merge can end up smaller than the sum of its inputs. In some cases, it can end up *underfull*, i.e. small enough that it could fit into the *previous* level. To avoid having underfull runs on a level, we currently *hold them back*: When the next runs are moved into the level to become a merging run, we add the underfull run to the merge, making it a `(T+1)`-way merge of similarly sized runs, where `T` is the table's size ratio. However, this means that the result can be *overfull*, i.e. too large for this level.

To accommodate these overfull runs, we weakened the size invariant to allow runs that are too large for their level. Initially, holding back a run can only result in a run that is too large by a factor of `(T+1)/T`, but as the table grows, that factor can grow indefinitely. If all `T` runs on the level are too large by factor `(T+1)/T` and get moved to the next level, holding back another run results in a run that is too large by factor `(T*(T+1)/T + 1)/T = (T+2)/T` and so on. This requires extremely specific patterns of key duplication between runs, but it would nevertheless be nice to guarantee an upper bound.

Ideally, we could prevent both underfull and overfull runs, but there doesn't seem to be a straightforward fix to the current approach. Potentially we could at least achieve a bound on how much smaller/larger runs can be or for how long these size anomalies can persist (e.g. they can't propagate to the next level).

The prototype has comments explaining the issue in relevant parts of the code and invariants. Some initial work on a potential alternative approach can be found in https://github.com/IntersectMBO/lsm-tree/pull/311.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reconsider how to hold back underfull runs #829

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reconsider how to hold back underfull runs #829

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions