Skip to content

Reconsider how to hold back underfull runs #829

@mheinzel

Description

@mheinzel

Due to compaction, the output run of a merge can end up smaller than the sum of its inputs. In some cases, it can end up underfull, i.e. small enough that it could fit into the previous level. To avoid having underfull runs on a level, we currently hold them back: When the next runs are moved into the level to become a merging run, we add the underfull run to the merge, making it a (T+1)-way merge of similarly sized runs, where T is the table's size ratio. However, this means that the result can be overfull, i.e. too large for this level.

To accommodate these overfull runs, we weakened the size invariant to allow runs that are too large for their level. Initially, holding back a run can only result in a run that is too large by a factor of (T+1)/T, but as the table grows, that factor can grow indefinitely. If all T runs on the level are too large by factor (T+1)/T and get moved to the next level, holding back another run results in a run that is too large by factor (T*(T+1)/T + 1)/T = (T+2)/T and so on. This requires extremely specific patterns of key duplication between runs, but it would nevertheless be nice to guarantee an upper bound.

Ideally, we could prevent both underfull and overfull runs, but there doesn't seem to be a straightforward fix to the current approach. Potentially we could at least achieve a bound on how much smaller/larger runs can be or for how long these size anomalies can persist (e.g. they can't propagate to the next level).

The prototype has comments explaining the issue in relevant parts of the code and invariants. Some initial work on a potential alternative approach can be found in #311.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestlsm-treeRelated to the lsm-tree library

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions