Due to compaction, the output run of a merge can end up smaller than the sum of its inputs. In some cases, it can end up underfull, i.e. small enough that it could fit into the previous level. To avoid having underfull runs on a level, we currently hold them back: When the next runs are moved into the level to become a merging run, we add the underfull run to the merge, making it a (T+1)-way merge of similarly sized runs, where T is the table's size ratio. However, this means that the result can be overfull, i.e. too large for this level.
To accommodate these overfull runs, we weakened the size invariant to allow runs that are too large for their level. Initially, holding back a run can only result in a run that is too large by a factor of (T+1)/T, but as the table grows, that factor can grow indefinitely. If all T runs on the level are too large by factor (T+1)/T and get moved to the next level, holding back another run results in a run that is too large by factor (T*(T+1)/T + 1)/T = (T+2)/T and so on. This requires extremely specific patterns of key duplication between runs, but it would nevertheless be nice to guarantee an upper bound.
Ideally, we could prevent both underfull and overfull runs, but there doesn't seem to be a straightforward fix to the current approach. Potentially we could at least achieve a bound on how much smaller/larger runs can be or for how long these size anomalies can persist (e.g. they can't propagate to the next level).
The prototype has comments explaining the issue in relevant parts of the code and invariants. Some initial work on a potential alternative approach can be found in #311.
Due to compaction, the output run of a merge can end up smaller than the sum of its inputs. In some cases, it can end up underfull, i.e. small enough that it could fit into the previous level. To avoid having underfull runs on a level, we currently hold them back: When the next runs are moved into the level to become a merging run, we add the underfull run to the merge, making it a
(T+1)-way merge of similarly sized runs, whereTis the table's size ratio. However, this means that the result can be overfull, i.e. too large for this level.To accommodate these overfull runs, we weakened the size invariant to allow runs that are too large for their level. Initially, holding back a run can only result in a run that is too large by a factor of
(T+1)/T, but as the table grows, that factor can grow indefinitely. If allTruns on the level are too large by factor(T+1)/Tand get moved to the next level, holding back another run results in a run that is too large by factor(T*(T+1)/T + 1)/T = (T+2)/Tand so on. This requires extremely specific patterns of key duplication between runs, but it would nevertheless be nice to guarantee an upper bound.Ideally, we could prevent both underfull and overfull runs, but there doesn't seem to be a straightforward fix to the current approach. Potentially we could at least achieve a bound on how much smaller/larger runs can be or for how long these size anomalies can persist (e.g. they can't propagate to the next level).
The prototype has comments explaining the issue in relevant parts of the code and invariants. Some initial work on a potential alternative approach can be found in #311.