Skip to content

Make taskWindowSize tunable in OnDiskGraphIndexCompactor #9

@eolivelli

Description

@eolivelli

Make taskWindowSize tunable in OnDiskGraphIndexCompactor

Summary

OnDiskGraphIndexCompactor.taskWindowSize is currently hardcoded to
Runtime.getRuntime().availableProcessors() (line 93). It controls the
sliding-window of in-flight compaction batches inside runBatchesWithBackpressure,
but is not exposed as a constructor parameter, making it impossible for callers
to trade parallelism for lower memory pressure.

Where in the code

File: jvector-base/src/main/java/io/github/jbellis/jvector/graph/disk/OnDiskGraphIndexCompactor.java

// Line 93 — constructor body
this.taskWindowSize = threads;          // always == availableProcessors()
// Lines 779-783 — runBatchesWithBackpressure
// initial window
while (inFlight < taskWindowSize && nextToSubmit < total) {
    submitOne.accept(batches.get(nextToSubmit++));
    inFlight++;
}

The window size determines how many batch Future<List<WriteResult>> objects
exist in memory simultaneously. Each in-flight batch keeps a List<WriteResult>
alive (containing the fully-computed neighbor lists and inline vectors for up to
TARGET_NODES_PER_BATCH = 128 nodes) until the write loop drains it.

Impact observed in HerdDB (the caller)

HerdDB's VectorIndexCompactor.rebuildSegmentStreaming and
RemoteSegmentGraphMerger.mergeStreaming both construct an
OnDiskGraphIndexCompactor via the single existing constructor. On a k3s-local
cluster (8 vCPU, IS container limit 20 GiB), the IS pod runs with -Xmx7g and a
6 GiB Netty direct-memory budget.

When the IS memory estimate exceeds ~2.13 GiB (70 % of direct-memory), the IS
engages back-pressure and blocks the commit-log tailer, stalling indexing. The
compactor's per-thread Scratch objects (one GraphSearcher per source index,
holding vectors + neighbor arrays in direct memory) already consume several
hundred MiB; the taskWindowSize-wide window of in-flight WriteResult lists
adds another proportional chunk.

On a 10-core host taskWindowSize = 10, holding 10 batches × 128 nodes × (512
bytes inline vector + ~128 bytes neighbor list) ≈ 8 MiB of WriteResult heap
alone, on top of the Scratch per-thread cost. Reducing taskWindowSize to
e.g. 2–4 would halve or quarter that overhead with a modest throughput penalty
that is acceptable in a memory-constrained environment.

Proposed change

Add an overloaded constructor (or a builder parameter) that accepts a caller-
supplied taskWindowSize, validated to be >= 1:

/**
 * @param taskWindowSize  max number of in-flight compaction batches.
 *                        Use {@code Runtime.getRuntime().availableProcessors()}
 *                        for the default (maximum throughput) behaviour.
 *                        Smaller values reduce peak RAM at the cost of
 *                        lower CPU utilisation.
 */
public OnDiskGraphIndexCompactor(
        List<OnDiskGraphIndex> sources,
        List<FixedBitSet> liveNodes,
        List<OrdinalMapper> remappers,
        VectorSimilarityFunction similarityFunction,
        ForkJoinPool executor,
        int taskWindowSize) {
    // ... existing validation ...
    if (taskWindowSize < 1) throw new IllegalArgumentException("taskWindowSize must be >= 1");
    this.taskWindowSize = taskWindowSize;
    // rest unchanged
}

The existing 5-argument constructor would remain unchanged (delegates to the new
one with threads as before), keeping the API backwards-compatible.

Why not just reduce the ForkJoinPool parallelism?

The ForkJoinPool is passed in by HerdDB (PhysicalCoreExecutor.pool()) and is
shared across many subsystems. Reducing its parallelism to limit compactor RAM
would throttle unrelated work. A dedicated taskWindowSize cap on the
compactor's sliding window is a cleaner, surgical control.

Suggested default behaviour (no change)

taskWindowSize = Runtime.getRuntime().availableProcessors() — identical to
today; no existing caller is affected.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions