Make taskWindowSize tunable in OnDiskGraphIndexCompactor

## Make `taskWindowSize` tunable in `OnDiskGraphIndexCompactor`

### Summary

`OnDiskGraphIndexCompactor.taskWindowSize` is currently hardcoded to
`Runtime.getRuntime().availableProcessors()` (line 93).  It controls the
sliding-window of in-flight compaction batches inside `runBatchesWithBackpressure`,
but is not exposed as a constructor parameter, making it impossible for callers
to trade parallelism for lower memory pressure.

### Where in the code

**File:** `jvector-base/src/main/java/io/github/jbellis/jvector/graph/disk/OnDiskGraphIndexCompactor.java`

```java
// Line 93 — constructor body
this.taskWindowSize = threads;          // always == availableProcessors()
```

```java
// Lines 779-783 — runBatchesWithBackpressure
// initial window
while (inFlight < taskWindowSize && nextToSubmit < total) {
    submitOne.accept(batches.get(nextToSubmit++));
    inFlight++;
}
```

The window size determines how many batch `Future<List<WriteResult>>` objects
exist in memory simultaneously.  Each in-flight batch keeps a `List<WriteResult>`
alive (containing the fully-computed neighbor lists and inline vectors for up to
`TARGET_NODES_PER_BATCH = 128` nodes) until the write loop drains it.

### Impact observed in HerdDB (the caller)

HerdDB's `VectorIndexCompactor.rebuildSegmentStreaming` and
`RemoteSegmentGraphMerger.mergeStreaming` both construct an
`OnDiskGraphIndexCompactor` via the single existing constructor.  On a k3s-local
cluster (8 vCPU, IS container limit 20 GiB), the IS pod runs with `-Xmx7g` and a
6 GiB Netty direct-memory budget.

When the IS memory estimate exceeds ~2.13 GiB (70 % of direct-memory), the IS
engages back-pressure and blocks the commit-log tailer, stalling indexing.  The
compactor's per-thread `Scratch` objects (one `GraphSearcher` per source index,
holding vectors + neighbor arrays in direct memory) already consume several
hundred MiB; the `taskWindowSize`-wide window of in-flight `WriteResult` lists
adds another proportional chunk.

On a 10-core host `taskWindowSize = 10`, holding 10 batches × 128 nodes × (512
bytes inline vector + ~128 bytes neighbor list) ≈ 8 MiB of `WriteResult` heap
alone, on top of the `Scratch` per-thread cost.  Reducing `taskWindowSize` to
e.g. 2–4 would halve or quarter that overhead with a modest throughput penalty
that is acceptable in a memory-constrained environment.

### Proposed change

Add an overloaded constructor (or a builder parameter) that accepts a caller-
supplied `taskWindowSize`, validated to be `>= 1`:

```java
/**
 * @param taskWindowSize  max number of in-flight compaction batches.
 *                        Use {@code Runtime.getRuntime().availableProcessors()}
 *                        for the default (maximum throughput) behaviour.
 *                        Smaller values reduce peak RAM at the cost of
 *                        lower CPU utilisation.
 */
public OnDiskGraphIndexCompactor(
        List<OnDiskGraphIndex> sources,
        List<FixedBitSet> liveNodes,
        List<OrdinalMapper> remappers,
        VectorSimilarityFunction similarityFunction,
        ForkJoinPool executor,
        int taskWindowSize) {
    // ... existing validation ...
    if (taskWindowSize < 1) throw new IllegalArgumentException("taskWindowSize must be >= 1");
    this.taskWindowSize = taskWindowSize;
    // rest unchanged
}
```

The existing 5-argument constructor would remain unchanged (delegates to the new
one with `threads` as before), keeping the API backwards-compatible.

### Why not just reduce the `ForkJoinPool` parallelism?

The `ForkJoinPool` is passed in by HerdDB (`PhysicalCoreExecutor.pool()`) and is
shared across many subsystems.  Reducing its parallelism to limit compactor RAM
would throttle unrelated work.  A dedicated `taskWindowSize` cap on the
compactor's sliding window is a cleaner, surgical control.

### Suggested default behaviour (no change)

`taskWindowSize = Runtime.getRuntime().availableProcessors()` — identical to
today; no existing caller is affected.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make taskWindowSize tunable in OnDiskGraphIndexCompactor #9

Make `taskWindowSize` tunable in `OnDiskGraphIndexCompactor`

Summary

Where in the code

Impact observed in HerdDB (the caller)

Proposed change

Why not just reduce the `ForkJoinPool` parallelism?

Suggested default behaviour (no change)

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Make taskWindowSize tunable in OnDiskGraphIndexCompactor #9

Description

Make taskWindowSize tunable in OnDiskGraphIndexCompactor

Summary

Where in the code

Impact observed in HerdDB (the caller)

Proposed change

Why not just reduce the ForkJoinPool parallelism?

Suggested default behaviour (no change)

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Make `taskWindowSize` tunable in `OnDiskGraphIndexCompactor`

Why not just reduce the `ForkJoinPool` parallelism?