Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
100 changes: 64 additions & 36 deletions docs/CONCURRENCY.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,68 +2,92 @@

drift uses Zig 0.16's `std.Io` interface throughout. Every file read, subprocess spawn, and path resolution flows through an `Io` instance threaded from `std.process.Init` in `main` down through `CommandContext` to every command.

This design makes it straightforward to parallelize CPU- and I/O-bound work without changing the control flow of individual functions. Below is what's done today and what remains as viable follow-up.
This design makes it straightforward to parallelize CPU- and I/O-bound work without changing the control flow of individual functions. The work below is all implemented in `src/commands/lint.zig`.

## Implemented: per-doc parallelism (`Io.Group`)
## Layer 1: per-doc parallelism (`Io.Group`)

`drift check` wraps the per-doc loop in `src/commands/lint.zig` in an `Io.Group`. Each doc's binding checks (file read → tree-sitter parse → hash → optional `git log` for blame) run as independent tasks on the thread pool backing `Io.Threaded`.

Key design constraints:

- **Task-local `CommandContext`.** Each task builds its own `std.heap.ArenaAllocator` (child of `run_arena`) so `ctx.scratch()` / `ctx.resetScratch()` inside `checkBinding`, `checkDocLinks`, and `classifyRelativeLink` continue to work unchanged.
- **Task-local `CommandContext`.** Each task builds its own `std.heap.ArenaAllocator` (child of `run_arena`) so `ctx.scratch()` / `ctx.resetScratch()` inside `checkBinding` continue to work unchanged.
- **Task-local `FileCache`.** `std.StringHashMap` is not thread-safe. Each task gets its own cache. Docs rarely share files across each other, so the lost hit rate is small and not worth a mutex.
- **Pre-allocated result slots.** `results: []?DocCheckResult` is allocated once from `run_arena`. Tasks write their own slot; the main thread merges in doc-order (docs are already sorted by `discoverDocGroups`) so output stays deterministic.
- **Error handling.** `Io.Group.async` tasks cannot propagate errors back to the caller. A `checkOneDoc` wrapper catches errors and stores them as `error_message: ?[]const u8` on the result. The main thread prints and translates to `error.LintCheckFailed` during merge.

Local measurement on the drift repo itself: ~0.40 s → ~0.14 s (≈3×).
## Layer 2: per-binding parallel blame (`io.async`)

## Proposed: speculative blame via `io.async`
`checkBinding` is CPU-bound (tree-sitter parse + fingerprint). It no longer shells out to `git log` for blame; instead it reports `.result = .stale, .reason_code = .changed_after_baseline` with `blame = null`. A second phase inside `checkOneDocInner` iterates over the anchor rows, and for each `changed_after_baseline` entry fires `io.async(vcs.getLatestBlameInfo, …)`, collecting the futures. A follow-up loop awaits each future and fills in the blame info.

When a binding turns stale, `checkBinding` shells out to `git log` for blame info (`vcs.getLatestBlameInfo`). Today this is serial inside each task. Starting the blame query speculatively *before* the fingerprint comparison finishes removes its latency from the stale path:
Two properties matter:

```zig
var blame_future = io.async(vcs.getLatestBlameInfo, .{ ... });
defer if (blame_future.cancel(io)) |_| {} else |_| {};
- **No wasted work on fresh anchors.** Blame queries are only fired after we know a binding is stale, so the common case (all anchors fresh) pays nothing extra. An earlier proposal to fire blame *speculatively* before the staleness check was rejected because `git log -1` is ~20 ms — more than the ~5 ms staleness check — so speculation would slow down the common case.
- **Scales with stale count.** A doc with 5 stale anchors previously ran 5 sequential `git log` calls (~100 ms). Now they execute concurrently on the thread pool (~25 ms on a 4-core machine).

// ... compute fingerprint ...
if (is_fresh) return .{ .result = .fresh, ... }; // cancel fires on defer
const blame = try blame_future.await(io);
return .{ .result = .stale, .blame = blame, ... };
```
Subprocess buffers for blame queries use `run_arena` (thread-safe, lives until the run ends) rather than the task-local scratch arena, because the task-local scratch is reset between bindings in phase 1 and the blame futures outlive that reset.

`io.async` is infallible on `Io.Threaded` (it runs inline on `Io.failing`), so the code reads the same on both backends. Cost: wasted work on fresh anchors. A `git log -1` against a single path is cheap; the tradeoff favours the stale path since that's what blocks the user.
## Layer 3: per-link parallel existence checks (`Io.Group`)

## Proposed: `Io.Batch` for link-existence checks
`checkDocLinks` previously walked each markdown link serially, doing `realPathFileAlloc` + `accessAbsolute` per link. Now each link is a task in a per-doc `Io.Group`; results fill a pre-allocated `?JsonLinkRow` slot array that's stable across parallel writes. After `group.await`, slots are appended to `out` in doc-order so output is deterministic.

`checkDocLinks` in `src/commands/lint.zig` resolves each markdown link in a doc one at a time and calls `pathExists` (`accessAbsolute`) on each target. For docs with dense cross-links this is many sequential stat syscalls.
For docs with dense cross-links this collapses N sequential stats into a single pool round-trip.

`Io.Batch` is the low-level batching primitive (one layer below `Io.Group`) and supports stat-style operations:
**Note on `Io.Batch`.** An earlier proposal used `Io.Batch`, which is the lower-level primitive below `Io.Group`. As of Zig 0.16, `Io.Batch.Operation` only supports `file_read_streaming`, `file_write_streaming`, `device_io_control`, and `net_receive` — no stat/access operation. If stdlib adds one, link checks could migrate to `Io.Batch` for better coalescing under `Io.Uring` specifically. On `Io.Threaded` (current backend) `Io.Group` is functionally equivalent.

```zig
var batch: Io.Batch = .init;
for (parsed.links.items) |link| batch.stat(io, absolute);
try batch.await(io);
```
## Layer 4: overlapped startup (`io.async`)

On Linux with `Io.Uring` this becomes a single submission; on `Io.Threaded` it falls back to parallel thread-pool stats. Modest win on dense docs, larger win under `Io.Uring` once that backend stabilizes.
At the top of `lint.run`, `discoverDocGroups` (shells out to `git ls-files`) and `vcs.getRepoIdentity` (shells out to `git remote get-url origin`) are both independent subprocess calls. `getRepoIdentity` is now spawned as an `io.async` future before `discoverDocGroups` runs; the two `git` processes execute in parallel and we await the identity right after ls-files returns.

## Proposed: overlap startup shell-outs
Saves a few ms per run. Small absolute win, but free — both calls are already `io`-aware.

At the top of `lint.run`, `discoverDocGroups` calls `git ls-files` and `getRepoIdentity` calls `git remote get-url origin` back-to-back. Both take separate allocators and are trivially independent:
## Measurements

```zig
var identity_future = io.async(vcs.getRepoIdentity, .{ ctx.run_arena, ctx.scratch(), cwd_path });
var doc_groups = try discoverDocGroups(...);
const repo_identity = identity_future.await(io) catch null;
```
Benchmarks via `hyperfine --warmup 3 --runs 20+` on an M-series Mac, `-Doptimize=ReleaseSafe` builds at the three points: sequential (main before #24), layer 1 only (main after #24), all four layers (after #26).

Saves a few ms per run. Small absolute win, but free — both calls are already `io`-aware.
### Small repo — drift itself (10 docs, 19 anchors, ~6 links)

Fresh steady-state (no stale anchors):

| Variant | Mean | vs sequential |
|---|---:|---:|
| sequential | 64.6 ± 2.4 ms | 1.00× |
| layer 1 only | 34.6 ± 3.4 ms | 1.87× |
| layers 1-4 | 26.4 ± 1.5 ms | **2.45×** |

All 19 anchors forced stale (worst-case blame load):

| Variant | Mean | vs sequential |
|---|---:|---:|
| sequential | ~280 ms | 1.00× |
| layer 1 only | 103.3 ± 3.2 ms | ~2.7× |
| layers 1-4 | 78.3 ± 12.7 ms | **~3.6×** |

### Large repo — nocturne monorepo (247 docs, 107 anchors)

## Proposed: rework `GitCatFile` with `Io.Group`
Fresh steady-state:

`GitCatFile` (`src/vcs.zig`) keeps a persistent `git cat-file --batch` process alive to avoid spawn overhead for historical file reads. Today request/response is synchronous. Running request submission and response parsing as two concurrent tasks inside an `Io.Group` would let the next request go out while the current response is still being read — useful only if we shift to a use case that issues many queries, which today we don't.
| Variant | Mean | vs sequential |
|---|---:|---:|
| sequential | 1.426 ± 0.036 s | 1.00× |
| layer 1 only | 468.6 ± 65.0 ms | 3.04× |
| layers 1-4 | 424.3 ± 35.0 ms | **3.36×** |

Noting it as a shape that 0.16 now permits; not a priority.
All 107 anchors forced stale:

| Variant | Mean | vs sequential |
|---|---:|---:|
| sequential | 2.876 ± 0.053 s | 1.00× |
| layer 1 only | 914.7 ± 132.8 ms | 3.14× |
| layers 1-4 | 672.1 ± 26.9 ms | **4.28×** |

### Reading the numbers

- **Layer 1 is the largest win** and scales with doc count: 1.87× on a 10-doc repo, 3.04× on a 247-doc repo.
- **Layer 2 (parallel blame) is a 1.32–1.36× multiplier when many anchors are stale** (`git log` runs serialize inside a doc before layer 2, parallelize after). On fresh runs it costs nothing because blame is only fired once staleness is confirmed — not speculatively.
- **Layer 3 (parallel link checks)** fires independently of staleness; on drift's repo with 6 links it's in the noise, on nocturne with 112 links it accounts for part of the 1.1× on top of layer 1 in the fresh case.
- **Layer 4 (overlapped startup)** saves one `git remote` subprocess latency (~10–20 ms) and is free.

Combined, on the large-repo stale worst case, the pipeline went from **2.88 s to 672 ms (4.28×)**.

## What not to parallelize

Expand All @@ -73,4 +97,8 @@ Noting it as a shape that 0.16 now permits; not a priority.

## Backends

All of the above runs unchanged on any `Io` implementation. Today drift uses `Io.Threaded` (the only feature-complete backend in 0.16). `Io.Uring` (Linux), `Io.Kqueue` (BSD/macOS), and `Io.Dispatch` (macOS) are proof-of-concept in 0.16 and will become interesting for `Io.Batch` work once they stabilize.
All of the above runs unchanged on any `Io` implementation. Today drift uses `Io.Threaded` (the only feature-complete backend in 0.16). `Io.Uring` (Linux), `Io.Kqueue` (BSD/macOS), and `Io.Dispatch` (macOS) are proof-of-concept in 0.16 and will become interesting for layer 3 specifically once `Io.Batch` grows a stat operation.

## Proposed but not implemented

**Rework `GitCatFile` with `Io.Group`.** `src/vcs.zig` keeps a persistent `git cat-file --batch` process alive to avoid spawn overhead for historical file reads. Running request submission and response parsing as two concurrent tasks inside an `Io.Group` would let the next request go out while the current response is still being read — useful only if we shift to a use case that issues many queries, which today we don't.
2 changes: 1 addition & 1 deletion drift.lock
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
CLAUDE.md -> build.zig sig:7194b38f39dbadba
CLAUDE.md -> src/main.zig sig:f2735440986d2477
docs/CLI.md -> src/commands/link.zig sig:7bd7f824afc30e0b
docs/CLI.md -> src/commands/lint.zig sig:e29be45c4d84759e
docs/CLI.md -> src/commands/lint.zig sig:b5afc0405907cf64
docs/CLI.md -> src/commands/refs.zig sig:f623b7774086094e
docs/CLI.md -> src/commands/status.zig sig:eade166d24a20b81
docs/CLI.md -> src/commands/unlink.zig sig:0dbe1ee3315211b5
Expand Down
Loading
Loading