diff --git a/docs/CONCURRENCY.md b/docs/CONCURRENCY.md index 59aa7a4..7736165 100644 --- a/docs/CONCURRENCY.md +++ b/docs/CONCURRENCY.md @@ -2,68 +2,92 @@ drift uses Zig 0.16's `std.Io` interface throughout. Every file read, subprocess spawn, and path resolution flows through an `Io` instance threaded from `std.process.Init` in `main` down through `CommandContext` to every command. -This design makes it straightforward to parallelize CPU- and I/O-bound work without changing the control flow of individual functions. Below is what's done today and what remains as viable follow-up. +This design makes it straightforward to parallelize CPU- and I/O-bound work without changing the control flow of individual functions. The work below is all implemented in `src/commands/lint.zig`. -## Implemented: per-doc parallelism (`Io.Group`) +## Layer 1: per-doc parallelism (`Io.Group`) `drift check` wraps the per-doc loop in `src/commands/lint.zig` in an `Io.Group`. Each doc's binding checks (file read → tree-sitter parse → hash → optional `git log` for blame) run as independent tasks on the thread pool backing `Io.Threaded`. Key design constraints: -- **Task-local `CommandContext`.** Each task builds its own `std.heap.ArenaAllocator` (child of `run_arena`) so `ctx.scratch()` / `ctx.resetScratch()` inside `checkBinding`, `checkDocLinks`, and `classifyRelativeLink` continue to work unchanged. +- **Task-local `CommandContext`.** Each task builds its own `std.heap.ArenaAllocator` (child of `run_arena`) so `ctx.scratch()` / `ctx.resetScratch()` inside `checkBinding` continue to work unchanged. - **Task-local `FileCache`.** `std.StringHashMap` is not thread-safe. Each task gets its own cache. Docs rarely share files across each other, so the lost hit rate is small and not worth a mutex. - **Pre-allocated result slots.** `results: []?DocCheckResult` is allocated once from `run_arena`. Tasks write their own slot; the main thread merges in doc-order (docs are already sorted by `discoverDocGroups`) so output stays deterministic. - **Error handling.** `Io.Group.async` tasks cannot propagate errors back to the caller. A `checkOneDoc` wrapper catches errors and stores them as `error_message: ?[]const u8` on the result. The main thread prints and translates to `error.LintCheckFailed` during merge. -Local measurement on the drift repo itself: ~0.40 s → ~0.14 s (≈3×). +## Layer 2: per-binding parallel blame (`io.async`) -## Proposed: speculative blame via `io.async` +`checkBinding` is CPU-bound (tree-sitter parse + fingerprint). It no longer shells out to `git log` for blame; instead it reports `.result = .stale, .reason_code = .changed_after_baseline` with `blame = null`. A second phase inside `checkOneDocInner` iterates over the anchor rows, and for each `changed_after_baseline` entry fires `io.async(vcs.getLatestBlameInfo, …)`, collecting the futures. A follow-up loop awaits each future and fills in the blame info. -When a binding turns stale, `checkBinding` shells out to `git log` for blame info (`vcs.getLatestBlameInfo`). Today this is serial inside each task. Starting the blame query speculatively *before* the fingerprint comparison finishes removes its latency from the stale path: +Two properties matter: -```zig -var blame_future = io.async(vcs.getLatestBlameInfo, .{ ... }); -defer if (blame_future.cancel(io)) |_| {} else |_| {}; +- **No wasted work on fresh anchors.** Blame queries are only fired after we know a binding is stale, so the common case (all anchors fresh) pays nothing extra. An earlier proposal to fire blame *speculatively* before the staleness check was rejected because `git log -1` is ~20 ms — more than the ~5 ms staleness check — so speculation would slow down the common case. +- **Scales with stale count.** A doc with 5 stale anchors previously ran 5 sequential `git log` calls (~100 ms). Now they execute concurrently on the thread pool (~25 ms on a 4-core machine). -// ... compute fingerprint ... -if (is_fresh) return .{ .result = .fresh, ... }; // cancel fires on defer -const blame = try blame_future.await(io); -return .{ .result = .stale, .blame = blame, ... }; -``` +Subprocess buffers for blame queries use `run_arena` (thread-safe, lives until the run ends) rather than the task-local scratch arena, because the task-local scratch is reset between bindings in phase 1 and the blame futures outlive that reset. -`io.async` is infallible on `Io.Threaded` (it runs inline on `Io.failing`), so the code reads the same on both backends. Cost: wasted work on fresh anchors. A `git log -1` against a single path is cheap; the tradeoff favours the stale path since that's what blocks the user. +## Layer 3: per-link parallel existence checks (`Io.Group`) -## Proposed: `Io.Batch` for link-existence checks +`checkDocLinks` previously walked each markdown link serially, doing `realPathFileAlloc` + `accessAbsolute` per link. Now each link is a task in a per-doc `Io.Group`; results fill a pre-allocated `?JsonLinkRow` slot array that's stable across parallel writes. After `group.await`, slots are appended to `out` in doc-order so output is deterministic. -`checkDocLinks` in `src/commands/lint.zig` resolves each markdown link in a doc one at a time and calls `pathExists` (`accessAbsolute`) on each target. For docs with dense cross-links this is many sequential stat syscalls. +For docs with dense cross-links this collapses N sequential stats into a single pool round-trip. -`Io.Batch` is the low-level batching primitive (one layer below `Io.Group`) and supports stat-style operations: +**Note on `Io.Batch`.** An earlier proposal used `Io.Batch`, which is the lower-level primitive below `Io.Group`. As of Zig 0.16, `Io.Batch.Operation` only supports `file_read_streaming`, `file_write_streaming`, `device_io_control`, and `net_receive` — no stat/access operation. If stdlib adds one, link checks could migrate to `Io.Batch` for better coalescing under `Io.Uring` specifically. On `Io.Threaded` (current backend) `Io.Group` is functionally equivalent. -```zig -var batch: Io.Batch = .init; -for (parsed.links.items) |link| batch.stat(io, absolute); -try batch.await(io); -``` +## Layer 4: overlapped startup (`io.async`) -On Linux with `Io.Uring` this becomes a single submission; on `Io.Threaded` it falls back to parallel thread-pool stats. Modest win on dense docs, larger win under `Io.Uring` once that backend stabilizes. +At the top of `lint.run`, `discoverDocGroups` (shells out to `git ls-files`) and `vcs.getRepoIdentity` (shells out to `git remote get-url origin`) are both independent subprocess calls. `getRepoIdentity` is now spawned as an `io.async` future before `discoverDocGroups` runs; the two `git` processes execute in parallel and we await the identity right after ls-files returns. -## Proposed: overlap startup shell-outs +Saves a few ms per run. Small absolute win, but free — both calls are already `io`-aware. -At the top of `lint.run`, `discoverDocGroups` calls `git ls-files` and `getRepoIdentity` calls `git remote get-url origin` back-to-back. Both take separate allocators and are trivially independent: +## Measurements -```zig -var identity_future = io.async(vcs.getRepoIdentity, .{ ctx.run_arena, ctx.scratch(), cwd_path }); -var doc_groups = try discoverDocGroups(...); -const repo_identity = identity_future.await(io) catch null; -``` +Benchmarks via `hyperfine --warmup 3 --runs 20+` on an M-series Mac, `-Doptimize=ReleaseSafe` builds at the three points: sequential (main before #24), layer 1 only (main after #24), all four layers (after #26). -Saves a few ms per run. Small absolute win, but free — both calls are already `io`-aware. +### Small repo — drift itself (10 docs, 19 anchors, ~6 links) + +Fresh steady-state (no stale anchors): + +| Variant | Mean | vs sequential | +|---|---:|---:| +| sequential | 64.6 ± 2.4 ms | 1.00× | +| layer 1 only | 34.6 ± 3.4 ms | 1.87× | +| layers 1-4 | 26.4 ± 1.5 ms | **2.45×** | + +All 19 anchors forced stale (worst-case blame load): + +| Variant | Mean | vs sequential | +|---|---:|---:| +| sequential | ~280 ms | 1.00× | +| layer 1 only | 103.3 ± 3.2 ms | ~2.7× | +| layers 1-4 | 78.3 ± 12.7 ms | **~3.6×** | + +### Large repo — nocturne monorepo (247 docs, 107 anchors) -## Proposed: rework `GitCatFile` with `Io.Group` +Fresh steady-state: -`GitCatFile` (`src/vcs.zig`) keeps a persistent `git cat-file --batch` process alive to avoid spawn overhead for historical file reads. Today request/response is synchronous. Running request submission and response parsing as two concurrent tasks inside an `Io.Group` would let the next request go out while the current response is still being read — useful only if we shift to a use case that issues many queries, which today we don't. +| Variant | Mean | vs sequential | +|---|---:|---:| +| sequential | 1.426 ± 0.036 s | 1.00× | +| layer 1 only | 468.6 ± 65.0 ms | 3.04× | +| layers 1-4 | 424.3 ± 35.0 ms | **3.36×** | -Noting it as a shape that 0.16 now permits; not a priority. +All 107 anchors forced stale: + +| Variant | Mean | vs sequential | +|---|---:|---:| +| sequential | 2.876 ± 0.053 s | 1.00× | +| layer 1 only | 914.7 ± 132.8 ms | 3.14× | +| layers 1-4 | 672.1 ± 26.9 ms | **4.28×** | + +### Reading the numbers + +- **Layer 1 is the largest win** and scales with doc count: 1.87× on a 10-doc repo, 3.04× on a 247-doc repo. +- **Layer 2 (parallel blame) is a 1.32–1.36× multiplier when many anchors are stale** (`git log` runs serialize inside a doc before layer 2, parallelize after). On fresh runs it costs nothing because blame is only fired once staleness is confirmed — not speculatively. +- **Layer 3 (parallel link checks)** fires independently of staleness; on drift's repo with 6 links it's in the noise, on nocturne with 112 links it accounts for part of the 1.1× on top of layer 1 in the fresh case. +- **Layer 4 (overlapped startup)** saves one `git remote` subprocess latency (~10–20 ms) and is free. + +Combined, on the large-repo stale worst case, the pipeline went from **2.88 s to 672 ms (4.28×)**. ## What not to parallelize @@ -73,4 +97,8 @@ Noting it as a shape that 0.16 now permits; not a priority. ## Backends -All of the above runs unchanged on any `Io` implementation. Today drift uses `Io.Threaded` (the only feature-complete backend in 0.16). `Io.Uring` (Linux), `Io.Kqueue` (BSD/macOS), and `Io.Dispatch` (macOS) are proof-of-concept in 0.16 and will become interesting for `Io.Batch` work once they stabilize. +All of the above runs unchanged on any `Io` implementation. Today drift uses `Io.Threaded` (the only feature-complete backend in 0.16). `Io.Uring` (Linux), `Io.Kqueue` (BSD/macOS), and `Io.Dispatch` (macOS) are proof-of-concept in 0.16 and will become interesting for layer 3 specifically once `Io.Batch` grows a stat operation. + +## Proposed but not implemented + +**Rework `GitCatFile` with `Io.Group`.** `src/vcs.zig` keeps a persistent `git cat-file --batch` process alive to avoid spawn overhead for historical file reads. Running request submission and response parsing as two concurrent tasks inside an `Io.Group` would let the next request go out while the current response is still being read — useful only if we shift to a use case that issues many queries, which today we don't. diff --git a/drift.lock b/drift.lock index fb7fbd4..4b896d9 100644 --- a/drift.lock +++ b/drift.lock @@ -3,7 +3,7 @@ CLAUDE.md -> build.zig sig:7194b38f39dbadba CLAUDE.md -> src/main.zig sig:f2735440986d2477 docs/CLI.md -> src/commands/link.zig sig:7bd7f824afc30e0b -docs/CLI.md -> src/commands/lint.zig sig:e29be45c4d84759e +docs/CLI.md -> src/commands/lint.zig sig:b5afc0405907cf64 docs/CLI.md -> src/commands/refs.zig sig:f623b7774086094e docs/CLI.md -> src/commands/status.zig sig:eade166d24a20b81 docs/CLI.md -> src/commands/unlink.zig sig:0dbe1ee3315211b5 diff --git a/src/commands/lint.zig b/src/commands/lint.zig index dbddd7d..43a06e1 100644 --- a/src/commands/lint.zig +++ b/src/commands/lint.zig @@ -173,6 +173,12 @@ pub fn run( const lf = try lockfile.discover(ctx.io, ctx.run_arena, ctx.scratch(), cwd_path); ctx.resetScratch(); + // Kick off the `git remote get-url origin` query concurrently with the + // `git ls-files` run inside `discoverDocGroups`. Both are independent and + // both shell out to git, so the two subprocesses overlap. + var identity_future = ctx.io.async(vcs.getRepoIdentity, .{ ctx.io, ctx.run_arena, ctx.run_arena, cwd_path }); + defer _ = identity_future.cancel(ctx.io); + var doc_groups = try discoverDocGroups(ctx.io, ctx.run_arena, lf.root_path, lf.bindings.items); defer { for (doc_groups.items) |*doc| doc.bindings.deinit(ctx.run_arena); @@ -180,7 +186,7 @@ pub fn run( } const detected_vcs = vcs.detectVcs(); - const repo_identity = vcs.getRepoIdentity(ctx.io, ctx.run_arena, ctx.scratch(), cwd_path); + const repo_identity = identity_future.await(ctx.io); const normalized_changed = if (changed_path) |raw| try normalizeChangedPrefix(ctx, lf.root_path, cwd_path, raw) @@ -314,6 +320,8 @@ fn checkOneDocInner( var stale_count: usize = 0; var skip_count: usize = 0; + // Phase 1: determine staleness per binding. CPU-bound (tree-sitter parse + + // fingerprint). `checkBinding` no longer fetches blame — that's phase 2. for (doc.bindings.items) |binding| { task_ctx.resetScratch(); const parsed = target.parse(binding.target); @@ -324,7 +332,7 @@ fn checkOneDocInner( const is_local = if (repo_identity) |ri| std.mem.eql(u8, o, ri) else false; if (!is_local) break :blk AnchorOutcome{ .result = .skip, .reason_code = .origin_mismatch }; } - break :blk checkBinding(task_ctx, root_path, binding, parsed, &file_cache, detected_vcs) catch |err| { + break :blk checkBinding(task_ctx, root_path, binding, parsed, &file_cache) catch |err| { const msg = try std.fmt.allocPrint(run_arena, "error checking {s}: {s}\n", .{ binding.target, @errorName(err) }); doc_result.error_message = msg; out.* = doc_result; @@ -340,6 +348,47 @@ fn checkOneDocInner( } } + // Phase 2: fetch blame in parallel for every `changed_after_baseline` stale + // anchor. Each call shells out to `git log`; running them concurrently cuts + // doc check time on docs with multiple stale anchors from N× to ~1×. + // Previously these were serialized inside `checkBinding`. + const BlameReturn = @typeInfo(@TypeOf(vcs.getLatestBlameInfo)).@"fn".return_type.?; + const BlameJob = struct { + row_index: usize, + future: std.Io.Future(BlameReturn), + }; + var blame_jobs: std.ArrayList(BlameJob) = .empty; + defer blame_jobs.deinit(run_arena); + + for (doc_result.anchors.items, 0..) |row, i| { + if (!std.mem.eql(u8, row.wire.result, "stale")) continue; + const reason = row.wire.reason orelse continue; + if (!std.mem.eql(u8, reason.code, @tagName(ReasonCode.changed_after_baseline))) continue; + + const future = io.async(vcs.getLatestBlameInfo, .{ + io, run_arena, run_arena, root_path, row.wire.path, detected_vcs, + }); + blame_jobs.append(run_arena, .{ .row_index = i, .future = future }) catch { + // If we can't record the future, at least drain it so it doesn't leak. + var drop = future; + _ = drop.cancel(io) catch null; + continue; + }; + } + for (blame_jobs.items) |*job| { + const blame = (job.future.await(io)) catch null; + if (blame) |b| { + const row = &doc_result.anchors.items[job.row_index]; + row.blame_storage = b; + row.wire.blame = .{ + .author = b.author, + .commit = b.commit_hash, + .date = b.date, + .subject = b.subject, + }; + } + } + try checkDocLinks(task_ctx, root_path, doc.path, &file_cache, &doc_result.links); var broken_links: usize = 0; @@ -501,50 +550,65 @@ fn checkDocLinks( var parsed = (try markdown.parseDocument(ctx.run_arena, content)) orelse return; defer parsed.deinit(); - for (parsed.links.items) |link| { - const checked = try classifyRelativeLink(ctx, root_path, doc_path, link.target) orelse continue; - try out.append(ctx.run_arena, .{ - .display_target = checked.display_target, - .wire = .{ - .target = link.target, - .line = link.line, - .result = linkResultStr(if (checked.exists) .ok else .broken), - .reason = if (checked.exists) null else driftReason(.link_target_not_found), - }, + if (parsed.links.items.len == 0) return; + + // Resolve the doc's real path once; every link check needs the same dir. + const raw_absolute_doc = try std.Io.Dir.path.resolve(ctx.run_arena, &.{ root_path, doc_path }); + const real_doc_path = std.Io.Dir.cwd().realPathFileAlloc(ctx.io, raw_absolute_doc, ctx.run_arena) catch raw_absolute_doc; + const doc_dir = std.Io.Dir.path.dirname(real_doc_path) orelse root_path; + + // Parallelize link-existence checks. Each task is a `statFile` syscall + // wrapped in `pathExists`. On docs with many cross-doc links this + // collapses N sequential stats into a single pool round-trip. + const slots = try ctx.run_arena.alloc(?JsonLinkRow, parsed.links.items.len); + @memset(slots, null); + + var group: std.Io.Group = .init; + defer group.cancel(ctx.io); + + for (parsed.links.items, slots) |link, *slot| { + group.async(ctx.io, classifyLinkTask, .{ + ctx.io, ctx.run_arena, root_path, doc_dir, link, slot, }); } -} + try group.await(ctx.io); -const CheckedLink = struct { - display_target: []const u8, - exists: bool, -}; + for (slots) |maybe_row| { + const row = maybe_row orelse continue; + try out.append(ctx.run_arena, row); + } +} -fn classifyRelativeLink( - ctx: CommandContext, +fn classifyLinkTask( + io: std.Io, + run_arena: std.mem.Allocator, root_path: []const u8, - doc_path: []const u8, - raw_target: []const u8, -) !?CheckedLink { - const trimmed = std.mem.trim(u8, raw_target, " \t\r\n"); - if (trimmed.len == 0) return null; - if (trimmed[0] == '#') return null; - if (std.Io.Dir.path.isAbsolute(trimmed)) return null; - if (hasUriScheme(trimmed)) return null; + doc_dir: []const u8, + link: markdown.Link, + slot: *?JsonLinkRow, +) void { + const trimmed = std.mem.trim(u8, link.target, " \t\r\n"); + if (trimmed.len == 0) return; + if (trimmed[0] == '#') return; + if (std.Io.Dir.path.isAbsolute(trimmed)) return; + if (hasUriScheme(trimmed)) return; const path_part = if (std.mem.findScalar(u8, trimmed, '#')) |idx| trimmed[0..idx] else trimmed; - if (path_part.len == 0) return null; + if (path_part.len == 0) return; - // Resolve symlinks on the doc path so relative links are computed from the real location. - const raw_absolute_doc = try std.Io.Dir.path.resolve(ctx.scratch(), &.{ root_path, doc_path }); - const real_doc_path = std.Io.Dir.cwd().realPathFileAlloc(ctx.io, raw_absolute_doc, ctx.scratch()) catch raw_absolute_doc; - const doc_dir = std.Io.Dir.path.dirname(real_doc_path) orelse root_path; - const absolute = try std.Io.Dir.path.resolve(ctx.scratch(), &.{ doc_dir, path_part }); - const relative = try std.Io.Dir.path.relative(ctx.run_arena, "", null, root_path, absolute); - const exists = pathExists(ctx.io, absolute); - ctx.resetScratch(); + const absolute = std.Io.Dir.path.resolve(run_arena, &.{ doc_dir, path_part }) catch return; + const relative = std.Io.Dir.path.relative(run_arena, "", null, root_path, absolute) catch return; + const exists = pathExists(io, absolute); - return .{ .display_target = relative, .exists = exists }; + slot.* = .{ + .display_target = relative, + .wire = .{ + .target = link.target, + .line = link.line, + .result = linkResultStr(if (exists) .ok else .broken), + .reason = if (exists) null else driftReason(.link_target_not_found), + }, + }; } fn hasUriScheme(target_text: []const u8) bool { @@ -595,7 +659,6 @@ fn checkBinding( binding: *const lockfile.Binding, parsed: target.ParsedTarget, file_cache: *FileCache, - detected_vcs: vcs.VcsKind, ) !AnchorOutcome { const sig_hex = binding.fieldValue("sig") orelse return .{ .result = .stale, .reason_code = .baseline_unavailable }; @@ -631,8 +694,10 @@ fn checkBinding( return .{ .result = .fresh, .reason_code = .none }; } - const blame = try vcs.getLatestBlameInfo(ctx.io, ctx.run_arena, ctx.scratch(), root_path, parsed.file_path, detected_vcs); - return .{ .result = .stale, .reason_code = .changed_after_baseline, .blame = blame }; + // Blame lookup happens in a second parallel phase in `checkOneDocInner`. + // Leaving it unset here so multiple stale anchors in the same doc don't + // serialize on one-at-a-time `git log` invocations. + return .{ .result = .stale, .reason_code = .changed_after_baseline }; } fn driftProvenance(sig: ?[]const u8) ?drift_check_v1.Provenance {