diff --git a/docs/design/architecture-north-star-2026-06.md b/docs/design/architecture-north-star-2026-06.md new file mode 100644 index 0000000..7ae71e6 --- /dev/null +++ b/docs/design/architecture-north-star-2026-06.md @@ -0,0 +1,339 @@ +# Architecture north star — the unified vision + +Status: synthesis proposal (2026-06-20). Reads the four +companion proposals together — `distributed-ci-2026-06.md`, +`vx-cloud-2026-06.md`, `extension-protocol-2026-06.md`, +`predictive-execution-2026-06.md` — and answers: _what does vx look +like at the end of this arc, and what does each step buy us?_ + +## 1. The end-state vision (one screen) + +> **vx is the fastest, most open task runner in existence.** +> +> A single binary, OSS, that runs locally with zero infra and scales +> with one config flip into a self-hosted (or hosted) team execution +> and observability platform. Every internal event is on a typed +> serializable bus; every surface — terminal output, web UI, IDE +> plugin, MCP server, CI annotator, cloud uploader — is a subscriber +> on that bus. Tasks are content-addressed, executions are fungible, +> caches are layered (local → remote → speculative), and the +> scheduler learns from history to keep getting faster every week. +> +> One protocol from `vx run` to `vx serve` to `vx cloud`. One +> contract third parties can extend. One trajectory: ship the OSS +> reference impl of every layer first; let the hosted product fund +> the development; let the community own the moat. + +## 2. The architectural spine + +Six layers, each independently rewritable, each with a versioned +contract: + +``` +┌────────────────────────────────────────────────────────────────┐ +│ 6. Surfaces (subscribers) │ +│ Terminal • Web UI • TUI • MCP • Cloud uploader • Plugins │ +└────────────────────────────────────────────────────────────────┘ + ▲ + │ WireEvent + RunState (serializable) + │ +┌────────────────────────────────────────────────────────────────┐ +│ 5. Event substrate │ +│ bus + reducer + devframe surface (off-thread capable) │ +└────────────────────────────────────────────────────────────────┘ + ▲ + │ Logger calls + │ +┌────────────────────────────────────────────────────────────────┐ +│ 4. Orchestrator │ +│ run() = prepare → graph → schedule → execute │ +│ • predictive scheduling (history-aware priority) │ +│ • in-flight dedup (shared across submissions) │ +│ • watch + supersede (continuation across input changes) │ +└────────────────────────────────────────────────────────────────┘ + ▲ ▲ + │ │ + RunBackend CacheLayer + │ │ +┌────────────────────────────────────────────────────────────────┐ +│ 3. Execution backends │ +│ localBackend • serviceBackend • coordinator (distributed) │ +└────────────────────────────────────────────────────────────────┘ + │ │ +┌────────────────────────────────────────────────────────────────┐ +│ 2. Cache layers │ +│ local (SQLite + tar.zst) • remote (Turbo wire + HMAC) │ +└────────────────────────────────────────────────────────────────┘ + │ +┌────────────────────────────────────────────────────────────────┐ +│ 1. Exec primitives │ +│ Bun.spawn • sandbox (SRT / bwrap) • env / paths │ +└────────────────────────────────────────────────────────────────┘ +``` + +What ships today: 1, 2, 3 (mostly), 4 (sans predictive), 5 (partial), +6 (terminal + bridge UI + MCP planned). The proposals fill out the +gaps and extend each layer. + +## 3. The decisive design rules (carve them in stone) + +These are what make the whole stack composable. They are NOT +negotiable as the system grows. + +### 3.1 Content addressing is the only identity + +Every task has a hash (the v22 pure-input key). The hash is the +identity. Two tasks with the same hash are interchangeable. A worker +producing artifact `` satisfies every consumer of ``. +Once `` exists in any cache layer, no one re-executes it. + +This is why distributed exec works, why in-flight dedup works, why +remote caches work, why the hosted layer can be appended without +correctness risk. + +### 3.2 The event stream is the protocol + +Every observation goes through the bus. The terminal renderer is a +subscriber. The cloud uploader is a subscriber. The web UI is a +subscriber. The MCP server is a subscriber. We do not add "side +channels" to the orchestrator; we add subscribers. + +This is why extensibility works without API breakage — third parties +hook into a serializable contract, not into orchestrator internals. + +### 3.3 Fail-safe to local, never block + +Every external dependency degrades to local: + +- Remote cache down → cache miss, run continues. +- Coordinator down → fall back to in-process scheduling. +- Cloud uploader down → events queue locally, flush later. +- Subscriber wedged → drop it, run continues. + +The local path is _the_ path. Everything else is overlay. + +### 3.4 Shell is the API for tasks + +We do not define a "task SDK" with magic functions. A task is a +shell command. This is the boundary that lets us run any tool in any +language, sandboxed or not, on any worker, with stable semantics. + +### 3.5 Validate at boundaries, trust the inside + +`valibot` validates wire-deserialized data. In-process code trusts +its inputs. No defensive checks on internal paths; no double +validation. + +## 4. How the proposals compose + +The four companion proposals are not independent — each one's value +multiplies when the others land. The bow-tie diagram: + +``` + vx-cloud (observability) ─┐ + │ + extension-protocol ─┤ + ├── all reuse: event substrate + predictive-execution ─┤ (+ HistoryTable from cloud) + │ + distributed-ci ─┘ +``` + +- **distributed-ci** depends on the event substrate (for streaming + outputs back) and a remote cache (already shipped). It produces + events that the cloud + extensions consume. +- **vx-cloud** is _the historical store_ of those events; it powers + the HistoryTable that **predictive-execution** uses. +- **extension-protocol** is how third-party tools consume the cloud + data and the live event stream. +- **predictive-execution** consumes history (from local or cloud) + and feeds it back into scheduling — closing the learning loop. + +Each individually delivers value: + +- **distributed-ci alone** → free Nx-Cloud DTE for OSS users. +- **vx-cloud alone** → Nx Cloud / Turbo dashboards for OSS users. +- **extension-protocol alone** → ecosystem of community tools. +- **predictive-execution alone** → "the only task runner that learns." + +Together: a closed-loop system where every run improves the next run. + +## 5. The execution sequence (what to ship, in what order) + +The ordering that maximizes value-per-week shipped: + +``` +┌─ Wave 1 (foundations): WHAT'S ALREADY SHIPPED +│ • RunBackend + serviceBackend ✓ +│ • Event bus + busLogger + RunState reducer ✓ +│ • Remote prefetch + in-flight dedup ✓ +│ • vx serve + vx dev hub ✓ +│ • Distributed cache (Turbo-wire-compatible + HMAC) ✓ +│ +├─ Wave 2 (THE NEXT 4 WEEKS): +│ • Predictive scheduling Phase A (HistoryTable revival) +│ └─ delivers immediately: vx info shows history +│ • Extension SDK Phase A (@vzn/vx-client TS) +│ └─ delivers immediately: subscribers work +│ • vx insights Phase A (local SPA over cache.db) +│ └─ delivers immediately: the deleted dashboard revived +│ on top of the substrate that makes it not crash +│ +├─ Wave 3 (THE NEXT 6 WEEKS): +│ • Predictive Phase B (critical-path-from-history scheduler) +│ • Distributed CI Phase A-B (coordinator + multi-worker) +│ • Plugin API Phase D (defineWorkspace.plugins) +│ • MCP server (the agent surface, already roadmapped) +│ +├─ Wave 4 (THE FOLLOWING QUARTER): +│ • Distributed CI Phase C-D (GitHub Actions composite, +│ capability labels, critical-path priority) +│ • vx cloud Phase B-C (data model + self-hosted backend) +│ • Predictive Phase D-E (bandit retry, regression detection) +│ • Extension Phase B-C-E (RPCs, drivers, ref plugins) +│ +└─ Wave 5 (THE LONG ARC): + • vx cloud Phase D-E (multi-tenant, hosted SaaS) + • Distributed CI Phase E (signed manifests, sparse-clone workers) + • Hosted execution (untrusted-worker model) + • Web SPA (full devtool, replaces bridge mode) +``` + +The sequencing principle: **each wave delivers user-visible value +and unblocks the next wave.** Wave 2 is the lightest lift with the +highest immediate payoff — HistoryTable revival, the TS SDK, and a +local-only insights UI all leverage existing primitives. + +## 6. The performance commitments + +What we promise users: + +1. **Cold runs**: ≤ Turbo on every reasonable workload (already + true on the 300-pkg benchmark). Maintain forever. +2. **Warm runs**: ≤ 200ms summary-printed for a 1000-pkg full-cache + run. Already true; maintain. +3. **CI scale-out**: 8-way matrix completes an N-task graph in + `T(serial) / min(8, P)` time where P is the critical path. + Requires Wave 4. +4. **Per-week speedup**: a project on vx with the predictive + scheduler enabled gets 5-15% faster over 4 weeks of usage, + without user changes. Requires Wave 2-3. +5. **Cache hit p50 latency**: ≤ 5ms local, ≤ 50ms remote. Already + the bar; maintain. + +These commitments go into `docs/comparison.md` as the **headline +table** at the top of the doc. + +## 7. The DX commitments + +What we promise users: + +1. **Zero-install onboarding**: `bunx vx migrate` in any Turbo or Nx + monorepo emits a working `vx.config.ts` + report. Already true. +2. **One-flag distributed**: `vx run --coordinator ` is the + _only_ knob needed to go distributed. Workers join with `vx run +--worker `. No YAML, no orchestration files. +3. **Live insights**: `vx insights` opens a browser to a UI of the + user's runs. No login, no upload, no cloud account. +4. **Optional everything**: cloud, hosted, distributed, predictive, + extensions — every one is opt-in. The local-only path stays + stable forever. +5. **Agent-native**: an LLM can run `vx mcp` and use typed tools + instead of parsing terminal output. + +## 8. The openness commitments + +What we promise _the ecosystem_: + +1. **All protocols are SemVer-published**: wire schemas live in + `protocol.ts`, validated by `valibot`, versioned. +2. **OSS reference impl for every layer**: cache server, cloud + backend, coordinator, worker — all shipped from this repo. +3. **No vendor lock-in**: Turbo-wire-compatible cache means a team + on Turbo can use our cache; a team on vx can use Turbo's cache. +4. **Plugin API is part of the public contract**: not "extensions" + that break in 6 weeks. +5. **No hosted-only features**: if it ships on the SaaS, it ships + in the self-hosted binary at the same version. + +## 9. The competitive picture + +| Capability | Turbo | Nx (OSS) | Nx Cloud | vx (today) | vx (north star) | +| ----------------------------- | -------- | -------- | -------- | ---------- | --------------- | +| Local task graph + cache | ✓ | ✓ | ✓ | ✓ | ✓ | +| Remote cache | ✓ | ✓ | ✓ | ✓ | ✓ | +| Distributed CI execution | ✗ | ✗ | ✓ (paid) | partial | ✓ (OSS) | +| Web analytics | ✓ (paid) | ✗ | ✓ (paid) | ✗ | ✓ (OSS) | +| Self-hostable analytics | ✗ | ✗ | ✗ | ✗ | ✓ | +| Predictive scheduling | ✗ | ✗ | ✗ | ✗ | ✓ | +| Public extension protocol | ✗ | partial | ✗ | ✗ | ✓ | +| Agent-native (MCP) | ✗ | ✗ | ✗ | planned | ✓ | +| Wire interop with competitors | ✗ | ✗ | ✗ | ✓ (Turbo) | ✓ | +| Bun runtime (fast) | ✗ | ✗ | ✗ | ✓ | ✓ | + +The north star is a strict superset. + +## 10. The risk picture (honest) + +Three risks materially above zero: + +### 10.1 Scope blowout + +The single biggest threat. The proposals collectively describe +~6 months of work for a team. They're _individually_ shippable; the +risk is dilution — work in flight on too many fronts. Mitigation: +**Wave 2 first**, validate the substrate with HistoryTable + SDK + +local insights before any cloud work begins. Don't open multiple +big fronts. + +### 10.2 The hosted business + +Hosted requires real engineering (auth, billing, multi-tenancy) AND +real ops (uptime, on-call, support). We can ship the self-hosted +binary as Wave 4-5 and _defer_ hosted indefinitely. The OSS story +stands alone. + +### 10.3 Plugin compat over time + +Every API we expose becomes an obligation. Mitigation: SemVer the +wire. Major breaks are allowed at major-version boundaries; we +ship a `vx upgrade` tool that helps migrate. Same posture as Turbo +between 1.x and 2.x. + +## 11. The non-goals (still) + +Things we explicitly do not do: + +- A general-purpose CI system. (We're a task runner; CI providers + drive us.) +- A package manager. (Bun + pnpm cover this.) +- A language-level build tool. (esbuild + bun build cover JS; + cargo/tsc/etc. cover the rest. We orchestrate them.) +- A code editor. (Our IDE story is plugins consuming the wire.) +- A scheduler for non-vx workloads. (Tasks are vx tasks.) + +The discipline of not building these is what makes the system +coherent. + +## 12. Recap — what we're really building + +We're not "another task runner." We're building **the substrate for a +fast, observable, learning, distributed build system, with the OSS +reference impl of every layer**. The path: + +1. The orchestrator is content-addressed and event-driven. +2. The event stream is the protocol. +3. Every surface is a subscriber on that protocol. +4. History closes the loop — the scheduler learns from itself. +5. Distributed execution is the same protocol on more workers. +6. Cloud observability is the same protocol persisted. +7. Extensions are first-class consumers, not afterthoughts. +8. Hosted is convenience; OSS is the product. + +The order matters. The substrate has to exist before the surfaces +can ship. We've shipped most of the substrate; now we ship the +surfaces. Each one — local insights, distributed CI, predictive +scheduling, the SDK, the cloud — is a flagship feature that lands on +top of the same plumbing. + +That's the next 6 months. Then the system funds itself. diff --git a/docs/design/architecture-review-2026-06.md b/docs/design/architecture-review-2026-06.md new file mode 100644 index 0000000..694ccaf --- /dev/null +++ b/docs/design/architecture-review-2026-06.md @@ -0,0 +1,533 @@ +# Architecture review — sharpening the five proposals + +Status: review pass (2026-06-20). Reads the five north-star proposals +(`architecture-north-star-2026-06.md`, `distributed-ci-2026-06.md`, +`vx-cloud-2026-06.md`, `extension-protocol-2026-06.md`, +`predictive-execution-2026-06.md`) against (a) two parallel research +passes on third-party tooling and (b) the cross-doc duplication + +contract-gap matrix. Answers: what to simplify, what to merge, what +to fix, what to import from outside instead of building, and what to +commit to first. + +## 0. Executive verdict + +The five proposals are individually coherent and compose well. The +review surfaced **two structural simplifications**, **four sharpening +moves**, **three bugs/contradictions**, and **two strategic +imports** (one wire-format consolidation, one storage-abstraction +factoring). One major pivot: **vx Cloud is now Cloudflare-native** +(Workers + R2 + D1 + Durable Objects + Queues), template-spawnable +from `apps/cloud/`. The PostgreSQL+S3 framing in the original doc is +out. + +| # | Proposal | Verdict | Action | +| --- | ----------------------- | ---------------------------------- | --------------------------------------------------------------------- | +| 1 | event-stream (shipped) | ✓ foundation is sound | formalize WireEvent as JSON-RPC 2.0 + OTel-LogRecord-shaped | +| 2 | execution-service | ✓ shipped + dedup landed | rename roles to match distributed-ci | +| 3 | distributed-ci | ✓ feasible, contract needs work | fold `worker:*` into `protocol.ts`; add hybrid-exec racing from Buck2 | +| 4 | vx-cloud | **PIVOT**: now Cloudflare stack | rewrite §4/§8 (done in this pass); template-spawnable | +| 5 | extension-protocol | ✓ feasible; over-scoped at phase F | trim from 7 phases to 3; collapse plugin + subscriber into one model | +| 6 | predictive-execution | ✓ feasible, gated on data | downgrade "default-on" Phase F to "future" | +| 7 | architecture-north-star | ✓ synthesis stands | update with CF pivot + wire-format consolidation | + +The single biggest unlock is consolidating the wire format: **one +envelope (JSON-RPC 2.0), one event shape (OTel LogRecord-flavored), +one transport family (WS for bidir + SSE for read-only + NDJSON for +scripts)**. Everything else — MCP, A2A, devframe, birpc, custom +clients — interops with that. + +The single biggest external import is **OpenTelemetry CI/CD semantic +conventions** for the event stream. Lift it, and every observability +platform (Grafana, Honeycomb, Datadog, Tempo) speaks vx out of the +box. The single biggest internal refactor is **lifting `Digest` +into the cache** — make `(hash, sizeBytes)` the explicit CAS key +type, so storage backends (local FS, R2, S3, REAPI CAS) become +pluggable without orchestrator changes. + +--- + +## 1. Duplications across the five docs (consolidation list) + +Five themes get restated in multiple docs with slightly different +words. Each is a candidate for **one canonical paragraph + cross-link +from the others**: + +| Theme | Restated in | Canonical home | +| ----------------------------------------- | ------------------------------------------------------------------------------ | ------------------------------------------- | +| "Everything's a subscriber on the bus" | event-stream §1, vx-cloud §9, extension-protocol §14, north-star §3.2 | **event-stream §1**; others link | +| "Content addressing is the only identity" | distributed-ci §4, north-star §3.1, vx-cloud §5 (cross-org), event-stream §4.2 | **north-star §3.1**; others link | +| "Fail-safe to local, never block" | distributed-ci §6, vx-cloud §11, extension-protocol §10, north-star §3.3 | **north-star §3.3**; others link | +| "Shell is the API for tasks" | extension-protocol §12, north-star §3.4 | **north-star §3.4**; CLAUDE.md already pins | +| WireEvent + RunState description | event-stream §4, extension-protocol §3, vx-cloud §3 | **event-stream §4** is authoritative | + +Action: in a follow-up PR, replace the duplicated paragraphs with a +one-liner + link to canonical. Net negative LOC across the proposal +set; less drift risk. + +--- + +## 2. Contract sharpening (where boundaries leak today) + +### 2.1 distributed-ci `worker:*` messages vs `protocol.ts` + +`distributed-ci-2026-06.md §4` introduces `WorkerToCoord` / +`CoordToWorker` types, but does not say how they relate to the +existing `protocol.ts` `Server|ClientMessage` enum. Without +unification we'd end up with two parallel wire enums. + +**Resolution.** Make the coordinator wire one **extension** of +`protocol.ts`. Three roles share one envelope; the `t` field is +shared namespace. Worker messages prefix with `worker:`, coordinator +messages with `task:` / `coord:`, today's run delegation stays under +`run:` / `event:` / `result:`. One `valibot` schema, three +role-shaped subtypes selected by an initial `hello` message. + +### 2.2 extension-protocol channels vs the bus + +extension-protocol §3 introduces `vx:events`, `vx:run`, `vx:rpc`, +`vx:submit` as "channels" but never says how they map to the +transport. Today devframe provides streaming channel + shared state. +After cutting devframe (see §5 below), the mapping needs to be +explicit. + +**Resolution.** Channels are **JSON-RPC 2.0 methods + notifications** +over the chosen transport (WS/SSE/NDJSON): + +| Channel | JSON-RPC method | Direction | +| ----------- | -------------------------------------------------------------- | --------------- | +| `vx:events` | notification `events.append` | server → client | +| `vx:run` | request/response `state.snapshot` + notification `state.patch` | both | +| `vx:rpc` | request/response `` | client → server | +| `vx:submit` | request `submit.run` + stream | client → server | + +This means: **one wire**, four logical surfaces. Today's +`WireEvent` becomes the params of `events.append`; today's +`RunState` becomes the result of `state.snapshot`. No new format, +no new framing — JSON-RPC 2.0 IS the framing. + +### 2.3 vx-cloud `run_events` blob vs WireEvent JSON + +vx-cloud §3 introduces `run_events.event_json TEXT` storing +serialized `WireEvent`s. But `TaskOutcome.wallclockStartNs` is a +bigint, and `JSON.stringify` THROWS on bigints (verified blocker +from event-stream-2026-06.md §4.2). The existing `toWireEvent` +projection already handles this (decimal-string ns). + +**Resolution.** The cloud only persists **`WireEvent` (the projected +form)** — never `RunEvent` raw. Document the rule: the boundary +between in-process and any storage/wire form is `toWireEvent`. The +inverse `fromWireEvent` (live in `wire-render.ts`) rebuilds for +consumers. Two-step pipeline: `RunEvent → toWireEvent → JSON → +storage → JSON → wire-render → terminal`. + +### 2.4 predictive-execution `HistoryTable` data source + +§4 introduces `HistoryTable` but is fuzzy about WHO loads it. For +local runs it's `cache.db`. For cloud-aware runs it's an RPC. The +contract should be **one loader interface with two implementations**: + +```ts +type HistoryProvider = { + loadFor(taskIds: string[]): Promise +} +``` + +- `LocalHistoryProvider` — runs the SQL CTE the deleted TUI + prototyped. +- `CloudHistoryProvider` — calls `vx:rpc / getTaskHistory` over the + extension protocol (i.e. an inspector). + +`prepareRun` picks the provider based on the same logic that selects +the cache backend (env var or workspace config). The HistoryTable +interface stays. + +--- + +## 3. Bugs / contradictions to fix in the docs + +### 3.1 distributed-ci `worker:pull` missing identity + +§4 shows `{ t: 'worker:pull'; available: number }`. The coordinator +needs to know which worker is asking. Either the workerId is implicit +from the WS connection (one connection per worker — the realistic +case, fine), or it should be on the message. Document the assumption. + +### 3.2 vx-cloud schema assumed PostgreSQL; now D1 + +The schema in §3 used `INTEGER PRIMARY KEY` and `ROW_NUMBER() OVER +(PARTITION BY ...)`. D1 is SQLite — both are supported, but the +JSONB column (`event_json` was implicitly JSONB-shaped) does NOT +have native JSONB on D1. Resolution: store as TEXT, use SQLite's +JSON1 extension functions for query. (Already shipped in CF D1; no +schema change needed.) + +### 3.3 extension-protocol Phase F "default-on" promises a perf gain not yet measured + +predictive-execution §11 Phase F says "Default-on for `predictive`. +The improvements are universal enough to make non-opt-in." This is +unfalsifiable today — we have no data. Downgrade to: _"Phase F: +gated on six months of telemetry showing > 5% wall-time improvement +on representative workloads with zero observed regressions; until +then, opt-in via `defineWorkspace({ predictive: true })`."_ + +--- + +## 4. Simplifications (collapse parallel concepts) + +### 4.1 In-process plugins ≡ WS subscribers — one Plugin contract + +extension-protocol §5 introduces `Plugin` for in-process and §3 +introduces `subscriber` for over-the-wire. They have **identical +contracts**: receive events, optionally expose RPC methods. + +Collapse to **one** `Plugin` interface that ships in two flavors: + +```ts +type Plugin = { + name: string + setup(ctx: PluginContext): void | Promise +} +// In-process: registered via defineWorkspace({ plugins: [...] }) +// WS: a thin shell that proxies the same lifecycle hooks over JSON-RPC +``` + +The remote-WS Plugin runs as a sandboxed client of `vx serve`; the +in-process Plugin runs in the same Bun process. The lifecycle hooks +(`onRunStart`, `onTaskStart`, …) are identical. Net: one mental +model, less doc. + +### 4.2 Coordinator (distributed-ci) ≡ RunCoordinatorDO (vx-cloud) + +Same role: per-run state holder + ready-queue + WS fan-out. Different +deployment targets (Bun process for self-hosted; Durable Object for +CF cloud). Document the shared contract as a class interface; ship +two implementations. The Durable Object form gets WebSocket +Hibernation (sleeps between events; no cost); the Bun form gets +process-local state. + +### 4.3 Cache layer cluster: lift `Digest` as the explicit key type + +Today `CacheLayer` operates on string hashes. Three of the proposals +add storage backends (R2 for cloud, REAPI CAS for hypothetical Bazel +interop, FS for self-hosted). Cleaner shape (from the Bazel research +import): + +```ts +type Digest = { hash: string; sizeBytes: number } +type CASBackend = { + put(digest: Digest, bytes: ReadableStream): Promise + get(digest: Digest): Promise + has(digest: Digest): Promise +} +// CacheLayer composes a CASBackend + an entries index (D1 or SQLite) +``` + +`sizeBytes` IS the truncation check; we lose nothing by carrying +it. The `Cache` becomes `CASBackend(local FS) + entries(SQLite)`; +`RemoteCache` becomes `CASBackend(HTTP)` + remote entries metadata. +Pluggable from day one. + +### 4.4 The seven extension-protocol phases collapse to three + +The original §11 phasing was: A (SDK) → B (RPC) → C (driver) → D +(in-proc plugin) → E (RPC plugin) → F (Python SDK) → G (MCP). +Seven phases is process theatre. Real shape: + +- **Phase 1**: Bus surfaces over JSON-RPC 2.0 (subscriber + inspector + + driver, one wire). MCP adapter ships with it (free, the JSON-RPC + envelope IS the MCP envelope). +- **Phase 2**: in-process Plugin API (Vite-style lifecycle hooks) + via `defineWorkspace({ plugins: [...] })`. Reference impl: the + terminal renderer becomes the first built-in plugin. +- **Phase 3**: language SDKs (TS + Python). Bonus, not load-bearing. + +The original §11 padded out distinct surfaces (inspector/driver/ +subscriber/plugin) that all share one bus, one envelope, one +runtime. Three-phase plan is honest. + +--- + +## 5. Third-party adoption decisions (synthesis of both research passes) + +### 5.1 Adopt (3) + +| Tool | What for | Cost | +| -------------------------------------------- | -------------------------------------------- | --------------------------------------------------------------------------------- | +| **`@modelcontextprotocol/sdk`** | `vx mcp` — agents talk to vx as a typed tool | One dep; matches July-2026 stable | +| **Hono** | `vx serve` HTTP + the CF Workers app | One dep; ~14KB; Bun-native + Workers-native | +| **OpenTelemetry CI/CD semantic conventions** | Event stream / span shape | Vendoring the field-name spec; zero runtime dep until a `@vx/otel-bridge` package | + +### 5.2 Inspire (8) + +| Idea | From | Adopt how | +| ----------------------------- | ----------------------------- | ----------------------------------------------------------------------------- | +| **CAS digest model** | Bazel REAPI | Lift `Digest = (hash, sizeBytes)` into cache; see §4.3 | +| **Hybrid execution / racing** | Buck2 | Coordinator races local vs remote workers; first-to-respond wins | +| **JSON-RPC 2.0 envelope** | MCP + A2A + birpc convergence | Single wire envelope for all four extension channels | +| **OTel LogRecord shape** | OTel logs-replacing-events | `WireEvent` fields: `time, severityNumber, body, attributes, traceId, spanId` | +| **Vite plugin lifecycle** | Vite / Rollup | `onRunStart / onTaskStart / onTaskComplete / onCacheLookup / onRunEnd` | +| **BuildBuddy results UI** | BuildBuddy product | Reference for `vx insights` flamegraph + per-run page | +| **Nix narinfo sidecar** | Nix store | Optional signed metadata next to cache artifacts (multi-tenant trust) | +| **A2A protocol shape** | Linux Foundation A2A | Cloud-side run delegation envelope; future | + +### 5.3 Skip (12) + +REAPI on the wire (wrong granularity for our resolved-config key); +BuildBarn / NativeLink as backends (REAPI-only); NATS / JetStream +(adds a sidecar process; our scale doesn't need it); Temporal / +Restate (we're seconds-to-minutes, not days); Cap'n Proto / +FlatBuffers / msgpack (JSON+valibot is faster than our event rate +needs); Apache Arrow / Parquet / Postgres in core (SQLite + D1 +holds it); WebTransport (no Safari); WebContainers in core docs +(Bun doesn't run in WC); Effect-TS (system-wide commitment; +overkill for 19 deps); Phoenix LiveView / Electric (wrong runtime); +Sentry / PostHog as our backend (build telemetry isn't their +shape); Buildkite / GHA worker protocol (locked to their control +planes). + +### 5.4 Plan exit (1) + +**devframe.** Useful as a quick first surface; not load-bearing. +Risks identified in the research pass: single-author 0.x, three +rough edges hit in first integration, ~33 packages of closure with +a native TS parser inside. Plan: by 0.6, the bus must work with +**Hono + raw WS** as the default transport, devframe gated behind +an explicit `--devframe` opt-in. One-file removal when the time +comes; the in-process bus is already devframe-agnostic. + +--- + +## 6. The Cloudflare cloud pivot + +The original `vx-cloud-2026-06.md` framed the backend as +"PostgreSQL + S3, with Helm + docker-compose for deploy." That's +gone. The new framing (already applied in this review pass): **the +cloud is a Cloudflare Workers project at `apps/cloud/`, +template-spawnable into any user's CF account in ~5 minutes**. + +Stack mapping: + +| Concern | CF primitive | Why | +| ------------------------ | ------------------- | -------------------------------------------------------------------------- | +| Stateless HTTP | **Workers** | Edge-distributed, scale-to-zero, free tier covers most teams | +| Cache artifacts | **R2** | S3-compatible API, **zero egress fees** — changes the read:write economics | +| Relational (orgs, runs) | **D1** | SQLite at edge; 10GB free per database | +| Per-run state + WS | **Durable Objects** | Stateful actors; WebSocket Hibernation = no $/idle | +| In-flight dedup | **Durable Objects** | One DO per task hash; content-addressed naturally | +| Event ingest buffer | **Queues** | Absorb CI run spikes; batch into D1 | +| Token + flag cache | **KV** | Sub-ms global reads of small hot data | +| External Postgres escape | **Hyperdrive** | When a team outgrows D1; same code, different binding | + +**Why this pivot is structurally important**: the friction floor +for "evaluate vx cloud" drops from "provision Postgres + S3 + +container orchestrator + on-call" to `git clone && bun wrangler +deploy`. The OSS-first promise that the original doc made (the +hosted runtime IS the OSS runtime) is even stronger here — there is +no proprietary glue; the SaaS is one CF account deployment of the +same code, no special configuration. + +**Implication for distributed-ci**: the coordinator gains a third +deployment target — running INSIDE a Durable Object on the user's +own CF account. For "trigger a distributed CI run from any +provider," that's a free coordinator with global reach and no +infrastructure. The existing in-process and `vx serve` coordinator +forms stay. + +**Risk surfaced**: Workers' 30s CPU-time per request cap. Irrelevant +for any single request we make (cache GET/PUT, event ingest, RPC +calls all complete in ms), but **a long-running coordinator must +live in a Durable Object**, not a plain Worker handler. The +architecture handles this; document it. + +--- + +## 7. The wire-format consolidation (most leverage in the smallest PR) + +The current state has THREE event shapes in flight: + +1. `RunEvent` (in-process, has bigints) +2. `WireEvent` (post-projection, JSON-safe) +3. `ServerMessage|ClientMessage` (`protocol.ts` envelope for `vx serve`) + +Plus future surfaces want: 4. MCP tool result framing (JSON-RPC 2.0) 5. A2A inter-agent envelopes (JSON-RPC 2.0) 6. OTel exporter output (OTLP) 7. devframe channels (currently in vx) + +That's 7 framings. Consolidate to **two** (in-process `RunEvent` for +type fidelity, wire JSON-RPC 2.0 with `WireEvent` as the param +payload). Map every transport to the JSON-RPC envelope: + +- **WS**: `{"jsonrpc":"2.0","method":"events.append","params":{event}}` per frame. +- **SSE**: `event: events.append\ndata: {...}` per message. +- **NDJSON**: one JSON-RPC envelope per line. +- **MCP**: already JSON-RPC 2.0 — direct passthrough. +- **A2A**: already JSON-RPC 2.0 — direct passthrough. +- **OTLP**: one-way adapter in `@vx/otel-bridge` that turns + `events.append` into LogRecord; never in core. + +The payload of `events.append` matches OTel LogRecord shape (per +the research pass — `time, severityNumber, body, attributes, +traceId, spanId`). We get OTel-shape and MCP/A2A interop in one +move. + +**Concrete next step (own design doc)**: write +`docs/design/wire-protocol-2026-06.md` codifying this — one short +doc that pins (a) the JSON-RPC envelope, (b) the LogRecord-shaped +payload, (c) the four channel methods, (d) the three transports. +Then everything else follows mechanically. + +--- + +## 8. Feasibility audit (per-proposal verdict) + +### 8.1 distributed-ci — feasible; CI integration is the risk + +The protocol extension is straightforward (1-2 weeks). The risk is +the **CI integration story** (§7.1 GHA composite). Cross-runner +networking on GHA-hosted runners requires either Tailscale (free +tier, well-known), ngrok/cloudflared (works), or self-hosted +runners (best). The composite action needs to handle the tunnel +setup elegantly. Plan: ship Phase A-B (in-process + multi-worker) +WITHOUT the GHA composite first; ship the composite once we know +which tunnel solution survives a year of GHA changes. + +### 8.2 vx-cloud — feasible AFTER the CF pivot; risk dropped substantially + +Pre-pivot the risk was real (we'd be building a distributed system +with Postgres + auth + multi-tenancy + Helm). Post-pivot, the +moving parts are: a few Workers, one D1 schema, one R2 bucket +prefix scheme, two Durable Object classes. Estimated impl size: +1500-2500 LOC over 3-4 weeks for Phases A-C. Risk shifted from +"can we operate this?" to "is CF the right pick?" — and the +**template-spawnable angle inverts the question**: users operate +their own; we operate the SaaS as one instance among many. + +### 8.3 extension-protocol — feasible; Phase 1 is the only commit + +Post-consolidation (§4.4), Phase 1 ships subscriber + inspector + +driver + MCP all at once on the unified JSON-RPC 2.0 envelope. +Estimated impl size: 600-1000 LOC (mostly transport adapters; the +bus already exists). Phase 2 (plugin API) is a `defineWorkspace` +extension. Phase 3 (SDKs) is a community contribution opportunity. + +### 8.4 predictive-execution — feasible; data dependency is the gate + +HistoryTable revival is ~200 LOC of SQL + a memoizing loader. The +scheduler integration is ~100 LOC of priority override. Bandit +retry is ~150 LOC. Regression detection is ~250 LOC. **Total ~800 +LOC across the whole proposal** — the smallest of the five. The +gate is real-world data: we need 4-8 weeks of observed runs before +we can show a wall-time improvement. + +### 8.5 architecture-north-star — feasible as synthesis + +The six-layer spine holds. With the CF pivot and the wire-format +consolidation applied, the layer boundaries get sharper: layer 5 +(event substrate) is now formally JSON-RPC 2.0; layer 6 (surfaces) +all consume the same envelope; the cache layer cluster gets a +clean `Digest` + `CASBackend` shape. + +--- + +## 9. Revised wave plan (replaces north-star §5) + +Reordered to reflect (a) the CF pivot collapses cloud-Phase-C-D +work and (b) the wire-format consolidation unblocks four downstream +surfaces in one PR. + +### Wave 1 — Already shipped + +- RunBackend + serviceBackend, event bus + busLogger + RunState + reducer, remote prefetch + in-flight dedup, `vx serve` + `vx dev` + hub, distributed cache (Turbo-wire-compatible + HMAC). + +### Wave 2 — Next 4 weeks (small PRs, high leverage) + +- **`docs/design/wire-protocol-2026-06.md`** — codify JSON-RPC 2.0 + envelope + OTel LogRecord payload shape (1-day doc). +- **`Digest` + `CASBackend` refactor** in `src/cache/` (~200 LOC, + byte-identical behaviour). Unblocks R2 backend later. +- **HistoryTable revival** behind a `vx info --history` flag for + early validation. SQL CTE is already prototyped. +- **Hono migration** of `vx serve` HTTP routes (replaces direct + `Bun.serve` handlers — better SSE/WS ergonomics + readiness for + the CF target). + +### Wave 3 — Weeks 5-8 + +- **`vx insights serve`** — local SPA over `cache.db` (Solid + + UnoCSS + DuckDB-WASM for client-side analytics). Replaces the + cloud-Phase-A surface; ships standalone. +- **`vx mcp`** — JSON-RPC 2.0 / MCP server exposing + `runTasks / getRunState / explainCacheKey / whyDidThisRerun`. +- **Predictive Phase B** — history-aware critical-path priority in + the scheduler, opt-in via `defineWorkspace({ predictive: true })`. + +### Wave 4 — Weeks 9-16 + +- **`apps/cloud/` scaffold** — Wrangler-managed Workers project + with D1 + R2 + DOs + Queue + KV bindings. Reference deploy via + README; documented as the template. +- **Distributed CI Phase A-B** — coordinator + multi-worker with + the JSON-RPC envelope (no GHA composite yet). +- **Plugin API** — Vite-style lifecycle hooks on + `defineWorkspace.plugins`; terminal renderer migrated as the + first built-in plugin. + +### Wave 5 — Weeks 17-26 + +- **vx cloud template promotion** — published to `cloudflare/ +templates` registry; `npx create-cloudflare vx-cloud` works. +- **Distributed CI Phase C-D** — GHA composite + capability labels + - critical-path priority. +- **`@vx/otel-bridge`** package — one-way exporter for the OTel CI/CD + conventions. + +### Wave 6 — Long arc (no commitments yet) + +- Hosted SaaS at `cloud.vx.dev` (one CF account deployment of the + template). Trial + paid tiers. +- Signed manifests + sparse-clone workers (distributed-ci §9 trust + model). +- A2A inter-agent envelopes for cloud-side delegation. + +--- + +## 10. Five carved-in-stone rules (revised) + +Replaces north-star §3 with rules that now reflect the imports + +consolidations: + +1. **Content addressing is the only identity.** Every task has a + `(hash, sizeBytes)` Digest. Storage backends are pluggable; the + key is constant. +2. **One envelope, many transports.** JSON-RPC 2.0 frames every + wire message. Transports (WS / SSE / NDJSON / MCP / A2A / + OTLP-bridge) are encoded outside the envelope. +3. **The event stream is the protocol.** OTel-LogRecord-shaped + payloads; the bus is fire-and-forget; consumers handle + backpressure on the consumer side. +4. **Fail-safe to local.** Every external dependency (remote + cache, coordinator, cloud uploader, plugin) degrades to a + local-only run. The local path is THE path. +5. **Shell is the API for tasks.** Plugins observe and submit; + they never redefine what executing a task means. + +(Rule 5 from the old §3 — "validate at boundaries, trust the +inside" — folds into rule 2: the JSON-RPC envelope IS the boundary, +validated once at deserialization.) + +--- + +## 11. What this review IS NOT + +- Not a green-light to start every Wave 2 item at once. Pick one + (the wire-protocol doc — it unblocks four downstream surfaces). +- Not a commitment to ship every proposal in the five docs. The + vision frame stands; the implementation order is now realistic. +- Not a rewrite of the proposals. Section-level edits to vx-cloud + (CF pivot) have already landed in this PR; the others get edits + in a follow-up keyed to (a) the wire-protocol doc, (b) the + HistoryTable provider interface, (c) the Plugin contract + collapse. + +The review's deliverable is **this doc + the CF pivot edit to +vx-cloud**. Everything else is incremental, owner-scheduled. diff --git a/docs/design/distributed-ci-2026-06.md b/docs/design/distributed-ci-2026-06.md new file mode 100644 index 0000000..ff5ccf2 --- /dev/null +++ b/docs/design/distributed-ci-2026-06.md @@ -0,0 +1,350 @@ +# Distributed execution on CI — the killer-feature roadmap + +Status: proposal (2026-06-20). Owner ask: "distributed tasks execution +on CI easily." Builds on `execution-service-2026-06.md` (the pluggable +`RunBackend` + `vx serve` foundation) and the cache layer cluster +(local + remote, Turbo-wire-compatible). + +## 1. The one-paragraph pitch + +A `vx run` on CI should saturate the available compute, regardless of +whether that compute is a single 64-core runner or fifty 2-core +runners spread across a matrix. Today every runner re-executes the +same graph in isolation, racing only against the remote cache for +hits. Tomorrow, a CI job posts its graph to a **coordinator**, every +matrix worker registers as an executor, and the coordinator dispatches +ready tasks to the least-loaded worker. Content addressing makes work +**fungible** — any worker producing artifact `` satisfies every +consumer of `` — so the system has no notion of "this task +belongs to this runner." The execution graph is one global queue, and +matrix parallelism becomes a perf knob the user dials without +restructuring their pipeline. + +This generalizes the foundation that already exists: `serviceBackend` +already submits a `RunRequest` to an arbitrary origin and receives a +streamed `WireEvent` log + final `RunResult`. We extend it from +"one client, one service" to "many clients submit, many workers +execute, one coordinator routes" — same protocol, same content +addressing, fundamentally more parallelism. + +## 2. Why this matters (vs. Turbo / Nx) + +**Turbo Remote Cache** ships hits but never executes work for you. +A 30-package CI build on Turbo is a 30-package serial-or-shard exercise +on your CI host; the remote cache eliminates redundant _recompilation_ +across runs, never _intra-run_ parallelism beyond `--concurrency`. + +**Nx Cloud DTE** (distributed task execution) does ship this, but it +is a hosted-only commercial product. The OSS Nx CLI does not include a +self-hostable agent protocol; if you want DTE on your own infra you +write it yourself. + +**vx's wedge**: ship a **self-hostable, OSS, Turbo-wire-compatible** +distributed-execution layer that runs on any CI provider (GitHub +Actions, GitLab CI, Buildkite, CircleCI, self-hosted Jenkins) with +zero new infrastructure beyond "a coordinator process that lives +during the build." Composes with the existing remote cache so warm +hits short-circuit dispatch entirely. Free DTE for everyone. + +## 3. Topology + +Three roles, all running the same `vx` binary in different modes: + +``` + ┌──────────────────────────────────────────┐ + │ coordinator (one per CI run/job) │ + │ • global ready-queue │ + │ • assignment policy │ + │ • run state + event fan-out │ + │ • cache-aware (asks "is hash present?") │ + └────┬─────────────────┬───────────────────┘ + │ submit / stream │ subscribe (CI log + dashboard) + ▼ ▲ +┌─────────────────┐ │ ┌────────────────┐ ┌────────────────┐ +│ submitter │──┤ │ worker[0] │ │ worker[N] │ +│ (`vx run`) │ │ │ vx run --serve │ │ vx run --serve │ +│ builds graph │ │ │ --worker │ │ --worker │ +│ attaches to │ │ │ │ │ │ +│ coordinator │ │ │ pulls ready │ │ pulls ready │ +└─────────────────┘ │ │ task; spawns; │ │ task; spawns; │ + │ │ uploads cache │ │ uploads cache │ + │ └────────────────┘ └────────────────┘ + │ │ │ + │ ▼ ▼ + │ ┌───────────────────────────────────┐ + │ │ remote cache (existing layer) │ + │ │ shared store of .tar.zst │ + │ └───────────────────────────────────┘ + │ + ▼ + GitHub Actions log (stream) +``` + +**Coordinator** owns the global state: graph + cache key per node, the +ready frontier, which nodes are assigned/in-flight on which worker, +and the streaming event log. It runs on the CI job's "primary" runner +(any container with a port; on GHA, the matrix index 0 runner is fine) +and exits at end-of-graph. + +**Workers** are stateless and fungible. Each runs `vx run --worker +`, opens a single websocket, registers with `{ capacity: +, capabilities: }`, then loops: pull next task, +spawn, stream output to coordinator, save to (local + remote) cache, +ack. No worker needs to know about any other worker. + +**Submitter** is the CI script that wants the build done. It runs +`vx run lint test build --coordinator `, which is the +existing `serviceBackend` with a different transport variant. The +submitter sees the same `WireEvent` stream + framed output the local +backend produces today. + +These roles **can collapse**: on a single-machine `vx serve`, one +process is coordinator + worker + submission target. The matrix +expansion is identical code paths, just more workers attached. + +## 4. The wire (extension of today's protocol) + +`orchestrator/protocol.ts` already defines `Server|ClientMessage` over +WS. We extend with two new message families — coordinator↔worker and +coordinator↔submitter — designed so a v0.5 client (today's `vx serve`) +keeps working unchanged: + +```ts +// Coordinator-side messages (NEW) +type WorkerToCoord = + | { t: 'worker:hello'; workerId: string; capacity: number; labels: string[] } + | { t: 'worker:pull'; available: number } // backpressure-aware pull + | { t: 'worker:start'; taskHash: string; pid?: number } + | { t: 'worker:stdout' | 'worker:stderr'; taskHash: string; chunk: string } + | { t: 'worker:done'; taskHash: string; outcome: WireOutcome } // outcome carries: exit, cpu, rss, cache provenance + | { t: 'worker:bye'; reason: 'idle-timeout' | 'shutdown' } + +type CoordToWorker = + | { t: 'task:assign'; node: WireTaskNode; hash: string } + | { t: 'cache:exists'; hash: string; present: boolean } // pre-spawn shortcut + | { t: 'drain' } // graceful shutdown +``` + +Submitter↔coordinator reuses today's `RunRequest` → streamed +`WireEvent` → `RunResult` exactly. The submitter doesn't know it's +distributed; only the coordinator side changes. + +**Content addressing is the invariant.** Every message keys off +`taskHash` (the existing pure-input v22 hash). The coordinator never +needs to "track" a task across runners — when `worker:done` arrives +with hash `H`, every downstream node that folds `H` becomes a +candidate for the ready queue, regardless of which worker produced it. +Output bytes live in the cache (local→remote), keyed by `H`, so the +next consumer that needs them pulls from there. The coordinator +forwards _log_ output back to the submitter; it does not move artifact +bytes. + +## 5. Assignment policy + +Naive first: **least-loaded** (worker with the most free capacity +gets the next ready task). Sufficient for homogeneous matrices. Three +extensions land as evidence demands: + +1. **Capability labels.** A worker registers `labels: ['linux-x64', +'docker', 'gpu']`; a task can declare `runOn: ['gpu']` in its + config and the coordinator only assigns to matching workers. Same + shape as GitHub Actions `runs-on`; we adopt the syntax to remove a + concept users already know. +2. **Cache-affinity hints.** A worker reports `recentHashes: [...]` + (top-K LRU of locally-cached hashes) on each pull. When two + workers can take a task and one already has the upstream's + artifact warm in its local cache, prefer that worker — saves a + remote download. Pure perf, no correctness implication. +3. **Critical-path priority.** The coordinator runs the topo-DP + critical-path computation on the graph and prioritizes nodes on + the longest remaining path. Today's scheduler does this in-process; + the coordinator does the same across workers. + +Priority + assignment is the _only_ moving piece in the coordinator. +Everything else is bookkeeping. + +## 6. Failure modes (cataloged, each with a deliberate behavior) + +| Failure | Behavior | +| --------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | +| Worker disconnects mid-task | Task transitions back to `ready`; reassign. Output of a dead task is uncacheable (process gone), so this is safe — we just re-execute on another worker. | +| Worker disconnects after `done` | Outcome is already in the coordinator + the artifact is already saved to the remote cache (worker uploads before `worker:done`). No loss. | +| Coordinator dies | Whole run dies. The submitter receives `{ t: 'error' }` and falls back to local (the existing fail-safe). Acceptable — coordinator owns the run. | +| Submitter dies | Coordinator detects WS close; **continues** the run (artifacts still go to remote cache for future runs to hit). This is the in-flight-dedup pattern from `execution-service-2026-06.md` generalized to dropped clients. | +| Network partition between workers | Workers don't talk to each other; nothing to partition. | +| Coordinator OOM on huge graphs | Per-task state is small (hash + status + slot). 100k tasks ≈ a few MB. Not a near-term concern. | + +The cache layer's existing **never-fail** rule (remote errors → local +miss, run continues) extends naturally: a worker that can't reach the +remote cache uploads on retry queue; if it never recovers, the +artifact is local-only and the next consumer re-executes. Correctness +is preserved; only perf degrades. + +## 7. The CI integration story (what the user types) + +### 7.1 GitHub Actions (the dominant case) + +A reusable composite action `vx/distributed-action@v1` (we ship it) +encapsulates the dance: + +```yaml +jobs: + build: + strategy: + matrix: + worker: [0, 1, 2, 3, 4, 5, 6, 7] # 8-way parallelism + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - uses: vznjs/vx-distributed-action@v1 + with: + tasks: lint test build + # Coordinator elected automatically — matrix.worker == 0 hosts it. + # Others register as workers. +``` + +Behind the scenes: + +- All matrix runners share a step in which they read a + build-scoped token (GitHub's `${{ runner.os }}-${{ github.run_id }}-${{ +github.run_attempt }}`) and a coordinator address (the public + hostname of matrix `0`, exposed via tailscale/cloudflared/ngrok or + the action's own short-lived tunnel). +- Matrix `0` runs `vx coordinator --tasks ` which builds the + graph and listens for workers. +- Matrix `1..N` run `vx run --worker ` which registers + and starts pulling work. +- All N runners receive the same streamed output (every runner's job + log shows the same run), so any one of them is enough for + debugging. The coordinator's runner is the authoritative one — it + exits with the run's exit code. + +**No new infrastructure needed.** The action provides an ephemeral +tunnel between matrix workers via a short-lived `tailscale up +--authkey=` (free tier sufficient) OR via direct GH +runner IPs (when on self-hosted). Both are documented. + +### 7.2 Generic CI (GitLab, Buildkite, …) + +Same primitives, simpler shape — most CIs let you start a +"coordinator" service container that other jobs connect to. We +document the canonical patterns; the protocol is the contract. + +### 7.3 Self-hosted runner farms + +The compelling case. A company runs a fleet of `vx run --worker +$COORDINATOR` daemons on whatever beefy boxes they have (an old Mac +Studio in the corner, idle dev machines, dedicated farm hardware). +The CI submits the graph to a coordinator the daemons are already +attached to, work flows there. Coordinator is just `vx serve` +upgraded with the assignment policy. Self-hosted Nx Cloud, but free +and Turbo-cache-compatible. + +## 8. Local DX (this isn't only for CI) + +The same protocol powers a **local hivemind**: every developer's +laptop, when idle, can register as a worker against a team +coordinator. Your test run uses Alice's spare cores. This is the +**company-wide compute pool** the hosted-service entry hints at, +materialized without a hosted service. + +We don't have to ship this on day one, but the protocol must not +preclude it. The same gating rules apply: opt-in, network-bounded +(team VPN / tailscale), capability-labeled (don't run my untrusted +worker on Alice's GPU). + +## 9. Sandboxing (the trust story) + +A worker executes arbitrary shell strings from another machine. This +is a sharp tool. Mitigations, in order: + +1. **Default deny untrusted submissions.** A worker only accepts + `task:assign` for tasks whose `taskConfigHash` it can verify + against a _signed manifest_ the submitter pre-published. The + manifest pins every task's command + inputs + env capture, signed + with the same HMAC key vx already uses for the remote cache. A + worker that can't verify the manifest refuses the assignment. +2. **Existing sandbox layer.** Tasks with `sandbox: {}` already run + under SRT (macOS) / bwrap (Linux) with strict allow/deny lists. + Distributed exec turns this on by default for tasks coming from a + non-local coordinator — you can dial down per-task as needed. +3. **No sibling visibility.** A worker holds the bare minimum + workspace: a sparse clone keyed by the task's `inputs.files` + + `workspaceFiles` glob set. Materialized from the remote cache or + git directly. Workers never see code they don't need. + +The honest framing: distributed CI on trusted self-hosted infra works +today with item 2 alone. Items 1 + 3 are what unlock "rent a worker +from anywhere." + +## 10. Phasing (each phase ships independent value) + +| Phase | Ships | Validates | +| ----- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------- | +| **A** | Coordinator role inside `vx serve`. One worker (the same process). All-in-one binary. The submitter pattern works end-to-end against a single in-proc worker. | The protocol is sound. No new transport surface. | +| **B** | Multi-worker. `vx run --worker `. Least-loaded assignment. Cache-affinity hints deferred. | Multi-process work flows; failures handled per §6. | +| **C** | GitHub Actions composite. Matrix orchestration as documented in §7.1. Real CI smoketest. | The user-facing story is real. Numbers from a real monorepo on real GHA. | +| **D** | Capability labels + critical-path priority + cache-affinity. Submitter retry on coord failure. | Production-grade. Heterogeneous fleets work. | +| **E** | Signed manifests + sparse-clone worker. Trust story for "rent a worker." | Hosted/3rd-party-worker viable. Foundation for `vx cloud` (see `vx-cloud-2026-06.md`). | + +Phase A is small (the in-process refactor: extract `WorkerLoop` from +the existing scheduler; the coordinator hosts a queue the worker +pulls from). Each subsequent phase is a horizon, not a feature flag — +no half-built coordinator behind a flag, ship it or don't. + +## 11. Performance North Star + +The promise the architecture makes: + +> A `vx run` of an N-task graph, parallelism-bound on K workers, +> completes in time _T(serial) / min(K, P)_ where P is the graph's +> critical path. Cache hits are subtracted from `T(serial)`. The +> distributed-coordination overhead is sub-second for graphs ≤ 10k +> tasks. + +This is **better than Turbo** (whose CI-side parallelism is bounded +by one runner's `--concurrency`) and **comparable to Nx DTE** (which +ships this behavior in their hosted product). The differentiator is +that vx ships it as OSS, self-hostable, free. + +## 12. What this means for the docs / DX surface + +- `vx coordinator [tasks...]` — new top-level subcommand. Builds the + graph + opens the WS; exits when the graph is done. +- `vx run --worker ` — flag on existing `vx run`. Stateless + worker loop. +- `vx run --coordinator ` — submission against an external + coordinator. Resolves to `serviceBackend(, sink)`. +- One new page `docs/distributed.md` (matched in `apps/docs/`) + walking the CI integration end-to-end with the GHA action. +- `docs/comparison.md` updated to flip "distributed CI" from gap to + shipped, with the SLO from §11 as the headline. + +## 13. Non-goals (deliberately scoped down) + +- **Cross-language workers.** Workers run the same Bun runtime; we + don't define a language-agnostic execution gRPC. (A `vx worker +--shell-only` mode might come later for non-Bun infra, but it + doesn't change the protocol.) +- **Cluster scheduling.** We're not building Kubernetes. A worker is + a long-lived process; orchestrating _its_ lifecycle is the user's + job. We provide health endpoints and graceful drain. +- **Persistent run history.** The `runs` table already records what + ran; that's enough. The coordinator is ephemeral by design. + +## 14. Open questions (tracked, not blocking) + +- **Backpressure on stdout fan-out.** A worker streams every byte to + the coordinator, which fans out to every connected submitter. If + 10 submitters watch the same run, that's 10× egress for each + stdout chunk. Solution path: only the _primary_ submitter (first + to attach) gets full output; secondary submitters subscribe to a + reduced channel (status + summary). Defer until measured. +- **Eviction of stale `recentHashes` hints.** A worker that's been + up for hours has a degraded LRU; the affinity hint becomes stale. + Worker periodically reposts. Bounded staleness OK — affinity is a + hint, not correctness. +- **Coordinator HA.** Single point of failure. For a CI build it's + fine (one job, one coordinator); for a long-lived hosted + deployment, a coordinator restart drops the run. The hosted-cloud + proposal addresses this; locally we accept it. diff --git a/docs/design/extension-protocol-2026-06.md b/docs/design/extension-protocol-2026-06.md new file mode 100644 index 0000000..58efd43 --- /dev/null +++ b/docs/design/extension-protocol-2026-06.md @@ -0,0 +1,368 @@ +# Extension protocol — third-party tooling on top of vx + +Status: proposal (2026-06-20). Builds on `event-stream-2026-06.md` +(`WireEvent` + devframe surface), `execution-service-2026-06.md` +(backend protocol), and the `vx serve` / `vx dev` plumbing already +shipped. + +## 1. The pitch + +vx becomes a **platform**. Third parties (developers, agents, vendors, +IDE plugins, dashboards) write small programs that consume the vx +event stream and/or talk to a vx service. We define a typed, versioned +**extension API** so that a 3rd-party tool written today still works +tomorrow. The bar: writing a useful vx extension should take 30 +minutes and 50 lines of code. + +This is what unlocks the "openness vs nx and turbo" story. Turbo +exposes nothing programmatic. Nx exposes a plugin API but it's tied +to their TS runtime + executor schema. vx exposes a **wire protocol** +plus a small SDK that targets the wire — language-agnostic, +transport-agnostic, no Bun lock-in for consumers. + +## 2. Three roles for extensions + +Every extension fits one of three roles, distinguished by _what data +flows in which direction_: + +### 2.1 Subscriber — read-only consumer of the event stream + +The simplest case. Connects to `vx serve` (or a `vx run` with +`--ui`), subscribes to the `vx:events` channel, observes everything. +Doesn't write back. + +Use cases: + +- **Custom CI annotators**: a tool that posts `run:end` summaries to + Slack with custom formatting. +- **Cost trackers**: tally CPU-minutes per task across the team. +- **Notification systems**: ping the dev when their long-running CI + job finishes. +- **Custom UIs**: a phone app that shows team build health. + +API surface: just the WireEvent stream + (optionally) the reduced +`RunState`. Today's `vx:run` shared state already exposes this. +Subscribers connect, read, exit. **Read-only — they cannot disrupt +the run.** + +### 2.2 Inspector — read-only RPC consumer + +A subscriber, but instead of subscribing to a live stream, makes +typed RPC queries against vx state: + +- `getRunState(runId)` → `RunState` +- `getRunHistory(filter)` → `Run[]` (uses the `vx-cloud` data model) +- `getTaskLogs(runId, taskId)` → `string` +- `getCacheStats(scope)` → `{ entries, sizeBytes, hitRate24h }` +- `explainCacheKey(taskId)` → `{ files, env, config, upstream }` +- `whyDidThisRerun(runId, taskId)` → `{ changedInputs: string[] }` + +These power agent-facing UIs (an LLM asking "why is this slow") +and IDE plugins ("show me cache hit rate for the file I'm editing"). + +### 2.3 Driver — write-capable submitter + +A driver _submits work_ to vx. Use cases: + +- **Custom dispatchers**: a tool that watches GitHub PR comments and + submits `vx run pr-validate` on `/test` commands. +- **AI agents**: an autonomous coding agent that submits builds + during exploration. +- **Build-on-save IDE features**: editor saves trigger a + task-specific vx run. + +Drivers go through the same `RunBackend` protocol the local CLI uses +— `RunRequest` in, `WireEvent` stream + `RunResult` out. Permissions +are checked at the service boundary (auth tokens). + +## 3. The wire (the only thing that matters) + +vx commits to **one wire protocol per service**, versioned: + +```ts +// 1. vx:events — the event stream (subscribers + inspectors + drivers) +type WireEvent = … // already shipped in events.ts + +// 2. vx:run — the reduced shared state +type RunState = … // already shipped in run-state.ts + +// 3. vx:rpc — typed inspector RPCs +type RpcRequest = + | { method: 'getRunState'; runId: string } + | { method: 'getRunHistory'; filter: HistoryFilter } + | { method: 'getCacheStats'; scope: 'all' | { project: string } } + | { method: 'explainCacheKey'; taskId: string } + | { method: 'whyDidThisRerun'; runId: string; taskId: string } + | { method: 'getTaskLogs'; runId: string; taskId: string } + +// 4. vx:submit — driver protocol +type SubmitRequest = RunRequest // already shipped in protocol.ts +type SubmitResponse = // streamed + | { kind: 'event'; event: WireEvent } + | { kind: 'result'; result: RunResult } +``` + +The transport is **WebSocket + birpc + valibot** (we already pulled +this in via devframe). The schemas live in `src/orchestrator/ +protocol.ts` and are _the_ version-controlled artifact — everything +else is implementation. + +### 3.1 Versioning + +A `vx serve` exposes its protocol version on `/version`. Clients +negotiate: a v1.2 client connecting to a v1.5 server uses v1.2; a +v1.5 client connecting to a v1.2 server uses v1.2 (downgrade). When +a major version bumps (v1 → v2), the old endpoint stays alive at +`/v1/*` for one minor release of the new version. SemVer for wire +protocols. + +### 3.2 Auth + +Local: no auth (loopback). Hosted/remote: bearer token in the WS +handshake. The vx-cloud proposal (`vx-cloud-2026-06.md`) defines the +token model; extension protocol _uses_ it. + +## 4. The SDK + +We publish three thin SDKs (more on demand): + +### 4.1 `@vzn/vx-client` (TypeScript / Bun + Node) + +```ts +import { connect } from '@vzn/vx-client' + +const vx = await connect('ws://localhost:5176') // local vx serve + +// Subscriber +for await (const event of vx.events()) { + if (event.kind === 'task:complete') { + console.log(`${event.taskId}: ${event.outcome.status}`) + } +} + +// Inspector +const stats = await vx.rpc('getCacheStats', { scope: 'all' }) + +// Driver +const run = vx.submit({ tasks: ['build'] }) +for await (const ev of run.events) { … } +const result = await run.result +``` + +### 4.2 `@vzn/vx-client-py` (Python) + +Same API shape, async. Targets data-science / ops use cases — a +Python notebook tracking build trends. + +### 4.3 `vx-client` (CLI helper) + +```bash +$ vx events --tail +{ "kind": "task:start", "taskId": "pkg-a#build", ... } +$ vx rpc getCacheStats --scope all +{ "entries": 1234, "sizeBytes": 5e8, "hitRate24h": 0.84 } +``` + +Shell scripts and `jq`-pipelined dashboards. The escape hatch. + +## 5. Plugin model (in-process extensions) + +A class of extension _runs in the same process as `vx run`_ — it +observes the run from inside the host, doesn't go over a wire. Use +case: a config-side hook ("run `npm audit` after every install +task," "annotate every task with cost data," "send failures to +Sentry"). Today these would require modifying vx core. + +We expose plugins via the existing `defineWorkspace` config: + +```ts +// vx.workspace.ts +import { defineWorkspace } from '@vzn/vx' +import { sentryPlugin } from '@vzn/vx-plugin-sentry' +import { costTracker } from './plugins/cost-tracker' + +export default defineWorkspace({ + plugins: [sentryPlugin({ dsn: process.env.SENTRY_DSN }), costTracker({ ratePerCpuMin: 0.0001 })], +}) +``` + +A `Plugin` is just an in-process subscriber to the bus: + +```ts +type Plugin = { + name: string + setup(ctx: PluginContext): void | Promise +} + +type PluginContext = { + on(event: 'run:start' | 'task:complete' | …, handler: (event) => void): void + rpc: RpcServer // a plugin can also expose RPCs (next §) + workspaceRoot: string + cacheDir: string +} +``` + +This is _exactly_ what `terminalSubscriber` is today, generalized to +allow N subscribers from config. The terminal renderer becomes the +first built-in plugin; user plugins layer on top. + +**Plugins cannot block the run.** Same backpressure rule as the +event stream — a plugin's queue is bounded; a wedged plugin loses +lossy events, never blocks producers. (Mechanism is already built; +we extend it to N subscribers.) + +## 6. RPC plugins — extending the inspector surface + +A plugin can also _register_ RPC methods, exposing them on +`vx:rpc`: + +```ts +const myPlugin: Plugin = { + name: 'cost-tracker', + setup(ctx) { + ctx.rpc.register('getCostReport', async ({ since, until }) => { + const events = await ctx.history.events({ since, until }) + return computeCost(events) + }) + }, +} +``` + +Now `vx rpc getCostReport --since 2026-06-01` works. The cost +tracker becomes a first-class introspection target — IDEs, agents, +shell scripts all see it the same way. + +This is the **open-platform** payoff: any team can extend vx's +introspection surface without touching core. The MCP server we ship +(for AI agents) is itself a plugin. + +## 7. The agent surface (special case) + +LLM-based coding agents are first-class users. A plugin exposing +`vx:mcp` (MCP server adapter, already on the roadmap from +`event-stream-2026-06.md`) gives agents typed tools: + +- `runTasks(tasks: string[]) → RunResult` +- `getRunStatus() → RunState` +- `tailTask(id: string) → ReadableStream` +- `whyDidThisRerun(taskId) → { changedInputs: string[] }` +- `explainCacheKey(taskId) → CacheKeyComponents` +- `runHistory(project, task, n=20) → Run[]` + +Agents use these instead of shelling out + parsing terminal output. +This is **agent-native ergonomics** — most build tools today force +agents to scrape ANSI; we give them a typed surface. + +## 8. Discovery — how a plugin gets loaded + +`defineWorkspace.plugins` is the canonical source. Plugins: + +1. Resolve as ordinary npm/Bun packages: `import { foo } from +'@vendor/vx-plugin-foo'`. +2. Get a chance to validate the workspace before the run starts + (return a `UserError` to abort with a clean message). +3. Get instantiated _once per `vx run`_ — same lifetime as the bus. + +For _cloud-deployed_ extensions (a subscriber running in a SaaS), we +provide an OPT-IN registration: `vx insights link --org acme` +registers the local insights uploader with the cloud, so every local +run uploads. This is the connection between in-process plugins and +the hosted face — they layer. + +## 9. Three example extensions we ship as reference + +To prove the API + force-test it: + +### 9.1 `@vzn/vx-plugin-sentry` + +Forwards `task:complete` events with `status === 'failed'` to Sentry +as exceptions. ~50 lines. + +### 9.2 `@vzn/vx-plugin-slack` + +Posts a `run:end` summary to a Slack channel. Configurable template, +threshold (only post if duration > X), and channels per branch. +~80 lines. + +### 9.3 `@vzn/vx-plugin-influx` + +Streams every `task:complete` to InfluxDB. Powers a Grafana +dashboard. ~30 lines (mostly tag-mapping). + +Each ships in this repo under `packages/plugin-*`. They double as +documentation and as smoke tests for the API surface. + +## 10. Performance bar + +- Plugin emit overhead per event: **< 10µs in-process**, dominated by + the bounded-queue enqueue. +- WS subscription: **< 1ms per event** at p99 from emit to + subscriber receive (localhost). +- RPC: **< 5ms p99** for trivial methods. +- A plugin that throws on every event loses its connection (we drop + it) within 100ms; the run is unaffected (the safe-observer pattern + from the deleted Observer subsystem revived). + +The performance contract: **subscribers cannot slow the run**. If a +subscriber falls behind, it loses events. The producer never waits. +This is the inverse of the philosophy that broke the deleted TUI +three times — and the lesson directly informs this design. + +## 11. Phasing + +| Phase | Ships | Validates | +| ----- | ----------------------------------------------------------------------------------------------------------------------------- | ---------------------------------- | ------------------------ | ----------------------------- | +| **A** | `@vzn/vx-client` (TS SDK). Connects to existing `vx serve` and consumes `vx:events`. Documented. The subscriber role is real. | One-direction streaming API works. | +| **B** | RPC server in `vx serve`. Inspector role with 4-5 built-in methods (`getCacheStats`, `getRunHistory`, `explainCacheKey`, …). | Read-only typed queries work. | +| **C** | Driver role — clients submit runs via SDK. Same code path as the existing serviceBackend, exposed publicly. | Hosted + agent use cases unlocked. | +| **D** | In-process Plugin API on `defineWorkspace`. The bus exposes itself to user code. | Custom in-process extensions work. | +| **E** | RPC plugins — third-party RPCs on top. `@vzn/vx-plugin-sentry | slack | influx` reference impls. | Open-platform story complete. | +| **F** | Python SDK + the `vx-client` CLI helper. Language coverage. | Cross-runtime is real. | +| **G** | MCP adapter for agents. (Already on the `event-stream` roadmap as Phase 4.) | Agent-native ergonomics shipped. | + +Phase A is small — the SDK is a thin wrapper on existing primitives. +Each subsequent phase adds value without breaking the previous. + +## 12. Non-goals + +- **A general-purpose plugin sandbox.** In-process plugins run with + the same trust as `vx.workspace.ts` — same module-loading + semantics, same access. If you import an untrusted plugin, that's + on you. Same model as Webpack/Vite plugins. +- **A marketplace.** We have npm. Plugins live there. +- **Backwards compat into perpetuity.** SemVer the wire; major + versions break, minor versions don't. +- **Custom executors / pluggable task runtimes.** Shell is the API + (existing principle). Plugins observe and submit; they don't + redefine what executing a task means. + +## 13. Open questions + +- **Multi-version plugin loading.** What happens when two plugins + depend on different vx versions? Plugins are _compiled against_ a + vx API version; the host runtime negotiates. Today's npm peer-dep + model is sufficient. +- **Plugin order.** Plugins see events in the order they're declared + in config. For RPC methods, last-registration-wins (with a + warning logged). Deterministic. +- **Error model.** A plugin's `setup()` throwing is a `UserError` — + aborts the run with a clean message naming the plugin. A + per-event handler throwing is caught + logged + the plugin keeps + receiving events (same safe-observer pattern). After N throws, + the plugin is disabled for the run (and we report it). + +## 14. The architectural reason this works + +The thing that makes vx extensible — and makes it different from +Turbo/Nx — is that **every internal call has been refactored through +the event stream**. The terminal renderer is a subscriber. The +devframe surface is a subscriber. The insights uploader is a +subscriber. The cloud event ingester is a subscriber. + +Once the producer fires events to a bus, adding the N+1th subscriber +is free. The protocol IS the SDK. We don't have to choose between +"build a great CLI" and "build a great platform" — they're the same +thing, viewed through different subscribers. + +This is the payoff of the event-stream refactor (Phase 1a/1b +already shipped). Extensions are the harvest. diff --git a/docs/design/predictive-execution-2026-06.md b/docs/design/predictive-execution-2026-06.md new file mode 100644 index 0000000..f782aa6 --- /dev/null +++ b/docs/design/predictive-execution-2026-06.md @@ -0,0 +1,341 @@ +# Predictive execution — using history to win the perf war + +Status: proposal (2026-06-20). Pure performance proposal. Builds on +the `runs` / `run_tasks` data model (`vx-cloud-2026-06.md`) and the +existing scheduler (`graph/scheduler.ts`). + +## 1. The premise + +Today's vx scheduler is **stateless across runs**. Every `vx run` +starts from zero knowledge — no idea which tasks are typically slow, +which are cache hits, which are flaky, which dominate the critical +path. The scheduler walks the DAG topologically and dispatches in +graph-insertion order with a transitive-reverse-deps priority. That's +the same algorithm a build system in 1995 used. + +But we have data. The `runs` table has been recording cpu/rss/wall/ +status/cache-hit per task since 2026-05 (schema v11). On any +established repo, after a week of runs, we know the empirical +distribution of every task's duration, cache hit rate, and likelihood +of failure. We use **none of it** to make scheduling decisions. + +This proposal: turn vx into a **learning scheduler**. Use history to +predict, prioritize, and pre-warm. The result is faster runs without +asking the user for anything. + +## 2. Three concrete wins + +### 2.1 Critical-path priority from history (not from graph) + +Today: priority = transitive-reverse-dep count. A task that blocks +many descendants runs first. Reasonable as a heuristic. + +Tomorrow: priority = **expected remaining critical-path duration**. +Compute, per task, the expected wall time of itself + the longest +chain of descendants, using historical p50 durations. A 30-second +test that unblocks a 4-second lint is lower priority than a 4-second +build that unblocks a 30-second test. The graph-counter heuristic +gets this wrong; history gets it right. + +The math is a single topo-DP pass: + +```ts +function expectedCriticalPath(node: TaskNode, history: HistoryTable): number { + if (cache.has(node.hash)) return 0 // a hit costs nothing + const own = history.p50(node.id) ?? defaultDuration + const downstream = max(node.dependents.map((d) => expectedCriticalPath(d))) + return own + downstream +} +``` + +Memoized; O(N) per run. The scheduler picks the highest-expected +remaining critical path among the ready set. + +**Expected speedup**: 5-20% wall-clock reduction on graphs where the +slowest task is _not_ the most-blocking task. Common in real +workloads — a database integration test blocks nothing but takes 90s; +prioritizing it over a 2s lint that blocks 40 build tasks would +catastrophically slow a single-worker run. Today's heuristic +correctly prioritizes the lint. The HISTORY-AWARE version +re-prioritizes when worker count makes the cost of mis-scheduling +small. + +The mechanism gracefully handles missing history: a task with no +prior runs uses a default duration (workspace median). Once it has +runs, history dominates. + +### 2.2 Speculative pre-warming + +The remote-prefetch optimization (shipped 2026-06) starts remote-cache +GETs for every cacheable task at run start. It works because **we +know the hash in advance**. + +Generalize: for any task whose hash we can compute upfront (stable +key — no dependence on upstream outputs), pre-warm: + +- **Remote-cache prefetch** (shipped). +- **Local-cache stat probe** — touch the local entry to OS-cache the + inode (negligible on SSDs, measurable on cold spinning disks). +- **Input file pre-read** — `posix_fadvise(WILLNEED)` on declared + input files (Linux only; no-op elsewhere). For cache-miss tasks + this overlaps a syscall the runner will make. +- **Module pre-load** — for tasks that exec a JS runtime (`bun +run`, `node`), pre-resolve the entry's import closure. Bun + supports `--preload`; we can leverage. + +Each is a small win, all overlap with already-running work. Total +expected speedup: 3-8% on cold runs. + +### 2.3 Bandit-driven retry decisions + +Some tasks are flaky. Today: a task fails → the run fails. The user +re-runs. Lost time. + +Future: per-task `failureRate` from history. A task with > 5% +historical flakiness gets _auto-retried once_ on transient failure +(non-zero exit with a structurally-detected "flake" pattern — +network errors, port collisions, timing-dependent assertions). A +task with < 1% flakiness fails fast. The threshold and detection +heuristic are owner-configurable per project. + +**Multi-armed bandit framing**: we balance the cost of retrying +(extra time) against the cost of false failures (broken CI, dev +re-runs). The expected-value calculation is straightforward: + +``` +expectedCost(retry) = (1 - p_succeed_on_retry) * 2 * task_duration +expectedCost(noretry) = p_flake * (cost_of_human_rerun + task_duration) +``` + +We retry when `expectedCost(retry) < expectedCost(noretry)`. With +real numbers from `cache.db`, this is a one-liner that materially +improves CI green rates. + +### 2.4 Bonus: shard-aware affinity + +For the distributed-execution proposal (`distributed-ci-2026-06.md`), +history tells the coordinator which tasks are **slow** and which are +**fast**. Assign slow tasks first to workers with the most capacity; +pack fast tasks together to minimize coordinator overhead. The +existing assignment policy gets a "bin-packing" upgrade for free. + +## 3. The data we need (already there) + +| Field | Source | +| ------------------ | ---------------------------------------- | +| Per-task wall time | `run_tasks.duration_ms` (extant) | +| Per-task CPU | `run_tasks.cpu_ms` (extant) | +| Per-task RSS | `run_tasks.peak_rss_bytes` (extant) | +| Per-task status | `run_tasks.status` (extant) | +| Cache hit/miss | `run_tasks.cache_source` (extant) | +| Branch / commit | `runs.branch`, `runs.commit_sha` (cloud) | +| Author | `runs.triggered_by` (cloud) | + +For _local_ predictions, the local `cache.db` is sufficient. For +_team-wide_ predictions ("most other people see this test as slow"), +the cloud `runs` table is the source. + +## 4. The HistoryTable abstraction + +A single in-memory snapshot loaded at `prepareRun`: + +```ts +type HistoryTable = { + // Per (project#task), last N runs (default 50) + recent(id: string): TaskRun[] + p50(id: string): number // ms + p99(id: string): number // ms + successRate(id: string): number // [0, 1] + hitRate(id: string): number // [0, 1] + failureMode(id: string): 'stable' | 'flaky-recoverable' | 'flaky-fatal' + bytesProduced(id: string): number // p50 artifact size +} +``` + +Already prototyped (and removed) as `Cache.getTaskHistory` for the +deleted TUI. We revive the SQL CTE that built it (one query, batched, +returns a Map). Cost: ~5ms on a 1000-project repo. Cached for the +run's lifetime. + +For the cloud path, the same shape is served by an RPC the +coordinator calls before dispatching. + +## 5. Continuous regression detection + +Beyond scheduling: with the HistoryTable, we can **detect +regressions** at run-end and surface them: + +``` +⚠ Slowdown detected: + @vzn/vx-docs#build: 4.2s (p50: 1.1s, +280%) + Suspected: workspaceFiles glob change in vx.workspace.ts +``` + +This requires: + +- Comparing this run's task durations against rolling p50. +- Identifying significant deviations (e.g., > 2.5σ above mean, + excluding cache-hit runs). +- Attributing them to changes: if the cache key changed since the + last run, the inputs differ; we can diff them. + +Output goes on the run summary footer (one warning line) AND to a +new RPC `getRegressions(since)` for tools/CI to consume. + +**This is the analytics killer feature**, materialized as a build- +time signal. Today you discover a regression days later when someone +notices CI is slow. Tomorrow vx tells you the moment the run +finishes. + +## 6. Architecture + +``` + ┌──── prepareRun() ────────────────────┐ + │ │ + │ load HistoryTable from cache.db │ + │ (or from vx cloud RPC) │ + │ │ + └────────────┬──────────────────────────┘ + │ + ▼ + ┌──────── computeCriticalPath ───────────┐ + │ topo-DP using HistoryTable.p50 │ + │ → priority per node │ + └────────────┬───────────────────────────┘ + │ + ▼ + ┌──────── scheduler ─────────────────────┐ + │ ready queue sorted by priority │ + │ + worker-affinity from history │ + └────────────┬───────────────────────────┘ + │ + ▼ + ┌──────── execute-task ──────────────────┐ + │ retry-or-not from HistoryTable │ + │ prefetch overlap from prefetch.ts │ + └────────────┬───────────────────────────┘ + │ + ▼ + ┌──────── recordRun() ───────────────────┐ + │ write back to cache.db.runs │ + │ compare to history → regression flag │ + └────────────────────────────────────────┘ +``` + +Each box reads or writes one shared table. The wire is the +`HistoryTable` interface. The mechanism is in `orchestrator/` +already; we extend it. + +## 7. Safety + correctness + +**Hard rule**: predictive scheduling must never break correctness. A +mis-prediction only changes _order_ and _priority_, not WHAT runs. +The cache key remains the source of truth; an incorrect prediction +is a perf regression, never a bug. + +**Hard rule**: history is hints, not contracts. A task with zero +history runs fine (default duration). A task whose history is +suddenly invalid (the code changed dramatically) re-converges within +a few runs as the rolling window updates. + +**Hard rule**: opt-out per project. `defineProject({ predictive: +false })` reverts to today's behavior for that project. Useful for +CI where you want deterministic ordering for debug repeatability. + +## 8. Implementation cost + +Surprisingly low. The data exists, the SQL CTE has been written and +reverted once, the scheduler already has a priority field. The +moving parts: + +- `orchestrator/history.ts` — the HistoryTable loader (one SQL pass). +- `orchestrator/predict.ts` — the critical-path-from-history calc. +- `graph/scheduler.ts` — accept a `priority` override per node from + the predict module instead of `reachOf` only. +- `orchestrator/execute-task.ts` — read `failureMode` for the retry + decision; consult `HistoryTable.bytesProduced` to budget local + storage. +- `cli/info.ts` — surface "predicted vs actual" deltas, regressions. + +~300 LOC of new code; the bulk is the SQL and tests. + +## 9. The performance bar (the promise) + +After this lands: + +| Workload | Today | After | Gain | +| ------------------------------ | ------ | --------- | ---- | +| Cold run, 1000-pkg deep graph | T₀ | T₀ × 0.85 | -15% | +| Warm run, 1000-pkg | T₁ | T₁ × 0.92 | -8% | +| Single-worker mixed-duration | T₂ | T₂ × 0.80 | -20% | +| CI with 8 matrix workers (DTE) | T₃ | T₃ × 0.88 | -12% | +| Flaky-failure recovery | manual | auto | ∞ | + +These numbers are bounded by physics: the critical-path duration +itself doesn't change. But running closer to the floor is the win. + +Compared to: + +- **Turbo**: no historical awareness. Static graph-counter priority. +- **Nx**: similar; Nx Cloud has analytics dashboards but the + scheduler doesn't consume them. + +We'd be the **first task runner that learns from its own runs**. + +## 10. The composable architecture + +This proposal stays clean because every piece is independently +useful: + +- **HistoryTable alone**: powers `vx insights` (the local UI from the + cloud proposal). +- **Critical-path-from-history alone**: improves single-worker + scheduling. +- **Prefetch generalization alone**: improves cold runs. +- **Bandit retry alone**: reduces CI re-runs. +- **Regression detection alone**: surfaces drifts. + +We can ship them in any order. Each delivers value standalone, and +together they compose. + +## 11. Phasing + +| Phase | Ships | Validates | +| ----- | --------------------------------------------------------------------------------------------------------- | ------------------------------------- | ------- | ----------------------------------- | +| **A** | HistoryTable revival (CTE). Exposed via RPC. Used by `vx info`. | Data plumbing works. | +| **B** | Critical-path-from-history priority on the scheduler. Opt-in via `defineWorkspace({ predictive: true })`. | Measurable gain on real workloads. | +| **C** | Speculative pre-warming generalized (input WILLNEED, module preload). | Cold-run gain. | +| **D** | Bandit retry. Per-project `flakyRetry: 'auto' | | 'off'`. | Flaky-test recovery without manual. | +| **E** | Regression detection at run-end. Surfaced in footer + RPC. | CI-time regression visibility. | +| **F** | Default-on for `predictive`. The improvements are universal enough to make non-opt-in. | Confidence the perf gain is positive. | + +## 12. Open questions + +- **Cold-start without history.** A fresh repo has no data; we use + defaults (workspace median, optimistic cache hit). Need to verify + the defaults don't bias the new-user experience badly. +- **History invalidation.** A task whose command changed yesterday + has irrelevant history. We weight by _task config hash similarity_ + — if the resolved-config hash today differs from a historical + run's, that run gets a lower weight. (Trivial to compute; we + store both.) +- **Storage growth.** 50 runs × 1000 tasks = 50k rows in `run_tasks`. + Bounded. We GC old runs at `vx cache prune` time. +- **Predictability under change.** A regression flag triggered by a + legitimate code change is noise. Solution: include the regression + in the run record; if subsequent runs all show the new duration, + it's the new baseline. Self-converges. + +## 13. Why this matters + +The performance race against Turbo and Nx is bounded by **physics** +(the CPU you have) and **cleverness** (the optimizations you find). +We're winning the cleverness race — but every win is finite. The +_compounding_ win is **learning**. + +A task runner that gets faster every run, with no user intervention, +is a fundamentally different value proposition. It's the wedge that +turns "vx is competitive on perf" into "vx is the only system that +keeps improving as you use it." + +This is the architecture for that wedge. diff --git a/docs/design/vx-cloud-2026-06.md b/docs/design/vx-cloud-2026-06.md new file mode 100644 index 0000000..cf43b1c --- /dev/null +++ b/docs/design/vx-cloud-2026-06.md @@ -0,0 +1,466 @@ +# vx Cloud — hosted observability, cache, and execution + +Status: proposal (2026-06-20). Owner ask: "hosted service where people +could see local and company-wide things." Pairs with +`distributed-ci-2026-06.md` (the execution protocol) and +`remote-cache.md` (the existing cache transport). This is the +_observability + multi-tenancy_ layer on top. + +## 1. The pitch in one paragraph + +vx Cloud is a hosted (and **self-hostable**) service where every `vx +run` — local on a laptop, on CI, on a worker farm — streams its event +log into a long-lived database. A web UI rendered on top exposes: +**team-wide flamegraphs**, **regression detection** ("test X went from +2s to 30s last Tuesday across 40 PRs"), **cache hit-rate per project +per branch**, **slowest tasks rolled up by author**, **per-PR +comparison** ("this branch is 2.3× slower than main, here's why"). It +**also** hosts a shared remote cache + a shared coordinator (so a +team's CI runs without provisioning anything). One product, one wire +protocol, three faces: local insights, team analytics, hosted +execution. + +The economic story: the **OSS** core ships everything self-hostable. +Hosted is convenience. Companies who want it managed pay for that; +companies who want their data on their own boxes run the same code on +their own boxes. We do not gate features behind hosted-only. + +## 2. The three faces + +### 2.1 Local face — `vx insights` + +Today: every `vx run` writes a `runs` row to `cache.db` with task +spans, cache hits, durations. The `vx info` doctor reads aggregates; +that's the only consumer. + +Tomorrow: `vx insights serve` opens a localhost web UI that reads +`cache.db` directly (no daemon, no upload). The same SPA the hosted +product serves, pointed at a local SQLite. A developer can answer: + +- "Why was that last run slow? Show me the spans." +- "What's my hit rate on this branch?" +- "What's eaten the most CPU this week?" +- "Show me a flamegraph of run ``." + +This is what `apps/dashboard` aimed for and got removed. The reason +this revival works: **we already have the data** (runs table, spans, +cpu/rss, cache provenance — all v11+ columns). The UI is purely +historical-read; no live coupling to the orchestrator that killed the +previous attempt. And the `RunState` reducer + `WireEvent` stream from +`event-stream-2026-06.md` already give us a fully-typed live view to +_also_ render on the same UI for in-flight runs. + +### 2.2 Team face — `vx insights upload` + +A team running self-hosted: every CI run posts its event log + `runs` +row to a shared backend (`https://vx-insights.acme.corp` or +`https://cloud.vx.dev`). Same shape as the local UI, but the data is +aggregated across every contributor's machine, every PR, every CI +job. The data model is **append-only** — runs are immutable facts. +The UI does analytics on top. + +The web UI gains team-only views: + +- **Per-project trends** — was this project's `test` task always 12s, + or did it creep up? +- **Per-author breakdown** — who's writing tasks that miss the cache + most often? +- **PR diff view** — this PR's runs vs main's recent runs. +- **Bottleneck atlas** — the company's longest critical paths, + ranked. +- **Cache cliff** — sudden drops in hit rate (often signal an + unintended input drift; we can root-cause). + +### 2.3 Hosted face — `vx cloud` + +Optional, paid (for the hosted SaaS) or self-deploy (Helm chart, +docker-compose, single-binary). Combines insights + remote cache + +distributed-execution coordinator + signed-manifest authority + CI +integration. Drop-in replacement for Nx Cloud, with these structural +differences: + +- **Turbo-wire-compatible remote cache.** A team that hasn't migrated + off Turbo can still benefit — point Turbo at us, then migrate to + vx incrementally. +- **OSS reference implementation.** The hosted runtime IS the OSS + reference impl. No "community edition" with crippled features. +- **No proprietary protocol.** Wire is documented; you can write + your own coordinator and we'll route to it. + +## 3. Data model + +The atomic unit is a **Run**. Already exists in `cache.db.runs`. We +extend it with team-aware shape: + +```sql +-- Existing (extended) +runs ( + run_id TEXT PRIMARY KEY, -- UUIDv7 + org_id TEXT, -- NEW; null on local + repo TEXT, -- NEW; git remote URL + branch TEXT, -- NEW; HEAD ref + commit_sha TEXT, -- NEW; HEAD commit + pr_number INTEGER, -- NEW; if applicable + triggered_by TEXT, -- NEW; user/ci-bot + ci_provider TEXT, -- NEW; gh/gl/buildkite + started_at INTEGER, + ended_at INTEGER, + exit_code INTEGER, + cpu_ms INTEGER, + peak_rss_bytes INTEGER, + wallclock_start_ns INTEGER, + wallclock_end_ns INTEGER, + ... +) + +-- New +run_tasks ( + run_id TEXT, + task_id TEXT, -- project#task + task_hash TEXT, -- the v22 input key + status TEXT, -- success/failed/skipped/aborted + cache_source TEXT, -- miss/fresh/local/remote + duration_ms INTEGER, + cpu_ms INTEGER, + peak_rss_bytes INTEGER, + span_start_ns INTEGER, + span_end_ns INTEGER, + worker_id TEXT, -- for DTE + stdout_artifact TEXT, -- pointer to log blob (S3/local) + stderr_artifact TEXT, + PRIMARY KEY (run_id, task_id) +) + +run_events ( + run_id TEXT, + seq INTEGER, -- monotonic per run + ts_ns INTEGER, -- relative to run start + event_json TEXT, -- the serialized WireEvent + PRIMARY KEY (run_id, seq) +) +``` + +`run_events` is the **full event log** — exactly the WireEvents the +orchestrator emits today. Replaying them rebuilds the timeline +exactly: the same data that drives a live UI drives a historical one. +This is _the_ design unification: **one event stream, two consumers +(live + history)**. + +The hosted variant pages event blobs out to S3-compatible storage +when individual events grow large (long stdout dumps). The pointer +plus a content hash stays in SQLite (or PostgreSQL for the hosted +multi-tenant case). + +## 4. Architecture + +**The cloud is Cloudflare-native.** Edge compute (Workers) + edge +SQLite (D1) + S3-compatible object storage (R2) + stateful actor +runtimes (Durable Objects) + queues (Queues). Self-hostable by +**deploying the template into your own Cloudflare account** — `npx +wrangler deploy` from the cloned repo, done in five minutes. No +PostgreSQL, no S3 contract, no container orchestrator, no on-call. + +This choice is deliberate. The combination of (a) global edge +distribution, (b) generous free tier (10M Worker requests/month, 10GB +R2/month, 5GB D1/month free), (c) zero-egress R2, and (d) Wrangler's +one-command deploy collapses "spin up a vx cloud for your team" from +a Kubernetes adventure into a script. Anyone can fork the template +and have a private hosted backend running before lunch. + +``` +┌──────────────────── Cloudflare account (yours or hosted) ────────────────────┐ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ Workers (edge compute) │ │ +│ │ • /v8/artifacts/* — Turbo-wire cache (PUT/GET/HEAD) │ │ +│ │ • /v1/events/ingest — batched WireEvent uploader │ │ +│ │ • /v1/runs/* etc. — Insights API │ │ +│ │ • /v1/coord/* — distributed-execution submission │ │ +│ │ • /v1/ws — WS upgrade to the per-run Durable Object │ │ +│ │ • Static asset binding — serves the SPA built from /apps/insights │ │ +│ └─────────────┬─────────────────────────┬─────────────────────┬───────┘ │ +│ │ │ │ │ +│ ▼ ▼ ▼ │ +│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────────┐ │ +│ │ R2 (objects) │ │ D1 (SQLite/edge)│ │ Durable Objects │ │ +│ │ • .tar.zst│ │ • runs │ │ • RunCoordinatorDO │ │ +│ │ • event blobs │ │ • run_tasks │ │ (1 per active run; │ │ +│ │ • per-org prefix│ │ • orgs/members │ │ holds graph state,│ │ +│ │ • presigned PUT │ │ • api_tokens │ │ fans WS to subs) │ │ +│ │ • zero egress │ │ • global indexes│ │ • InflightDedupDO │ │ +│ └──────────────────┘ └──────────────────┘ │ (per-hash; the │ │ +│ │ join-not-rerun │ │ +│ ┌──────────────────┐ ┌──────────────────┐ │ pattern from │ │ +│ │ Queues │ │ KV │ │ execution-service)│ │ +│ │ • event ingest │ │ • org-token │ └──────────────────────┘ │ +│ │ buffering │ │ lookup cache │ │ +│ │ • aggregation │ │ • feature flags │ │ +│ └──────────────────┘ └──────────────────┘ │ +└──────────────────────────────────────────────────────────────────────────────┘ + ▲ ▲ ▲ ▲ + │ PUT/GET tar │ POST events │ WS task RPCs │ OAuth + │ │ │ │ +┌───┴────┐ ┌────┴────┐ ┌────┴────┐ ┌────┴────┐ +│ vx run │ │ vx run │ │ vx run │ │ Browser │ +│ (local)│ │ (CI) │ │ --worker│ │ user │ +└────────┘ └─────────┘ └─────────┘ └─────────┘ +``` + +**Why each piece.** + +- **Workers** for stateless HTTP. They scale to zero, run at the + edge close to users, no provisioning. Free tier covers most teams + forever. The Workers runtime is V8-isolate-based — fast cold + starts (~5ms), but watch out for the 30s CPU-time cap per request + (irrelevant for our request shape; nothing we do takes that long). + +- **R2** for the cache artifacts and the larger event blobs. + S3-API-compatible, **zero egress fees** (unique to R2 — this + changes the cost model for a cache that's read 100× more than + it's written). Presigned URLs for the actual byte transfer so + Workers don't proxy bytes. + +- **D1** for the relational store. SQLite on the edge, read-replicated + globally, 10GB free per database. Our schema is small (rows per + task, not per file). Wrangler-managed migrations. For accounts + that outgrow D1's 10GB cap, **Hyperdrive** bridges to an external + Postgres — same Workers code, different binding. + +- **Durable Objects** for stateful per-run coordination. The single + most important piece: a `RunCoordinatorDO` is a per-run singleton + with strong consistency, holds the graph + ready queue + worker + registrations + WS connections. Solves the "where does the live + state live" problem that would otherwise require Redis. Pairs + natively with WebSocket Hibernation (DO sleeps between events; + no $/idle-connection). The `InflightDedupDO` is the + content-addressed dedup pattern from + `execution-service-2026-06.md` materialized as a global edge + service — one DO per task hash, holds the in-flight promise so a + second submitter joins instead of re-running. + +- **Queues** to buffer event-ingestion spikes. A noisy CI run sends + 500 events/second; the queue absorbs and the consumer Worker + batches into D1/R2. Backpressure-friendly, retry on failure. + +- **KV** for low-latency global reads of small hot data — token → + org lookup, public-key cache for HMAC verification, per-org + feature flags. Eventually consistent but cheap. + +The whole stack is **one repo, one `wrangler.toml`, one +`bun wrangler deploy` command** to bring up. + +## 5. Identity, authz, multi-tenancy + +The honest tradeoffs: + +- **Identity**: GitHub OAuth (via Workers OAuth helper, ~50 LOC) + + a generic OIDC fallback. No proprietary user database; you bring + your own SSO. Sessions live in **D1**; auth state per-request + validated against a **KV**-cached lookup (sub-ms p99). +- **API tokens**: scoped to (org, role, expiry), stored hashed in + D1, lookup-cached in KV. Used by CI and by the worker + registration handshake. Token revocation purges the KV entry + immediately. +- **Per-org isolation**: every D1 row carries `org_id`; every query + filtered through a tiny middleware that injects the auth context. + R2 objects use a per-org key prefix (`/.tar.zst`) + so a misconfigured presigned URL cannot leak cross-org. +- **No cross-org leakage**: a hash collision across orgs returns + miss for the org that doesn't own it. Hashes are not assumed + globally unique; the `(org_id, hash)` tuple is the cache key + enforced at the Worker layer. + +The cache wire stays Turbo-compatible — the team ID + token model +maps straight onto our (org_id, api_token) tuple. Existing Turbo +clients work pointing at us. + +## 6. The DX flow we want users to feel + +```bash +# Day 1: local insights +$ vx run lint test +✓ done in 2.3s + +$ vx insights +→ http://localhost:5173 (your local runs, all-time) + +# Day 7: connect to a team +$ vx insights link --org acme +→ opens browser, OAuth, done. +$ vx run lint test +✓ done in 2.3s +→ uploaded to acme/insights + +# Day 14: turn on the team cache +$ cat .env +VX_REMOTE_CACHE_URL=https://cloud.vx.dev/v8/artifacts +VX_REMOTE_CACHE_TOKEN=team_xxx + +# Day 21: distributed CI +$ cat .github/workflows/ci.yml +- uses: vznjs/vx-distributed-action@v1 + with: + tasks: lint test build + cloud: ${{ secrets.VX_CLOUD_TOKEN }} +``` + +Each step is opt-in. Nothing breaks if you stop using vx Cloud — you +fall back to the local data path. + +## 7. Privacy & data minimization + +What we DO NOT collect: + +- **Stdout/stderr contents by default.** Logs stay local unless the + user opts in (per-org policy). The hosted UI can show "logs not + uploaded" rather than the bytes. +- **Source code.** We never ship source. The cache stores _outputs_, + which the user has explicitly declared. +- **Telemetry beyond the user's runs.** No hidden pings. + +What we DO collect: + +- The run metadata: durations, statuses, hashes, cache + provenance. This is what powers the analytics. It's the same data + Nx Cloud + Turbo collect for paying customers. +- Optionally: stdout/stderr for failed tasks (defaulted off, can be + toggled per-org for debugging). + +Hosted is the only place this matters; self-hosted means it stays on +your boxes regardless. The OSS surface treats local SQLite as the +canonical store. + +## 8. The OSS reference implementation — `apps/cloud/` + +The cloud lives in this repo as `apps/cloud/`, a Cloudflare Workers +project. The same code runs the hosted SaaS at `cloud.vx.dev`. The +**README is the deploy guide**: + +```bash +$ git clone https://github.com/vznjs/vx +$ cd vx/apps/cloud +$ bun install +$ bun wrangler login # one-time auth to your CF account +$ bun wrangler d1 create vx_cloud +$ bun wrangler r2 bucket create vx-cloud-artifacts +$ bun wrangler deploy +→ Deployed to https://vx-cloud-.workers.dev +``` + +That's five minutes to a private hosted vx for your team. The +`wrangler.toml` defines every binding (D1, R2, DOs, Queue, KV) so a +fresh clone is provisionable verbatim. Migrations live as `.sql` +files under `apps/cloud/migrations/` and apply via +`wrangler d1 migrations apply`. + +**Template-spawnable.** We publish the same source as a `cloudflare/ +templates`-registered template so users can `npx create-cloudflare +vx-cloud` and skip the clone-and-configure dance entirely — the +template wizard prompts for the bucket/D1 names and writes +`wrangler.toml` for you. The result is **a hosted vx that the user +owns, in their CF account, with their billing**, deployed by typing +~3 commands. + +This is the structural answer to "open vs. proprietary": there is no +proprietary component. The hosted runtime is the OSS runtime, the +hosted SaaS is just one deployed instance. If `cloud.vx.dev` goes +away tomorrow, every customer can spin their own up in an afternoon. + +### 8.1 Why not a portable backend? + +We considered Postgres + S3 + a generic container deployment (Helm, +docker-compose). **The user-experience math doesn't work.** A team +trying to evaluate vx Cloud should not need to provision a database, +an object store, and a container orchestrator. Cloudflare is the +_only_ stack where the entire surface (compute + relational store + +object store + actor runtime + queue) is one provider, one CLI, one +account, with a free tier that covers small teams forever. + +For users who DO want to bring their own storage (Postgres, S3, a +container farm), the **execution-service-2026-06.md** path stays +open — `vx serve` runs anywhere a Bun process runs, and a future +`vx serve --backend postgres` adapter would let it persist. We +ship the CF target first because it removes friction; the +generic-backend target is a follow-up driven by a real ask. + +### 8.2 Hyperdrive escape hatch + +D1's 10GB-per-database cap is generous but finite. When a team +outgrows it, the **Hyperdrive** binding lets the same Workers code +talk to an external Postgres (RDS, Neon, Supabase) with edge-cached +connection pooling. Migration is one `wrangler.toml` change + a SQL +dump/restore; no code change. So the CF-native default is not a +dead-end. + +## 9. The big architectural payoff: one event stream feeds everything + +``` +┌──── orchestrator (in vx run) ────┐ +│ emits WireEvents → ──┐ +└────────────────────────────────────┘ │ + ├──→ terminal renderer (today) + ├──→ --ui local dev server + ├──→ --tui (future) + ├──→ MCP server for agents + ├──→ Insights uploader (this proposal) + └──→ In-process subscribers (plugins, devframe) +``` + +`devframe-surface.ts` already exposes the stream. The Insights +uploader is _just another subscriber_ — a batched HTTP-POST sink +that flushes events to the cloud API. No new abstraction; one more +adapter on the substrate `event-stream-2026-06.md` built. + +## 10. Phasing + +| Phase | Ships | Validates | +| ----- | ------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------ | +| **A** | `vx insights serve` — local SPA over local `cache.db`. Run history, flamegraph, per-task trends. No upload, no cloud. | The UI is real and useful before any infra exists. | +| **B** | The data model extension (`org_id`, `run_tasks`, `run_events`). Migration of `cache.db` schema. Run-event sink interface. | Persisted state survives schema changes; can be replayed/exported. | +| **C** | `vx cloud serve` — single-binary self-hosted backend. Postgres + S3, no auth (token-only). Reference impl ships first. | Self-hostable from day one. The OSS-first promise. | +| **D** | OAuth + multi-tenant + RBAC. Production-grade self-hosted. | Real teams deploy it. | +| **E** | Hosted SaaS at `cloud.vx.dev`. Trial tier + paid tiers. Same binary, managed. | The commercial path. Funds development. | + +## 11. Non-goals + +- **A proprietary "smart" feature set behind hosted.** Everything + works self-hosted. +- **Replacing GitHub.** We don't host code/PRs/issues. We're a build + observability layer. +- **A general-purpose analytics warehouse.** This is a build-runner + data model. It is not a substitute for Snowflake/BigQuery. + +## 12. Performance bar + +- Insights API: p99 query latency < 200ms for "last 100 runs of + project X" or "trend of task T over 30 days." Achieved via + pre-aggregation rollups (hourly/daily) in PostgreSQL. +- Event ingestion: 10k events/sec/instance, single-postgres backend. + Batched POSTs from the client (1-second flush window or 64KB + batches) keep request rates sane. +- Cache hit latency: ≤ 50ms p99 for a remote GET (already the bar + the existing remote-cache hits). Pre-signed S3 URLs for the actual + byte transfer keep the metadata service light. + +## 13. Open questions + +- **Real-time over the hosted backend.** A user might want to watch + a CI run live from the cloud UI. Solution: the coordinator already + streams `WireEvents`; the cloud UI can subscribe via SSE/WS the + same way the local UI does. Defer until we have customers asking. +- **Log retention.** Bounded by org policy. Default: 30 days for + hosted, infinite for self-hosted. Compaction job ages out + `run_events` blobs to summary form. +- **Cost model for hosted.** Per-seat? Per-task? Per-cache-GB? The + Nx Cloud pricing model is per-seat; the Turbo Vercel model is + per-cache-bandwidth. We'll iterate; the architecture supports + either. + +## 14. The big claim + +If we ship this, vx becomes the **only OSS task runner with an +end-to-end story for: local DX → team observability → hosted +execution → CI distribution → cache.** Turbo has cache + (proprietary) +analytics. Nx has cache + (proprietary) DTE + (proprietary) analytics. +vx has cache + DTE + analytics + execution-as-a-service, OSS, +self-hostable, with optional hosted convenience. That's the moat.