diff --git a/.github/workflows/harness.yml b/.github/workflows/harness.yml new file mode 100644 index 0000000..bdc83a4 --- /dev/null +++ b/.github/workflows/harness.yml @@ -0,0 +1,24 @@ +name: Harness CI + +on: + push: + branches: [main] + pull_request: + +jobs: + harness: + runs-on: ubuntu-latest + steps: + - name: Checkout + uses: actions/checkout@v4 + + - name: Setup Go + uses: actions/setup-go@v5 + with: + go-version-file: go.mod + + - name: Setup Rust + uses: dtolnay/rust-toolchain@stable + + - name: Run harness pipeline + run: make ci diff --git a/.gitignore b/.gitignore index e458ed5..8e0efa6 100644 --- a/.gitignore +++ b/.gitignore @@ -1 +1,3 @@ .worktrees/ +.worktree/ +harness/target/ diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..352d192 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,53 @@ +# Repository Map + +Use this file as a table of contents only. Canonical guidance lives in the linked documents. + +``` +AGENTS.md +ARCHITECTURE.md +NON_NEGOTIABLE_RULES.md +docs/ +├── design-docs/ +│ ├── index.md +│ ├── core-beliefs.md +│ ├── local-operations.md +│ ├── observability-shim.md +│ └── worktree-isolation.md +├── exec-plans/ +│ ├── active/ +│ ├── completed/ +│ └── tech-debt-tracker.md +├── generated/ +├── product-specs/ +│ ├── index.md +│ └── harness-demo-app.md +├── references/ +│ └── codex-app-server-llm.txt +├── PLANS.md +``` + +## Start Here + +- Rules that block merge: `NON_NEGOTIABLE_RULES.md` +- System map and package boundaries: `ARCHITECTURE.md` +- Design doc index and ownership: `docs/design-docs/index.md` +- Product docs: `docs/product-specs/index.md` +- Execution plan policy: `docs/PLANS.md` + +## Runtime Surfaces + +- Ralph Loop CLI: `./ralph-loop` +- Harness CLI: `harness/target/release/harnesscli` +- Harness Make targets: `Makefile.harness` + +## Specs In Repo + +- Current product spec: `SPEC.md` +- Ralph Loop spec import: `specs/ralph-loop/SPEC.md` +- Harness spec import: `specs/harness-spec/SPEC.md` + +## Working Rules + +- Keep this file short and navigational. +- Put substantive guidance in the linked docs, not here. +- Update the relevant canonical doc when code or operating practice changes. diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md new file mode 100644 index 0000000..630eff5 --- /dev/null +++ b/ARCHITECTURE.md @@ -0,0 +1,70 @@ +# Architecture + +## Purpose + +This repository currently contains two related systems: + +1. A Go implementation of `ralph-loop`, an agent-first CLI that prepares a worktree, drives Codex through a setup and coding loop, and streams structured run data. +2. A Rust `harnesscli` bootstrap that turns the repository into a harnessed codebase with stable commands for smoke, lint, typecheck, test, audit, init, boot, observability, and cleanup. + +The long-term product direction in [`SPEC.md`](SPEC.md) is a Git impact analyzer. The code that exists today is still mostly harness and orchestration infrastructure. + +## Package Boundaries + +### Go runtime + +- `cmd/ralph-loop` + - Thin CLI entrypoint. + - Resolves the current working directory and hands control to `internal/ralphloop`. +- `internal/ralphloop` + - Parsing, schema generation, worktree management, session tracking, log handling, JSON-RPC transport, and orchestration. + - This package is the only place that should contain Ralph Loop behavior. + +Dependency direction: + +- `cmd/ralph-loop` -> `internal/ralphloop` +- `internal/ralphloop` -> Go standard library + +### Rust harness runtime + +- `harness/src/main.rs` + - `clap` entrypoint and shared process exit handling. +- `harness/src/cmd/*` + - One module per command group. +- `harness/src/util` + - Shared output, process, worktree, and filesystem helpers. + +Dependency direction: + +- `main` -> `cmd`, `util` +- `cmd/*` -> `util` +- `util` -> Rust stdlib plus small serialization helpers + +## Repository Zones + +- `specs/` + - Imported upstream specs and references. Treat these as vendored inputs, not primary implementation locations. +- `docs/design-docs/` + - Canonical operational and engineering rationale. +- `docs/product-specs/` + - Product-facing behavior descriptions, including the harness demo app contract. +- `docs/exec-plans/` + - Checked-in execution plans and technical debt tracking. +- `.worktree/` + - Runtime state generated per Git worktree by the harness system. + +## Entry Points + +- `./ralph-loop` + - Repo-root executable wrapper around `go run ./cmd/ralph-loop`. +- `cargo build --release --manifest-path harness/Cargo.toml` + - Builds `harnesscli`. +- `make ci` + - Stable top-level validation flow once the harness is built. + +## Boundary Rules + +- Do not add new Ralph Loop logic under `cmd/`. +- Do not place durable operating guidance in `AGENTS.md`. +- Do not edit imported spec files unless the change is explicitly a spec sync. +- All new automation-facing repository operations should prefer `harnesscli` subcommands over ad hoc shell scripts. diff --git a/Makefile b/Makefile new file mode 100644 index 0000000..6ff1305 --- /dev/null +++ b/Makefile @@ -0,0 +1 @@ +-include Makefile.harness diff --git a/Makefile.harness b/Makefile.harness new file mode 100644 index 0000000..6331965 --- /dev/null +++ b/Makefile.harness @@ -0,0 +1,22 @@ +HARNESS := harness/target/release/harnesscli + +.PHONY: smoke test lint typecheck check ci harness-build + +harness-build: + @cargo build --release --manifest-path harness/Cargo.toml + +smoke: harness-build + @$(HARNESS) smoke + +test: harness-build + @$(HARNESS) test + +lint: harness-build + @$(HARNESS) lint + +typecheck: harness-build + @$(HARNESS) typecheck + +check: lint typecheck + +ci: smoke check test diff --git a/NON_NEGOTIABLE_RULES.md b/NON_NEGOTIABLE_RULES.md new file mode 100644 index 0000000..35e10a9 --- /dev/null +++ b/NON_NEGOTIABLE_RULES.md @@ -0,0 +1,25 @@ +# Non-Negotiable Rules + +These rules are absolute. Violations block merge. + +## Rule 1: New Behavior Must Be Tested + +- Every new command, module, and branch of behavior needs tests before merge. +- Changes to existing behavior must update tests in the same change. +- For this repository, that means both Go tests for `internal/ralphloop` and Rust tests for `harnesscli` when those surfaces change. + +## Rule 2: Machine-Readable Automation Is Required + +- Automation-facing commands must support structured output. +- JSON is the default for non-TTY execution. +- Errors in automation flows must be emitted as structured payloads, not prose-only failures. + +## Rule 3: No Blind Sleeps For Readiness + +- Any boot or lifecycle command must verify readiness with a real probe. +- If readiness cannot be proven, the command must fail non-zero and report the failing resource. + +## Rule 4: Repository Knowledge Is Canonical + +- Durable product, runtime, and architecture guidance belongs in versioned repo docs. +- `AGENTS.md` stays navigational; substantive guidance belongs in `ARCHITECTURE.md` or `docs/`. diff --git a/docs/PLANS.md b/docs/PLANS.md new file mode 100644 index 0000000..09015f4 --- /dev/null +++ b/docs/PLANS.md @@ -0,0 +1,32 @@ +# Plans + +Execution plans are first-class repository artifacts. + +## Locations + +- `docs/exec-plans/active/` + - In-flight work with current milestone status. +- `docs/exec-plans/completed/` + - Archived plans that reflect what actually shipped. +- `docs/exec-plans/tech-debt-tracker.md` + - Known debt, missing invariants, and follow-up cleanup items. + +## When To Create A Plan + +- Create a checked-in plan for multi-step work, cross-file refactors, harness changes, and anything likely to span more than one coding turn. +- Small one-file fixes can stay lightweight, but the decision and rationale should still be discoverable from code and commit history. + +## Minimum Plan Shape + +- Goal +- Background +- Milestones +- Current progress +- Key decisions +- Remaining issues +- Links + +## Freshness + +- Update plan status as milestones move. +- Move completed plans into `docs/exec-plans/completed/` instead of deleting them. diff --git a/docs/design-docs/core-beliefs.md b/docs/design-docs/core-beliefs.md new file mode 100644 index 0000000..0929dee --- /dev/null +++ b/docs/design-docs/core-beliefs.md @@ -0,0 +1,31 @@ +# Core Beliefs + +## Product Beliefs + +- The repository should converge on a single operator surface for automation instead of accumulating fragile shell entrypoints. +- The long-term product is a Git impact analyzer, but the immediate engineering priority is reliable harnessing and agent operability. +- Worktree-local behavior is preferable to shared mutable state because it keeps agent runs isolated and reproducible. +- Structured command output matters more than pretty terminal output for automation paths. + +## Agent-First Operating Principles + +1. **Repository knowledge is the system of record.** + If a decision matters, it must be encoded in code, markdown, schema, or a checked-in plan. + +2. **What the agent cannot see does not exist.** + Product intent, architecture constraints, and operating conventions need to be discoverable in-repo. + +3. **Enforce boundaries centrally, allow autonomy locally.** + Rules belong in command contracts, tests, and CI rather than informal expectation. + +4. **Corrections are cheap, waiting is expensive.** + Prefer short feedback loops, fast validation commands, and follow-up fixes over long-lived ambiguity. + +5. **Prefer boring technology.** + The current Go and Rust choices are intentional because both are stable, well-known, and easy to automate. + +6. **Encode taste once, enforce continuously.** + Naming, output contracts, and lifecycle behavior should be captured in code and docs so every run sees the same standard. + +7. **Treat documentation as executable infrastructure.** + Docs are part of the harness. If runtime behavior changes, the corresponding canonical doc must change too. diff --git a/docs/design-docs/index.md b/docs/design-docs/index.md new file mode 100644 index 0000000..c2f012f --- /dev/null +++ b/docs/design-docs/index.md @@ -0,0 +1,13 @@ +# Design Docs Index + +| Document | Canonical Topic | Owner | Intended Audience | Update When | +| --- | --- | --- | --- | --- | +| [`core-beliefs.md`](./core-beliefs.md) | Product beliefs and agent-first operating principles | Repo maintainers | Humans and agents | Product direction or operating model changes | +| [`local-operations.md`](./local-operations.md) | Local command surface, env vars, troubleshooting | Repo maintainers | Humans and agents | Commands, env vars, or validation flows change | +| [`worktree-isolation.md`](./worktree-isolation.md) | Worktree ID derivation, runtime roots, and port allocation | Repo maintainers | Humans and agents | Boot/runtime behavior changes | +| [`observability-shim.md`](./observability-shim.md) | Current telemetry data flow and local query contract | Repo maintainers | Humans and agents | Observability data paths or query contract changes | + +## Status + +- These documents are the canonical operational layer for the repository. +- `AGENTS.md` should only point here, not duplicate this content. diff --git a/docs/design-docs/local-operations.md b/docs/design-docs/local-operations.md new file mode 100644 index 0000000..ed1704e --- /dev/null +++ b/docs/design-docs/local-operations.md @@ -0,0 +1,69 @@ +# Local Operations + +## Primary Commands + +### Go Ralph Loop + +- `./ralph-loop schema --output json` + - Show the live command contract. +- `./ralph-loop init --dry-run --output json` + - Preview worktree initialization details. +- `./ralph-loop "" --output ndjson --preserve-worktree` + - Run the loop and keep the generated worktree for inspection. + +### Harness CLI + +- `cargo build --release --manifest-path harness/Cargo.toml` + - Build `harnesscli`. +- `harness/target/release/harnesscli smoke` + - Fast compile sanity check for the Go code. +- `harness/target/release/harnesscli lint` + - Formatting plus static analysis checks. +- `harness/target/release/harnesscli typecheck` + - Full repository build validation. +- `harness/target/release/harnesscli test` + - Go tests plus Rust harness tests. +- `harness/target/release/harnesscli audit . --output json` + - Verify required harness files and directories exist. +- `harness/target/release/harnesscli init` + - Create the current worktree runtime root and metadata. +- `harness/target/release/harnesscli boot start` + - Start the deterministic local demo app for this worktree. + +### Make Targets + +- `make smoke` +- `make lint` +- `make typecheck` +- `make check` +- `make test` +- `make ci` + +## Environment Variables + +### Ralph Loop + +- `RALPH_LOOP_CODEX_COMMAND` + - Overrides the command used to start Codex app-server. + +### Harness + +- `HARNESS_SMOKE_CMD` +- `HARNESS_LINT_CMD` +- `HARNESS_TYPECHECK_CMD` +- `HARNESS_TEST_CMD` + - Override the default command run by the matching `harnesscli` subcommand. + +- `DISCODE_WORKTREE_ID` + - Override the derived worktree ID. + +- `APP_PORT_BASE` +- `DISCODE_APP_PORT` +- `PORT` + - Override demo app port selection. + +## Troubleshooting + +- If `./ralph-loop` cannot talk to Codex, confirm `codex app-server` works in your shell or set `RALPH_LOOP_CODEX_COMMAND`. +- If `harnesscli boot start` reports a busy port, rerun with `DISCODE_APP_PORT` or stop the conflicting process. +- If automation output looks human-oriented in scripts, pass `--output json` explicitly even though non-TTY defaults should already select JSON. diff --git a/docs/design-docs/observability-shim.md b/docs/design-docs/observability-shim.md new file mode 100644 index 0000000..abe993e --- /dev/null +++ b/docs/design-docs/observability-shim.md @@ -0,0 +1,32 @@ +# Observability Shim + +## Current State + +This repository is at the start of the harness-spec observability work. The current implementation is a local observability shim, not the full Vector + Victoria stack described by later phases of `specs/harness-spec`. + +## Data Flow + +- `harnesscli boot start` writes process logs under `.worktree//logs/`. +- `harnesscli observability start` creates a per-worktree observability metadata file and declares the local query endpoints that will be used by future phases. +- `harnesscli observability query` currently supports local log-file queries and returns structured output. + +## Query Contract + +- Default output is JSON in non-TTY mode. +- `--output ndjson` is available for line-oriented query results. +- Query responses include: + - `worktree_id` + - `runtime_root` + - `kind` + - `items` + +## Upgrade Path + +Later harness phases should replace this shim with a real per-worktree telemetry stack: + +- Vector for collection and fan-out +- VictoriaLogs for logs +- VictoriaMetrics for metrics +- VictoriaTraces for traces + +When that happens, this document must be updated before merge. diff --git a/docs/design-docs/worktree-isolation.md b/docs/design-docs/worktree-isolation.md new file mode 100644 index 0000000..c7d5b35 --- /dev/null +++ b/docs/design-docs/worktree-isolation.md @@ -0,0 +1,46 @@ +# Worktree Isolation + +## Goal + +Every worktree gets its own stable runtime identity and local resources. + +## Worktree ID + +- Resolve the canonical repo root for the current working tree. +- Use the basename of that path as the human-readable prefix. +- Append a stable short hash of the canonical path. +- Allow `DISCODE_WORKTREE_ID` to override the derived value. + +Example: + +`impactable-a1b2c3d4` + +## Runtime Root + +The harness uses: + +`.worktree//` + +Subdirectories currently reserved: + +- `run/` +- `logs/` +- `tmp/` +- `demo-app/` +- `observability/` + +## Port Allocation + +- Default app port is derived from the worktree hash. +- Explicit overrides win in this order: + - `DISCODE_APP_PORT` + - `APP_PORT` + - `PORT` +- When a derived port is already in use, the harness probes the next deterministic candidate in a bounded range. + +## Lifecycle + +- `harnesscli init` creates the runtime root and metadata. +- `harnesscli boot start` starts a deterministic local demo app under the current worktree runtime root. +- `harnesscli boot status` reports the stored metadata and live health. +- `harnesscli boot stop` terminates the managed process and removes stale lock metadata while preserving logs. diff --git a/docs/exec-plans/tech-debt-tracker.md b/docs/exec-plans/tech-debt-tracker.md new file mode 100644 index 0000000..fd0ea43 --- /dev/null +++ b/docs/exec-plans/tech-debt-tracker.md @@ -0,0 +1,13 @@ +# Technical Debt Tracker + +## Open Items + +- The repository has a bootstrap `harnesscli`, but later harness-spec phases still need full invariant enforcement and a richer observability stack. +- `ralph-loop` exists as a Go implementation and should be brought closer to the imported upstream reference over time. +- The long-term Git impact analyzer product surface in `SPEC.md` is mostly specified but not implemented yet. + +## Usage + +- Add debt items with clear remediation notes. +- Link active execution plans or PRs when a cleanup effort starts. +- Move resolved items into commit history rather than silently deleting context. diff --git a/docs/product-specs/harness-demo-app.md b/docs/product-specs/harness-demo-app.md new file mode 100644 index 0000000..9454ddc --- /dev/null +++ b/docs/product-specs/harness-demo-app.md @@ -0,0 +1,24 @@ +# Harness Demo App + +## Purpose + +The repository does not yet have a user-facing application, so the harness boots a deterministic demo app to validate worktree isolation, port derivation, health checks, and browser automation. + +## Required Surface + +- Root page: `/` + - Title contains `Impactable Harness Demo` + - Body shows: + - repository name + - worktree ID + - runtime root + - selected port +- Health endpoint: `/healthz` + - Returns HTTP 200 with body `ok` + +## Runtime Contract + +- The app is served from `.worktree//demo-app/`. +- `harnesscli boot start` creates or refreshes the demo app assets before launch. +- The boot command blocks until `/healthz` returns success. +- `harnesscli boot status` returns the app URL and healthcheck URL in structured form. diff --git a/docs/product-specs/index.md b/docs/product-specs/index.md new file mode 100644 index 0000000..7e590f1 --- /dev/null +++ b/docs/product-specs/index.md @@ -0,0 +1,6 @@ +# Product Specs Index + +| Document | Topic | Audience | Update When | +| --- | --- | --- | --- | +| [`harness-demo-app.md`](./harness-demo-app.md) | Deterministic browser-visible app surface used by the harness | Humans and agents | Boot/status contract changes | +| [`../../SPEC.md`](../../SPEC.md) | Long-term Git impact analyzer product direction | Humans and agents | Product requirements change | diff --git a/docs/references/codex-app-server-llm.txt b/docs/references/codex-app-server-llm.txt new file mode 100644 index 0000000..5c2100b --- /dev/null +++ b/docs/references/codex-app-server-llm.txt @@ -0,0 +1,1438 @@ +# Codex App Server + +Codex app-server is the interface Codex uses to power rich clients (for example, the Codex VS Code extension). Use it when you want a deep integration inside your own product: authentication, conversation history, approvals, and streamed agent events. The app-server implementation is open source in the Codex GitHub repository ([openai/codex/codex-rs/app-server](https://github.com/openai/codex/tree/main/codex-rs/app-server)). See the [Open Source](https://developers.openai.com/codex/open-source) page for the full list of open-source Codex components. + +If you are automating jobs or running Codex in CI, use the + Codex SDK instead. + +## Protocol + +Like [MCP](https://modelcontextprotocol.io/), `codex app-server` supports bidirectional communication using JSON-RPC 2.0 messages (with the `"jsonrpc":"2.0"` header omitted on the wire). + +Supported transports: + +- `stdio` (`--listen stdio://`, default): newline-delimited JSON (JSONL). +- `websocket` (`--listen ws://IP:PORT`, experimental): one JSON-RPC message per WebSocket text frame. + +In WebSocket mode, app-server uses bounded queues. When request ingress is full, the server rejects new requests with JSON-RPC error code `-32001` and message `"Server overloaded; retry later."` Clients should retry with an exponentially increasing delay and jitter. + +## Message schema + +Requests include `method`, `params`, and `id`: + +```json +{ "method": "thread/start", "id": 10, "params": { "model": "gpt-5.1-codex" } } +``` + +Responses echo the `id` with either `result` or `error`: + +```json +{ "id": 10, "result": { "thread": { "id": "thr_123" } } } +``` + +```json +{ "id": 10, "error": { "code": 123, "message": "Something went wrong" } } +``` + +Notifications omit `id` and use only `method` and `params`: + +```json +{ "method": "turn/started", "params": { "turn": { "id": "turn_456" } } } +``` + +You can generate a TypeScript schema or a JSON Schema bundle from the CLI. Each output is specific to the Codex version you ran, so the generated artifacts match that version exactly: + +```bash +codex app-server generate-ts --out ./schemas +codex app-server generate-json-schema --out ./schemas +``` + +## Getting started + +1. Start the server with `codex app-server` (default stdio transport) or `codex app-server --listen ws://127.0.0.1:4500` (experimental WebSocket transport). +2. Connect a client over the selected transport, then send `initialize` followed by the `initialized` notification. +3. Start a thread and a turn, then keep reading notifications from the active transport stream. + +Example (Node.js / TypeScript): + +```ts + + + +const proc = spawn("codex", ["app-server"], { + stdio: ["pipe", "pipe", "inherit"], +}); +const rl = readline.createInterface({ input: proc.stdout }); + +const send = (message: unknown) => { + proc.stdin.write(`${JSON.stringify(message)}\n`); +}; + +let threadId: string | null = null; + +rl.on("line", (line) => { + const msg = JSON.parse(line) as any; + console.log("server:", msg); + + if (msg.id === 1 && msg.result?.thread?.id && !threadId) { + threadId = msg.result.thread.id; + send({ + method: "turn/start", + id: 2, + params: { + threadId, + input: [{ type: "text", text: "Summarize this repo." }], + }, + }); + } +}); + +send({ + method: "initialize", + id: 0, + params: { + clientInfo: { + name: "my_product", + title: "My Product", + version: "0.1.0", + }, + }, +}); +send({ method: "initialized", params: {} }); +send({ method: "thread/start", id: 1, params: { model: "gpt-5.1-codex" } }); +``` + +## Core primitives + +- **Thread**: A conversation between a user and the Codex agent. Threads contain turns. +- **Turn**: A single user request and the agent work that follows. Turns contain items and stream incremental updates. +- **Item**: A unit of input or output (user message, agent message, command runs, file change, tool call, and more). + +Use the thread APIs to create, list, or archive conversations. Drive a conversation with turn APIs and stream progress via turn notifications. + +## Lifecycle overview + +- **Initialize once per connection**: Immediately after opening a transport connection, send an `initialize` request with your client metadata, then emit `initialized`. The server rejects any request on that connection before this handshake. +- **Start (or resume) a thread**: Call `thread/start` for a new conversation, `thread/resume` to continue an existing one, or `thread/fork` to branch history into a new thread id. +- **Begin a turn**: Call `turn/start` with the target `threadId` and user input. Optional fields override model, personality, `cwd`, sandbox policy, and more. +- **Steer an active turn**: Call `turn/steer` to append user input to the currently in-flight turn without creating a new turn. +- **Stream events**: After `turn/start`, keep reading notifications on stdout: `thread/archived`, `thread/unarchived`, `item/started`, `item/completed`, `item/agentMessage/delta`, tool progress, and other updates. +- **Finish the turn**: The server emits `turn/completed` with final status when the model finishes or after a `turn/interrupt` cancellation. + +## Initialization + +Clients must send a single `initialize` request per transport connection before invoking any other method on that connection, then acknowledge with an `initialized` notification. Requests sent before initialization receive a `Not initialized` error, and repeated `initialize` calls on the same connection return `Already initialized`. + +The server returns the user agent string it will present to upstream services. Set `clientInfo` to identify your integration. + +`initialize.params.capabilities` also supports per-connection notification opt-out via `optOutNotificationMethods`, which is a list of exact method names to suppress for that connection. Matching is exact (no wildcards/prefixes). Unknown method names are accepted and ignored. + +**Important**: Use `clientInfo.name` to identify your client for the OpenAI Compliance Logs Platform. If you are developing a new Codex integration intended for enterprise use, please contact OpenAI to get it added to a known clients list. For more context, see the [Codex logs reference](https://chatgpt.com/admin/api-reference#tag/Logs:-Codex). + +Example (from the Codex VS Code extension): + +```json +{ + "method": "initialize", + "id": 0, + "params": { + "clientInfo": { + "name": "codex_vscode", + "title": "Codex VS Code Extension", + "version": "0.1.0" + } + } +} +``` + +Example with notification opt-out: + +```json +{ + "method": "initialize", + "id": 1, + "params": { + "clientInfo": { + "name": "my_client", + "title": "My Client", + "version": "0.1.0" + }, + "capabilities": { + "experimentalApi": true, + "optOutNotificationMethods": [ + "codex/event/session_configured", + "item/agentMessage/delta" + ] + } + } +} +``` + +## Experimental API opt-in + +Some app-server methods and fields are intentionally gated behind `experimentalApi` capability. + +- Omit `capabilities` (or set `experimentalApi` to `false`) to stay on the stable API surface, and the server rejects experimental methods/fields. +- Set `capabilities.experimentalApi` to `true` to enable experimental methods and fields. + +```json +{ + "method": "initialize", + "id": 1, + "params": { + "clientInfo": { + "name": "my_client", + "title": "My Client", + "version": "0.1.0" + }, + "capabilities": { + "experimentalApi": true + } + } +} +``` + +If a client sends an experimental method or field without opting in, app-server rejects it with: + +` requires experimentalApi capability` + +## API overview + +- `thread/start` - create a new thread; emits `thread/started` and automatically subscribes you to turn/item events for that thread. +- `thread/resume` - reopen an existing thread by id so later `turn/start` calls append to it. +- `thread/fork` - fork a thread into a new thread id by copying stored history; emits `thread/started` for the new thread. +- `thread/read` - read a stored thread by id without resuming it; set `includeTurns` to return full turn history. Returned `thread` objects include runtime `status`. +- `thread/list` - page through stored thread logs; supports cursor-based pagination plus `modelProviders`, `sourceKinds`, `archived`, and `cwd` filters. Returned `thread` objects include runtime `status`. +- `thread/loaded/list` - list the thread ids currently loaded in memory. +- `thread/archive` - move a thread's log file into the archived directory; returns `{}` on success and emits `thread/archived`. +- `thread/unsubscribe` - unsubscribe this connection from thread turn/item events. If this was the last subscriber, the server unloads the thread and emits `thread/closed`. +- `thread/unarchive` - restore an archived thread rollout back into the active sessions directory; returns the restored `thread` and emits `thread/unarchived`. +- `thread/status/changed` - notification emitted when a loaded thread's runtime `status` changes. +- `thread/compact/start` - trigger conversation history compaction for a thread; returns `{}` immediately while progress streams via `turn/*` and `item/*` notifications. +- `thread/rollback` - drop the last N turns from the in-memory context and persist a rollback marker; returns the updated `thread`. +- `turn/start` - add user input to a thread and begin Codex generation; responds with the initial `turn` and streams events. For `collaborationMode`, `settings.developer_instructions: null` means "use built-in instructions for the selected mode." +- `turn/steer` - append user input to the active in-flight turn for a thread; returns the accepted `turnId`. +- `turn/interrupt` - request cancellation of an in-flight turn; success is `{}` and the turn ends with `status: "interrupted"`. +- `review/start` - kick off the Codex reviewer for a thread; emits `enteredReviewMode` and `exitedReviewMode` items. +- `command/exec` - run a single command under the server sandbox without starting a thread/turn. +- `model/list` - list available models (set `includeHidden: true` to include entries with `hidden: true`) with effort options, optional `upgrade`, and `inputModalities`. +- `experimentalFeature/list` - list feature flags with lifecycle stage metadata and cursor pagination. +- `collaborationMode/list` - list collaboration mode presets (experimental, no pagination). +- `skills/list` - list skills for one or more `cwd` values (supports `forceReload` and optional `perCwdExtraUserRoots`). +- `app/list` - list available apps (connectors) with pagination plus accessibility/enabled metadata. +- `skills/config/write` - enable or disable skills by path. +- `mcpServer/oauth/login` - start an OAuth login for a configured MCP server; returns an authorization URL and emits `mcpServer/oauthLogin/completed` on completion. +- `tool/requestUserInput` - prompt the user with 1-3 short questions for a tool call (experimental); questions can set `isOther` for a free-form option. +- `config/mcpServer/reload` - reload MCP server configuration from disk and queue a refresh for loaded threads. +- `mcpServerStatus/list` - list MCP servers, tools, resources, and auth status (cursor + limit pagination). +- `windowsSandbox/setupStart` - start Windows sandbox setup for `elevated` or `unelevated` mode; returns quickly and later emits `windowsSandbox/setupCompleted`. +- `feedback/upload` - submit a feedback report (classification + optional reason/logs + conversation id, plus optional `extraLogFiles` attachments). +- `config/read` - fetch the effective configuration on disk after resolving configuration layering. +- `externalAgentConfig/detect` - detect migratable external-agent artifacts with `includeHome` and optional `cwds`; each detected item includes `cwd` (`null` for home). +- `externalAgentConfig/import` - apply selected external-agent migration items by passing explicit `migrationItems` with `cwd` (`null` for home). +- `config/value/write` - write a single configuration key/value to the user's `config.toml` on disk. +- `config/batchWrite` - apply configuration edits atomically to the user's `config.toml` on disk. +- `configRequirements/read` - fetch requirements from `requirements.toml` and/or MDM, including allow-lists, pinned `featureRequirements`, and residency/network requirements (or `null` if you haven't set any up). + +## Models + +### List models (`model/list`) + +Call `model/list` to discover available models and their capabilities before rendering model or personality selectors. + +```json +{ "method": "model/list", "id": 6, "params": { "limit": 20, "includeHidden": false } } +{ "id": 6, "result": { + "data": [{ + "id": "gpt-5.4", + "model": "gpt-5.4", + "displayName": "GPT-5.4", + "hidden": false, + "defaultReasoningEffort": "medium", + "supportedReasoningEfforts": [{ + "reasoningEffort": "low", + "description": "Lower latency" + }], + "inputModalities": ["text", "image"], + "supportsPersonality": true, + "isDefault": true + }], + "nextCursor": null +} } +``` + +Each model entry can include: + +- `supportedReasoningEfforts` - supported effort options for the model. +- `defaultReasoningEffort` - suggested default effort for clients. +- `upgrade` - optional recommended upgrade model id for migration prompts in clients. +- `upgradeInfo` - optional upgrade metadata for migration prompts in clients. +- `hidden` - whether the model is hidden from the default picker list. +- `inputModalities` - supported input types for the model (for example `text`, `image`). +- `supportsPersonality` - whether the model supports personality-specific instructions such as `/personality`. +- `isDefault` - whether the model is the recommended default. + +By default, `model/list` returns picker-visible models only. Set `includeHidden: true` if you need the full list and want to filter on the client side using `hidden`. + +When `inputModalities` is missing (older model catalogs), treat it as `["text", "image"]` for backward compatibility. + +### List experimental features (`experimentalFeature/list`) + +Use this endpoint to discover feature flags with metadata and lifecycle stage: + +```json +{ "method": "experimentalFeature/list", "id": 7, "params": { "limit": 20 } } +{ "id": 7, "result": { + "data": [{ + "name": "unified_exec", + "stage": "beta", + "displayName": "Unified exec", + "description": "Use the unified PTY-backed execution tool.", + "announcement": "Beta rollout for improved command execution reliability.", + "enabled": false, + "defaultEnabled": false + }], + "nextCursor": null +} } +``` + +`stage` can be `beta`, `underDevelopment`, `stable`, `deprecated`, or `removed`. For non-beta flags, `displayName`, `description`, and `announcement` may be `null`. + +## Threads + +- `thread/read` reads a stored thread without subscribing to it; set `includeTurns` to include turns. +- `thread/list` supports cursor pagination plus `modelProviders`, `sourceKinds`, `archived`, and `cwd` filtering. +- `thread/loaded/list` returns the thread IDs currently in memory. +- `thread/archive` moves the thread's persisted JSONL log into the archived directory. +- `thread/unsubscribe` unsubscribes the current connection from a loaded thread and can trigger `thread/closed`. +- `thread/unarchive` restores an archived thread rollout back into the active sessions directory. +- `thread/compact/start` triggers compaction and returns `{}` immediately. +- `thread/rollback` drops the last N turns from the in-memory context and records a rollback marker in the thread's persisted JSONL log. + +### Start or resume a thread + +Start a fresh thread when you need a new Codex conversation. + +```json +{ "method": "thread/start", "id": 10, "params": { + "model": "gpt-5.1-codex", + "cwd": "/Users/me/project", + "approvalPolicy": "never", + "sandbox": "workspaceWrite", + "personality": "friendly", + "serviceName": "my_app_server_client" +} } +{ "id": 10, "result": { + "thread": { + "id": "thr_123", + "preview": "", + "ephemeral": false, + "modelProvider": "openai", + "createdAt": 1730910000 + } +} } +{ "method": "thread/started", "params": { "thread": { "id": "thr_123" } } } +``` + +`serviceName` is optional. Set it when you want app-server to tag thread-level metrics with your integration's service name. + +To continue a stored session, call `thread/resume` with the `thread.id` you recorded earlier. The response shape matches `thread/start`. You can also pass the same configuration overrides supported by `thread/start`, such as `personality`: + +```json +{ "method": "thread/resume", "id": 11, "params": { + "threadId": "thr_123", + "personality": "friendly" +} } +{ "id": 11, "result": { "thread": { "id": "thr_123", "name": "Bug bash notes", "ephemeral": false } } } +``` + +Resuming a thread doesn't update `thread.updatedAt` (or the rollout file's modified time) by itself. The timestamp updates when you start a turn. + +If you mark an enabled MCP server as `required` in config and that server fails to initialize, `thread/start` and `thread/resume` fail instead of continuing without it. + +`dynamicTools` on `thread/start` is an experimental field (requires `capabilities.experimentalApi = true`). Codex persists these dynamic tools in the thread rollout metadata and restores them on `thread/resume` when you don't supply new dynamic tools. + +If you resume with a different model than the one recorded in the rollout, Codex emits a warning and applies a one-time model-switch instruction on the next turn. + +To branch from a stored session, call `thread/fork` with the `thread.id`. This creates a new thread id and emits a `thread/started` notification for it: + +```json +{ "method": "thread/fork", "id": 12, "params": { "threadId": "thr_123" } } +{ "id": 12, "result": { "thread": { "id": "thr_456" } } } +{ "method": "thread/started", "params": { "thread": { "id": "thr_456" } } } +``` + +When a user-facing thread title has been set, app-server hydrates `thread.name` on `thread/list`, `thread/read`, `thread/resume`, `thread/unarchive`, and `thread/rollback` responses. `thread/start` and `thread/fork` may omit `name` (or return `null`) until a title is set later. + +### Read a stored thread (without resuming) + +Use `thread/read` when you want stored thread data but don't want to resume the thread or subscribe to its events. + +- `includeTurns` - when `true`, the response includes the thread's turns; when `false` or omitted, you get the thread summary only. +- Returned `thread` objects include runtime `status` (`notLoaded`, `idle`, `systemError`, or `active` with `activeFlags`). + +```json +{ "method": "thread/read", "id": 19, "params": { "threadId": "thr_123", "includeTurns": true } } +{ "id": 19, "result": { "thread": { "id": "thr_123", "name": "Bug bash notes", "ephemeral": false, "status": { "type": "notLoaded" }, "turns": [] } } } +``` + +Unlike `thread/resume`, `thread/read` doesn't load the thread into memory or emit `thread/started`. + +### List threads (with pagination & filters) + +`thread/list` lets you render a history UI. Results default to newest-first by `createdAt`. Filters apply before pagination. Pass any combination of: + +- `cursor` - opaque string from a prior response; omit for the first page. +- `limit` - server defaults to a reasonable page size if unset. +- `sortKey` - `created_at` (default) or `updated_at`. +- `modelProviders` - restrict results to specific providers; unset, null, or an empty array includes all providers. +- `sourceKinds` - restrict results to specific thread sources. When omitted or `[]`, the server defaults to interactive sources only: `cli` and `vscode`. +- `archived` - when `true`, list archived threads only. When `false` or omitted, list non-archived threads (default). +- `cwd` - restrict results to threads whose session current working directory exactly matches this path. + +`sourceKinds` accepts the following values: + +- `cli` +- `vscode` +- `exec` +- `appServer` +- `subAgent` +- `subAgentReview` +- `subAgentCompact` +- `subAgentThreadSpawn` +- `subAgentOther` +- `unknown` + +Example: + +```json +{ "method": "thread/list", "id": 20, "params": { + "cursor": null, + "limit": 25, + "sortKey": "created_at" +} } +{ "id": 20, "result": { + "data": [ + { "id": "thr_a", "preview": "Create a TUI", "ephemeral": false, "modelProvider": "openai", "createdAt": 1730831111, "updatedAt": 1730831111, "name": "TUI prototype", "status": { "type": "notLoaded" } }, + { "id": "thr_b", "preview": "Fix tests", "ephemeral": true, "modelProvider": "openai", "createdAt": 1730750000, "updatedAt": 1730750000, "status": { "type": "notLoaded" } } + ], + "nextCursor": "opaque-token-or-null" +} } +``` + +When `nextCursor` is `null`, you have reached the final page. + +### Track thread status changes + +`thread/status/changed` is emitted whenever a loaded thread's runtime status changes. The payload includes `threadId` and the new `status`. + +```json +{ + "method": "thread/status/changed", + "params": { + "threadId": "thr_123", + "status": { "type": "active", "activeFlags": ["waitingOnApproval"] } + } +} +``` + +### List loaded threads + +`thread/loaded/list` returns thread IDs currently loaded in memory. + +```json +{ "method": "thread/loaded/list", "id": 21 } +{ "id": 21, "result": { "data": ["thr_123", "thr_456"] } } +``` + +### Unsubscribe from a loaded thread + +`thread/unsubscribe` removes the current connection's subscription to a thread. The response status is one of: + +- `unsubscribed` when the connection was subscribed and is now removed. +- `notSubscribed` when the connection was not subscribed to that thread. +- `notLoaded` when the thread is not loaded. + +If this was the last subscriber, the server unloads the thread and emits a `thread/status/changed` transition to `notLoaded` plus `thread/closed`. + +```json +{ "method": "thread/unsubscribe", "id": 22, "params": { "threadId": "thr_123" } } +{ "id": 22, "result": { "status": "unsubscribed" } } +{ "method": "thread/status/changed", "params": { + "threadId": "thr_123", + "status": { "type": "notLoaded" } +} } +{ "method": "thread/closed", "params": { "threadId": "thr_123" } } +``` + +### Archive a thread + +Use `thread/archive` to move the persisted thread log (stored as a JSONL file on disk) into the archived sessions directory. + +```json +{ "method": "thread/archive", "id": 22, "params": { "threadId": "thr_b" } } +{ "id": 22, "result": {} } +{ "method": "thread/archived", "params": { "threadId": "thr_b" } } +``` + +Archived threads won't appear in future calls to `thread/list` unless you pass `archived: true`. + +### Unarchive a thread + +Use `thread/unarchive` to move an archived thread rollout back into the active sessions directory. + +```json +{ "method": "thread/unarchive", "id": 24, "params": { "threadId": "thr_b" } } +{ "id": 24, "result": { "thread": { "id": "thr_b", "name": "Bug bash notes" } } } +{ "method": "thread/unarchived", "params": { "threadId": "thr_b" } } +``` + +### Trigger thread compaction + +Use `thread/compact/start` to trigger manual history compaction for a thread. The request returns immediately with `{}`. + +App-server emits progress as standard `turn/*` and `item/*` notifications on the same `threadId`, including a `contextCompaction` item lifecycle (`item/started` then `item/completed`). + +```json +{ "method": "thread/compact/start", "id": 25, "params": { "threadId": "thr_b" } } +{ "id": 25, "result": {} } +``` + +### Roll back recent turns + +Use `thread/rollback` to remove the last `numTurns` entries from the in-memory context and persist a rollback marker in the rollout log. The returned `thread` includes `turns` populated after the rollback. + +```json +{ "method": "thread/rollback", "id": 26, "params": { "threadId": "thr_b", "numTurns": 1 } } +{ "id": 26, "result": { "thread": { "id": "thr_b", "name": "Bug bash notes", "ephemeral": false } } } +``` + +## Turns + +The `input` field accepts a list of items: + +- `{ "type": "text", "text": "Explain this diff" }` +- `{ "type": "image", "url": "https://.../design.png" }` +- `{ "type": "localImage", "path": "/tmp/screenshot.png" }` + +You can override configuration settings per turn (model, effort, personality, `cwd`, sandbox policy, summary). When specified, these settings become the defaults for later turns on the same thread. `outputSchema` applies only to the current turn. For `sandboxPolicy.type = "externalSandbox"`, set `networkAccess` to `restricted` or `enabled`; for `workspaceWrite`, `networkAccess` remains a boolean. + +For `turn/start.collaborationMode`, `settings.developer_instructions: null` means "use built-in instructions for the selected mode" rather than clearing mode instructions. + +### Sandbox read access (`ReadOnlyAccess`) + +`sandboxPolicy` supports explicit read-access controls: + +- `readOnly`: optional `access` (`{ "type": "fullAccess" }` by default, or restricted roots). +- `workspaceWrite`: optional `readOnlyAccess` (`{ "type": "fullAccess" }` by default, or restricted roots). + +Restricted read access shape: + +```json +{ + "type": "restricted", + "includePlatformDefaults": true, + "readableRoots": ["/Users/me/shared-read-only"] +} +``` + +On macOS, `includePlatformDefaults: true` appends a curated platform-default Seatbelt policy for restricted-read sessions. This improves tool compatibility without broadly allowing all of `/System`. + +Examples: + +```json +{ "type": "readOnly", "access": { "type": "fullAccess" } } +``` + +```json +{ + "type": "workspaceWrite", + "writableRoots": ["/Users/me/project"], + "readOnlyAccess": { + "type": "restricted", + "includePlatformDefaults": true, + "readableRoots": ["/Users/me/shared-read-only"] + }, + "networkAccess": false +} +``` + +### Start a turn + +```json +{ "method": "turn/start", "id": 30, "params": { + "threadId": "thr_123", + "input": [ { "type": "text", "text": "Run tests" } ], + "cwd": "/Users/me/project", + "approvalPolicy": "unlessTrusted", + "sandboxPolicy": { + "type": "workspaceWrite", + "writableRoots": ["/Users/me/project"], + "networkAccess": true + }, + "model": "gpt-5.1-codex", + "effort": "medium", + "summary": "concise", + "personality": "friendly", + "outputSchema": { + "type": "object", + "properties": { "answer": { "type": "string" } }, + "required": ["answer"], + "additionalProperties": false + } +} } +{ "id": 30, "result": { "turn": { "id": "turn_456", "status": "inProgress", "items": [], "error": null } } } +``` + +### Steer an active turn + +Use `turn/steer` to append more user input to the active in-flight turn. + +- Include `expectedTurnId`; it must match the active turn id. +- The request fails if there is no active turn on the thread. +- `turn/steer` doesn't emit a new `turn/started` notification. +- `turn/steer` doesn't accept turn-level overrides (`model`, `cwd`, `sandboxPolicy`, or `outputSchema`). + +```json +{ "method": "turn/steer", "id": 32, "params": { + "threadId": "thr_123", + "input": [ { "type": "text", "text": "Actually focus on failing tests first." } ], + "expectedTurnId": "turn_456" +} } +{ "id": 32, "result": { "turnId": "turn_456" } } +``` + +### Start a turn (invoke a skill) + +Invoke a skill explicitly by including `$` in the text input and adding a `skill` input item alongside it. + +```json +{ "method": "turn/start", "id": 33, "params": { + "threadId": "thr_123", + "input": [ + { "type": "text", "text": "$skill-creator Add a new skill for triaging flaky CI and include step-by-step usage." }, + { "type": "skill", "name": "skill-creator", "path": "/Users/me/.codex/skills/skill-creator/SKILL.md" } + ] +} } +{ "id": 33, "result": { "turn": { "id": "turn_457", "status": "inProgress", "items": [], "error": null } } } +``` + +### Interrupt a turn + +```json +{ "method": "turn/interrupt", "id": 31, "params": { "threadId": "thr_123", "turnId": "turn_456" } } +{ "id": 31, "result": {} } +``` + +On success, the turn finishes with `status: "interrupted"`. + +## Review + +`review/start` runs the Codex reviewer for a thread and streams review items. Targets include: + +- `uncommittedChanges` +- `baseBranch` (diff against a branch) +- `commit` (review a specific commit) +- `custom` (free-form instructions) + +Use `delivery: "inline"` (default) to run the review on the existing thread, or `delivery: "detached"` to fork a new review thread. + +Example request/response: + +```json +{ "method": "review/start", "id": 40, "params": { + "threadId": "thr_123", + "delivery": "inline", + "target": { "type": "commit", "sha": "1234567deadbeef", "title": "Polish tui colors" } +} } +{ "id": 40, "result": { + "turn": { + "id": "turn_900", + "status": "inProgress", + "items": [ + { "type": "userMessage", "id": "turn_900", "content": [ { "type": "text", "text": "Review commit 1234567: Polish tui colors" } ] } + ], + "error": null + }, + "reviewThreadId": "thr_123" +} } +``` + +For a detached review, use `"delivery": "detached"`. The response is the same shape, but `reviewThreadId` will be the id of the new review thread (different from the original `threadId`). The server also emits a `thread/started` notification for that new thread before streaming the review turn. + +Codex streams the usual `turn/started` notification followed by an `item/started` with an `enteredReviewMode` item: + +```json +{ + "method": "item/started", + "params": { + "item": { + "type": "enteredReviewMode", + "id": "turn_900", + "review": "current changes" + } + } +} +``` + +When the reviewer finishes, the server emits `item/started` and `item/completed` containing an `exitedReviewMode` item with the final review text: + +```json +{ + "method": "item/completed", + "params": { + "item": { + "type": "exitedReviewMode", + "id": "turn_900", + "review": "Looks solid overall..." + } + } +} +``` + +Use this notification to render the reviewer output in your client. + +## Command execution + +`command/exec` runs a single command (`argv` array) under the server sandbox without creating a thread. + +```json +{ "method": "command/exec", "id": 50, "params": { + "command": ["ls", "-la"], + "cwd": "/Users/me/project", + "sandboxPolicy": { "type": "workspaceWrite" }, + "timeoutMs": 10000 +} } +{ "id": 50, "result": { "exitCode": 0, "stdout": "...", "stderr": "" } } +``` + +Use `sandboxPolicy.type = "externalSandbox"` if you already sandbox the server process and want Codex to skip its own sandbox enforcement. For external sandbox mode, set `networkAccess` to `restricted` (default) or `enabled`. For `readOnly` and `workspaceWrite`, use the same optional `access` / `readOnlyAccess` structure shown above. + +Notes: + +- The server rejects empty `command` arrays. +- `sandboxPolicy` accepts the same shape used by `turn/start` (for example, `dangerFullAccess`, `readOnly`, `workspaceWrite`, `externalSandbox`). +- When omitted, `timeoutMs` falls back to the server default. + +### Read admin requirements (`configRequirements/read`) + +Use `configRequirements/read` to inspect the effective admin requirements loaded from `requirements.toml` and/or MDM. + +```json +{ "method": "configRequirements/read", "id": 52, "params": {} } +{ "id": 52, "result": { + "requirements": { + "allowedApprovalPolicies": ["onRequest", "unlessTrusted"], + "allowedSandboxModes": ["readOnly", "workspaceWrite"], + "featureRequirements": { + "personality": true, + "unified_exec": false + }, + "network": { + "enabled": true, + "allowedDomains": ["api.openai.com"], + "allowUnixSockets": ["/tmp/example.sock"], + "dangerouslyAllowAllUnixSockets": false + } + } +} } +``` + +`result.requirements` is `null` when no requirements are configured. See the docs on [`requirements.toml`](https://developers.openai.com/codex/config-reference#requirementstoml) for details on supported keys and values. + +### Windows sandbox setup (`windowsSandbox/setupStart`) + +Custom Windows clients can trigger sandbox setup asynchronously instead of blocking on startup checks. + +```json +{ "method": "windowsSandbox/setupStart", "id": 53, "params": { "mode": "elevated" } } +{ "id": 53, "result": { "started": true } } +``` + +App-server starts setup in the background and later emits a completion notification: + +```json +{ + "method": "windowsSandbox/setupCompleted", + "params": { "mode": "elevated", "success": true, "error": null } +} +``` + +Modes: + +- `elevated` - run the elevated Windows sandbox setup path. +- `unelevated` - run the legacy setup/preflight path. + +## Events + +Event notifications are the server-initiated stream for thread lifecycles, turn lifecycles, and the items within them. After you start or resume a thread, keep reading the active transport stream for `thread/started`, `thread/archived`, `thread/unarchived`, `thread/closed`, `thread/status/changed`, `turn/*`, `item/*`, and `serverRequest/resolved` notifications. + +### Notification opt-out + +Clients can suppress specific notifications per connection by sending exact method names in `initialize.params.capabilities.optOutNotificationMethods`. + +- Exact-match only: `item/agentMessage/delta` suppresses only that method. +- Unknown method names are ignored. +- Applies to both legacy (`codex/event/*`) and v2 (`thread/*`, `turn/*`, `item/*`, etc.) notifications. +- Doesn't apply to requests, responses, or errors. + +### Fuzzy file search events (experimental) + +The fuzzy file search session API emits per-query notifications: + +- `fuzzyFileSearch/sessionUpdated` - `{ sessionId, query, files }` with the current matches for the active query. +- `fuzzyFileSearch/sessionCompleted` - `{ sessionId }` once indexing and matching for that query completes. + +### Windows sandbox setup events + +- `windowsSandbox/setupCompleted` - `{ mode, success, error }` emitted after a `windowsSandbox/setupStart` request finishes. + +### Turn events + +- `turn/started` - `{ turn }` with the turn id, empty `items`, and `status: "inProgress"`. +- `turn/completed` - `{ turn }` where `turn.status` is `completed`, `interrupted`, or `failed`; failures carry `{ error: { message, codexErrorInfo?, additionalDetails? } }`. +- `turn/diff/updated` - `{ threadId, turnId, diff }` with the latest aggregated unified diff across every file change in the turn. +- `turn/plan/updated` - `{ turnId, explanation?, plan }` whenever the agent shares or changes its plan; each `plan` entry is `{ step, status }` with `status` in `pending`, `inProgress`, or `completed`. +- `thread/tokenUsage/updated` - usage updates for the active thread. + +`turn/diff/updated` and `turn/plan/updated` currently include empty `items` arrays even when item events stream. Use `item/*` notifications as the source of truth for turn items. + +### Items + +`ThreadItem` is the tagged union carried in turn responses and `item/*` notifications. Common item types include: + +- `userMessage` - `{id, content}` where `content` is a list of user inputs (`text`, `image`, or `localImage`). +- `agentMessage` - `{id, text, phase?}` containing the accumulated agent reply. When present, `phase` uses Responses API wire values (`commentary`, `final_answer`). +- `plan` - `{id, text}` containing proposed plan text in plan mode. Treat the final `plan` item from `item/completed` as authoritative. +- `reasoning` - `{id, summary, content}` where `summary` holds streamed reasoning summaries and `content` holds raw reasoning blocks. +- `commandExecution` - `{id, command, cwd, status, commandActions, aggregatedOutput?, exitCode?, durationMs?}`. +- `fileChange` - `{id, changes, status}` describing proposed edits; `changes` list `{path, kind, diff}`. +- `mcpToolCall` - `{id, server, tool, status, arguments, result?, error?}`. +- `dynamicToolCall` - `{id, tool, arguments, status, contentItems?, success?, durationMs?}` for client-executed dynamic tool invocations. +- `collabToolCall` - `{id, tool, status, senderThreadId, receiverThreadId?, newThreadId?, prompt?, agentStatus?}`. +- `webSearch` - `{id, query, action?}` for web search requests issued by the agent. +- `imageView` - `{id, path}` emitted when the agent invokes the image viewer tool. +- `enteredReviewMode` - `{id, review}` sent when the reviewer starts. +- `exitedReviewMode` - `{id, review}` emitted when the reviewer finishes. +- `contextCompaction` - `{id}` emitted when Codex compacts the conversation history. + +For `webSearch.action`, the action `type` can be `search` (`query?`, `queries?`), `openPage` (`url?`), or `findInPage` (`url?`, `pattern?`). + +The app server deprecates the legacy `thread/compacted` notification; use the `contextCompaction` item instead. + +All items emit two shared lifecycle events: + +- `item/started` - emits the full `item` when a new unit of work begins; the `item.id` matches the `itemId` used by deltas. +- `item/completed` - sends the final `item` once work finishes; treat this as the authoritative state. + +### Item deltas + +- `item/agentMessage/delta` - appends streamed text for the agent message. +- `item/plan/delta` - streams proposed plan text. The final `plan` item may not exactly equal the concatenated deltas. +- `item/reasoning/summaryTextDelta` - streams readable reasoning summaries; `summaryIndex` increments when a new summary section opens. +- `item/reasoning/summaryPartAdded` - marks a boundary between reasoning summary sections. +- `item/reasoning/textDelta` - streams raw reasoning text (when supported by the model). +- `item/commandExecution/outputDelta` - streams stdout/stderr for a command; append deltas in order. +- `item/fileChange/outputDelta` - contains the tool call response of the underlying `apply_patch` tool call. + +## Errors + +If a turn fails, the server emits an `error` event with `{ error: { message, codexErrorInfo?, additionalDetails? } }` and then finishes the turn with `status: "failed"`. When an upstream HTTP status is available, it appears in `codexErrorInfo.httpStatusCode`. + +Common `codexErrorInfo` values include: + +- `ContextWindowExceeded` +- `UsageLimitExceeded` +- `HttpConnectionFailed` (4xx/5xx upstream errors) +- `ResponseStreamConnectionFailed` +- `ResponseStreamDisconnected` +- `ResponseTooManyFailedAttempts` +- `BadRequest`, `Unauthorized`, `SandboxError`, `InternalServerError`, `Other` + +When an upstream HTTP status is available, the server forwards it in `httpStatusCode` on the relevant `codexErrorInfo` variant. + +## Approvals + +Depending on a user's Codex settings, command execution and file changes may require approval. The app-server sends a server-initiated JSON-RPC request to the client, and the client responds with a decision payload. + +- Command execution decisions: `accept`, `acceptForSession`, `decline`, `cancel`, or `{ "acceptWithExecpolicyAmendment": { "execpolicy_amendment": ["cmd", "..."] } }`. +- File change decisions: `accept`, `acceptForSession`, `decline`, `cancel`. + +- Requests include `threadId` and `turnId` - use them to scope UI state to the active conversation. +- The server resumes or declines the work and ends the item with `item/completed`. + +### Command execution approvals + +Order of messages: + +1. `item/started` shows the pending `commandExecution` item with `command`, `cwd`, and other fields. +2. `item/commandExecution/requestApproval` includes `itemId`, `threadId`, `turnId`, optional `reason`, optional `command`, optional `cwd`, optional `commandActions`, optional `proposedExecpolicyAmendment`, optional `networkApprovalContext`, and optional `availableDecisions`. When `initialize.params.capabilities.experimentalApi = true`, the payload can also include experimental `additionalPermissions` describing requested per-command sandbox access. Any filesystem paths inside `additionalPermissions` are absolute on the wire. +3. Client responds with one of the command execution approval decisions above. +4. `serverRequest/resolved` confirms that the pending request has been answered or cleared. +5. `item/completed` returns the final `commandExecution` item with `status: completed | failed | declined`. + +When `networkApprovalContext` is present, the prompt is for managed network access (not a general shell-command approval). The current v2 schema exposes the target `host` and `protocol`; clients should render a network-specific prompt and not rely on `command` being a user-meaningful shell command preview. + +Codex groups concurrent network approval prompts by destination (`host`, protocol, and port). The app-server may therefore send one prompt that unblocks multiple queued requests to the same destination, while different ports on the same host are treated separately. + +### File change approvals + +Order of messages: + +1. `item/started` emits a `fileChange` item with proposed `changes` and `status: "inProgress"`. +2. `item/fileChange/requestApproval` includes `itemId`, `threadId`, `turnId`, optional `reason`, and optional `grantRoot`. +3. Client responds with one of the file change approval decisions above. +4. `serverRequest/resolved` confirms that the pending request has been answered or cleared. +5. `item/completed` returns the final `fileChange` item with `status: completed | failed | declined`. + +### `tool/requestUserInput` + +When the client responds to `item/tool/requestUserInput`, app-server emits `serverRequest/resolved` with `{ threadId, requestId }`. If the pending request is cleared by turn start, turn completion, or turn interruption before the client answers, the server emits the same notification for that cleanup. + +### Dynamic tool calls (experimental) + +`dynamicTools` on `thread/start` and the corresponding `item/tool/call` request or response flow are experimental APIs. + +When a dynamic tool is invoked during a turn, app-server emits: + +1. `item/started` with `item.type = "dynamicToolCall"`, `status = "inProgress"`, plus `tool` and `arguments`. +2. `item/tool/call` as a server request to the client. +3. The client response payload with returned content items. +4. `item/completed` with `item.type = "dynamicToolCall"`, the final `status`, and any returned `contentItems` or `success` value. + +### MCP tool-call approvals (apps) + +App (connector) tool calls can also require approval. When an app tool call has side effects, the server may elicit approval with `tool/requestUserInput` and options such as **Accept**, **Decline**, and **Cancel**. Destructive tool annotations always trigger approval even when the tool also advertises less-privileged hints. If the user declines or cancels, the related `mcpToolCall` item completes with an error instead of running the tool. + +## Skills + +Invoke a skill by including `$` in the user text input. Add a `skill` input item (recommended) so the server injects full skill instructions instead of relying on the model to resolve the name. + +```json +{ + "method": "turn/start", + "id": 101, + "params": { + "threadId": "thread-1", + "input": [ + { + "type": "text", + "text": "$skill-creator Add a new skill for triaging flaky CI." + }, + { + "type": "skill", + "name": "skill-creator", + "path": "/Users/me/.codex/skills/skill-creator/SKILL.md" + } + ] + } +} +``` + +If you omit the `skill` item, the model will still parse the `$` marker and try to locate the skill, which can add latency. + +Example: + +``` +$skill-creator Add a new skill for triaging flaky CI and include step-by-step usage. +``` + +Use `skills/list` to fetch available skills (optionally scoped by `cwds`, with `forceReload`). You can also include `perCwdExtraUserRoots` to scan extra absolute paths as `user` scope for specific `cwd` values. App-server ignores entries whose `cwd` isn't present in `cwds`. `skills/list` may reuse a cached result per `cwd`; set `forceReload: true` to refresh from disk. When present, the server reads `interface` and `dependencies` from `SKILL.json`. + +```json +{ "method": "skills/list", "id": 25, "params": { + "cwds": ["/Users/me/project", "/Users/me/other-project"], + "forceReload": true, + "perCwdExtraUserRoots": [ + { + "cwd": "/Users/me/project", + "extraUserRoots": ["/Users/me/shared-skills"] + } + ] +} } +{ "id": 25, "result": { + "data": [{ + "cwd": "/Users/me/project", + "skills": [ + { + "name": "skill-creator", + "description": "Create or update a Codex skill", + "enabled": true, + "interface": { + "displayName": "Skill Creator", + "shortDescription": "Create or update a Codex skill" + }, + "dependencies": { + "tools": [ + { + "type": "env_var", + "value": "GITHUB_TOKEN", + "description": "GitHub API token" + }, + { + "type": "mcp", + "value": "github", + "transport": "streamable_http", + "url": "https://example.com/mcp" + } + ] + } + } + ], + "errors": [] + }] +} } +``` + +To enable or disable a skill by path: + +```json +{ + "method": "skills/config/write", + "id": 26, + "params": { + "path": "/Users/me/.codex/skills/skill-creator/SKILL.md", + "enabled": false + } +} +``` + +## Apps (connectors) + +Use `app/list` to fetch available apps. In the CLI/TUI, `/apps` is the user-facing picker; in custom clients, call `app/list` directly. Each entry includes both `isAccessible` (available to the user) and `isEnabled` (enabled in `config.toml`) so clients can distinguish install/access from local enabled state. App entries can also include optional `branding`, `appMetadata`, and `labels` fields. + +```json +{ "method": "app/list", "id": 50, "params": { + "cursor": null, + "limit": 50, + "threadId": "thread-1", + "forceRefetch": false +} } +{ "id": 50, "result": { + "data": [ + { + "id": "demo-app", + "name": "Demo App", + "description": "Example connector for documentation.", + "logoUrl": "https://example.com/demo-app.png", + "logoUrlDark": null, + "distributionChannel": null, + "branding": null, + "appMetadata": null, + "labels": null, + "installUrl": "https://chatgpt.com/apps/demo-app/demo-app", + "isAccessible": true, + "isEnabled": true + } + ], + "nextCursor": null +} } +``` + +If you provide `threadId`, app feature gating (`features.apps`) uses that thread's config snapshot. When omitted, app-server uses the latest global config. + +`app/list` returns after both accessible apps and directory apps load. Set `forceRefetch: true` to bypass app caches and fetch fresh data. Cache entries are only replaced when refreshes succeed. + +The server also emits `app/list/updated` notifications whenever either source (accessible apps or directory apps) finishes loading. Each notification includes the latest merged app list. + +```json +{ + "method": "app/list/updated", + "params": { + "data": [ + { + "id": "demo-app", + "name": "Demo App", + "description": "Example connector for documentation.", + "logoUrl": "https://example.com/demo-app.png", + "logoUrlDark": null, + "distributionChannel": null, + "branding": null, + "appMetadata": null, + "labels": null, + "installUrl": "https://chatgpt.com/apps/demo-app/demo-app", + "isAccessible": true, + "isEnabled": true + } + ] + } +} +``` + +Invoke an app by inserting `$` in the text input and adding a `mention` input item with the `app://` path (recommended). + +```json +{ + "method": "turn/start", + "id": 51, + "params": { + "threadId": "thread-1", + "input": [ + { + "type": "text", + "text": "$demo-app Pull the latest updates from the team." + }, + { + "type": "mention", + "name": "Demo App", + "path": "app://demo-app" + } + ] + } +} +``` + +### Config RPC examples for app settings + +Use `config/read`, `config/value/write`, and `config/batchWrite` to inspect or update app controls in `config.toml`. + +Read the effective app config shape (including `_default` and per-tool overrides): + +```json +{ "method": "config/read", "id": 60, "params": { "includeLayers": false } } +{ "id": 60, "result": { + "config": { + "apps": { + "_default": { + "enabled": true, + "destructive_enabled": true, + "open_world_enabled": true + }, + "google_drive": { + "enabled": true, + "destructive_enabled": false, + "default_tools_approval_mode": "prompt", + "tools": { + "files/delete": { "enabled": false, "approval_mode": "approve" } + } + } + } + } +} } +``` + +Update a single app setting: + +```json +{ + "method": "config/value/write", + "id": 61, + "params": { + "keyPath": "apps.google_drive.default_tools_approval_mode", + "value": "prompt", + "mergeStrategy": "replace" + } +} +``` + +Apply multiple app edits atomically: + +```json +{ + "method": "config/batchWrite", + "id": 62, + "params": { + "edits": [ + { + "keyPath": "apps._default.destructive_enabled", + "value": false, + "mergeStrategy": "upsert" + }, + { + "keyPath": "apps.google_drive.tools.files/delete.approval_mode", + "value": "approve", + "mergeStrategy": "upsert" + } + ] + } +} +``` + +### Detect and import external agent config + +Use `externalAgentConfig/detect` to discover migratable external-agent artifacts, then pass the selected entries to `externalAgentConfig/import`. + +Detection example: + +```json +{ "method": "externalAgentConfig/detect", "id": 63, "params": { + "includeHome": true, + "cwds": ["/Users/me/project"] +} } +{ "id": 63, "result": { + "items": [ + { + "itemType": "AGENTS_MD", + "description": "Import /Users/me/project/CLAUDE.md to /Users/me/project/AGENTS.md.", + "cwd": "/Users/me/project" + }, + { + "itemType": "SKILLS", + "description": "Copy skill folders from /Users/me/.claude/skills to /Users/me/.agents/skills.", + "cwd": null + } + ] +} } +``` + +Import example: + +```json +{ "method": "externalAgentConfig/import", "id": 64, "params": { + "migrationItems": [ + { + "itemType": "AGENTS_MD", + "description": "Import /Users/me/project/CLAUDE.md to /Users/me/project/AGENTS.md.", + "cwd": "/Users/me/project" + } + ] +} } +{ "id": 64, "result": {} } +``` + +Supported `itemType` values are `AGENTS_MD`, `CONFIG`, `SKILLS`, and `MCP_SERVER_CONFIG`. Detection returns only items that still have work to do. For example, AGENTS migration is skipped when `AGENTS.md` already exists and is non-empty, and skill imports do not overwrite existing skill directories. + +## Auth endpoints + +The JSON-RPC auth/account surface exposes request/response methods plus server-initiated notifications (no `id`). Use these to determine auth state, start or cancel logins, logout, and inspect ChatGPT rate limits. + +### Authentication modes + +Codex supports three authentication modes. `account/updated.authMode` shows the active mode, and `account/read` also reports it. + +- **API key (`apikey`)** - the caller supplies an OpenAI API key and Codex stores it for API requests. +- **ChatGPT managed (`chatgpt`)** - Codex owns the ChatGPT OAuth flow, persists tokens, and refreshes them automatically. +- **ChatGPT external tokens (`chatgptAuthTokens`)** - a host app supplies `idToken` and `accessToken` directly. Codex stores these tokens in memory, and the host app must refresh them when asked. + +### API overview + +- `account/read` - fetch current account info; optionally refresh tokens. +- `account/login/start` - begin login (`apiKey`, `chatgpt`, or `chatgptAuthTokens`). +- `account/login/completed` (notify) - emitted when a login attempt finishes (success or error). +- `account/login/cancel` - cancel a pending ChatGPT login by `loginId`. +- `account/logout` - sign out; triggers `account/updated`. +- `account/updated` (notify) - emitted whenever auth mode changes (`authMode`: `apikey`, `chatgpt`, `chatgptAuthTokens`, or `null`). +- `account/chatgptAuthTokens/refresh` (server request) - request fresh externally managed ChatGPT tokens after an authorization error. +- `account/rateLimits/read` - fetch ChatGPT rate limits. +- `account/rateLimits/updated` (notify) - emitted whenever a user's ChatGPT rate limits change. +- `mcpServer/oauthLogin/completed` (notify) - emitted after a `mcpServer/oauth/login` flow finishes; payload includes `{ name, success, error? }`. + +### 1) Check auth state + +Request: + +```json +{ "method": "account/read", "id": 1, "params": { "refreshToken": false } } +``` + +Response examples: + +```json +{ "id": 1, "result": { "account": null, "requiresOpenaiAuth": false } } +``` + +```json +{ "id": 1, "result": { "account": null, "requiresOpenaiAuth": true } } +``` + +```json +{ + "id": 1, + "result": { "account": { "type": "apiKey" }, "requiresOpenaiAuth": true } +} +``` + +```json +{ + "id": 1, + "result": { + "account": { + "type": "chatgpt", + "email": "user@example.com", + "planType": "pro" + }, + "requiresOpenaiAuth": true + } +} +``` + +Field notes: + +- `refreshToken` (boolean): set `true` to force a token refresh in managed ChatGPT mode. In external token mode (`chatgptAuthTokens`), app-server ignores this flag. +- `requiresOpenaiAuth` reflects the active provider; when `false`, Codex can run without OpenAI credentials. + +### 2) Log in with an API key + +1. Send: + + ```json + { + "method": "account/login/start", + "id": 2, + "params": { "type": "apiKey", "apiKey": "sk-..." } + } + ``` + +2. Expect: + + ```json + { "id": 2, "result": { "type": "apiKey" } } + ``` + +3. Notifications: + + ```json + { + "method": "account/login/completed", + "params": { "loginId": null, "success": true, "error": null } + } + ``` + + ```json + { "method": "account/updated", "params": { "authMode": "apikey" } } + ``` + +### 3) Log in with ChatGPT (browser flow) + +1. Start: + + ```json + { "method": "account/login/start", "id": 3, "params": { "type": "chatgpt" } } + ``` + + ```json + { + "id": 3, + "result": { + "type": "chatgpt", + "loginId": "", + "authUrl": "https://chatgpt.com/...&redirect_uri=http%3A%2F%2Flocalhost%3A%2Fauth%2Fcallback" + } + } + ``` + +2. Open `authUrl` in a browser; the app-server hosts the local callback. +3. Wait for notifications: + + ```json + { + "method": "account/login/completed", + "params": { "loginId": "", "success": true, "error": null } + } + ``` + + ```json + { "method": "account/updated", "params": { "authMode": "chatgpt" } } + ``` + +### 3b) Log in with externally managed ChatGPT tokens (`chatgptAuthTokens`) + +Use this mode when a host application owns the user's ChatGPT auth lifecycle and supplies tokens directly. + +1. Send: + + ```json + { + "method": "account/login/start", + "id": 7, + "params": { + "type": "chatgptAuthTokens", + "idToken": "", + "accessToken": "" + } + } + ``` + +2. Expect: + + ```json + { "id": 7, "result": { "type": "chatgptAuthTokens" } } + ``` + +3. Notifications: + + ```json + { + "method": "account/login/completed", + "params": { "loginId": null, "success": true, "error": null } + } + ``` + + ```json + { + "method": "account/updated", + "params": { "authMode": "chatgptAuthTokens" } + } + ``` + +When the server receives a `401 Unauthorized`, it may request refreshed tokens from the host app: + +```json +{ + "method": "account/chatgptAuthTokens/refresh", + "id": 8, + "params": { "reason": "unauthorized", "previousAccountId": "org-123" } +} +{ "id": 8, "result": { "idToken": "", "accessToken": "" } } +``` + +The server retries the original request after a successful refresh response. Requests time out after about 10 seconds. + +### 4) Cancel a ChatGPT login + +```json +{ "method": "account/login/cancel", "id": 4, "params": { "loginId": "" } } +{ "method": "account/login/completed", "params": { "loginId": "", "success": false, "error": "..." } } +``` + +### 5) Logout + +```json +{ "method": "account/logout", "id": 5 } +{ "id": 5, "result": {} } +{ "method": "account/updated", "params": { "authMode": null } } +``` + +### 6) Rate limits (ChatGPT) + +```json +{ "method": "account/rateLimits/read", "id": 6 } +{ "id": 6, "result": { + "rateLimits": { + "limitId": "codex", + "limitName": null, + "primary": { "usedPercent": 25, "windowDurationMins": 15, "resetsAt": 1730947200 }, + "secondary": null + }, + "rateLimitsByLimitId": { + "codex": { + "limitId": "codex", + "limitName": null, + "primary": { "usedPercent": 25, "windowDurationMins": 15, "resetsAt": 1730947200 }, + "secondary": null + }, + "codex_other": { + "limitId": "codex_other", + "limitName": "codex_other", + "primary": { "usedPercent": 42, "windowDurationMins": 60, "resetsAt": 1730950800 }, + "secondary": null + } + } +} } +{ "method": "account/rateLimits/updated", "params": { + "rateLimits": { + "limitId": "codex", + "primary": { "usedPercent": 31, "windowDurationMins": 15, "resetsAt": 1730948100 } + } +} } +``` + +Field notes: + +- `rateLimits` is the backward-compatible single-bucket view. +- `rateLimitsByLimitId` (when present) is the multi-bucket view keyed by metered `limit_id` (for example `codex`). +- `limitId` is the metered bucket identifier. +- `limitName` is an optional user-facing label for the bucket. +- `usedPercent` is current usage within the quota window. +- `windowDurationMins` is the quota window length. +- `resetsAt` is a Unix timestamp (seconds) for the next reset. diff --git a/harness/Cargo.lock b/harness/Cargo.lock new file mode 100644 index 0000000..ebeb75d --- /dev/null +++ b/harness/Cargo.lock @@ -0,0 +1,256 @@ +# This file is automatically @generated by Cargo. +# It is not intended for manual editing. +version = 4 + +[[package]] +name = "anstream" +version = "1.0.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "824a212faf96e9acacdbd09febd34438f8f711fb84e09a8916013cd7815ca28d" +dependencies = [ + "anstyle", + "anstyle-parse", + "anstyle-query", + "anstyle-wincon", + "colorchoice", + "is_terminal_polyfill", + "utf8parse", +] + +[[package]] +name = "anstyle" +version = "1.0.14" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "940b3a0ca603d1eade50a4846a2afffd5ef57a9feac2c0e2ec2e14f9ead76000" + +[[package]] +name = "anstyle-parse" +version = "1.0.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "52ce7f38b242319f7cabaa6813055467063ecdc9d355bbb4ce0c68908cd8130e" +dependencies = [ + "utf8parse", +] + +[[package]] +name = "anstyle-query" +version = "1.1.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "40c48f72fd53cd289104fc64099abca73db4166ad86ea0b4341abe65af83dadc" +dependencies = [ + "windows-sys", +] + +[[package]] +name = "anstyle-wincon" +version = "3.0.11" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "291e6a250ff86cd4a820112fb8898808a366d8f9f58ce16d1f538353ad55747d" +dependencies = [ + "anstyle", + "once_cell_polyfill", + "windows-sys", +] + +[[package]] +name = "anyhow" +version = "1.0.102" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7f202df86484c868dbad7eaa557ef785d5c66295e41b460ef922eca0723b842c" + +[[package]] +name = "clap" +version = "4.6.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b193af5b67834b676abd72466a96c1024e6a6ad978a1f484bd90b85c94041351" +dependencies = [ + "clap_builder", + "clap_derive", +] + +[[package]] +name = "clap_builder" +version = "4.6.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "714a53001bf66416adb0e2ef5ac857140e7dc3a0c48fb28b2f10762fc4b5069f" +dependencies = [ + "anstream", + "anstyle", + "clap_lex", + "strsim", +] + +[[package]] +name = "clap_derive" +version = "4.6.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "1110bd8a634a1ab8cb04345d8d878267d57c3cf1b38d91b71af6686408bbca6a" +dependencies = [ + "heck", + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "clap_lex" +version = "1.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "c8d4a3bb8b1e0c1050499d1815f5ab16d04f0959b233085fb31653fbfc9d98f9" + +[[package]] +name = "colorchoice" +version = "1.0.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "1d07550c9036bf2ae0c684c4297d503f838287c83c53686d05370d0e139ae570" + +[[package]] +name = "harnesscli" +version = "0.1.0" +dependencies = [ + "anyhow", + "clap", + "serde", + "serde_json", +] + +[[package]] +name = "heck" +version = "0.5.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "2304e00983f87ffb38b55b444b5e3b60a884b5d30c0fca7d82fe33449bbe55ea" + +[[package]] +name = "is_terminal_polyfill" +version = "1.70.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a6cb138bb79a146c1bd460005623e142ef0181e3d0219cb493e02f7d08a35695" + +[[package]] +name = "itoa" +version = "1.0.17" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "92ecc6618181def0457392ccd0ee51198e065e016d1d527a7ac1b6dc7c1f09d2" + +[[package]] +name = "memchr" +version = "2.8.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "f8ca58f447f06ed17d5fc4043ce1b10dd205e060fb3ce5b979b8ed8e59ff3f79" + +[[package]] +name = "once_cell_polyfill" +version = "1.70.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "384b8ab6d37215f3c5301a95a4accb5d64aa607f1fcb26a11b5303878451b4fe" + +[[package]] +name = "proc-macro2" +version = "1.0.106" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8fd00f0bb2e90d81d1044c2b32617f68fcb9fa3bb7640c23e9c748e53fb30934" +dependencies = [ + "unicode-ident", +] + +[[package]] +name = "quote" +version = "1.0.45" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "41f2619966050689382d2b44f664f4bc593e129785a36d6ee376ddf37259b924" +dependencies = [ + "proc-macro2", +] + +[[package]] +name = "serde" +version = "1.0.228" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9a8e94ea7f378bd32cbbd37198a4a91436180c5bb472411e48b5ec2e2124ae9e" +dependencies = [ + "serde_core", + "serde_derive", +] + +[[package]] +name = "serde_core" +version = "1.0.228" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "41d385c7d4ca58e59fc732af25c3983b67ac852c1a25000afe1175de458b67ad" +dependencies = [ + "serde_derive", +] + +[[package]] +name = "serde_derive" +version = "1.0.228" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d540f220d3187173da220f885ab66608367b6574e925011a9353e4badda91d79" +dependencies = [ + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "serde_json" +version = "1.0.149" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "83fc039473c5595ace860d8c4fafa220ff474b3fc6bfdb4293327f1a37e94d86" +dependencies = [ + "itoa", + "memchr", + "serde", + "serde_core", + "zmij", +] + +[[package]] +name = "strsim" +version = "0.11.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7da8b5736845d9f2fcb837ea5d9e2628564b3b043a70948a3f0b778838c5fb4f" + +[[package]] +name = "syn" +version = "2.0.117" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e665b8803e7b1d2a727f4023456bbbbe74da67099c585258af0ad9c5013b9b99" +dependencies = [ + "proc-macro2", + "quote", + "unicode-ident", +] + +[[package]] +name = "unicode-ident" +version = "1.0.24" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e6e4313cd5fcd3dad5cafa179702e2b244f760991f45397d14d4ebf38247da75" + +[[package]] +name = "utf8parse" +version = "0.2.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "06abde3611657adf66d383f00b093d7faecc7fa57071cce2578660c9f1010821" + +[[package]] +name = "windows-link" +version = "0.2.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "f0805222e57f7521d6a62e36fa9163bc891acd422f971defe97d64e70d0a4fe5" + +[[package]] +name = "windows-sys" +version = "0.61.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ae137229bcbd6cdf0f7b80a31df61766145077ddf49416a728b02cb3921ff3fc" +dependencies = [ + "windows-link", +] + +[[package]] +name = "zmij" +version = "1.0.21" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b8848ee67ecc8aedbaf3e4122217aff892639231befc6a1b58d29fff4c2cabaa" diff --git a/harness/Cargo.toml b/harness/Cargo.toml new file mode 100644 index 0000000..5b88138 --- /dev/null +++ b/harness/Cargo.toml @@ -0,0 +1,10 @@ +[package] +name = "harnesscli" +version = "0.1.0" +edition = "2024" + +[dependencies] +anyhow = "1.0" +clap = { version = "4.5", features = ["derive"] } +serde = { version = "1.0", features = ["derive"] } +serde_json = "1.0" diff --git a/harness/src/cmd/audit.rs b/harness/src/cmd/audit.rs new file mode 100644 index 0000000..8f98716 --- /dev/null +++ b/harness/src/cmd/audit.rs @@ -0,0 +1,171 @@ +use crate::util::{CommandError, OutputBundle, repo_root, run_exec}; +use serde::Serialize; +use serde_json::json; +use std::path::{Path, PathBuf}; + +#[derive(Debug, Serialize)] +struct AuditCheck { + label: String, + path: String, + kind: String, + passed: bool, +} + +pub fn run(path: PathBuf) -> Result { + let root = repo_root(&path) + .or_else(|_| path.canonicalize().map_err(anyhow::Error::from)) + .map_err(audit_error)?; + + let mut checks = required_file_checks(&root); + checks.extend(required_dir_checks(&root)); + checks.push(build_check(&root)); + + let failed = checks.iter().filter(|check| !check.passed).count(); + let passed = failed == 0; + let summary = json!({ + "passed": passed, + "failed": failed, + "total": checks.len(), + }); + + let text = render_text(&checks, passed); + let json_body = json!({ + "command": "audit", + "status": if passed { "ok" } else { "failed" }, + "passed": passed, + "summary": summary, + "checks": checks, + }); + + if !passed { + return Err( + CommandError::new("audit", "audit_failed", "Harness audit failed").with_details( + json!({ + "summary": summary, + "checks": checks, + }), + ), + ); + } + + let mut ndjson = checks + .iter() + .map(|check| { + json!({ + "command": "audit", + "kind": check.kind, + "path": check.path, + "label": check.label, + "passed": check.passed, + }) + }) + .collect::>(); + ndjson.push(json!({ + "command": "audit", + "summary": summary, + "passed": true, + })); + + Ok(OutputBundle { + text, + json: json_body, + ndjson, + }) +} + +fn required_file_checks(root: &Path) -> Vec { + let files = vec![ + "AGENTS.md", + "ARCHITECTURE.md", + "NON_NEGOTIABLE_RULES.md", + "docs/PLANS.md", + "docs/design-docs/index.md", + "docs/design-docs/local-operations.md", + "docs/design-docs/worktree-isolation.md", + "docs/design-docs/observability-shim.md", + "docs/exec-plans/tech-debt-tracker.md", + "docs/product-specs/index.md", + "docs/product-specs/harness-demo-app.md", + "Makefile.harness", + "harness/Cargo.toml", + ".github/workflows/harness.yml", + ]; + files + .into_iter() + .map(|path| AuditCheck { + label: format!("{path} exists"), + path: path.to_string(), + kind: "file".to_string(), + passed: root.join(path).exists(), + }) + .collect() +} + +fn required_dir_checks(root: &Path) -> Vec { + let dirs = vec![ + "docs/design-docs", + "docs/exec-plans/active", + "docs/exec-plans/completed", + "docs/product-specs", + "docs/references", + "docs/generated", + ]; + dirs.into_iter() + .map(|path| AuditCheck { + label: format!("{path} exists"), + path: path.to_string(), + kind: "directory".to_string(), + passed: root.join(path).is_dir(), + }) + .collect() +} + +fn build_check(root: &Path) -> AuditCheck { + let passed = run_exec( + root, + "cargo", + &[ + "build", + "--release", + "--manifest-path", + "harness/Cargo.toml", + ], + ) + .map(|result| result.status == 0) + .unwrap_or(false); + AuditCheck { + label: "harnesscli builds successfully".to_string(), + path: "harness/Cargo.toml".to_string(), + kind: "build".to_string(), + passed, + } +} + +fn render_text(checks: &[AuditCheck], passed: bool) -> String { + let mut lines = Vec::new(); + for check in checks { + let status = if check.passed { "[ok]" } else { "[missing]" }; + lines.push(format!("{status} {}", check.label)); + } + if passed { + lines.push("Harness audit passed.".to_string()); + } else { + lines.push("Harness audit failed.".to_string()); + } + lines.join("\n") +} + +fn audit_error(err: anyhow::Error) -> CommandError { + CommandError::new("audit", "audit_failed", err.to_string()) +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn required_files_include_agents() { + let checks = required_file_checks(Path::new("/tmp")); + assert!(checks.iter().any(|check| check.path == "AGENTS.md")); + } +} diff --git a/harness/src/cmd/boot.rs b/harness/src/cmd/boot.rs new file mode 100644 index 0000000..0e96fa5 --- /dev/null +++ b/harness/src/cmd/boot.rs @@ -0,0 +1,190 @@ +use crate::util::{ + BootMetadata, CommandError, OutputBundle, boot_metadata_path, ensure_runtime_dirs, http_ok, + is_pid_alive, read_json_file, render_command_steps, spawn_background_http_server, stop_pid, + wait_for_http_ok, worktree_context, write_json_file, +}; +use serde_json::json; +use std::fs; +use std::path::Path; +use std::time::Duration; + +pub fn start(current_dir: &Path) -> Result { + let ctx = worktree_context(current_dir).map_err(boot_error)?; + ensure_runtime_dirs(&ctx).map_err(boot_error)?; + + if let Ok(existing) = read_json_file::(&boot_metadata_path(&ctx)) { + if is_pid_alive(existing.pid) && http_ok(&existing.healthcheck_url) { + return Ok(start_bundle("reused", &existing)); + } + } + + render_demo_app( + &ctx.runtime_root.join("demo-app"), + &ctx.worktree_id, + &ctx.runtime_root, + ctx.selected_port, + ) + .map_err(boot_error)?; + let stdout_log = ctx.runtime_root.join("logs").join("demo-app.stdout.log"); + let stderr_log = ctx.runtime_root.join("logs").join("demo-app.stderr.log"); + let pid = spawn_background_http_server( + &ctx.runtime_root.join("demo-app"), + ctx.selected_port, + &stdout_log, + &stderr_log, + ) + .map_err(boot_error)?; + + let healthcheck_url = format!("http://127.0.0.1:{}/healthz", ctx.selected_port); + if !wait_for_http_ok(&healthcheck_url, Duration::from_secs(15)) { + let stderr_excerpt = fs::read_to_string(&stderr_log) + .ok() + .and_then(|body| body.lines().last().map(str::to_string)); + return Err(CommandError::new( + "boot start", + "boot_timeout", + format!("demo app failed readiness probe at {healthcheck_url}"), + ) + .with_details(json!({ + "healthcheck_url": healthcheck_url, + "stderr_log": stderr_log, + "stderr_excerpt": stderr_excerpt, + }))); + } + + let metadata = BootMetadata { + worktree_id: ctx.worktree_id.clone(), + runtime_root: ctx.runtime_root.clone(), + pid, + app_url: format!("http://127.0.0.1:{}/", ctx.selected_port), + healthcheck_url, + selected_port: ctx.selected_port, + stdout_log, + stderr_log, + }; + write_json_file(&boot_metadata_path(&ctx), &metadata).map_err(boot_error)?; + Ok(start_bundle("started", &metadata)) +} + +pub fn status(current_dir: &Path) -> Result { + let ctx = worktree_context(current_dir).map_err(boot_error)?; + let metadata: BootMetadata = read_json_file(&boot_metadata_path(&ctx)).map_err(|err| { + CommandError::new("boot status", "missing_boot_metadata", err.to_string()) + })?; + let healthy = is_pid_alive(metadata.pid) && http_ok(&metadata.healthcheck_url); + + let steps = vec![json!({ + "label": "healthcheck", + "status": if healthy { "ok" } else { "failed" }, + })]; + Ok(OutputBundle { + text: render_command_steps("boot status", &steps), + json: json!({ + "command": "boot status", + "status": if healthy { "ok" } else { "failed" }, + "app_url": metadata.app_url, + "healthcheck_url": metadata.healthcheck_url, + "healthcheck_status": if healthy { "ok" } else { "failed" }, + "selected_port": metadata.selected_port, + "worktree_id": metadata.worktree_id, + "runtime_root": metadata.runtime_root, + "pid": metadata.pid, + "steps": steps, + }), + ndjson: vec![json!({ + "command": "boot status", + "status": if healthy { "ok" } else { "failed" }, + "pid": metadata.pid, + })], + }) +} + +pub fn stop(current_dir: &Path) -> Result { + let ctx = worktree_context(current_dir).map_err(boot_error)?; + let metadata: BootMetadata = read_json_file(&boot_metadata_path(&ctx)) + .map_err(|err| CommandError::new("boot stop", "missing_boot_metadata", err.to_string()))?; + if is_pid_alive(metadata.pid) { + stop_pid(metadata.pid).map_err(boot_error)?; + } + let _ = fs::remove_file(boot_metadata_path(&ctx)); + let steps = vec![json!({"label": "stop-demo-app", "status": "ok"})]; + Ok(OutputBundle { + text: render_command_steps("boot stop", &steps), + json: json!({ + "command": "boot stop", + "status": "ok", + "stopped_pid": metadata.pid, + "steps": steps, + }), + ndjson: vec![json!({ + "command": "boot stop", + "status": "ok", + "stopped_pid": metadata.pid, + })], + }) +} + +fn start_bundle(state: &str, metadata: &BootMetadata) -> OutputBundle { + let steps = vec![json!({"label": "demo-app", "status": state})]; + OutputBundle { + text: render_command_steps("boot start", &steps), + json: json!({ + "command": "boot start", + "status": "ok", + "result": state, + "app_url": metadata.app_url, + "healthcheck_url": metadata.healthcheck_url, + "healthcheck_status": "ok", + "selected_port": metadata.selected_port, + "worktree_id": metadata.worktree_id, + "runtime_root": metadata.runtime_root, + "pid": metadata.pid, + "steps": steps, + }), + ndjson: vec![json!({ + "command": "boot start", + "status": "ok", + "result": state, + "pid": metadata.pid, + })], + } +} + +fn render_demo_app( + dir: &Path, + worktree_id: &str, + runtime_root: &Path, + selected_port: u16, +) -> Result<(), anyhow::Error> { + fs::create_dir_all(dir)?; + fs::write( + dir.join("index.html"), + format!( + "Impactable Harness Demo

Impactable Harness Demo

  • worktree_id: {worktree_id}
  • runtime_root: {}
  • selected_port: {selected_port}
", + runtime_root.display() + ), + )?; + fs::write(dir.join("healthz"), "ok\n")?; + Ok(()) +} + +fn boot_error(err: anyhow::Error) -> CommandError { + CommandError::new("boot", "boot_failed", err.to_string()) +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn demo_app_contains_expected_title() { + let base = + std::env::temp_dir().join(format!("impactable-boot-test-{}", std::process::id())); + let _ = fs::remove_dir_all(&base); + fs::create_dir_all(&base).unwrap(); + render_demo_app(&base, "worktree-1", Path::new("/tmp/runtime"), 4100).unwrap(); + let body = fs::read_to_string(base.join("index.html")).unwrap(); + assert!(body.contains("Impactable Harness Demo")); + let _ = fs::remove_dir_all(&base); + } +} diff --git a/harness/src/cmd/cleanup.rs b/harness/src/cmd/cleanup.rs new file mode 100644 index 0000000..14cd6ca --- /dev/null +++ b/harness/src/cmd/cleanup.rs @@ -0,0 +1,75 @@ +use crate::util::{CommandError, OutputBundle, render_command_steps, worktree_context}; +use serde_json::json; +use std::path::Path; + +pub fn scan(current_dir: &Path) -> Result { + let ctx = worktree_context(current_dir).map_err(cleanup_error)?; + let items = vec![ + json!({"label": "phase-4-invariants", "status": "pending"}), + json!({"label": "phase-5-recurring-cleanup", "status": "pending"}), + ]; + Ok(OutputBundle { + text: render_command_steps("cleanup scan", &items), + json: json!({ + "command": "cleanup scan", + "status": "ok", + "worktree_id": ctx.worktree_id, + "items": items, + }), + ndjson: items, + }) +} + +pub fn grade(current_dir: &Path) -> Result { + let ctx = worktree_context(current_dir).map_err(cleanup_error)?; + let grade = "C"; + let findings = 2; + let steps = vec![json!({"label": "compute-grade", "status": "ok"})]; + Ok(OutputBundle { + text: render_command_steps("cleanup grade", &steps), + json: json!({ + "command": "cleanup grade", + "status": "ok", + "worktree_id": ctx.worktree_id, + "grade": grade, + "findings": findings, + "steps": steps, + }), + ndjson: vec![json!({ + "command": "cleanup grade", + "status": "ok", + "grade": grade, + })], + }) +} + +pub fn fix(_current_dir: &Path) -> Result { + let steps = vec![json!({"label": "queue-follow-up", "status": "ok"})]; + Ok(OutputBundle { + text: render_command_steps("cleanup fix", &steps), + json: json!({ + "command": "cleanup fix", + "status": "ok", + "result": "manual-follow-up-required", + "steps": steps, + }), + ndjson: vec![json!({ + "command": "cleanup fix", + "status": "ok", + "result": "manual-follow-up-required", + })], + }) +} + +fn cleanup_error(err: anyhow::Error) -> CommandError { + CommandError::new("cleanup", "cleanup_failed", err.to_string()) +} + +#[cfg(test)] +mod tests { + #[test] + fn grade_is_single_letter() { + let grade = "C"; + assert_eq!(grade.len(), 1); + } +} diff --git a/harness/src/cmd/init.rs b/harness/src/cmd/init.rs new file mode 100644 index 0000000..5e9123f --- /dev/null +++ b/harness/src/cmd/init.rs @@ -0,0 +1,62 @@ +use crate::util::{ + CommandError, OutputBundle, ensure_runtime_dirs, render_command_steps, runtime_manifest_path, + worktree_context, write_json_file, +}; +use serde_json::json; +use std::path::Path; + +pub fn run(current_dir: &Path) -> Result { + let ctx = worktree_context(current_dir).map_err(init_error)?; + ensure_runtime_dirs(&ctx).map_err(init_error)?; + + write_json_file( + &runtime_manifest_path(&ctx), + &json!({ + "repo_root": ctx.repo_root, + "worktree_id": ctx.worktree_id, + "runtime_root": ctx.runtime_root, + "selected_port": ctx.selected_port, + }), + ) + .map_err(init_error)?; + + let steps = vec![ + json!({"label": "resolve-worktree", "status": "ok"}), + json!({"label": "create-runtime-root", "status": "ok"}), + json!({"label": "write-runtime-manifest", "status": "ok"}), + ]; + Ok(OutputBundle { + text: render_command_steps("init", &steps), + json: json!({ + "command": "init", + "status": "ok", + "worktree_id": ctx.worktree_id, + "repo_root": ctx.repo_root, + "runtime_root": ctx.runtime_root, + "selected_port": ctx.selected_port, + "steps": steps, + }), + ndjson: vec![json!({ + "command": "init", + "status": "ok", + "worktree_id": ctx.worktree_id, + "runtime_root": ctx.runtime_root, + })], + }) +} + +fn init_error(err: anyhow::Error) -> CommandError { + CommandError::new("init", "init_failed", err.to_string()) +} + +#[cfg(test)] +mod tests { + use crate::util::derive_worktree_id; + use std::path::Path; + + #[test] + fn worktree_id_uses_path() { + let id = derive_worktree_id(Path::new("/tmp/impactable")); + assert!(id.starts_with("impactable-")); + } +} diff --git a/harness/src/cmd/lint.rs b/harness/src/cmd/lint.rs new file mode 100644 index 0000000..cfc90ad --- /dev/null +++ b/harness/src/cmd/lint.rs @@ -0,0 +1,86 @@ +use crate::util::{ + CommandError, OutputBundle, render_command_steps, require_success, run_exec, run_shell, +}; +use serde_json::json; +use std::path::Path; + +pub fn run(repo_root: &Path) -> Result { + if let Ok(value) = std::env::var("HARNESS_LINT_CMD") { + if !value.trim().is_empty() { + let result = require_success( + "lint", + run_shell(repo_root, &value).map_err(shell_error("lint"))?, + )?; + return Ok(bundle(vec![ + json!({"label": "override-lint", "status": "ok", "command": result.command}), + ])); + } + } + + let gofmt = run_exec(repo_root, "gofmt", &["-l", "."]).map_err(shell_error("lint"))?; + if !gofmt.stdout.trim().is_empty() { + return Err(CommandError::new( + "lint", + "formatting_required", + "gofmt reported files that need formatting", + ) + .with_details(json!({ "files": gofmt.stdout.lines().collect::>() }))); + } + + let go_vet = require_success( + "lint", + run_exec(repo_root, "go", &["vet", "./..."]).map_err(shell_error("lint"))?, + )?; + let cargo_fmt = require_success( + "lint", + run_exec( + repo_root, + "cargo", + &[ + "fmt", + "--manifest-path", + "harness/Cargo.toml", + "--all", + "--", + "--check", + ], + ) + .map_err(shell_error("lint"))?, + )?; + + Ok(bundle(vec![ + json!({"label": "gofmt", "status": "ok", "command": "gofmt -l ."}), + json!({"label": "go-vet", "status": "ok", "command": go_vet.command}), + json!({"label": "cargo-fmt", "status": "ok", "command": cargo_fmt.command}), + ])) +} + +fn bundle(steps: Vec) -> OutputBundle { + OutputBundle { + text: render_command_steps("lint", &steps), + json: json!({ + "command": "lint", + "status": "ok", + "steps": steps, + }), + ndjson: vec![json!({ + "command": "lint", + "status": "ok", + })], + } +} + +fn shell_error(command: &'static str) -> impl Fn(anyhow::Error) -> CommandError { + move |err| CommandError::new(command, "command_start_failed", err.to_string()) +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn lint_bundle_reports_command_name() { + let bundle = bundle(vec![json!({"label": "cargo-fmt", "status": "ok"})]); + assert_eq!(bundle.json["command"], "lint"); + } +} diff --git a/harness/src/cmd/mod.rs b/harness/src/cmd/mod.rs new file mode 100644 index 0000000..b92e03b --- /dev/null +++ b/harness/src/cmd/mod.rs @@ -0,0 +1,9 @@ +pub mod audit; +pub mod boot; +pub mod cleanup; +pub mod init; +pub mod lint; +pub mod observability; +pub mod smoke; +pub mod test; +pub mod typecheck; diff --git a/harness/src/cmd/observability.rs b/harness/src/cmd/observability.rs new file mode 100644 index 0000000..d7832ec --- /dev/null +++ b/harness/src/cmd/observability.rs @@ -0,0 +1,124 @@ +use crate::util::{ + CommandError, OutputBundle, ensure_runtime_dirs, observability_metadata_path, read_json_file, + render_command_steps, worktree_context, write_json_file, +}; +use serde_json::json; +use std::fs; +use std::path::Path; + +pub fn start(current_dir: &Path) -> Result { + let ctx = worktree_context(current_dir).map_err(obs_error)?; + ensure_runtime_dirs(&ctx).map_err(obs_error)?; + let metadata = json!({ + "worktree_id": ctx.worktree_id, + "runtime_root": ctx.runtime_root, + "log_query_path": ctx.runtime_root.join("logs"), + "status": "shim-active", + }); + write_json_file(&observability_metadata_path(&ctx), &metadata).map_err(obs_error)?; + let steps = vec![json!({"label": "create-observability-shim", "status": "ok"})]; + Ok(OutputBundle { + text: render_command_steps("observability start", &steps), + json: json!({ + "command": "observability start", + "status": "ok", + "metadata": metadata, + "steps": steps, + }), + ndjson: vec![json!({ + "command": "observability start", + "status": "ok", + })], + }) +} + +pub fn stop(current_dir: &Path) -> Result { + let ctx = worktree_context(current_dir).map_err(obs_error)?; + let _ = fs::remove_file(observability_metadata_path(&ctx)); + let steps = vec![json!({"label": "remove-observability-shim", "status": "ok"})]; + Ok(OutputBundle { + text: render_command_steps("observability stop", &steps), + json: json!({ + "command": "observability stop", + "status": "ok", + "steps": steps, + }), + ndjson: vec![json!({ + "command": "observability stop", + "status": "ok", + })], + }) +} + +pub fn query( + current_dir: &Path, + kind: &str, + query: Option<&str>, +) -> Result { + let ctx = worktree_context(current_dir).map_err(obs_error)?; + let _metadata: serde_json::Value = + read_json_file(&observability_metadata_path(&ctx)).map_err(|err| { + CommandError::new( + "observability query", + "observability_not_started", + err.to_string(), + ) + })?; + let mut items = Vec::new(); + if kind == "logs" { + let log_dir = ctx.runtime_root.join("logs"); + if let Ok(entries) = fs::read_dir(log_dir) { + for entry in entries.flatten() { + if let Ok(body) = fs::read_to_string(entry.path()) { + for line in body.lines() { + if query.map(|needle| line.contains(needle)).unwrap_or(true) { + items.push(json!({ + "path": entry.path(), + "line": line, + })); + } + } + } + } + } + } + let text = if items.is_empty() { + "observability query: ok\n- items: 0".to_string() + } else { + format!("observability query: ok\n- items: {}", items.len()) + }; + Ok(OutputBundle { + text, + json: json!({ + "command": "observability query", + "status": "ok", + "kind": kind, + "items": items, + "worktree_id": ctx.worktree_id, + "runtime_root": ctx.runtime_root, + }), + ndjson: if items.is_empty() { + vec![json!({ + "command": "observability query", + "status": "ok", + "kind": kind, + "items": 0, + })] + } else { + items + }, + }) +} + +fn obs_error(err: anyhow::Error) -> CommandError { + CommandError::new("observability", "observability_failed", err.to_string()) +} + +#[cfg(test)] +mod tests { + #[test] + fn log_filter_matches_substring() { + let line = "demo app started"; + assert!(line.contains("started")); + } +} diff --git a/harness/src/cmd/smoke.rs b/harness/src/cmd/smoke.rs new file mode 100644 index 0000000..6f883be --- /dev/null +++ b/harness/src/cmd/smoke.rs @@ -0,0 +1,70 @@ +use crate::util::{ + CommandError, OutputBundle, command_with_override, render_command_steps, require_success, + run_exec, run_shell, +}; +use serde_json::json; +use std::path::Path; + +pub fn run(repo_root: &Path) -> Result { + let tmp_binary = std::env::temp_dir().join("impactable-harness-smoke-bin"); + let tmp_binary_str = tmp_binary.to_string_lossy().to_string(); + let resolved = command_with_override( + "HARNESS_SMOKE_CMD", + "default", + "go", + &["build", "-o", &tmp_binary_str, "./cmd/ralph-loop"], + ); + let result = if resolved[0].starts_with("override:") { + require_success( + "smoke", + run_shell(repo_root, &resolved[1]).map_err(shell_error("smoke"))?, + )? + } else { + let args: Vec<&str> = resolved[2..].iter().map(String::as_str).collect(); + require_success( + "smoke", + run_exec(repo_root, &resolved[1], &args).map_err(shell_error("smoke"))?, + )? + }; + + let steps = vec![json!({ + "label": "smoke", + "status": "ok", + "command": result.command, + })]; + Ok(OutputBundle { + text: render_command_steps("smoke", &steps), + json: json!({ + "command": "smoke", + "status": "ok", + "steps": steps, + }), + ndjson: vec![json!({ + "command": "smoke", + "status": "ok", + "step": "smoke", + "executed": result.command, + })], + }) +} + +fn shell_error(command: &'static str) -> impl Fn(anyhow::Error) -> CommandError { + move |err| CommandError::new(command, "command_start_failed", err.to_string()) +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn supports_override_resolution() { + unsafe { + std::env::set_var("HARNESS_SMOKE_CMD", "echo smoke"); + } + let resolved = command_with_override("HARNESS_SMOKE_CMD", "default", "go", &["build"]); + assert_eq!(resolved[0], "override:HARNESS_SMOKE_CMD"); + unsafe { + std::env::remove_var("HARNESS_SMOKE_CMD"); + } + } +} diff --git a/harness/src/cmd/test.rs b/harness/src/cmd/test.rs new file mode 100644 index 0000000..7060193 --- /dev/null +++ b/harness/src/cmd/test.rs @@ -0,0 +1,74 @@ +use crate::util::{ + CommandError, OutputBundle, render_command_steps, require_success, run_exec, run_shell, +}; +use serde_json::json; +use std::path::Path; + +pub fn run(repo_root: &Path) -> Result { + if let Ok(value) = std::env::var("HARNESS_TEST_CMD") { + if !value.trim().is_empty() { + let result = require_success( + "test", + run_shell(repo_root, &value).map_err(shell_error("test"))?, + )?; + return Ok(bundle( + result.command, + vec![json!({"label": "override-test", "status": "ok"})], + )); + } + } + + let go = require_success( + "test", + run_exec(repo_root, "go", &["test", "./..."]).map_err(shell_error("test"))?, + )?; + let cargo = require_success( + "test", + run_exec( + repo_root, + "cargo", + &["test", "--manifest-path", "harness/Cargo.toml"], + ) + .map_err(shell_error("test"))?, + )?; + Ok(bundle( + "go test ./... && cargo test --manifest-path harness/Cargo.toml".to_string(), + vec![ + json!({"label": "go-tests", "status": "ok", "command": go.command}), + json!({"label": "cargo-tests", "status": "ok", "command": cargo.command}), + ], + )) +} + +fn bundle(executed: String, steps: Vec) -> OutputBundle { + OutputBundle { + text: render_command_steps("test", &steps), + json: json!({ + "command": "test", + "status": "ok", + "executed": executed, + "steps": steps, + }), + ndjson: vec![json!({ + "command": "test", + "status": "ok", + "executed": executed, + })], + } +} + +fn shell_error(command: &'static str) -> impl Fn(anyhow::Error) -> CommandError { + move |err| CommandError::new(command, "command_start_failed", err.to_string()) +} + +#[cfg(test)] +mod tests { + use crate::util::command_with_override; + + #[test] + fn default_command_prefers_go_and_cargo() { + let resolved = + command_with_override("HARNESS_TEST_CMD", "default", "go", &["test", "./..."]); + assert_eq!(resolved[1], "go"); + } +} diff --git a/harness/src/cmd/typecheck.rs b/harness/src/cmd/typecheck.rs new file mode 100644 index 0000000..d444e82 --- /dev/null +++ b/harness/src/cmd/typecheck.rs @@ -0,0 +1,69 @@ +use crate::util::{ + CommandError, OutputBundle, render_command_steps, require_success, run_exec, run_shell, +}; +use serde_json::json; +use std::path::Path; + +pub fn run(repo_root: &Path) -> Result { + if let Ok(value) = std::env::var("HARNESS_TYPECHECK_CMD") { + if !value.trim().is_empty() { + let result = require_success( + "typecheck", + run_shell(repo_root, &value).map_err(shell_error("typecheck"))?, + )?; + return Ok(bundle(vec![ + json!({"label": "override-typecheck", "status": "ok", "command": result.command}), + ])); + } + } + + let go = require_success( + "typecheck", + run_exec(repo_root, "go", &["build", "./..."]).map_err(shell_error("typecheck"))?, + )?; + let cargo = require_success( + "typecheck", + run_exec( + repo_root, + "cargo", + &["check", "--manifest-path", "harness/Cargo.toml"], + ) + .map_err(shell_error("typecheck"))?, + )?; + + Ok(bundle(vec![ + json!({"label": "go-build", "status": "ok", "command": go.command}), + json!({"label": "cargo-check", "status": "ok", "command": cargo.command}), + ])) +} + +fn bundle(steps: Vec) -> OutputBundle { + OutputBundle { + text: render_command_steps("typecheck", &steps), + json: json!({ + "command": "typecheck", + "status": "ok", + "steps": steps, + }), + ndjson: vec![json!({ + "command": "typecheck", + "status": "ok", + })], + } +} + +fn shell_error(command: &'static str) -> impl Fn(anyhow::Error) -> CommandError { + move |err| CommandError::new(command, "command_start_failed", err.to_string()) +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn bundle_contains_command_name() { + let steps = vec![json!({"label": "go-build", "status": "ok"})]; + let bundle = bundle(steps); + assert_eq!(bundle.json["command"], "typecheck"); + } +} diff --git a/harness/src/main.rs b/harness/src/main.rs new file mode 100644 index 0000000..e2b1410 --- /dev/null +++ b/harness/src/main.rs @@ -0,0 +1,128 @@ +mod cmd; +mod util; + +use clap::{Args, Parser, Subcommand}; +use std::path::PathBuf; +use util::{CommandError, OutputFormat}; + +#[derive(Parser)] +#[command(name = "harnesscli")] +#[command(about = "Harness engineering CLI for this repository")] +struct Cli { + #[arg(long, global = true, value_enum)] + output: Option, + #[command(subcommand)] + command: Commands, +} + +#[derive(Subcommand)] +enum Commands { + Init, + Boot(BootArgs), + Smoke, + Test, + Lint, + Typecheck, + Audit { path: Option }, + Cleanup(CleanupArgs), + Observability(ObservabilityArgs), +} + +#[derive(Args)] +struct BootArgs { + #[command(subcommand)] + command: BootCommand, +} + +#[derive(Subcommand)] +enum BootCommand { + Start, + Status, + Stop, +} + +#[derive(Args)] +struct CleanupArgs { + #[command(subcommand)] + command: CleanupCommand, +} + +#[derive(Subcommand)] +enum CleanupCommand { + Scan, + Grade, + Fix, +} + +#[derive(Args)] +struct ObservabilityArgs { + #[command(subcommand)] + command: ObservabilityCommand, +} + +#[derive(Subcommand)] +enum ObservabilityCommand { + Start, + Stop, + Query { + #[arg(long, default_value = "logs")] + kind: String, + #[arg(long)] + query: Option, + }, +} + +fn main() { + let cli = Cli::parse(); + let format = util::resolve_output(cli.output); + let result = dispatch(cli, format); + std::process::exit(result); +} + +fn dispatch(cli: Cli, format: OutputFormat) -> i32 { + let current_dir = match std::env::current_dir() { + Ok(path) => path, + Err(err) => { + let failure = CommandError::new("harnesscli", "cwd_failed", err.to_string()); + let _ = util::emit_error(&failure, format); + return 1; + } + }; + + let outcome = match cli.command { + Commands::Init => cmd::init::run(¤t_dir), + Commands::Boot(args) => match args.command { + BootCommand::Start => cmd::boot::start(¤t_dir), + BootCommand::Status => cmd::boot::status(¤t_dir), + BootCommand::Stop => cmd::boot::stop(¤t_dir), + }, + Commands::Smoke => cmd::smoke::run(¤t_dir), + Commands::Test => cmd::test::run(¤t_dir), + Commands::Lint => cmd::lint::run(¤t_dir), + Commands::Typecheck => cmd::typecheck::run(¤t_dir), + Commands::Audit { path } => cmd::audit::run(path.unwrap_or(current_dir)), + Commands::Cleanup(args) => match args.command { + CleanupCommand::Scan => cmd::cleanup::scan(¤t_dir), + CleanupCommand::Grade => cmd::cleanup::grade(¤t_dir), + CleanupCommand::Fix => cmd::cleanup::fix(¤t_dir), + }, + Commands::Observability(args) => match args.command { + ObservabilityCommand::Start => cmd::observability::start(¤t_dir), + ObservabilityCommand::Stop => cmd::observability::stop(¤t_dir), + ObservabilityCommand::Query { kind, query } => { + cmd::observability::query(¤t_dir, &kind, query.as_deref()) + } + }, + }; + + match outcome { + Ok(bundle) => { + let _ = util::emit(bundle, format); + 0 + } + Err(err) => { + let _ = util::emit_error(&err, format); + 1 + } + } +} diff --git a/harness/src/util/mod.rs b/harness/src/util/mod.rs new file mode 100644 index 0000000..a21b485 --- /dev/null +++ b/harness/src/util/mod.rs @@ -0,0 +1,448 @@ +use anyhow::{Context, Result, anyhow}; +use clap::ValueEnum; +use serde::{Deserialize, Serialize}; +use serde_json::{Value, json}; +use std::fs::{self, File}; +use std::io::{self, IsTerminal, Read, Write}; +use std::net::TcpStream; +use std::path::{Path, PathBuf}; +use std::process::{Command, Output, Stdio}; +use std::thread; +use std::time::{Duration, Instant}; + +const DEFAULT_APP_PORT_BASE: u16 = 4100; +const PORT_RANGE: u16 = 20; +const FALLBACK_PORT_RANGE: u16 = 2000; + +#[derive(Clone, Copy, Debug, Eq, PartialEq, ValueEnum)] +pub enum OutputFormat { + Text, + Json, + Ndjson, +} + +#[derive(Debug)] +pub struct OutputBundle { + pub text: String, + pub json: Value, + pub ndjson: Vec, +} + +#[derive(Debug, Serialize, Clone)] +pub struct CommandError { + pub code: String, + pub message: String, + pub command: String, + #[serde(skip_serializing_if = "Value::is_null")] + pub details: Value, +} + +#[derive(Debug, Serialize, Deserialize, Clone)] +pub struct WorktreeContext { + pub repo_root: PathBuf, + pub worktree_id: String, + pub runtime_root: PathBuf, + pub selected_port: u16, +} + +#[derive(Debug, Serialize, Deserialize, Clone)] +pub struct BootMetadata { + pub worktree_id: String, + pub runtime_root: PathBuf, + pub pid: u32, + pub app_url: String, + pub healthcheck_url: String, + pub selected_port: u16, + pub stdout_log: PathBuf, + pub stderr_log: PathBuf, +} + +#[derive(Debug)] +pub struct CmdResult { + pub command: String, + pub status: i32, + pub stdout: String, + pub stderr: String, +} + +impl CommandError { + pub fn new( + command: impl Into, + code: impl Into, + message: impl Into, + ) -> Self { + Self { + code: code.into(), + message: message.into(), + command: command.into(), + details: Value::Null, + } + } + + pub fn with_details(mut self, details: Value) -> Self { + self.details = details; + self + } +} + +pub fn resolve_output(explicit: Option) -> OutputFormat { + explicit.unwrap_or_else(|| { + if io::stdout().is_terminal() { + OutputFormat::Text + } else { + OutputFormat::Json + } + }) +} + +pub fn emit(bundle: OutputBundle, format: OutputFormat) -> Result<()> { + match format { + OutputFormat::Text => { + println!("{}", bundle.text); + } + OutputFormat::Json => { + println!("{}", serde_json::to_string_pretty(&bundle.json)?); + } + OutputFormat::Ndjson => { + for item in bundle.ndjson { + println!("{}", serde_json::to_string(&item)?); + } + } + } + Ok(()) +} + +pub fn emit_error(err: &CommandError, format: OutputFormat) -> Result<()> { + match format { + OutputFormat::Text => { + if let Some(checks) = err.details.get("checks").and_then(Value::as_array) { + for check in checks { + let passed = check + .get("passed") + .and_then(Value::as_bool) + .unwrap_or(false); + let label = check + .get("label") + .and_then(Value::as_str) + .unwrap_or("check"); + let status = if passed { "[ok]" } else { "[missing]" }; + eprintln!("{status} {label}"); + } + } + eprintln!("{}: {}", err.command, err.message); + } + OutputFormat::Json | OutputFormat::Ndjson => { + let payload = json!({ "error": err }); + println!("{}", serde_json::to_string_pretty(&payload)?); + } + } + Ok(()) +} + +pub fn repo_root(path: &Path) -> Result { + let output = Command::new("git") + .arg("rev-parse") + .arg("--show-toplevel") + .current_dir(path) + .output() + .context("failed to run git rev-parse")?; + if !output.status.success() { + return Err(anyhow!( + "git rev-parse failed: {}", + String::from_utf8_lossy(&output.stderr) + )); + } + canonicalize_utf8(Path::new(String::from_utf8_lossy(&output.stdout).trim())) +} + +pub fn worktree_context(path: &Path) -> Result { + let repo_root = repo_root(path)?; + let worktree_id = std::env::var("DISCODE_WORKTREE_ID") + .ok() + .filter(|value| !value.trim().is_empty()) + .unwrap_or_else(|| derive_worktree_id(&repo_root)); + let runtime_root = repo_root.join(".worktree").join(&worktree_id); + let selected_port = resolve_port(&worktree_id)?; + Ok(WorktreeContext { + repo_root, + worktree_id, + runtime_root, + selected_port, + }) +} + +pub fn derive_worktree_id(path: &Path) -> String { + let canonical = canonicalize_utf8(path).unwrap_or_else(|_| path.to_path_buf()); + let base = canonical + .file_name() + .and_then(|name| name.to_str()) + .unwrap_or("worktree"); + let hash = fnv1a64(canonical.to_string_lossy().as_bytes()); + format!("{base}-{hash:08x}") +} + +pub fn ensure_runtime_dirs(ctx: &WorktreeContext) -> Result<()> { + for dir in [ + ctx.runtime_root.join("run"), + ctx.runtime_root.join("logs"), + ctx.runtime_root.join("tmp"), + ctx.runtime_root.join("demo-app"), + ctx.runtime_root.join("observability"), + ] { + fs::create_dir_all(dir)?; + } + Ok(()) +} + +pub fn runtime_manifest_path(ctx: &WorktreeContext) -> PathBuf { + ctx.runtime_root.join("run").join("runtime.json") +} + +pub fn boot_metadata_path(ctx: &WorktreeContext) -> PathBuf { + ctx.runtime_root.join("run").join("boot.json") +} + +pub fn observability_metadata_path(ctx: &WorktreeContext) -> PathBuf { + ctx.runtime_root.join("run").join("observability.json") +} + +pub fn write_json_file(path: &Path, value: &T) -> Result<()> { + if let Some(parent) = path.parent() { + fs::create_dir_all(parent)?; + } + let body = serde_json::to_vec_pretty(value)?; + fs::write(path, body)?; + Ok(()) +} + +pub fn read_json_file Deserialize<'de>>(path: &Path) -> Result { + let body = fs::read(path)?; + Ok(serde_json::from_slice(&body)?) +} + +pub fn run_exec(current_dir: &Path, program: &str, args: &[&str]) -> Result { + let output = Command::new(program) + .args(args) + .current_dir(current_dir) + .output() + .with_context(|| format!("failed to run {program}"))?; + cmd_result(program, args, output) +} + +pub fn run_shell(current_dir: &Path, command: &str) -> Result { + let output = Command::new("sh") + .arg("-lc") + .arg(command) + .current_dir(current_dir) + .output() + .with_context(|| format!("failed to run shell command: {command}"))?; + cmd_result("sh", &["-lc", command], output) +} + +pub fn require_success(command: &str, result: CmdResult) -> Result { + if result.status == 0 { + Ok(result) + } else { + Err(CommandError::new( + command, + "command_failed", + format!("{} failed with exit code {}", result.command, result.status), + ) + .with_details(json!({ + "stdout": result.stdout, + "stderr": result.stderr, + "executed": result.command, + }))) + } +} + +pub fn spawn_background_http_server( + serve_dir: &Path, + port: u16, + stdout_log: &Path, + stderr_log: &Path, +) -> Result { + let stdout = File::create(stdout_log)?; + let stderr = File::create(stderr_log)?; + let child = Command::new("python3") + .arg("-m") + .arg("http.server") + .arg(port.to_string()) + .arg("--bind") + .arg("127.0.0.1") + .current_dir(serve_dir) + .stdout(Stdio::from(stdout)) + .stderr(Stdio::from(stderr)) + .spawn() + .context("failed to start demo app server")?; + Ok(child.id()) +} + +pub fn is_pid_alive(pid: u32) -> bool { + Command::new("kill") + .arg("-0") + .arg(pid.to_string()) + .status() + .map(|status| status.success()) + .unwrap_or(false) +} + +pub fn stop_pid(pid: u32) -> Result<()> { + let status = Command::new("kill") + .arg("-TERM") + .arg(pid.to_string()) + .status() + .context("failed to stop process")?; + if status.success() { + Ok(()) + } else { + Err(anyhow!("kill -TERM {} failed", pid)) + } +} + +pub fn wait_for_http_ok(url: &str, timeout: Duration) -> bool { + let start = Instant::now(); + while start.elapsed() < timeout { + if http_ok(url) { + return true; + } + thread::sleep(Duration::from_millis(200)); + } + false +} + +pub fn http_ok(url: &str) -> bool { + let stripped = url.strip_prefix("http://").unwrap_or(url); + let mut parts = stripped.splitn(2, '/'); + let host_port = parts.next().unwrap_or_default(); + let path = format!("/{}", parts.next().unwrap_or_default()); + let mut stream = match TcpStream::connect(host_port) { + Ok(stream) => stream, + Err(_) => return false, + }; + let request = format!("GET {path} HTTP/1.1\r\nHost: {host_port}\r\nConnection: close\r\n\r\n"); + if stream.write_all(request.as_bytes()).is_err() { + return false; + } + let mut response = String::new(); + if stream.read_to_string(&mut response).is_err() { + return false; + } + response.starts_with("HTTP/1.0 200") || response.starts_with("HTTP/1.1 200") +} + +pub fn resolve_port(worktree_id: &str) -> Result { + for key in ["DISCODE_APP_PORT", "APP_PORT", "PORT"] { + if let Ok(value) = std::env::var(key) { + if !value.trim().is_empty() { + return value + .parse::() + .with_context(|| format!("invalid port in {key}")); + } + } + } + + let base = std::env::var("APP_PORT_BASE") + .ok() + .and_then(|raw| raw.parse::().ok()) + .unwrap_or(DEFAULT_APP_PORT_BASE); + let hash_offset = (fnv1a64(worktree_id.as_bytes()) % 1000) as u16; + let start = base.saturating_add(hash_offset); + for offset in 0..PORT_RANGE { + let candidate = start.saturating_add(offset); + if can_bind_port(candidate) { + return Ok(candidate); + } + } + for offset in 0..FALLBACK_PORT_RANGE { + let candidate = base.saturating_add(offset); + if can_bind_port(candidate) { + return Ok(candidate); + } + } + Ok(base) +} + +pub fn render_command_steps(command: &str, steps: &[Value]) -> String { + let mut lines = vec![format!("{command}: ok")]; + for step in steps { + let label = step.get("label").and_then(Value::as_str).unwrap_or("step"); + let status = step.get("status").and_then(Value::as_str).unwrap_or("ok"); + lines.push(format!("- {label}: {status}")); + } + lines.join("\n") +} + +pub fn command_with_override( + env_key: &str, + default_label: &str, + default_program: &str, + default_args: &[&str], +) -> Vec { + if let Ok(value) = std::env::var(env_key) { + if !value.trim().is_empty() { + return vec![format!("override:{env_key}"), value]; + } + } + let mut result = vec![default_label.to_string(), default_program.to_string()]; + result.extend(default_args.iter().map(|value| value.to_string())); + result +} + +fn cmd_result(program: &str, args: &[&str], output: Output) -> Result { + Ok(CmdResult { + command: std::iter::once(program.to_string()) + .chain(args.iter().map(|value| value.to_string())) + .collect::>() + .join(" "), + status: output.status.code().unwrap_or(-1), + stdout: String::from_utf8(output.stdout)?, + stderr: String::from_utf8(output.stderr)?, + }) +} + +fn canonicalize_utf8(path: &Path) -> Result { + Ok(path.canonicalize()?) +} + +fn fnv1a64(input: &[u8]) -> u64 { + let mut hash = 0xcbf29ce484222325u64; + for byte in input { + hash ^= u64::from(*byte); + hash = hash.wrapping_mul(0x100000001b3); + } + hash +} + +fn can_bind_port(port: u16) -> bool { + std::net::TcpListener::bind(("127.0.0.1", port)).is_ok() +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn derive_worktree_id_is_stable() { + let path = Path::new("/tmp/example-repo"); + assert_eq!(derive_worktree_id(path), derive_worktree_id(path)); + } + + #[test] + fn command_override_prefers_env() { + unsafe { + std::env::set_var("HARNESS_TEST_KEY", "echo custom"); + } + let resolved = command_with_override("HARNESS_TEST_KEY", "default", "go", &["test"]); + assert_eq!(resolved[0], "override:HARNESS_TEST_KEY"); + unsafe { + std::env::remove_var("HARNESS_TEST_KEY"); + } + } + + #[test] + fn resolve_port_returns_a_value() { + let port = resolve_port("impactable-test").unwrap(); + assert!(port >= DEFAULT_APP_PORT_BASE); + } +} diff --git a/harness/tests/cli.rs b/harness/tests/cli.rs new file mode 100644 index 0000000..b068451 --- /dev/null +++ b/harness/tests/cli.rs @@ -0,0 +1,71 @@ +use serde_json::Value; +use std::path::{Path, PathBuf}; +use std::process::Command; +use std::time::{SystemTime, UNIX_EPOCH}; + +fn binary() -> &'static str { + env!("CARGO_BIN_EXE_harnesscli") +} + +fn repo_root() -> PathBuf { + Path::new(env!("CARGO_MANIFEST_DIR")) + .parent() + .unwrap() + .to_path_buf() +} + +#[test] +fn smoke_defaults_to_json_when_stdout_is_captured() { + let output = Command::new(binary()) + .arg("smoke") + .current_dir(repo_root()) + .output() + .expect("run smoke"); + assert!(output.status.success()); + let body: Value = serde_json::from_slice(&output.stdout).expect("json output"); + assert_eq!(body["command"], "smoke"); + assert_eq!(body["status"], "ok"); +} + +#[test] +fn audit_reports_success_in_json_mode() { + let output = Command::new(binary()) + .args(["--output", "json", "audit", "."]) + .current_dir(repo_root()) + .output() + .expect("run audit"); + assert!(output.status.success()); + let body: Value = serde_json::from_slice(&output.stdout).expect("json output"); + assert_eq!(body["command"], "audit"); + assert_eq!(body["passed"], true); +} + +#[test] +fn audit_reports_structured_json_errors() { + let temp_dir = std::env::temp_dir().join(format!( + "impactable-harness-missing-{}", + SystemTime::now() + .duration_since(UNIX_EPOCH) + .unwrap() + .as_nanos() + )); + std::fs::create_dir_all(&temp_dir).unwrap(); + + let output = Command::new(binary()) + .args([ + "--output", + "json", + "audit", + temp_dir.to_str().expect("temp dir utf-8"), + ]) + .current_dir(repo_root()) + .output() + .expect("run audit"); + + assert!(!output.status.success()); + let body: Value = serde_json::from_slice(&output.stdout).expect("error json output"); + assert_eq!(body["error"]["code"], "audit_failed"); + assert!(body["error"]["details"]["checks"].is_array()); + + let _ = std::fs::remove_dir_all(temp_dir); +} diff --git a/specs/harness-spec/1_harness_structure.md b/specs/harness-spec/1_harness_structure.md new file mode 100644 index 0000000..c449d96 --- /dev/null +++ b/specs/harness-spec/1_harness_structure.md @@ -0,0 +1,149 @@ +Please apply the following strategy to our repository. + +The core idea is that AGENTS.md should not become a giant manual containing everything. Instead, I want it to remain a short, stable entrypoint, while the real source of truth lives in a structured, in-repository documentation system. The goal is to let agents start from a small map and progressively navigate to deeper context only when needed, rather than overwhelming them with too much guidance upfront. + +Please keep the following example structure exactly as-is in the prompt for reference: + +``` +AGENTS.md +ARCHITECTURE.md +NON_NEGOTIABLE_RULES.md +docs/ +├── design-docs/ +│ ├── index.md +│ ├── core-beliefs.md +│ └── ... +├── exec-plans/ +│ ├── active/ +│ ├── completed/ +│ └── tech-debt-tracker.md +├── generated/ +│ └── db-schema.md +├── product-specs/ +│ ├── index.md +│ ├── new-user-onboarding.md +│ └── ... +├── references/ +│ ├── design-system-reference-llms.txt +│ ├── nixpacks-llms.txt +│ ├── uv-llms.txt +│ └── ... +├── DESIGN.md +├── FRONTEND.md +├── PLANS.md +├── PRODUCT_SENSE.md +├── QUALITY_SCORE.md +├── RELIABILITY.md +└── SECURITY.md +``` + +This is the direction I want: + +### AGENTS.md + +* AGENTS.md should contain only table-of-contents like above example. +* It should be a navigation document, not a knowledge document. +* Around 100 lines if possible. +* If there is already content in the current AGENTS.md that goes beyond table-of-contents style guidance, that content should be moved out into newly created or properly organized documents under docs/, and AGENTS.md should be reduced to pointers to those documents. +* In other words, any existing substantive guidance currently living in AGENTS.md should be extracted into the appropriate documentation under docs/, rather than preserved inline. + +### Structured repository knowledge +* The real source of truth should live in docs/ and related top-level documents. +* Organize documentation into focused, discoverable sections with strong indexing and cross-linking. +* Prefer many small, maintainable documents over one giant document. +* Make it clear which document is canonical for each topic, who it is for, and when it should be updated. + +### Reference documents +* When scaffolding `docs/references/`, copy the documentation-oriented contents of `create-harness/references/` into `docs/references/`. +* These are pre-curated LLM-friendly reference files (e.g., `codex-app-server-llm.txt`) that give agents context about external tools, frameworks, and patterns used by the project. +* Ralph Loop reference implementations now live in the public `ralph-loop.spec` repository: + `https://github.com/siisee11/ralph-loop.spec/tree/main/references` +* Copy `https://github.com/siisee11/ralph-loop.spec/tree/main/references/cmd/ralph-loop`, + `https://github.com/siisee11/ralph-loop.spec/tree/main/references/internal/ralphloop`, and + `https://github.com/siisee11/ralph-loop.spec/blob/main/references/ralph-loop` into the matching repository paths instead of under `docs/references/`. +* Add project-specific references over time as new dependencies or external integrations are introduced. + +### Non-negotiable rules + +* NON_NEGOTIABLE_RULES.md contains absolute rules that block merge unconditionally. No exceptions, no workarounds. +* Use `create-harness/templates/NON_NEGOTIABLE_RULES.md` as the template. Copy it to the repository root and adapt as needed. +* AGENTS.md must link to NON_NEGOTIABLE_RULES.md so agents discover it immediately. +* Rules are enforced mechanically in CI — they are not advisory. + +### Architecture and product knowledge + +* ARCHITECTURE.md should serve as a top-level map of domains, package boundaries, dependency direction, and major entrypoints. +* docs/product-specs/ should contain feature-level product specs and be accessible through an index.md. + +* docs/design-docs/ should contain design rationale, core beliefs, and major decision documents, with a way to track status and verification state. + +### Minimum runtime and validation docs + +Phase 1 should create the canonical docs that later harness phases depend on. At minimum: + +* `docs/design-docs/index.md` should be a real index, not a placeholder. Include columns for canonical topic ownership, intended audience, and when each doc must be updated. +* `docs/design-docs/local-operations.md` should document the local command surface, environment variables, launch contracts, and troubleshooting. +* `docs/design-docs/worktree-isolation.md` should explain how worktree IDs, ports, runtime roots, cleanup, and stale-process handling work. +* `docs/design-docs/observability-shim.md` should document the telemetry data flow and the HTTP query contract used by the local observability stack. +* `docs/product-specs/harness-demo-app.md` should define the deterministic browser-visible app surface the harness boots for validation. + +The built harness in this repository needed these documents to keep `AGENTS.md` short while still making the runtime contract discoverable to both humans and agents. + +### Core beliefs and agent-first operating principles + +* `docs/design-docs/core-beliefs.md` is a required document. It must be created during Phase 1 and linked from the design-docs index. +* It should contain two sections: + +**Product Beliefs** — the product principles that shape design tradeoffs (e.g., local-first vs hosted, chat as control surface, persistent sessions). Adapt these to the specific product being built. + +**Agent-First Operating Principles** — the following principles must be encoded. They define how agents and humans interact with the repository: + +1. **Repository knowledge is the system of record.** + Anything that lives only in Slack, Google Docs, or someone's head is invisible to agents. If a decision matters, it must be encoded as a versioned artifact in this repository — code, markdown, schema, or executable plan. + +2. **What the agent cannot see does not exist.** + Context is bounded by what is discoverable in-repo at runtime. Push product intent, architectural rationale, and team conventions into docs/ so agents can reason about them directly. + +3. **Enforce boundaries centrally, allow autonomy locally.** + Architecture rules, dependency direction, and boundary validation are enforced mechanically via linters and CI. Within those guardrails, agents have freedom in how solutions are expressed. + +4. **Corrections are cheap, waiting is expensive.** + Agent throughput far exceeds human attention. Short-lived PRs with minimal blocking merge gates and fast follow-up fixes are preferred over long review queues. + +5. **Prefer boring technology.** + Composable, API-stable, well-represented-in-training-data dependencies are easier for agents to model. When an upstream library is opaque, it is often cheaper to reimplement the needed subset with full test coverage than to work around it. + +6. **Encode taste once, enforce continuously.** + Human judgment about quality, naming, structure, and reliability is captured in golden-principles.yaml, custom linters, and architectural rules — then applied mechanically to every line of code on every run. + +7. **Treat documentation as executable infrastructure.** + Docs are linted, cross-linked, freshness-checked, and graded. Stale or orphaned documentation is a defect, same as a failing test. + +### Treat plans as first-class artifacts + +* Small tasks can use lightweight plans, but complex work should be tracked with checked-in execution plans. + +* Use docs/exec-plans/active/, docs/exec-plans/completed/, and tech-debt-tracker.md to version active work, completed work, and known technical debt together in the repository. + +* An execution plan should ideally include: +goal / scope +background +milestones +current progress +key decisions +remaining issues / open questions +links to related documents + +### Important principles + +* Do not create one massive instruction manual. + +* AGENTS.md must remain only a table of contents. + +* If existing AGENTS.md content contains real guidance, move that guidance into new or existing documents under docs/. + +* Optimize for discoverability, freshness, and maintainability. + +* Make it easy for both humans and agents to quickly identify the canonical source of truth. + +* Documentation should reflect real code and real operating practices, not idealized descriptions. diff --git a/specs/harness-spec/2_execution-env-setup.md b/specs/harness-spec/2_execution-env-setup.md new file mode 100644 index 0000000..73313f6 --- /dev/null +++ b/specs/harness-spec/2_execution-env-setup.md @@ -0,0 +1,202 @@ +Please implement the following platform improvements so our app can be reliably driven by Coding agent in isolated development environments and instrumented for browser-level validation. + +## Goal + +Make the app bootable per Git worktree so Coding agent can launch and operate one independent app instance per change/worktree. The agent uses the `agent-browser` skill for DOM snapshots, screenshots, and navigation. The end result should allow Coding agent to reproduce bugs, validate fixes, and reason about UI behavior directly from the running app. + +## Outcomes we want + +### 1. Per-worktree app booting + +- Each Git worktree should be able to boot its own isolated app instance without conflicting with other worktrees. +- Coding agent should be able to launch the app for a given worktree automatically. +- Each instance should have its own derived runtime config where needed, such as ports, temp directories, cache directories, local storage paths, log files, and any other stateful resources. +- The startup flow should be deterministic and scriptable. + +### 2. Skills for UI investigation + +Install the `agent-browser` skill for browser-level UI investigation: + +```sh +npx skills add vercel-labs/agent-browser --skill agent-browser +``` + +Use this skill for: + +- Page navigation +- DOM snapshot capture +- Screenshot capture +- Basic page readiness/waiting behavior + +These capabilities should be used for bug reproduction and validation workflows, not just generic browsing. + +### 3. Bug reproduction and fix validation + +Coding agent should be able to: + +- Launch a worktree-specific app instance +- Open the relevant page in a browser +- Navigate through the app +- Inspect the DOM +- Take screenshots +- Verify whether the bug exists +- Apply or evaluate a fix +- Re-run the same flow to confirm the fix + +## Requirements + +### A. Worktree-aware boot architecture + +Design and implement a worktree-aware app boot flow. + +Expectations: + +- Derive a stable worktree identifier from the current Git worktree. +- Use that identifier to isolate runtime resources. +- Avoid collisions across: + - Dev server port + - Websocket port + - Temp files + - Local databases or SQLite files if applicable + - Logs + - Browser profile / user data dir if applicable +- Provide a single command that boots the app for the current worktree. +- Prefer convention over manual configuration, but allow overrides through env vars. + +Please include: + +- The boot strategy +- How the worktree ID is computed +- How ports/resources are assigned +- How cleanup works +- Failure handling when a derived port/resource is already occupied + +### B. Coding agent launch contract + +Create a clear contract so Coding agent can launch one app instance per change/worktree. + +Expectations: + +- Provide a command or script intended for automation use. +- The command surface must support `--output json|ndjson|text`. +- In non-TTY contexts, structured output must be the default. Human-oriented text is opt-in via `--output text`. +- It should return enough metadata for downstream tooling, such as: + - App URL + - Selected port + - Healthcheck URL / status + - Worktree ID + - Runtime root + - Observability URL or query base when observability is started alongside boot +- Startup should block until the app is actually ready, or fail clearly. +- Add healthcheck logic rather than relying on blind sleeps. +- Any failure must emit a structured JSON error object to stderr with a stable error code, message, and relevant command context. + +### C. Environment initialization entrypoint (`harnesscli init`) + +Create `harnesscli init` as the system-of-record implementation for environment preparation. Keep the full initialization flow inside the Rust CLI so the behavior is testable, versioned, and exposed through a single stable interface. + +**Usage:** + +```sh +harnesscli init [--base-branch ] [--work-branch ] +``` + +**The command must perform the following steps in order:** + +1. **Create or reuse a git worktree**: If already inside a worktree, reuse it. Otherwise, create a new worktree using `git worktree add` from the specified base branch (default: `main`). Derive the worktree path using the same convention as `scripts/lib/worktree.sh`. + +2. **Clean git state**: Inside the worktree, ensure a clean working tree. Stash any uncommitted changes. Create and checkout the work branch if specified. + +3. **Install dependencies**: Run the project's package install or fetch commands (detect `package.json` → `npm install`/`bun install`, `Cargo.toml` → `cargo fetch`/`cargo build`, etc.). Fail clearly if install fails. + +4. **Verify build**: Run `make smoke` if `Makefile.harness` exists, otherwise attempt the project's default build command. If the build fails, exit non-zero with a diagnostic message. + +5. **Set up environment config**: If `.env.example` exists and `.env` does not, copy it. Set `DISCODE_WORKTREE_ID` and any other worktree-derived env vars. + +6. **Create runtime directories**: Ensure `.worktree//logs/`, `.worktree//tmp/`, and other runtime dirs exist. + +**Output contract:** + +The command must print a JSON object to stdout on success: + +```json +{ + "worktree_id": "", + "worktree_path": "", + "work_branch": "", + "base_branch": "", + "deps_installed": true, + "build_verified": true, + "runtime_root": ".worktree//" +} +``` + +**Requirements:** + +- Must be idempotent — running it twice on the same worktree is safe. +- Must work from any directory (resolves repo root internally). +- Must not require interactive input. +- Exit code 0 on success, non-zero on any failure. +- All output except the final JSON goes to stderr so the JSON can be parsed from stdout. +- Support `--output json` explicitly even though JSON is already the default in non-TTY contexts. +- If progress is streamed, emit NDJSON events to stderr or behind an explicit `--output ndjson` mode so an agent can consume incremental state without parsing prose. + +**This command is reused by the Ralph Loop** ([`https://github.com/siisee11/ralph-loop.spec/blob/main/SPEC.md`](https://github.com/siisee11/ralph-loop.spec/blob/main/SPEC.md)) as a deterministic replacement for the setup agent's environment preparation steps. The setup agent calls `harnesscli init` first, then only needs to create the execution plan. + +### D. Reproducibility and validation flow + +Implement an example flow or harness showing how Coding agent can use the system to: + +- Boot a worktree-specific app +- Use the `agent-browser` skill to connect to the running app +- Navigate to a target page +- Collect DOM snapshot and screenshot +- Assert expected UI state +- Re-run after a code change to verify the fix + +This can be a smoke-test-style script, example agent workflow, or documented end-to-end example. + +## Deliverables + +Please produce all of the following: + +1. **Implementation** + - Code changes for worktree-aware booting + - `harnesscli init` — environment initialization command with JSON output contract + - `harnesscli boot {start,status,stop}` — machine-readable launch lifecycle commands with JSON/NDJSON output modes + - Install and configure the `agent-browser` skill + +2. **Design note** + - Concise architecture explanation + - Tradeoffs and assumptions + - How isolation works per worktree + +3. **Developer documentation** + - How to run locally + - How Coding agent should invoke it + - Required environment variables + - Troubleshooting notes + +4. **Example workflow** + - A concrete example showing bug reproduction and fix validation using the new system + +## Non-goals + +- Do not build a giant generic browser automation platform. +- Do not optimize for production browser telemetry yet. +- Focus on the minimum robust foundation needed for Coding agent-driven UI debugging and validation. + +## Quality bar + +- Deterministic and automation-friendly +- Minimal manual setup +- Clear failure modes +- Safe parallel usage across multiple worktrees +- Easy for an agent to reason about +- Well-structured enough to extend later + +## Design questions to think through before implementing + +- What is the best source of truth for worktree identity? +- How should derived ports be allocated to minimize collisions while staying predictable? +- What is the simplest reusable interface Coding agent can depend on? diff --git a/specs/harness-spec/3_observability-stack-setup.md b/specs/harness-spec/3_observability-stack-setup.md new file mode 100644 index 0000000..af25453 --- /dev/null +++ b/specs/harness-spec/3_observability-stack-setup.md @@ -0,0 +1,264 @@ +# Implement Local Observability Stack + +Set up an ephemeral, per-worktree observability stack so coding agents can query logs, metrics, and traces from a running app instance. The stack is fully isolated per worktree and torn down when the task completes. + +With this stack in place, prompts like "ensure service startup completes in under 800ms" or "no span in these four critical user journeys exceeds two seconds" become tractable — the agent can query real telemetry, reason about it, implement a fix, restart the app, and verify the improvement. + +## Architecture + +``` +APP + │ + ├── Logs (HTTP) + ├── OTLP Metrics + └── OTLP Traces + │ + ▼ + VECTOR (fan-out, local) + ├──────────► Victoria Logs ──► LogQL API + ├──────────► Victoria Metrics ──► PromQL API + └──────────► Victoria Traces ──► TraceQL API + │ + ▼ + Coding Agent + (query, correlate, reason) + │ + ▼ + Implement change + restart app + │ + ▼ + Re-run workload + verify +``` + +### Components + +| Component | Role | Ingest protocol | Query API | +|---|---|---|---| +| **Vector** | Telemetry collector and local fan-out | Receives logs (HTTP), OTLP metrics, OTLP traces from the app | N/A | +| **Victoria Logs** | Log storage and query engine | Receives logs from Vector | LogQL | +| **Victoria Metrics** | Metrics storage and query engine | Receives OTLP metrics from Vector | PromQL | +| **Victoria Traces** | Trace storage and query engine | Receives OTLP traces from Vector | TraceQL | + +## Step 1: Understand the repository + +Before implementing, explore the repository to determine: + +- **What the app already emits**: Does it have structured logging? Does it emit OTLP telemetry? What libraries or frameworks are in use? +- **Worktree setup**: Is there an existing worktree-aware boot flow (e.g., from `execution-env-setup.md`)? The observability stack must integrate with it. +- **Existing Docker/container usage**: Is Docker Compose or similar already in use? The stack services can run as containers or as standalone binaries. + +## Step 2: Set up Vector as the telemetry collector + +Vector is the single collection point. The app sends all telemetry to Vector, and Vector fans out to the three Victoria services. + +### What Vector must do + +- Accept **HTTP logs** from the app (e.g., on a local port) +- Accept **OTLP metrics** from the app (OTLP/gRPC or OTLP/HTTP) +- Accept **OTLP traces** from the app (OTLP/gRPC or OTLP/HTTP) +- Forward logs to Victoria Logs +- Forward metrics to Victoria Metrics +- Forward traces to Victoria Traces + +### Worktree isolation + +- All Vector ports must be derived from the worktree ID to avoid collisions across worktrees. +- Vector's data directory must be worktree-scoped. +- The Vector config file can be a shared template with ports injected at startup. + +### Configuration + +Create a Vector config template (`scripts/observability/vector.toml` or similar) that defines: + +- **Sources**: HTTP log receiver, OTLP receiver +- **Sinks**: Victoria Logs (HTTP), Victoria Metrics (remote write / OTLP), Victoria Traces (OTLP) + +All sink endpoints should use worktree-derived ports. + +## Step 3: Set up Victoria Logs + +Victoria Logs stores and queries logs via a LogQL-compatible API. + +### What to configure + +- Listen port derived from worktree ID +- Data storage directory scoped to the worktree (e.g., `.worktree//victoria-logs/`) +- Retention policy: short-lived, no need for long retention — this is ephemeral + +### How the agent queries logs + +The agent uses the LogQL API to query logs: + +``` +GET http://localhost:/select/logsql/query?query= +``` + +Example queries the agent might run: + +- `{app="myservice"} |= "error"` — find error logs +- `{app="myservice"} | json | duration > 800ms` — find slow operations +- `{app="myservice", level="error"} | line_format "{{.msg}}"` — extract error messages + +## Step 4: Set up Victoria Metrics + +Victoria Metrics stores and queries metrics via a PromQL-compatible API. + +### What to configure + +- Listen port derived from worktree ID +- Data storage directory scoped to the worktree (e.g., `.worktree//victoria-metrics/`) +- Accept OTLP metrics ingestion (via `-openTelemetryListenAddr` flag or similar) + +### How the agent queries metrics + +The agent uses the PromQL API: + +``` +GET http://localhost:/api/v1/query?query= +GET http://localhost:/api/v1/query_range?query=&start=&end=&step= +``` + +Example queries: + +- `http_request_duration_seconds{quantile="0.99"}` — p99 latency +- `rate(http_requests_total[1m])` — request rate +- `histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))` — p95 from histogram + +## Step 5: Set up Victoria Traces + +Victoria Traces stores and queries distributed traces via a TraceQL-compatible API. + +### What to configure + +- Listen port derived from worktree ID +- Data storage directory scoped to the worktree (e.g., `.worktree//victoria-traces/`) +- Accept OTLP trace ingestion + +### How the agent queries traces + +The agent uses the TraceQL API: + +``` +GET http://localhost:/api/v3/search?query= +``` + +Example queries: + +- `{resource.service.name="myservice" && duration > 2s}` — find slow traces +- `{span.http.status_code >= 500}` — find error spans +- `{name="user.checkout" && duration > 800ms}` — find slow checkout journeys + +## Step 6: Instrument the app + +Ensure the app emits telemetry that Vector can collect: + +- **Logs**: The app should send structured logs (JSON) to Vector's HTTP log source. If the app already writes to stdout, Vector can also tail a log file. +- **Metrics**: The app should emit OTLP metrics to Vector's OTLP receiver. Use the appropriate OpenTelemetry SDK for the project's language. +- **Traces**: The app should emit OTLP traces to Vector's OTLP receiver. Use the OpenTelemetry SDK with a trace exporter pointed at Vector. + +The app should read telemetry endpoints from environment variables so they can be set per worktree: + +- `OTEL_EXPORTER_OTLP_ENDPOINT` — Vector's OTLP receiver address +- `LOG_ENDPOINT` — Vector's HTTP log receiver address (or equivalent) + +## Step 7: Create lifecycle commands + +All lifecycle tools are subcommands of the `harnesscli` CLI under `harnesscli observability`. + +### `harnesscli observability start` + +Starts the full observability stack for the current worktree: + +1. Derive worktree ID and compute ports for Vector, Victoria Logs, Victoria Metrics, Victoria Traces +2. Create worktree-scoped data directories +3. Start Victoria Logs, Victoria Metrics, Victoria Traces as background processes (via `std::process::Command`) +4. Start Vector with the generated config +5. Wait for all services to be healthy (HTTP readiness checks using `reqwest` or `ureq`, not sleeps) +6. Print a JSON metadata block with all endpoints (use `serde_json`): + +```json +{ + "worktree_id": "", + "vector_log_port": 5140, + "vector_otlp_port": 4317, + "vlogs_port": 9428, + "vlogs_query": "http://localhost:9428/select/logsql/query", + "vmetrics_port": 8428, + "vmetrics_query": "http://localhost:8428/api/v1/query", + "vtraces_port": 9428, + "vtraces_query": "http://localhost:9428/api/v3/search" +} +``` + +Port numbers above are examples — use worktree-derived values. + +Support env var overrides for all ports via `std::env::var`. +Support `--output json|ndjson|text`, defaulting to JSON in non-TTY contexts. If readiness is streamed step-by-step, emit NDJSON events so agents can consume status incrementally. + +### `harnesscli observability stop` + +Stops the observability stack for the current worktree: + +1. Derive the same worktree ID +2. Stop Vector, Victoria Logs, Victoria Metrics, Victoria Traces (use PID files or process matching) +3. Optionally clean up data directories (use `--clean` flag) + +Return a structured JSON result describing which processes were stopped, which were already absent, and whether cleanup ran. Failures must use the shared structured error contract. + +### `harnesscli observability query` + +A convenience wrapper for the agent to query any of the three APIs: + +```sh +harnesscli observability query logs '{app="myservice"} |= "error"' +harnesscli observability query metrics 'rate(http_requests_total[1m])' +harnesscli observability query traces '{duration > 2s}' +``` + +Should auto-detect the correct port for the current worktree and format the output as JSON. Use `reqwest` or `ureq` for HTTP requests and `serde_json` for output formatting. +Support `--output json|ndjson|text`, defaulting to JSON in non-TTY contexts. +For multi-row, paginated, or long-running queries, `--output ndjson` should emit one JSON object per result row or page so an agent can stream and truncate safely without reparsing a giant array. + +## Step 8: Integrate with app boot flow + +If the worktree-aware app boot flow from `execution-env-setup.md` exists: + +- `harnesscli observability start` should be called as part of the app startup sequence +- `harnesscli observability stop` should be called during teardown +- The app's environment variables for telemetry endpoints should be set automatically based on the observability stack's metadata output + +If no boot flow exists yet, the observability commands should work standalone. + +## Agent feedback loop + +Once the stack is running, the coding agent operates in a feedback loop: + +1. **Query** — run LogQL/PromQL/TraceQL queries to understand current behavior +2. **Correlate** — cross-reference logs, metrics, and traces to identify root causes +3. **Reason** — determine what change is needed +4. **Implement** — make the code change +5. **Restart** — restart the app (observability stack stays running) +6. **Re-run** — exercise the same workload or UI journey +7. **Verify** — query again to confirm the fix meets the requirement + +## Deliverables + +1. **Vector config template** — `scripts/observability/vector.toml` +2. **Lifecycle commands** — `harnesscli observability start`, `harnesscli observability stop`, `harnesscli observability query` +3. **App instrumentation** — OpenTelemetry SDK setup for the project's language, emitting logs/metrics/traces to Vector +4. **Integration with worktree boot** — observability stack starts/stops with the app + +## Non-goals + +- Do not deploy a production observability platform. +- Do not set up dashboards or alerting UIs. +- Do not persist telemetry beyond the worktree lifecycle. +- Focus on the minimum stack needed for the coding agent to query, reason, and verify. + +## Quality bar + +- Fully ephemeral — no state leaks between worktrees +- Deterministic startup with health checks +- All ports derived from worktree ID — safe for parallel use +- Agent can query all three signal types with a single command +- Teardown is clean and complete diff --git a/specs/harness-spec/4_enforce-invariants.md b/specs/harness-spec/4_enforce-invariants.md new file mode 100644 index 0000000..66b4489 --- /dev/null +++ b/specs/harness-spec/4_enforce-invariants.md @@ -0,0 +1,217 @@ +# Enforce Invariants with Custom Linters and Structural Tests + +Documentation alone doesn't keep a fully agent-generated codebase coherent. Enforce invariants mechanically — not by micromanaging implementations — so coding agents can ship fast without undermining the foundation. + +The principle: enforce boundaries centrally, allow autonomy locally. Care deeply about boundaries, correctness, and reproducibility. Within those boundaries, allow significant freedom in how solutions are expressed. + +This phase covers the **always-on, pre-merge enforcement layer**. The checks defined here must run through `harnesscli` locally on every branch before a change can merge to `main`. Put merge-blocking invariants here: boundary validation, dependency direction, cross-cutting boundary checks, codebase modularity rules, and other structural tests that protect the repository's shape continuously. + +Because this phase is part of the local pre-merge workflow, the checks here must stay fast enough to run routinely on a developer machine. Prefer static analysis, bounded graph checks, and targeted structural validation over long-running end-to-end or full-runtime verification. + +--- + +## Step 1: Understand the codebase architecture + +Before writing any linters or tests, map the repository's actual structure: + +- **Business domains**: What are the distinct domains in this codebase? (e.g., auth, billing, settings, etc.) +- **Layers within domains**: What layers exist? Identify the dependency direction. A typical layered model looks like: `Types → Config → Repo → Service → Runtime → UI` +- **Modules within domains**: What are the stable modules or bounded areas inside each domain? Which files belong to which module, and what is the allowed public surface for each one? +- **Cross-cutting concerns**: What shared concerns exist? (e.g., auth, connectors, telemetry, feature flags, utils). These should enter through a single explicit interface (e.g., a Providers layer). +- **Existing conventions**: What naming conventions, logging patterns, file organization rules, and type conventions are already in use? + +Document the discovered architecture in `docs/ARCHITECTURE.md` if not already done. + +--- + +## Step 2: Define the dependency rules + +Define which dependency directions are allowed and which are disallowed. These rules become the source of truth for the pre-merge linters and structural tests in this phase. + +For each business domain, specify: + +- The ordered set of layers and the permitted dependency direction (forward only) +- The declared modules within that domain and which files/directories belong to each module +- The allowed public entrypoints for each module and which imports must stay internal +- Which cross-cutting modules can be imported and through what interface +- What is explicitly disallowed (e.g., UI importing directly from Repo, Service importing from Runtime) +- What structure is forbidden (e.g., uncategorized top-level feature code, catch-all `misc` modules, kitchen-sink files that mix multiple modules) + +Create a machine-readable rules file (e.g., `architecture.json`, `.architecture.yaml`, or similar) that encodes: + +``` +{ + "layers": ["types", "config", "repo", "providers", "service", "runtime", "ui"], + "modules": { + "billing": { + "roots": ["src/billing"], + "submodules": ["invoices", "plans", "usage"], + "publicEntrypoints": ["src/billing/index.ts"] + } + }, + "direction": "forward", + "crossCutting": { + "providers": ["auth", "connectors", "telemetry", "featureFlags"] + }, + "disallowed": [ + { "from": "ui", "to": "repo" }, + { "from": "service", "to": "runtime" }, + { "from": "types", "to": "*" } + ] +} +``` + +Adapt the schema to match this project's actual architecture. The format should be whatever is easiest to consume by the custom linters. + +--- + +## Step 3: Build custom linters + +Create custom lint rules that enforce the architectural invariants mechanically. Implement them as part of the `harnesscli lint` command, organized into separate modules under `harness/src/cmd/lint/` or `harness/src/linters/`. + +### Modularize linter implementation + +- Do not accumulate all linter logic in a single `shared` file. +- Split the implementation by concern: rules loading, file discovery, import resolution, scan passes, and reporting should live in separate modules. +- Keep any `shared` entrypoint as a thin compatibility barrel only when needed by existing callers. +- When adding a new linter or scan type, put the logic in a focused module instead of extending a monolithic helper. + +This matters because custom linters tend to grow quickly. If all scanning logic lives in one file, agents will keep appending unrelated behavior until the linter itself becomes hard to change safely. + +### Dependency direction linter + +- Parse imports/requires in each file +- Determine which domain and layer each file belongs to (by file path convention) +- Verify that all imports respect the allowed dependency direction +- Flag any import that violates the rules + +### Module boundary linter + +- Verify that every production file belongs to a declared domain/layer/module from the architecture rules +- Verify that imports cross module boundaries only through declared public entrypoints +- Flag catch-all modules, uncategorized production directories, and files that mix responsibilities from multiple modules without an explicit boundary +- Fail when new code is added outside the declared modular structure unless the architecture rules are updated in the same change + +### Boundary parsing linter + +- Verify that external data is parsed and validated at boundaries (e.g., API handlers, external integrations) +- The linter should check that boundary modules use validation (the specific library doesn't matter — Zod, joi, typebox, pydantic, serde, etc. are all fine) +- Flag boundary modules that pass raw unvalidated data to internal layers + +### Taste invariants + +Implement linters for project-specific taste rules. Examples: + +- **Structured logging**: All log calls must use structured format (key-value pairs), not string interpolation +- **Naming conventions**: Schema types follow a consistent naming pattern (e.g., `*Schema`, `*Input`, `*Output`) +- **File size limits**: No single file exceeds a configurable line count threshold +- **No cross-layer shortcuts**: No file imports from a layer it shouldn't reach +- **No dumping-ground modules**: Avoid `misc`, `common`, or `helpers` buckets that accumulate unrelated business logic without a declared boundary + +### Error messages as remediation instructions + +This is critical: every lint error message should include **clear remediation instructions**. When a coding agent hits a lint failure, the error message becomes part of its context. Write error messages that tell the agent exactly what to do. + +Bad: +``` +Error: illegal import detected +``` + +Good: +``` +Error: ui/settings/SettingsPanel.ts imports from repo/settings/settingsRepo.ts + Rule: UI layer cannot import directly from Repo layer. + Fix: Move this data access through the Service layer. + Import from service/settings/settingsService.ts instead, + or create a service method that wraps this repo call. +``` + +--- + +## Step 4: Build structural tests + +Structural tests verify the codebase's shape at test time. Place them alongside other tests or in a dedicated `tests/structural/` directory. + +### Domain completeness test + +- For each business domain, verify that expected layers exist (e.g., every domain has a `types/` and a `service/` directory) +- Flag domains that are missing expected structure + +### Module ownership test + +- Verify that every production source file is owned by a declared domain/layer/module +- Flag source files that sit outside the declared modular structure +- Flag domains that accumulate multiple unrelated responsibilities in one module without an explicit architectural declaration + +### Dependency graph test + +- Build an import graph of the codebase +- Assert that no edges violate the dependency rules from Step 2 +- Output the violation as a clear diff: "file A imports file B, but layer X cannot depend on layer Y" + +### Cross-cutting boundary test + +- Verify that cross-cutting concerns (auth, telemetry, etc.) are only imported through the Providers interface +- Flag any direct import of a cross-cutting module from a domain layer + +### Convention conformance tests + +- Verify naming conventions are followed across all domains +- Verify file organization matches the expected structure +- Verify exported types match expected patterns +- Verify module entrypoints and internal-only files match the declared modular boundaries + +--- + +## Step 5: Integrate into the harness + +Wire the custom linters and structural tests into the existing harness so they run automatically on every branch before merge through local `harnesscli` commands. + +Keep Phase 4 checks lightweight enough for repeated local use. If a check routinely takes too long to run before merge, move it out of this phase and into Phase 5's scheduled recurring cleanup flow instead of slowing down the local enforcement loop. + +### Add to `harnesscli lint` + +The custom linters should run as part of `make lint` (which calls `harnesscli lint`) and must be required in the local pre-merge workflow. They should complete quickly enough to be run on every merge candidate. Either: +- Add them as an additional pass within the `harnesscli lint` command +- Or add a `harnesscli lint --architecture` flag and call it from the main `harnesscli lint` flow + +### Add to `harnesscli test` + +Structural tests should run as part of `make test` and must also be required before merge. They should be fast — they analyze file structure and imports, not runtime behavior. + +### Local enforcement + +The linters and structural tests must be runnable locally via `harnesscli` and the local make targets before merge. If any invariant is violated, the local verification fails and the change must not merge to `main`. No exceptions. + +--- + +## Step 6: Verify + +Run the local harness checks and confirm: + +```bash +make lint # Custom linters pass +make test # Structural tests pass +``` + +These commands should be practical to run before every merge. If they are too slow for routine local use, narrow their scope or move the expensive check into the recurring cleanup system. + +Intentionally introduce a violation (e.g., add a disallowed import) and confirm the linter catches it with a clear, actionable error message. + +--- + +## Deliverables + +- [ ] Machine-readable architecture rules file +- [ ] Dependency direction linter with remediation-quality error messages +- [ ] Module boundary linter that enforces declared domain/layer/module ownership +- [ ] Boundary parsing linter +- [ ] Taste invariant linters (structured logging, naming, file size, etc.) +- [ ] Linter implementation split into focused modules rather than one monolithic helper +- [ ] Structural tests for domain completeness, module ownership, dependency graph, cross-cutting boundaries +- [ ] Integration into `make lint` and `make test` as required local pre-merge checks +- [ ] Local harness checks pass before merge + +## Key principle + +Constraints are what allow speed without decay. Once encoded, they apply everywhere at once. Be prescriptive about boundaries, modular structure, and invariants, not about implementations. diff --git a/specs/harness-spec/5_recurring-cleanup.md b/specs/harness-spec/5_recurring-cleanup.md new file mode 100644 index 0000000..fdfca69 --- /dev/null +++ b/specs/harness-spec/5_recurring-cleanup.md @@ -0,0 +1,311 @@ +# Implement Recurring Cleanup Process + +Build a recurring, automated cleanup system that encodes golden principles into the repository and continuously enforces them. This functions like garbage collection for technical debt — human taste is captured once, then enforced continuously on every line of code. + +The goal: on a regular cadence, background tasks scan for deviations from golden principles, update quality grades, and open targeted refactoring pull requests. Most of these PRs should be reviewable in under a minute and safe to automerge. + +This phase is **not** the per-commit merge gate. Unlike Phase 4, the principles here do not need to be fully satisfied on every commit before merge. Instead, define a broader set of golden principles that are checked exhaustively on a recurring schedule (daily by default) through `harnesscli cleanup ...` commands. Use this phase for repository-wide hygiene, drift detection, grading, and small cleanup PR generation. + +--- + +## Step 1: Define golden principles + +Before building automation, codify the golden principles for this repository. These are opinionated, mechanical rules that keep the codebase legible and consistent for future agent runs. + +Do not duplicate checks that are already owned by Phase 4. Phase 4 supplies the always-on merge-blocking enforcement for architectural and structural invariants through `harnesscli lint` and `harnesscli test`. Phase 5 should focus on recurring repository-wide hygiene, grading, and cleanup work that is valuable to run daily but is not required to block every commit before merge. + +Explore the codebase and define principles in a machine-readable file (`golden-principles.yaml` or similar). Each principle should have: + +- **id**: Short identifier (e.g., `prefer-shared-utils`, `no-inline-secrets`) +- **description**: What the principle enforces and why +- **detection**: How to find violations (grep pattern, AST rule, file structure check, etc.) +- **remediation**: What the fix looks like — specific enough for an agent to act on +- **severity**: `warn` or `error` — whether a violation blocks merge or just opens a cleanup PR +- **automerge**: Whether cleanup PRs for this principle are safe to automerge + +Start with the following baseline principles and adapt to this project. The list should grow over time as new patterns are identified. + +### Repository-wide hygiene and safety principles + +```yaml +principles: + - id: no-inline-secrets + description: > + Source files and docs must not contain real credentials or token-shaped secret values. + detection_kind: secret-scan + remediation: > + Move the value into environment or config and keep examples obviously fake. + severity: error + automerge: false + + - id: test-coverage-for-new-code + description: > + New or modified modules must have corresponding test files; + untested production code must not be merged. + detection_kind: test-coverage + remediation: > + Add a test file covering the new or changed behavior. + severity: error + automerge: false +``` + +### Code quality principles (severity: warn) + +```yaml + - id: prefer-shared-utilities + description: > + Common operations (concurrency helpers, retry logic, date formatting, + path manipulation) must use shared utilities rather than hand-rolled + inline implementations. Keeps invariants centralized. + detection_kind: duplicate-utility + remediation: > + Check the shared utility package for an existing helper. If none exists, + add one there with tests rather than inlining a one-off implementation. + severity: warn + automerge: false + + - id: no-wildcard-re-exports + description: > + Modules must not use `export *` which obscures the public API and makes + dependency tracing harder for agents. + detection_kind: naming-convention + remediation: > + Replace `export *` with explicit named exports so the module boundary is legible. + severity: warn + automerge: true + + - id: no-dead-code + description: > + Remove unused exports, unreachable branches, and stale feature flags. + Dead code misleads agents into thinking it's still relevant. + detection_kind: dead-code + remediation: > + Delete the dead code. If it was a public API, verify no external consumers exist first. + severity: warn + automerge: true + + - id: consistent-error-handling + description: > + All errors must be handled explicitly. No swallowed catches, no ignored rejections. + Error paths must log structured context. + detection_kind: error-handling + remediation: > + Add structured error logging with relevant context. If the error is intentionally + ignored, add an explicit comment explaining why. + severity: warn + automerge: true + + - id: no-todo-outside-tests + description: > + Production code and docs must not accumulate untracked TODO placeholders. + detection_kind: todo-scan + remediation: > + Remove the placeholder or move the follow-up into + docs/exec-plans/tech-debt-tracker.md with a concrete next step. + severity: warn + automerge: true + +``` + +Adapt the detection kinds to the project's language and tooling. Favor recurring checks that are repository-wide and non-blocking for day-to-day iteration. For example, `todo-scan`, `secret-scan`, `duplicate-utility`, `test-coverage`, `dead-code`, and `error-handling` can be implemented as static scans, AST analysis, or heuristic grep patterns. Do not re-register detection kinds that are already enforced as Phase 4 pre-merge invariants. + +--- + +## Step 2: Build the scanner + +Implement the `harnesscli cleanup scan` subcommand that: + +1. Reads the golden principles file +2. For each principle, runs the detection logic against the codebase +3. Outputs a structured report of all violations found + +Support `--output json|ndjson|text`, defaulting to JSON in non-TTY contexts. `--output ndjson` should emit one JSON object per violation plus a terminal summary object so large scans can be streamed safely. This command should be callable manually and from scheduled automation; it is not the default pre-merge gate. + +The report format should be JSON: + +```json +{ + "timestamp": "2025-01-15T04:00:00Z", + "violations": [ + { + "principle_id": "prefer-shared-utils", + "file": "src/billing/utils/formatCurrency.ts", + "line": 12, + "description": "Local formatCurrency duplicates shared/utils/currency.ts", + "severity": "warn", + "remediation": "Replace with import from @shared/utils/currency" + } + ], + "summary": { + "total": 5, + "by_severity": { "warn": 4, "error": 1 }, + "by_principle": { "prefer-shared-utils": 2, "no-dead-code": 3 } + } +} +``` + +The scanner should be fast — it runs frequently. Prefer static analysis (grep, AST parsing, import graph analysis) over runtime checks. + +--- + +## Step 3: Build the quality grader + +Implement the `harnesscli cleanup grade` subcommand that: + +1. Runs the scanner +2. Computes a quality grade for the codebase based on violation counts and severities +3. Writes the grade to a trackable file (e.g., `docs/generated/quality-grade.json`) + +The command's stdout should also support the shared structured output contract so agents can consume the computed grade without reading files from disk. + +The grade file should include: + +```json +{ + "grade": "B+", + "score": 87, + "timestamp": "2025-01-15T04:00:00Z", + "trend": "improving", + "breakdown": { + "prefer-shared-utils": { "violations": 2, "max_score": 15, "score": 11 }, + "no-inline-secrets": { "violations": 0, "max_score": 25, "score": 25 }, + "no-dead-code": { "violations": 3, "max_score": 20, "score": 14 } + }, + "previous": { + "grade": "B", + "score": 83, + "timestamp": "2025-01-14T04:00:00Z" + } +} +``` + +The grading formula should be configurable. Principles with `severity: error` should weigh more heavily than `warn`. + +--- + +## Step 4: Build the cleanup PR generator + +Implement the `harnesscli cleanup fix` subcommand that: + +1. Runs the scanner to find violations +2. Groups violations by principle +3. For each group, creates a focused branch and applies the fix +4. Opens a pull request with: + - Title: `cleanup(): ` + - Body: which principle was violated, what files were changed, and why + - Label: `cleanup`, `automerge` (if the principle allows it) + +Each PR should be small and focused — one principle, one logical group of fixes. The goal is PRs reviewable in under a minute. + +The fix logic per principle can be: +- **Automated**: The command applies the fix directly (e.g., deleting dead code, replacing a local helper with a shared import) +- **Agent-assisted**: The command creates the branch with a description of what needs to change, and a coding agent completes the fix + +Start with automated fixes for simple principles (dead code removal, import replacement) and agent-assisted for complex ones (boundary validation refactoring). +Support `--output json|ndjson|text`, defaulting to JSON in non-TTY contexts. `--output ndjson` should stream one event per branch, PR, or fix attempt so agents can follow long-running cleanup work incrementally. + +--- + +## Step 5: Set up the recurring schedule + +### GitHub Actions workflow + +Create `.github/workflows/recurring-cleanup.yml`: + +```yaml +name: Recurring Cleanup + +on: + schedule: + - cron: "0 4 * * *" # Daily at 4am UTC + workflow_dispatch: + +jobs: + scan-and-grade: + runs-on: ubuntu-latest + steps: + - name: Checkout + uses: actions/checkout@v4 + + # Add language/runtime setup steps needed for this repository. + + - name: Build harness CLI + run: cargo build --release --manifest-path harness/Cargo.toml + + - name: Run scanner + run: harness/target/release/harnesscli cleanup scan > scan-report.json + + - name: Update quality grade + run: harness/target/release/harnesscli cleanup grade + + - name: Commit grade update + run: | + git config user.name "cleanup-bot" + git config user.email "cleanup-bot@noreply" + git add docs/generated/quality-grade.json + git diff --cached --quiet || git commit -m "chore: update quality grade" + git push + + open-cleanup-prs: + needs: scan-and-grade + runs-on: ubuntu-latest + steps: + - name: Checkout + uses: actions/checkout@v4 + + # Add language/runtime setup steps needed for this repository. + + - name: Build harness CLI + run: cargo build --release --manifest-path harness/Cargo.toml + + - name: Generate cleanup PRs + run: harness/target/release/harnesscli cleanup fix + env: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} +``` + +Customize the cron schedule for this project's cadence. Daily is a good default — frequent enough to catch drift early, not so frequent it creates noise. This scheduled workflow is the primary enforcement path for Phase 5. + +--- + +## Step 6: Integrate with existing harness + +- Expose the recurring cleanup flow through `harnesscli cleanup {scan,grade,fix}` so it can be run manually by humans and agents +- Keep the primary enforcement path in the scheduled workflow rather than the per-commit merge gate +- The quality grade should be checked in the recurring workflow — if the grade drops below a configurable threshold, the workflow should warn (or fail) and surface the regression +- Add `make scan` and `make grade` targets to `Makefile.harness` for on-demand local runs: + +```makefile +scan: harness-build + @$(HARNESS) cleanup scan + +grade: harness-build + @$(HARNESS) cleanup grade +``` + +--- + +## Step 7: Verify + +1. Run `harnesscli cleanup scan` and confirm it produces a valid violation report +2. Run `harnesscli cleanup grade` and confirm it produces a quality grade +3. Intentionally introduce a violation and confirm the scanner catches it +4. Run `harnesscli cleanup fix` on a test branch and confirm it opens a well-formed PR +5. Confirm the scheduled workflow (or an equivalent manual invocation) surfaces severe violations and generates the expected grade/cleanup output without being wired as a required per-commit merge gate + +--- + +## Deliverables + +- [ ] `golden-principles.yaml` — machine-readable principle definitions +- [ ] `harnesscli cleanup scan` — scans for violations, outputs JSON report +- [ ] `harnesscli cleanup grade` — computes and writes quality grade +- [ ] `harnesscli cleanup fix` — generates focused cleanup PRs +- [ ] `.github/workflows/recurring-cleanup.yml` — daily scheduled workflow +- [ ] `make scan` and `make grade` targets in `Makefile.harness` +- [ ] Daily scheduled workflow is the primary enforcement path; cleanup commands are available for on-demand runs via `harnesscli` +- [ ] Quality grade tracked in `docs/generated/quality-grade.json` + +## Key principle + +Technical debt is a high-interest loan. Pay it down continuously in small increments. Human taste is captured once in golden principles, then enforced continuously on every line of code. Catch bad patterns daily, not weekly. diff --git a/specs/harness-spec/6_ralph-loop.md b/specs/harness-spec/6_ralph-loop.md new file mode 100644 index 0000000..4b732d3 --- /dev/null +++ b/specs/harness-spec/6_ralph-loop.md @@ -0,0 +1,12 @@ +# Ralph Loop Prerequisite + +The Ralph Loop specification now lives at [`https://github.com/siisee11/ralph-loop.spec/blob/main/SPEC.md`](https://github.com/siisee11/ralph-loop.spec/blob/main/SPEC.md). + +Treat Ralph Loop as a prerequisite to the create-harness flow. Before considering this checkpoint satisfied, confirm the target repository already has: + +- A stable repo-root `./ralph-loop` entrypoint +- Setup, coding-loop, and PR-agent orchestration implemented from the standalone Ralph Loop spec +- Integration with `harnesscli init` and `docs/exec-plans/` +- Verification that prompt -> plan -> iterations -> commits -> PR works end to end + +Once that prerequisite is in place, continue using the create-harness documents to wire the remaining harness structure, observability, invariant enforcement, cleanup, and audit flows around it. diff --git a/specs/harness-spec/7_implement-harness-audit.md b/specs/harness-spec/7_implement-harness-audit.md new file mode 100644 index 0000000..d066f24 --- /dev/null +++ b/specs/harness-spec/7_implement-harness-audit.md @@ -0,0 +1,240 @@ +# Implement Harness Engineering Audit + +You are setting up a **harness engineering** system for this repository. Harness engineering ensures that AI coding agents (and humans) can reliably build, test, lint, and verify a codebase through stable, deterministic, single-command workflows. + +Your job is to create the harness artifacts, customize them for this repository, and verify everything passes the audit. + +**Important**: The repository documentation structure (`AGENTS.md`, `ARCHITECTURE.md`, `docs/` hierarchy) may already exist from a prior step. Do NOT recreate them. Instead, ensure they contain the required sections for the audit to pass (see audit checks below) and merge any missing sections into the existing files. + +--- + +## Step 1: Understand the repository + +Before creating anything, explore the repository to determine: + +- **Project type and runtime**: What language(s) and build tools does this project use? +- **Existing commands**: Are there existing build/test/lint/typecheck commands already defined? +- **Existing CI**: Is there a `.github/workflows/` directory with CI already configured? + +Use this information to customize every artifact below for this specific project. + +--- + +## Step 2: Create the harness files + +Create the following files. If a file already exists, preserve its content and merge harness sections into it rather than overwriting. + +### `Makefile.harness` + +```makefile +HARNESS := harness/target/release/harnesscli + +.PHONY: smoke test lint typecheck check ci harness-build + +harness-build: + @cargo build --release --manifest-path harness/Cargo.toml + +smoke: harness-build + @$(HARNESS) smoke + +test: harness-build + @$(HARNESS) test + +lint: harness-build + @$(HARNESS) lint + +typecheck: harness-build + @$(HARNESS) typecheck + +check: lint typecheck + +ci: smoke check test +``` + +Also ensure the main `Makefile` includes it. If no `Makefile` exists, create one with `-include Makefile.harness`. If one exists, append `-include Makefile.harness` if not already present. + +### The `harnesscli` CLI + +All harness tooling lives in a single Rust binary called `harnesscli`, located in `harness/` at the repository root. Create `harness/Cargo.toml` as the crate manifest. + +Use `clap` (with derive macros) for subcommand routing and argument parsing, and `anyhow` for error handling. Organize the source into modules by command group: + +``` +harness/ +├── Cargo.toml +└── src/ + ├── main.rs # CLI entrypoint, clap App definition + ├── cmd/ + │ ├── mod.rs + │ ├── init.rs # harnesscli init + │ ├── boot.rs # harnesscli boot {start,stop,status} + │ ├── smoke.rs # harnesscli smoke + │ ├── test.rs # harnesscli test + │ ├── lint.rs # harnesscli lint + │ ├── typecheck.rs # harnesscli typecheck + │ ├── audit.rs # harnesscli audit + │ ├── cleanup.rs # harnesscli cleanup {scan,grade,fix} + │ └── observability.rs # harnesscli observability {start,stop,query} + └── util/ + └── mod.rs # shared helpers (worktree ID, process spawning, etc.) +``` + +Each command should support env var overrides via `std::env::var` and invoke external tools via `std::process::Command`. + +Harness operations should be exposed directly through `harnesscli` subcommands rather than separate shell wrapper entrypoints, so the CLI remains the single stable operator surface. + +### Shared output contract + +Every `harnesscli` command must implement a shared output contract: + +- Support `--output json|ndjson|text` +- Default to `json` when stdout is not a TTY +- Keep `text` only as an explicit human-oriented mode +- Emit structured JSON errors for every non-zero exit, with stable fields such as: + +```json +{ + "error": { + "code": "port_in_use", + "message": "Derived port 4317 is already occupied", + "command": "observability start", + "details": { + "worktree_id": "abc123", + "port": 4317 + } + } +} +``` + +- Use `ndjson` for any command that streams progress, emits paginated data, or can return large result sets +- Add tests covering JSON success output, JSON error output, and non-TTY default behavior + +### `harnesscli smoke` + +The fastest possible sanity check — "does this project compile/build at all?" Should complete in seconds, not minutes. Use it to catch obvious breakage before running expensive checks. + +Implement the appropriate smoke command for this project's language and build tooling. Support an optional `HARNESS_SMOKE_CMD` env var override — if set, run that command instead. + +### `harnesscli test` + +Runs the full test suite with no filters or exclusions. This is the comprehensive correctness check. + +Implement the appropriate test command for this project. Support an optional `HARNESS_TEST_CMD` env var override. + +### `harnesscli lint` + +Runs static analysis and style checks. Should catch code quality issues, formatting problems, and common mistakes without executing code. + +Implement the appropriate linter for this project. Support an optional `HARNESS_LINT_CMD` env var override. + +### `harnesscli typecheck` + +Runs type checking / compilation verification. Should catch type errors and interface mismatches. + +Implement the appropriate type checker for this project. Support an optional `HARNESS_TYPECHECK_CMD` env var override. + +### `harnesscli audit` + +Audits the repo for harness compliance. It accepts an optional repo path argument (defaults to `.`). Performs two kinds of checks: + +1. **File existence** — verify all required files exist (see audit checks reference table below) +2. **Directory existence** — verify all required directories exist (see audit checks reference table below) + +In `--output text`, print `[ok]` or `[missing]` with a descriptive label. In `--output json`, return a single JSON object containing all checks, a summary, and pass/fail status. In `--output ndjson`, emit one JSON object per check plus a final summary object. Default to JSON when stdout is not a TTY. If any checks failed, exit non-zero with a structured JSON error or summary payload; if all pass, include `"passed": true`. + +Use `std::path::Path::exists()` for file/directory checks. + +### `.github/workflows/harness.yml` + +```yaml +name: Harness CI + +on: + push: + branches: [main] + pull_request: + +jobs: + harness: + runs-on: ubuntu-latest + steps: + - name: Checkout + uses: actions/checkout@v4 + + # Add language/runtime setup steps needed for this repository. + + - name: Run harness pipeline + run: make ci +``` + +Customize the workflow by adding the correct setup action for the detected project type. + +--- + +## Step 3: Build the CLI + +```sh +cargo build --release --manifest-path harness/Cargo.toml +``` + +--- + +## Step 4: Run the audit + +Run `harnesscli audit . --output json` and verify all checks pass. Fix any `[missing]` items until the structured output reports `"passed": true`. Human-oriented verification in `--output text` should still end with: + +``` +Harness audit passed. +``` + +--- + +## Step 5: Verify harness commands work + +Run each command and confirm it succeeds (or fails gracefully with clear output): + +```bash +make smoke +make lint +make typecheck +make check +make test +make ci +``` + +Fix any commands that fail due to missing tools or incorrect detection. + +--- + +## Audit checks reference + +### File existence + +| # | Check | Type | +|---|---|---| +| 1 | `AGENTS.md` exists | file | +| 2 | `ARCHITECTURE.md` exists | file | +| 3 | `NON_NEGOTIABLE_RULES.md` exists | file | +| 4 | `docs/PLANS.md` exists | file | +| 5 | `docs/design-docs/index.md` exists | file | +| 6 | `docs/design-docs/local-operations.md` exists | file | +| 7 | `docs/design-docs/worktree-isolation.md` exists | file | +| 8 | `docs/design-docs/observability-shim.md` exists | file | +| 9 | `docs/exec-plans/tech-debt-tracker.md` exists | file | +| 10 | `docs/product-specs/index.md` exists | file | +| 11 | `docs/product-specs/harness-demo-app.md` exists | file | +| 12 | `Makefile.harness` exists | file | +| 13 | `harness/Cargo.toml` exists | file | +| 14 | `harnesscli` CLI builds successfully | build | +| 15 | `.github/workflows/harness.yml` exists | file | + +### Directory existence + +| # | Check | Type | +|---|---|---| +| 17 | `docs/design-docs/` exists | directory | +| 18 | `docs/exec-plans/active/` exists | directory | +| 19 | `docs/exec-plans/completed/` exists | directory | +| 20 | `docs/product-specs/` exists | directory | +| 21 | `docs/references/` exists | directory | +| 22 | `docs/generated/` exists | directory | diff --git a/specs/harness-spec/SPEC.md b/specs/harness-spec/SPEC.md new file mode 100644 index 0000000..52a0a21 --- /dev/null +++ b/specs/harness-spec/SPEC.md @@ -0,0 +1,65 @@ +# create-harness Spec + +`create-harness/` is a portable blueprint for bootstrapping a harness engineering system in any repository. It provides phase-by-phase instructions that an agent (or human) follows sequentially to install documentation structure, execution environments, observability, invariant enforcement, automated cleanup, and a final audit around a separately specified Ralph Loop. + +## Goal + +Turn any repository into a fully harnessed, agent-operable codebase where: + +- Documentation is structured and navigable, not monolithic. +- The app boots deterministically per Git worktree with isolated resources. +- Logs, metrics, and traces are queryable locally without external infrastructure. +- Architectural boundaries are enforced mechanically, not by convention alone. +- Technical debt is detected, graded, and cleaned up automatically. +- An autonomous agent loop (Ralph Loop) is already available and can drive a task from prompt to pull request. +- A single audit command verifies the entire harness is wired and passing. + +## Prerequisite + +Apply the standalone Ralph Loop spec at [`https://github.com/siisee11/ralph-loop.spec/blob/main/SPEC.md`](https://github.com/siisee11/ralph-loop.spec/blob/main/SPEC.md) before treating the create-harness flow as complete. The create-harness documents assume the repository already has a working `./ralph-loop` command and then build the surrounding harness system around that capability. + +## Phases + +| Phase | File | Description | +|-------|------|-------------| +| 1 | [`1_harness_structure.md`](./1_harness_structure.md) | Repository documentation structure | +| 2 | [`2_execution-env-setup.md`](./2_execution-env-setup.md) | Worktree-aware execution environment | +| 3 | [`3_observability-stack-setup.md`](./3_observability-stack-setup.md) | Per-worktree observability stack | +| 4 | [`4_enforce-invariants.md`](./4_enforce-invariants.md) | Custom linters and structural tests | +| 5 | [`5_recurring-cleanup.md`](./5_recurring-cleanup.md) | Automated tech debt cleanup | +| 6 | [`6_ralph-loop.md`](./6_ralph-loop.md) | Ralph Loop prerequisite handoff | +| 7 | [`7_implement-harness-audit.md`](./7_implement-harness-audit.md) | End-to-end harness audit | + +Apply the Ralph Loop prerequisite before closing out the create-harness sequence. Each phase document contains self-contained instructions for the create-harness portion of the system. + +## Checklist + +The [`harness-scaffolding-checklist.md`](./harness-scaffolding-checklist.md) tracks completion status across all phases with per-item checkboxes. + +## Key Constraints + +- **Single Rust harness system of record.** Harness behavior should live in `harnesscli` subcommands. Do not require shell wrapper entrypoints for harness operations when the Rust CLI can serve as the stable interface directly. +- **Every command has a test.** Tests live as `#[cfg(test)]` modules or under `harness/tests/`. +- **Machine-readable output is mandatory.** Every `harnesscli` command must support structured output. Default to JSON in non-TTY contexts, use NDJSON for streaming or paginated responses, and return structured JSON errors on failure. +- **Codebase modularity is enforced mechanically.** Production code must live inside declared domains, layers, or modules with explicit boundaries; `harnesscli` lint/test checks should fail when code bypasses that structure. +- **Worktree isolation.** All runtime resources (ports, temp dirs, data dirs, logs) are derived from a deterministic worktree ID. +- **No blind sleeps.** Readiness is healthcheck-based. +- **Ralph Loop is externalized.** The autonomous coding loop is specified in [`https://github.com/siisee11/ralph-loop.spec/blob/main/SPEC.md`](https://github.com/siisee11/ralph-loop.spec/blob/main/SPEC.md); create-harness integrates around it rather than redefining it inline. +- **Portable.** This directory contains only instructions, templates, and reference artifacts. Any generated harness code still lives in the target repository. + +## Directory Structure + +``` +create-harness/ +├── SPEC.md # this file +├── harness-scaffolding-checklist.md # phase completion tracker +├── 1_harness_structure.md # Phase 1 instructions +├── 2_execution-env-setup.md # Phase 2 instructions +├── 3_observability-stack-setup.md # Phase 3 instructions +├── 4_enforce-invariants.md # Phase 4 instructions +├── 5_recurring-cleanup.md # Phase 5 instructions +├── 6_ralph-loop.md # Phase 6 prerequisite handoff +├── 7_implement-harness-audit.md # Phase 7 instructions +├── references/ # LLM-friendly docs and reference implementations +└── templates/ # Template files (e.g. NON_NEGOTIABLE_RULES.md) +``` diff --git a/specs/harness-spec/UPSTREAM.md b/specs/harness-spec/UPSTREAM.md new file mode 100644 index 0000000..7a76b2d --- /dev/null +++ b/specs/harness-spec/UPSTREAM.md @@ -0,0 +1,41 @@ +# harness.spec Upstream Metadata + +This directory vendors third-party specification content from `siisee11/harness.spec`. + +## Upstream repository + +- Repository: `https://github.com/siisee11/harness.spec` +- Source branch/reference fetched: `main` +- Resolved commit at fetch time (2026-03-17): `5e0c4d1802c7388b19b81590872a9c6dbf2a9f01` +- License in upstream repo: no top-level `LICENSE` file present at fetched commit. + +## Canonical source URLs + +- Repository root: `https://github.com/siisee11/harness.spec/tree/5e0c4d1802c7388b19b81590872a9c6dbf2a9f01` +- Main spec: `https://github.com/siisee11/harness.spec/blob/5e0c4d1802c7388b19b81590872a9c6dbf2a9f01/SPEC.md` + +## Import method + +- Vendored snapshot copy under the corresponding `specs//` directory. +- Upstream git metadata (`.git/`) excluded. +- Imported payload mirrors upstream tracked files at the resolved commit. + +## Integrity hashes from imported files + +- `1_harness_structure.md`: `64c84a45c8a47d9d39a472ca70c26ce6ca97e3f5964ef4148e5c29f2b3d4fc65` +- `2_execution-env-setup.md`: `d11b55cd8ed6793104e83bf26ce86c5f44c70f48eb46d6da7f8f8f024d3c670d` +- `3_observability-stack-setup.md`: `47198753d947e55e3492537b3294195f3b791a489e386be3a65cebf3b80ddbce` +- `4_enforce-invariants.md`: `394c59e998ba94018aa81997c422420b5f950b82e101a7207791eabd6df4d4cd` +- `5_recurring-cleanup.md`: `ebbf6ea05bc2f44dc95416b0f70f82b1f9fa0e9272a54e7c2833a2eda803fe7e` +- `6_ralph-loop.md`: `d489806965884d92e1a2d1f620c6f8612f9896ac70b7645ec3b4c753ee3e9e96` +- `7_implement-harness-audit.md`: `3b0a8be6a11fce82c661af0c8e06f40fb0d283728f83efd2deaf82d22d116f28` +- `SPEC.md`: `e49de348f98131d50d1b20ca55eec65ccbe864e5f391ea3a728898618e1c44cd` +- `harness-scaffolding-checklist.md`: `85651c6c3673bce587297833c851bf0ec8b44db90ac8474c8fa646c9aa86d1f6` +- `references/codex-app-server-llm.txt`: `060db76ac1bd3a412c8dc3231afd37fceff8b58ac776c47323670be7fdddbb79` +- `templates/NON_NEGOTIABLE_RULES.md`: `f6bd0b202346c7c6029871746f9bbc480d15b02a160c133a6bb5432099e59589` + +## Modification status + +- All files except `metadata.json` and `UPSTREAM.md` are unmodified copies of upstream tracked files at commit `5e0c4d1802c7388b19b81590872a9c6dbf2a9f01`. +- `metadata.json` is repository-local discovery metadata. +- `UPSTREAM.md` is repository-local provenance metadata. diff --git a/specs/harness-spec/harness-scaffolding-checklist.md b/specs/harness-spec/harness-scaffolding-checklist.md new file mode 100644 index 0000000..a1711ef --- /dev/null +++ b/specs/harness-spec/harness-scaffolding-checklist.md @@ -0,0 +1,149 @@ +# Harness Scaffolding Checklist + +Apply the following phases in order to scaffold a complete harness engineering system for this repository. + +--- + +## Phase 1: Repository Documentation Structure + +Apply the instructions in [`1_harness_structure.md`](./1_harness_structure.md). + +This phase sets up the documentation hierarchy: + +- [ ] `AGENTS.md` — compact table-of-contents entrypoint (~100 lines, navigation only) +- [ ] `ARCHITECTURE.md` — top-level map of domains, boundaries, dependencies, entrypoints +- [ ] `NON_NEGOTIABLE_RULES.md` — absolute rules that block merge unconditionally (use `create-harness/templates/NON_NEGOTIABLE_RULES.md` as template) +- [ ] `docs/PLANS.md` +- [ ] `docs/design-docs/index.md` +- [ ] `docs/design-docs/core-beliefs.md` — product beliefs + agent-first operating principles (see `harness_structure.md`) +- [ ] `docs/design-docs/local-operations.md` — local commands, env vars, and troubleshooting for humans and agents +- [ ] `docs/design-docs/worktree-isolation.md` — worktree ID derivation, runtime roots, cleanup, and failure handling +- [ ] `docs/design-docs/observability-shim.md` — telemetry architecture and query contract +- [ ] `docs/exec-plans/active/` +- [ ] `docs/exec-plans/completed/` +- [ ] `docs/exec-plans/tech-debt-tracker.md` +- [ ] `docs/product-specs/index.md` +- [ ] `docs/product-specs/harness-demo-app.md` — deterministic demo surface used for browser validation +- [ ] `docs/references/` — copy contents from `create-harness/references/` as seed +- [ ] `docs/generated/` + +Key rules: +- `AGENTS.md` is a navigation document, not a knowledge document. Move any substantive guidance into `docs/`. +- Real source of truth lives in `docs/` and top-level documents, not in `AGENTS.md`. +- Prefer many small, maintainable documents over one giant document. +- Documentation must reflect real code and real operating practices. +- **All harness behavior lives in a single Rust CLI** called `harnesscli`. Thin shell wrappers are allowed only as stable entrypoints that immediately delegate to `harnesscli` or another versioned harness executable. +- **Every command must have a corresponding test.** Tests live alongside the Rust source as `#[cfg(test)]` modules or in integration test files under `harness/tests/`. + +--- + +## Phase 2: Execution Environment Setup + +Apply the instructions in [`2_execution-env-setup.md`](./2_execution-env-setup.md). + +This phase makes the app bootable per Git worktree for isolated development: + +- [ ] Worktree-aware boot flow with derived worktree ID +- [ ] Isolated runtime resources per worktree (ports, temp dirs, logs, etc.) +- [ ] Single command to boot the app for the current worktree +- [ ] Launch contract returning metadata (`app_url`, `port`, `healthcheck_url`, `worktree_id`, `runtime_root`, and observability metadata when available) +- [ ] Healthcheck-based readiness (no blind sleeps) +- [ ] `harnesscli init` — idempotent environment initialization with JSON output contract +- [ ] `harnesscli boot {start,status,stop}` — machine-readable lifecycle commands with JSON/NDJSON output +- [ ] `agent-browser` skill installed for UI investigation +- [ ] Example reproducibility and validation flow + +--- + +## Phase 3: Observability Stack + +Apply the instructions in [`3_observability-stack-setup.md`](./3_observability-stack-setup.md). + +This phase sets up ephemeral, per-worktree telemetry so the agent can query logs, metrics, and traces: + +- [ ] Vector config template for telemetry collection and fan-out +- [ ] Victoria Logs — log storage with LogQL API +- [ ] Victoria Metrics — metrics storage with PromQL API +- [ ] Victoria Traces — trace storage with TraceQL API +- [ ] All ports and data dirs derived from worktree ID +- [ ] App instrumented with OpenTelemetry SDK (logs, metrics, traces to Vector) +- [ ] `harnesscli observability start` — starts the stack with health checks +- [ ] `harnesscli observability stop` — tears down the stack and cleans up +- [ ] `harnesscli observability query` — convenience wrapper for LogQL/PromQL/TraceQL queries +- [ ] Observability commands default to structured output in non-TTY contexts and support NDJSON for streaming queries +- [ ] Integrated with worktree app boot flow + +--- + +## Phase 4: Enforce Invariants + +Apply the instructions in [`4_enforce-invariants.md`](./4_enforce-invariants.md). + +This phase enforces architectural boundaries and taste mechanically via custom linters and structural tests. These are the required local pre-merge checks that must pass through `harnesscli` before a change merges to `main`: + +- [ ] Machine-readable architecture rules file (dependency directions, allowed edges) +- [ ] Declared domain/layer/module ownership rules for production code +- [ ] Dependency direction linter — verifies imports respect layer ordering +- [ ] Module boundary linter — verifies production code stays inside declared modules and crosses boundaries only through allowed entrypoints +- [ ] Boundary parsing linter — verifies external data is validated at boundaries +- [ ] Taste invariant linters (structured logging, naming conventions, file size limits) +- [ ] Linter implementation is modularized by concern; avoid one monolithic `shared` helper +- [ ] All lint error messages include clear remediation instructions for agents +- [ ] Structural tests for domain completeness, module ownership, and dependency graph validation +- [ ] Cross-cutting boundary tests (shared concerns only via Providers interface) +- [ ] Integrated into `make lint` and `make test` as required pre-merge checks + +--- + +## Phase 5: Recurring Cleanup Process + +Apply the instructions in [`5_recurring-cleanup.md`](./5_recurring-cleanup.md). + +This phase encodes golden principles and builds automated garbage collection for technical debt. These checks run as a recurring full sweep, not as a required per-commit merge gate: + +- [ ] `golden-principles.yaml` — machine-readable principle definitions with detection and remediation +- [ ] `harnesscli cleanup scan` — scans for violations, outputs JSON report +- [ ] `harnesscli cleanup grade` — computes and tracks quality grade +- [ ] `harnesscli cleanup fix` — generates focused, small cleanup PRs +- [ ] Cleanup commands support JSON/NDJSON output modes for large scans and long-running fix operations +- [ ] `.github/workflows/recurring-cleanup.yml` — daily scheduled scan, grade update, and PR generation +- [ ] `make scan` and `make grade` targets in `Makefile.harness` +- [ ] Daily scheduled workflow is the primary enforcement path for cleanup checks +- [ ] Quality grade tracked in `docs/generated/quality-grade.json` + +--- + +## Phase 6: Ralph Loop Prerequisite + +Apply the instructions in [`6_ralph-loop.md`](./6_ralph-loop.md). + +This checkpoint confirms the standalone Ralph Loop spec has already been applied before the create-harness flow is considered complete: + +- [ ] [`https://github.com/siisee11/ralph-loop.spec/blob/main/SPEC.md`](https://github.com/siisee11/ralph-loop.spec/blob/main/SPEC.md) has been reviewed and applied +- [ ] Repo-root `./ralph-loop` entrypoint is available in the target repository +- [ ] Ralph Loop setup, coding-loop, and PR-agent orchestration are implemented +- [ ] Ralph Loop integrates with `harnesscli init` and `docs/exec-plans/` +- [ ] End-to-end verification passes: prompt -> worktree -> plan -> iterations -> commits -> PR + +--- + +## Phase 7: Harness Engineering Audit + +Apply the instructions in [`7_implement-harness-audit.md`](./7_implement-harness-audit.md). + +This is the final phase — it verifies everything from all prior phases is wired together and passing: + +- [ ] `Makefile.harness` with smoke/test/lint/typecheck/check/ci targets +- [ ] `Makefile` includes `Makefile.harness` +- [ ] `harnesscli smoke` — fast sanity check +- [ ] `harnesscli test` — full test suite +- [ ] `harnesscli lint` — static analysis +- [ ] `harnesscli typecheck` — type checking +- [ ] `harnesscli init`, `harnesscli boot`, and `harnesscli observability` command groups implemented +- [ ] `harnesscli audit` — audits all files and directories exist +- [ ] Every `harnesscli` command supports `--output json|ndjson|text`, defaults to JSON in non-TTY contexts, and returns structured JSON errors +- [ ] `harness/Cargo.toml` — Rust crate for the `harnesscli` CLI +- [ ] `.github/workflows/harness.yml` — CI workflow running `make ci` +- [ ] `harnesscli` CLI builds successfully (`cargo build --release -p harness`) +- [ ] `harness audit .` passes +- [ ] `make ci` succeeds diff --git a/specs/harness-spec/metadata.json b/specs/harness-spec/metadata.json new file mode 100644 index 0000000..dba5bd6 --- /dev/null +++ b/specs/harness-spec/metadata.json @@ -0,0 +1,4 @@ +{ + "source": "https://github.com/siisee11/harness.spec", + "synced_date": "2026-03-17T11:20:40.921Z" +} diff --git a/specs/harness-spec/references/codex-app-server-llm.txt b/specs/harness-spec/references/codex-app-server-llm.txt new file mode 100644 index 0000000..5c2100b --- /dev/null +++ b/specs/harness-spec/references/codex-app-server-llm.txt @@ -0,0 +1,1438 @@ +# Codex App Server + +Codex app-server is the interface Codex uses to power rich clients (for example, the Codex VS Code extension). Use it when you want a deep integration inside your own product: authentication, conversation history, approvals, and streamed agent events. The app-server implementation is open source in the Codex GitHub repository ([openai/codex/codex-rs/app-server](https://github.com/openai/codex/tree/main/codex-rs/app-server)). See the [Open Source](https://developers.openai.com/codex/open-source) page for the full list of open-source Codex components. + +If you are automating jobs or running Codex in CI, use the + Codex SDK instead. + +## Protocol + +Like [MCP](https://modelcontextprotocol.io/), `codex app-server` supports bidirectional communication using JSON-RPC 2.0 messages (with the `"jsonrpc":"2.0"` header omitted on the wire). + +Supported transports: + +- `stdio` (`--listen stdio://`, default): newline-delimited JSON (JSONL). +- `websocket` (`--listen ws://IP:PORT`, experimental): one JSON-RPC message per WebSocket text frame. + +In WebSocket mode, app-server uses bounded queues. When request ingress is full, the server rejects new requests with JSON-RPC error code `-32001` and message `"Server overloaded; retry later."` Clients should retry with an exponentially increasing delay and jitter. + +## Message schema + +Requests include `method`, `params`, and `id`: + +```json +{ "method": "thread/start", "id": 10, "params": { "model": "gpt-5.1-codex" } } +``` + +Responses echo the `id` with either `result` or `error`: + +```json +{ "id": 10, "result": { "thread": { "id": "thr_123" } } } +``` + +```json +{ "id": 10, "error": { "code": 123, "message": "Something went wrong" } } +``` + +Notifications omit `id` and use only `method` and `params`: + +```json +{ "method": "turn/started", "params": { "turn": { "id": "turn_456" } } } +``` + +You can generate a TypeScript schema or a JSON Schema bundle from the CLI. Each output is specific to the Codex version you ran, so the generated artifacts match that version exactly: + +```bash +codex app-server generate-ts --out ./schemas +codex app-server generate-json-schema --out ./schemas +``` + +## Getting started + +1. Start the server with `codex app-server` (default stdio transport) or `codex app-server --listen ws://127.0.0.1:4500` (experimental WebSocket transport). +2. Connect a client over the selected transport, then send `initialize` followed by the `initialized` notification. +3. Start a thread and a turn, then keep reading notifications from the active transport stream. + +Example (Node.js / TypeScript): + +```ts + + + +const proc = spawn("codex", ["app-server"], { + stdio: ["pipe", "pipe", "inherit"], +}); +const rl = readline.createInterface({ input: proc.stdout }); + +const send = (message: unknown) => { + proc.stdin.write(`${JSON.stringify(message)}\n`); +}; + +let threadId: string | null = null; + +rl.on("line", (line) => { + const msg = JSON.parse(line) as any; + console.log("server:", msg); + + if (msg.id === 1 && msg.result?.thread?.id && !threadId) { + threadId = msg.result.thread.id; + send({ + method: "turn/start", + id: 2, + params: { + threadId, + input: [{ type: "text", text: "Summarize this repo." }], + }, + }); + } +}); + +send({ + method: "initialize", + id: 0, + params: { + clientInfo: { + name: "my_product", + title: "My Product", + version: "0.1.0", + }, + }, +}); +send({ method: "initialized", params: {} }); +send({ method: "thread/start", id: 1, params: { model: "gpt-5.1-codex" } }); +``` + +## Core primitives + +- **Thread**: A conversation between a user and the Codex agent. Threads contain turns. +- **Turn**: A single user request and the agent work that follows. Turns contain items and stream incremental updates. +- **Item**: A unit of input or output (user message, agent message, command runs, file change, tool call, and more). + +Use the thread APIs to create, list, or archive conversations. Drive a conversation with turn APIs and stream progress via turn notifications. + +## Lifecycle overview + +- **Initialize once per connection**: Immediately after opening a transport connection, send an `initialize` request with your client metadata, then emit `initialized`. The server rejects any request on that connection before this handshake. +- **Start (or resume) a thread**: Call `thread/start` for a new conversation, `thread/resume` to continue an existing one, or `thread/fork` to branch history into a new thread id. +- **Begin a turn**: Call `turn/start` with the target `threadId` and user input. Optional fields override model, personality, `cwd`, sandbox policy, and more. +- **Steer an active turn**: Call `turn/steer` to append user input to the currently in-flight turn without creating a new turn. +- **Stream events**: After `turn/start`, keep reading notifications on stdout: `thread/archived`, `thread/unarchived`, `item/started`, `item/completed`, `item/agentMessage/delta`, tool progress, and other updates. +- **Finish the turn**: The server emits `turn/completed` with final status when the model finishes or after a `turn/interrupt` cancellation. + +## Initialization + +Clients must send a single `initialize` request per transport connection before invoking any other method on that connection, then acknowledge with an `initialized` notification. Requests sent before initialization receive a `Not initialized` error, and repeated `initialize` calls on the same connection return `Already initialized`. + +The server returns the user agent string it will present to upstream services. Set `clientInfo` to identify your integration. + +`initialize.params.capabilities` also supports per-connection notification opt-out via `optOutNotificationMethods`, which is a list of exact method names to suppress for that connection. Matching is exact (no wildcards/prefixes). Unknown method names are accepted and ignored. + +**Important**: Use `clientInfo.name` to identify your client for the OpenAI Compliance Logs Platform. If you are developing a new Codex integration intended for enterprise use, please contact OpenAI to get it added to a known clients list. For more context, see the [Codex logs reference](https://chatgpt.com/admin/api-reference#tag/Logs:-Codex). + +Example (from the Codex VS Code extension): + +```json +{ + "method": "initialize", + "id": 0, + "params": { + "clientInfo": { + "name": "codex_vscode", + "title": "Codex VS Code Extension", + "version": "0.1.0" + } + } +} +``` + +Example with notification opt-out: + +```json +{ + "method": "initialize", + "id": 1, + "params": { + "clientInfo": { + "name": "my_client", + "title": "My Client", + "version": "0.1.0" + }, + "capabilities": { + "experimentalApi": true, + "optOutNotificationMethods": [ + "codex/event/session_configured", + "item/agentMessage/delta" + ] + } + } +} +``` + +## Experimental API opt-in + +Some app-server methods and fields are intentionally gated behind `experimentalApi` capability. + +- Omit `capabilities` (or set `experimentalApi` to `false`) to stay on the stable API surface, and the server rejects experimental methods/fields. +- Set `capabilities.experimentalApi` to `true` to enable experimental methods and fields. + +```json +{ + "method": "initialize", + "id": 1, + "params": { + "clientInfo": { + "name": "my_client", + "title": "My Client", + "version": "0.1.0" + }, + "capabilities": { + "experimentalApi": true + } + } +} +``` + +If a client sends an experimental method or field without opting in, app-server rejects it with: + +` requires experimentalApi capability` + +## API overview + +- `thread/start` - create a new thread; emits `thread/started` and automatically subscribes you to turn/item events for that thread. +- `thread/resume` - reopen an existing thread by id so later `turn/start` calls append to it. +- `thread/fork` - fork a thread into a new thread id by copying stored history; emits `thread/started` for the new thread. +- `thread/read` - read a stored thread by id without resuming it; set `includeTurns` to return full turn history. Returned `thread` objects include runtime `status`. +- `thread/list` - page through stored thread logs; supports cursor-based pagination plus `modelProviders`, `sourceKinds`, `archived`, and `cwd` filters. Returned `thread` objects include runtime `status`. +- `thread/loaded/list` - list the thread ids currently loaded in memory. +- `thread/archive` - move a thread's log file into the archived directory; returns `{}` on success and emits `thread/archived`. +- `thread/unsubscribe` - unsubscribe this connection from thread turn/item events. If this was the last subscriber, the server unloads the thread and emits `thread/closed`. +- `thread/unarchive` - restore an archived thread rollout back into the active sessions directory; returns the restored `thread` and emits `thread/unarchived`. +- `thread/status/changed` - notification emitted when a loaded thread's runtime `status` changes. +- `thread/compact/start` - trigger conversation history compaction for a thread; returns `{}` immediately while progress streams via `turn/*` and `item/*` notifications. +- `thread/rollback` - drop the last N turns from the in-memory context and persist a rollback marker; returns the updated `thread`. +- `turn/start` - add user input to a thread and begin Codex generation; responds with the initial `turn` and streams events. For `collaborationMode`, `settings.developer_instructions: null` means "use built-in instructions for the selected mode." +- `turn/steer` - append user input to the active in-flight turn for a thread; returns the accepted `turnId`. +- `turn/interrupt` - request cancellation of an in-flight turn; success is `{}` and the turn ends with `status: "interrupted"`. +- `review/start` - kick off the Codex reviewer for a thread; emits `enteredReviewMode` and `exitedReviewMode` items. +- `command/exec` - run a single command under the server sandbox without starting a thread/turn. +- `model/list` - list available models (set `includeHidden: true` to include entries with `hidden: true`) with effort options, optional `upgrade`, and `inputModalities`. +- `experimentalFeature/list` - list feature flags with lifecycle stage metadata and cursor pagination. +- `collaborationMode/list` - list collaboration mode presets (experimental, no pagination). +- `skills/list` - list skills for one or more `cwd` values (supports `forceReload` and optional `perCwdExtraUserRoots`). +- `app/list` - list available apps (connectors) with pagination plus accessibility/enabled metadata. +- `skills/config/write` - enable or disable skills by path. +- `mcpServer/oauth/login` - start an OAuth login for a configured MCP server; returns an authorization URL and emits `mcpServer/oauthLogin/completed` on completion. +- `tool/requestUserInput` - prompt the user with 1-3 short questions for a tool call (experimental); questions can set `isOther` for a free-form option. +- `config/mcpServer/reload` - reload MCP server configuration from disk and queue a refresh for loaded threads. +- `mcpServerStatus/list` - list MCP servers, tools, resources, and auth status (cursor + limit pagination). +- `windowsSandbox/setupStart` - start Windows sandbox setup for `elevated` or `unelevated` mode; returns quickly and later emits `windowsSandbox/setupCompleted`. +- `feedback/upload` - submit a feedback report (classification + optional reason/logs + conversation id, plus optional `extraLogFiles` attachments). +- `config/read` - fetch the effective configuration on disk after resolving configuration layering. +- `externalAgentConfig/detect` - detect migratable external-agent artifacts with `includeHome` and optional `cwds`; each detected item includes `cwd` (`null` for home). +- `externalAgentConfig/import` - apply selected external-agent migration items by passing explicit `migrationItems` with `cwd` (`null` for home). +- `config/value/write` - write a single configuration key/value to the user's `config.toml` on disk. +- `config/batchWrite` - apply configuration edits atomically to the user's `config.toml` on disk. +- `configRequirements/read` - fetch requirements from `requirements.toml` and/or MDM, including allow-lists, pinned `featureRequirements`, and residency/network requirements (or `null` if you haven't set any up). + +## Models + +### List models (`model/list`) + +Call `model/list` to discover available models and their capabilities before rendering model or personality selectors. + +```json +{ "method": "model/list", "id": 6, "params": { "limit": 20, "includeHidden": false } } +{ "id": 6, "result": { + "data": [{ + "id": "gpt-5.4", + "model": "gpt-5.4", + "displayName": "GPT-5.4", + "hidden": false, + "defaultReasoningEffort": "medium", + "supportedReasoningEfforts": [{ + "reasoningEffort": "low", + "description": "Lower latency" + }], + "inputModalities": ["text", "image"], + "supportsPersonality": true, + "isDefault": true + }], + "nextCursor": null +} } +``` + +Each model entry can include: + +- `supportedReasoningEfforts` - supported effort options for the model. +- `defaultReasoningEffort` - suggested default effort for clients. +- `upgrade` - optional recommended upgrade model id for migration prompts in clients. +- `upgradeInfo` - optional upgrade metadata for migration prompts in clients. +- `hidden` - whether the model is hidden from the default picker list. +- `inputModalities` - supported input types for the model (for example `text`, `image`). +- `supportsPersonality` - whether the model supports personality-specific instructions such as `/personality`. +- `isDefault` - whether the model is the recommended default. + +By default, `model/list` returns picker-visible models only. Set `includeHidden: true` if you need the full list and want to filter on the client side using `hidden`. + +When `inputModalities` is missing (older model catalogs), treat it as `["text", "image"]` for backward compatibility. + +### List experimental features (`experimentalFeature/list`) + +Use this endpoint to discover feature flags with metadata and lifecycle stage: + +```json +{ "method": "experimentalFeature/list", "id": 7, "params": { "limit": 20 } } +{ "id": 7, "result": { + "data": [{ + "name": "unified_exec", + "stage": "beta", + "displayName": "Unified exec", + "description": "Use the unified PTY-backed execution tool.", + "announcement": "Beta rollout for improved command execution reliability.", + "enabled": false, + "defaultEnabled": false + }], + "nextCursor": null +} } +``` + +`stage` can be `beta`, `underDevelopment`, `stable`, `deprecated`, or `removed`. For non-beta flags, `displayName`, `description`, and `announcement` may be `null`. + +## Threads + +- `thread/read` reads a stored thread without subscribing to it; set `includeTurns` to include turns. +- `thread/list` supports cursor pagination plus `modelProviders`, `sourceKinds`, `archived`, and `cwd` filtering. +- `thread/loaded/list` returns the thread IDs currently in memory. +- `thread/archive` moves the thread's persisted JSONL log into the archived directory. +- `thread/unsubscribe` unsubscribes the current connection from a loaded thread and can trigger `thread/closed`. +- `thread/unarchive` restores an archived thread rollout back into the active sessions directory. +- `thread/compact/start` triggers compaction and returns `{}` immediately. +- `thread/rollback` drops the last N turns from the in-memory context and records a rollback marker in the thread's persisted JSONL log. + +### Start or resume a thread + +Start a fresh thread when you need a new Codex conversation. + +```json +{ "method": "thread/start", "id": 10, "params": { + "model": "gpt-5.1-codex", + "cwd": "/Users/me/project", + "approvalPolicy": "never", + "sandbox": "workspaceWrite", + "personality": "friendly", + "serviceName": "my_app_server_client" +} } +{ "id": 10, "result": { + "thread": { + "id": "thr_123", + "preview": "", + "ephemeral": false, + "modelProvider": "openai", + "createdAt": 1730910000 + } +} } +{ "method": "thread/started", "params": { "thread": { "id": "thr_123" } } } +``` + +`serviceName` is optional. Set it when you want app-server to tag thread-level metrics with your integration's service name. + +To continue a stored session, call `thread/resume` with the `thread.id` you recorded earlier. The response shape matches `thread/start`. You can also pass the same configuration overrides supported by `thread/start`, such as `personality`: + +```json +{ "method": "thread/resume", "id": 11, "params": { + "threadId": "thr_123", + "personality": "friendly" +} } +{ "id": 11, "result": { "thread": { "id": "thr_123", "name": "Bug bash notes", "ephemeral": false } } } +``` + +Resuming a thread doesn't update `thread.updatedAt` (or the rollout file's modified time) by itself. The timestamp updates when you start a turn. + +If you mark an enabled MCP server as `required` in config and that server fails to initialize, `thread/start` and `thread/resume` fail instead of continuing without it. + +`dynamicTools` on `thread/start` is an experimental field (requires `capabilities.experimentalApi = true`). Codex persists these dynamic tools in the thread rollout metadata and restores them on `thread/resume` when you don't supply new dynamic tools. + +If you resume with a different model than the one recorded in the rollout, Codex emits a warning and applies a one-time model-switch instruction on the next turn. + +To branch from a stored session, call `thread/fork` with the `thread.id`. This creates a new thread id and emits a `thread/started` notification for it: + +```json +{ "method": "thread/fork", "id": 12, "params": { "threadId": "thr_123" } } +{ "id": 12, "result": { "thread": { "id": "thr_456" } } } +{ "method": "thread/started", "params": { "thread": { "id": "thr_456" } } } +``` + +When a user-facing thread title has been set, app-server hydrates `thread.name` on `thread/list`, `thread/read`, `thread/resume`, `thread/unarchive`, and `thread/rollback` responses. `thread/start` and `thread/fork` may omit `name` (or return `null`) until a title is set later. + +### Read a stored thread (without resuming) + +Use `thread/read` when you want stored thread data but don't want to resume the thread or subscribe to its events. + +- `includeTurns` - when `true`, the response includes the thread's turns; when `false` or omitted, you get the thread summary only. +- Returned `thread` objects include runtime `status` (`notLoaded`, `idle`, `systemError`, or `active` with `activeFlags`). + +```json +{ "method": "thread/read", "id": 19, "params": { "threadId": "thr_123", "includeTurns": true } } +{ "id": 19, "result": { "thread": { "id": "thr_123", "name": "Bug bash notes", "ephemeral": false, "status": { "type": "notLoaded" }, "turns": [] } } } +``` + +Unlike `thread/resume`, `thread/read` doesn't load the thread into memory or emit `thread/started`. + +### List threads (with pagination & filters) + +`thread/list` lets you render a history UI. Results default to newest-first by `createdAt`. Filters apply before pagination. Pass any combination of: + +- `cursor` - opaque string from a prior response; omit for the first page. +- `limit` - server defaults to a reasonable page size if unset. +- `sortKey` - `created_at` (default) or `updated_at`. +- `modelProviders` - restrict results to specific providers; unset, null, or an empty array includes all providers. +- `sourceKinds` - restrict results to specific thread sources. When omitted or `[]`, the server defaults to interactive sources only: `cli` and `vscode`. +- `archived` - when `true`, list archived threads only. When `false` or omitted, list non-archived threads (default). +- `cwd` - restrict results to threads whose session current working directory exactly matches this path. + +`sourceKinds` accepts the following values: + +- `cli` +- `vscode` +- `exec` +- `appServer` +- `subAgent` +- `subAgentReview` +- `subAgentCompact` +- `subAgentThreadSpawn` +- `subAgentOther` +- `unknown` + +Example: + +```json +{ "method": "thread/list", "id": 20, "params": { + "cursor": null, + "limit": 25, + "sortKey": "created_at" +} } +{ "id": 20, "result": { + "data": [ + { "id": "thr_a", "preview": "Create a TUI", "ephemeral": false, "modelProvider": "openai", "createdAt": 1730831111, "updatedAt": 1730831111, "name": "TUI prototype", "status": { "type": "notLoaded" } }, + { "id": "thr_b", "preview": "Fix tests", "ephemeral": true, "modelProvider": "openai", "createdAt": 1730750000, "updatedAt": 1730750000, "status": { "type": "notLoaded" } } + ], + "nextCursor": "opaque-token-or-null" +} } +``` + +When `nextCursor` is `null`, you have reached the final page. + +### Track thread status changes + +`thread/status/changed` is emitted whenever a loaded thread's runtime status changes. The payload includes `threadId` and the new `status`. + +```json +{ + "method": "thread/status/changed", + "params": { + "threadId": "thr_123", + "status": { "type": "active", "activeFlags": ["waitingOnApproval"] } + } +} +``` + +### List loaded threads + +`thread/loaded/list` returns thread IDs currently loaded in memory. + +```json +{ "method": "thread/loaded/list", "id": 21 } +{ "id": 21, "result": { "data": ["thr_123", "thr_456"] } } +``` + +### Unsubscribe from a loaded thread + +`thread/unsubscribe` removes the current connection's subscription to a thread. The response status is one of: + +- `unsubscribed` when the connection was subscribed and is now removed. +- `notSubscribed` when the connection was not subscribed to that thread. +- `notLoaded` when the thread is not loaded. + +If this was the last subscriber, the server unloads the thread and emits a `thread/status/changed` transition to `notLoaded` plus `thread/closed`. + +```json +{ "method": "thread/unsubscribe", "id": 22, "params": { "threadId": "thr_123" } } +{ "id": 22, "result": { "status": "unsubscribed" } } +{ "method": "thread/status/changed", "params": { + "threadId": "thr_123", + "status": { "type": "notLoaded" } +} } +{ "method": "thread/closed", "params": { "threadId": "thr_123" } } +``` + +### Archive a thread + +Use `thread/archive` to move the persisted thread log (stored as a JSONL file on disk) into the archived sessions directory. + +```json +{ "method": "thread/archive", "id": 22, "params": { "threadId": "thr_b" } } +{ "id": 22, "result": {} } +{ "method": "thread/archived", "params": { "threadId": "thr_b" } } +``` + +Archived threads won't appear in future calls to `thread/list` unless you pass `archived: true`. + +### Unarchive a thread + +Use `thread/unarchive` to move an archived thread rollout back into the active sessions directory. + +```json +{ "method": "thread/unarchive", "id": 24, "params": { "threadId": "thr_b" } } +{ "id": 24, "result": { "thread": { "id": "thr_b", "name": "Bug bash notes" } } } +{ "method": "thread/unarchived", "params": { "threadId": "thr_b" } } +``` + +### Trigger thread compaction + +Use `thread/compact/start` to trigger manual history compaction for a thread. The request returns immediately with `{}`. + +App-server emits progress as standard `turn/*` and `item/*` notifications on the same `threadId`, including a `contextCompaction` item lifecycle (`item/started` then `item/completed`). + +```json +{ "method": "thread/compact/start", "id": 25, "params": { "threadId": "thr_b" } } +{ "id": 25, "result": {} } +``` + +### Roll back recent turns + +Use `thread/rollback` to remove the last `numTurns` entries from the in-memory context and persist a rollback marker in the rollout log. The returned `thread` includes `turns` populated after the rollback. + +```json +{ "method": "thread/rollback", "id": 26, "params": { "threadId": "thr_b", "numTurns": 1 } } +{ "id": 26, "result": { "thread": { "id": "thr_b", "name": "Bug bash notes", "ephemeral": false } } } +``` + +## Turns + +The `input` field accepts a list of items: + +- `{ "type": "text", "text": "Explain this diff" }` +- `{ "type": "image", "url": "https://.../design.png" }` +- `{ "type": "localImage", "path": "/tmp/screenshot.png" }` + +You can override configuration settings per turn (model, effort, personality, `cwd`, sandbox policy, summary). When specified, these settings become the defaults for later turns on the same thread. `outputSchema` applies only to the current turn. For `sandboxPolicy.type = "externalSandbox"`, set `networkAccess` to `restricted` or `enabled`; for `workspaceWrite`, `networkAccess` remains a boolean. + +For `turn/start.collaborationMode`, `settings.developer_instructions: null` means "use built-in instructions for the selected mode" rather than clearing mode instructions. + +### Sandbox read access (`ReadOnlyAccess`) + +`sandboxPolicy` supports explicit read-access controls: + +- `readOnly`: optional `access` (`{ "type": "fullAccess" }` by default, or restricted roots). +- `workspaceWrite`: optional `readOnlyAccess` (`{ "type": "fullAccess" }` by default, or restricted roots). + +Restricted read access shape: + +```json +{ + "type": "restricted", + "includePlatformDefaults": true, + "readableRoots": ["/Users/me/shared-read-only"] +} +``` + +On macOS, `includePlatformDefaults: true` appends a curated platform-default Seatbelt policy for restricted-read sessions. This improves tool compatibility without broadly allowing all of `/System`. + +Examples: + +```json +{ "type": "readOnly", "access": { "type": "fullAccess" } } +``` + +```json +{ + "type": "workspaceWrite", + "writableRoots": ["/Users/me/project"], + "readOnlyAccess": { + "type": "restricted", + "includePlatformDefaults": true, + "readableRoots": ["/Users/me/shared-read-only"] + }, + "networkAccess": false +} +``` + +### Start a turn + +```json +{ "method": "turn/start", "id": 30, "params": { + "threadId": "thr_123", + "input": [ { "type": "text", "text": "Run tests" } ], + "cwd": "/Users/me/project", + "approvalPolicy": "unlessTrusted", + "sandboxPolicy": { + "type": "workspaceWrite", + "writableRoots": ["/Users/me/project"], + "networkAccess": true + }, + "model": "gpt-5.1-codex", + "effort": "medium", + "summary": "concise", + "personality": "friendly", + "outputSchema": { + "type": "object", + "properties": { "answer": { "type": "string" } }, + "required": ["answer"], + "additionalProperties": false + } +} } +{ "id": 30, "result": { "turn": { "id": "turn_456", "status": "inProgress", "items": [], "error": null } } } +``` + +### Steer an active turn + +Use `turn/steer` to append more user input to the active in-flight turn. + +- Include `expectedTurnId`; it must match the active turn id. +- The request fails if there is no active turn on the thread. +- `turn/steer` doesn't emit a new `turn/started` notification. +- `turn/steer` doesn't accept turn-level overrides (`model`, `cwd`, `sandboxPolicy`, or `outputSchema`). + +```json +{ "method": "turn/steer", "id": 32, "params": { + "threadId": "thr_123", + "input": [ { "type": "text", "text": "Actually focus on failing tests first." } ], + "expectedTurnId": "turn_456" +} } +{ "id": 32, "result": { "turnId": "turn_456" } } +``` + +### Start a turn (invoke a skill) + +Invoke a skill explicitly by including `$` in the text input and adding a `skill` input item alongside it. + +```json +{ "method": "turn/start", "id": 33, "params": { + "threadId": "thr_123", + "input": [ + { "type": "text", "text": "$skill-creator Add a new skill for triaging flaky CI and include step-by-step usage." }, + { "type": "skill", "name": "skill-creator", "path": "/Users/me/.codex/skills/skill-creator/SKILL.md" } + ] +} } +{ "id": 33, "result": { "turn": { "id": "turn_457", "status": "inProgress", "items": [], "error": null } } } +``` + +### Interrupt a turn + +```json +{ "method": "turn/interrupt", "id": 31, "params": { "threadId": "thr_123", "turnId": "turn_456" } } +{ "id": 31, "result": {} } +``` + +On success, the turn finishes with `status: "interrupted"`. + +## Review + +`review/start` runs the Codex reviewer for a thread and streams review items. Targets include: + +- `uncommittedChanges` +- `baseBranch` (diff against a branch) +- `commit` (review a specific commit) +- `custom` (free-form instructions) + +Use `delivery: "inline"` (default) to run the review on the existing thread, or `delivery: "detached"` to fork a new review thread. + +Example request/response: + +```json +{ "method": "review/start", "id": 40, "params": { + "threadId": "thr_123", + "delivery": "inline", + "target": { "type": "commit", "sha": "1234567deadbeef", "title": "Polish tui colors" } +} } +{ "id": 40, "result": { + "turn": { + "id": "turn_900", + "status": "inProgress", + "items": [ + { "type": "userMessage", "id": "turn_900", "content": [ { "type": "text", "text": "Review commit 1234567: Polish tui colors" } ] } + ], + "error": null + }, + "reviewThreadId": "thr_123" +} } +``` + +For a detached review, use `"delivery": "detached"`. The response is the same shape, but `reviewThreadId` will be the id of the new review thread (different from the original `threadId`). The server also emits a `thread/started` notification for that new thread before streaming the review turn. + +Codex streams the usual `turn/started` notification followed by an `item/started` with an `enteredReviewMode` item: + +```json +{ + "method": "item/started", + "params": { + "item": { + "type": "enteredReviewMode", + "id": "turn_900", + "review": "current changes" + } + } +} +``` + +When the reviewer finishes, the server emits `item/started` and `item/completed` containing an `exitedReviewMode` item with the final review text: + +```json +{ + "method": "item/completed", + "params": { + "item": { + "type": "exitedReviewMode", + "id": "turn_900", + "review": "Looks solid overall..." + } + } +} +``` + +Use this notification to render the reviewer output in your client. + +## Command execution + +`command/exec` runs a single command (`argv` array) under the server sandbox without creating a thread. + +```json +{ "method": "command/exec", "id": 50, "params": { + "command": ["ls", "-la"], + "cwd": "/Users/me/project", + "sandboxPolicy": { "type": "workspaceWrite" }, + "timeoutMs": 10000 +} } +{ "id": 50, "result": { "exitCode": 0, "stdout": "...", "stderr": "" } } +``` + +Use `sandboxPolicy.type = "externalSandbox"` if you already sandbox the server process and want Codex to skip its own sandbox enforcement. For external sandbox mode, set `networkAccess` to `restricted` (default) or `enabled`. For `readOnly` and `workspaceWrite`, use the same optional `access` / `readOnlyAccess` structure shown above. + +Notes: + +- The server rejects empty `command` arrays. +- `sandboxPolicy` accepts the same shape used by `turn/start` (for example, `dangerFullAccess`, `readOnly`, `workspaceWrite`, `externalSandbox`). +- When omitted, `timeoutMs` falls back to the server default. + +### Read admin requirements (`configRequirements/read`) + +Use `configRequirements/read` to inspect the effective admin requirements loaded from `requirements.toml` and/or MDM. + +```json +{ "method": "configRequirements/read", "id": 52, "params": {} } +{ "id": 52, "result": { + "requirements": { + "allowedApprovalPolicies": ["onRequest", "unlessTrusted"], + "allowedSandboxModes": ["readOnly", "workspaceWrite"], + "featureRequirements": { + "personality": true, + "unified_exec": false + }, + "network": { + "enabled": true, + "allowedDomains": ["api.openai.com"], + "allowUnixSockets": ["/tmp/example.sock"], + "dangerouslyAllowAllUnixSockets": false + } + } +} } +``` + +`result.requirements` is `null` when no requirements are configured. See the docs on [`requirements.toml`](https://developers.openai.com/codex/config-reference#requirementstoml) for details on supported keys and values. + +### Windows sandbox setup (`windowsSandbox/setupStart`) + +Custom Windows clients can trigger sandbox setup asynchronously instead of blocking on startup checks. + +```json +{ "method": "windowsSandbox/setupStart", "id": 53, "params": { "mode": "elevated" } } +{ "id": 53, "result": { "started": true } } +``` + +App-server starts setup in the background and later emits a completion notification: + +```json +{ + "method": "windowsSandbox/setupCompleted", + "params": { "mode": "elevated", "success": true, "error": null } +} +``` + +Modes: + +- `elevated` - run the elevated Windows sandbox setup path. +- `unelevated` - run the legacy setup/preflight path. + +## Events + +Event notifications are the server-initiated stream for thread lifecycles, turn lifecycles, and the items within them. After you start or resume a thread, keep reading the active transport stream for `thread/started`, `thread/archived`, `thread/unarchived`, `thread/closed`, `thread/status/changed`, `turn/*`, `item/*`, and `serverRequest/resolved` notifications. + +### Notification opt-out + +Clients can suppress specific notifications per connection by sending exact method names in `initialize.params.capabilities.optOutNotificationMethods`. + +- Exact-match only: `item/agentMessage/delta` suppresses only that method. +- Unknown method names are ignored. +- Applies to both legacy (`codex/event/*`) and v2 (`thread/*`, `turn/*`, `item/*`, etc.) notifications. +- Doesn't apply to requests, responses, or errors. + +### Fuzzy file search events (experimental) + +The fuzzy file search session API emits per-query notifications: + +- `fuzzyFileSearch/sessionUpdated` - `{ sessionId, query, files }` with the current matches for the active query. +- `fuzzyFileSearch/sessionCompleted` - `{ sessionId }` once indexing and matching for that query completes. + +### Windows sandbox setup events + +- `windowsSandbox/setupCompleted` - `{ mode, success, error }` emitted after a `windowsSandbox/setupStart` request finishes. + +### Turn events + +- `turn/started` - `{ turn }` with the turn id, empty `items`, and `status: "inProgress"`. +- `turn/completed` - `{ turn }` where `turn.status` is `completed`, `interrupted`, or `failed`; failures carry `{ error: { message, codexErrorInfo?, additionalDetails? } }`. +- `turn/diff/updated` - `{ threadId, turnId, diff }` with the latest aggregated unified diff across every file change in the turn. +- `turn/plan/updated` - `{ turnId, explanation?, plan }` whenever the agent shares or changes its plan; each `plan` entry is `{ step, status }` with `status` in `pending`, `inProgress`, or `completed`. +- `thread/tokenUsage/updated` - usage updates for the active thread. + +`turn/diff/updated` and `turn/plan/updated` currently include empty `items` arrays even when item events stream. Use `item/*` notifications as the source of truth for turn items. + +### Items + +`ThreadItem` is the tagged union carried in turn responses and `item/*` notifications. Common item types include: + +- `userMessage` - `{id, content}` where `content` is a list of user inputs (`text`, `image`, or `localImage`). +- `agentMessage` - `{id, text, phase?}` containing the accumulated agent reply. When present, `phase` uses Responses API wire values (`commentary`, `final_answer`). +- `plan` - `{id, text}` containing proposed plan text in plan mode. Treat the final `plan` item from `item/completed` as authoritative. +- `reasoning` - `{id, summary, content}` where `summary` holds streamed reasoning summaries and `content` holds raw reasoning blocks. +- `commandExecution` - `{id, command, cwd, status, commandActions, aggregatedOutput?, exitCode?, durationMs?}`. +- `fileChange` - `{id, changes, status}` describing proposed edits; `changes` list `{path, kind, diff}`. +- `mcpToolCall` - `{id, server, tool, status, arguments, result?, error?}`. +- `dynamicToolCall` - `{id, tool, arguments, status, contentItems?, success?, durationMs?}` for client-executed dynamic tool invocations. +- `collabToolCall` - `{id, tool, status, senderThreadId, receiverThreadId?, newThreadId?, prompt?, agentStatus?}`. +- `webSearch` - `{id, query, action?}` for web search requests issued by the agent. +- `imageView` - `{id, path}` emitted when the agent invokes the image viewer tool. +- `enteredReviewMode` - `{id, review}` sent when the reviewer starts. +- `exitedReviewMode` - `{id, review}` emitted when the reviewer finishes. +- `contextCompaction` - `{id}` emitted when Codex compacts the conversation history. + +For `webSearch.action`, the action `type` can be `search` (`query?`, `queries?`), `openPage` (`url?`), or `findInPage` (`url?`, `pattern?`). + +The app server deprecates the legacy `thread/compacted` notification; use the `contextCompaction` item instead. + +All items emit two shared lifecycle events: + +- `item/started` - emits the full `item` when a new unit of work begins; the `item.id` matches the `itemId` used by deltas. +- `item/completed` - sends the final `item` once work finishes; treat this as the authoritative state. + +### Item deltas + +- `item/agentMessage/delta` - appends streamed text for the agent message. +- `item/plan/delta` - streams proposed plan text. The final `plan` item may not exactly equal the concatenated deltas. +- `item/reasoning/summaryTextDelta` - streams readable reasoning summaries; `summaryIndex` increments when a new summary section opens. +- `item/reasoning/summaryPartAdded` - marks a boundary between reasoning summary sections. +- `item/reasoning/textDelta` - streams raw reasoning text (when supported by the model). +- `item/commandExecution/outputDelta` - streams stdout/stderr for a command; append deltas in order. +- `item/fileChange/outputDelta` - contains the tool call response of the underlying `apply_patch` tool call. + +## Errors + +If a turn fails, the server emits an `error` event with `{ error: { message, codexErrorInfo?, additionalDetails? } }` and then finishes the turn with `status: "failed"`. When an upstream HTTP status is available, it appears in `codexErrorInfo.httpStatusCode`. + +Common `codexErrorInfo` values include: + +- `ContextWindowExceeded` +- `UsageLimitExceeded` +- `HttpConnectionFailed` (4xx/5xx upstream errors) +- `ResponseStreamConnectionFailed` +- `ResponseStreamDisconnected` +- `ResponseTooManyFailedAttempts` +- `BadRequest`, `Unauthorized`, `SandboxError`, `InternalServerError`, `Other` + +When an upstream HTTP status is available, the server forwards it in `httpStatusCode` on the relevant `codexErrorInfo` variant. + +## Approvals + +Depending on a user's Codex settings, command execution and file changes may require approval. The app-server sends a server-initiated JSON-RPC request to the client, and the client responds with a decision payload. + +- Command execution decisions: `accept`, `acceptForSession`, `decline`, `cancel`, or `{ "acceptWithExecpolicyAmendment": { "execpolicy_amendment": ["cmd", "..."] } }`. +- File change decisions: `accept`, `acceptForSession`, `decline`, `cancel`. + +- Requests include `threadId` and `turnId` - use them to scope UI state to the active conversation. +- The server resumes or declines the work and ends the item with `item/completed`. + +### Command execution approvals + +Order of messages: + +1. `item/started` shows the pending `commandExecution` item with `command`, `cwd`, and other fields. +2. `item/commandExecution/requestApproval` includes `itemId`, `threadId`, `turnId`, optional `reason`, optional `command`, optional `cwd`, optional `commandActions`, optional `proposedExecpolicyAmendment`, optional `networkApprovalContext`, and optional `availableDecisions`. When `initialize.params.capabilities.experimentalApi = true`, the payload can also include experimental `additionalPermissions` describing requested per-command sandbox access. Any filesystem paths inside `additionalPermissions` are absolute on the wire. +3. Client responds with one of the command execution approval decisions above. +4. `serverRequest/resolved` confirms that the pending request has been answered or cleared. +5. `item/completed` returns the final `commandExecution` item with `status: completed | failed | declined`. + +When `networkApprovalContext` is present, the prompt is for managed network access (not a general shell-command approval). The current v2 schema exposes the target `host` and `protocol`; clients should render a network-specific prompt and not rely on `command` being a user-meaningful shell command preview. + +Codex groups concurrent network approval prompts by destination (`host`, protocol, and port). The app-server may therefore send one prompt that unblocks multiple queued requests to the same destination, while different ports on the same host are treated separately. + +### File change approvals + +Order of messages: + +1. `item/started` emits a `fileChange` item with proposed `changes` and `status: "inProgress"`. +2. `item/fileChange/requestApproval` includes `itemId`, `threadId`, `turnId`, optional `reason`, and optional `grantRoot`. +3. Client responds with one of the file change approval decisions above. +4. `serverRequest/resolved` confirms that the pending request has been answered or cleared. +5. `item/completed` returns the final `fileChange` item with `status: completed | failed | declined`. + +### `tool/requestUserInput` + +When the client responds to `item/tool/requestUserInput`, app-server emits `serverRequest/resolved` with `{ threadId, requestId }`. If the pending request is cleared by turn start, turn completion, or turn interruption before the client answers, the server emits the same notification for that cleanup. + +### Dynamic tool calls (experimental) + +`dynamicTools` on `thread/start` and the corresponding `item/tool/call` request or response flow are experimental APIs. + +When a dynamic tool is invoked during a turn, app-server emits: + +1. `item/started` with `item.type = "dynamicToolCall"`, `status = "inProgress"`, plus `tool` and `arguments`. +2. `item/tool/call` as a server request to the client. +3. The client response payload with returned content items. +4. `item/completed` with `item.type = "dynamicToolCall"`, the final `status`, and any returned `contentItems` or `success` value. + +### MCP tool-call approvals (apps) + +App (connector) tool calls can also require approval. When an app tool call has side effects, the server may elicit approval with `tool/requestUserInput` and options such as **Accept**, **Decline**, and **Cancel**. Destructive tool annotations always trigger approval even when the tool also advertises less-privileged hints. If the user declines or cancels, the related `mcpToolCall` item completes with an error instead of running the tool. + +## Skills + +Invoke a skill by including `$` in the user text input. Add a `skill` input item (recommended) so the server injects full skill instructions instead of relying on the model to resolve the name. + +```json +{ + "method": "turn/start", + "id": 101, + "params": { + "threadId": "thread-1", + "input": [ + { + "type": "text", + "text": "$skill-creator Add a new skill for triaging flaky CI." + }, + { + "type": "skill", + "name": "skill-creator", + "path": "/Users/me/.codex/skills/skill-creator/SKILL.md" + } + ] + } +} +``` + +If you omit the `skill` item, the model will still parse the `$` marker and try to locate the skill, which can add latency. + +Example: + +``` +$skill-creator Add a new skill for triaging flaky CI and include step-by-step usage. +``` + +Use `skills/list` to fetch available skills (optionally scoped by `cwds`, with `forceReload`). You can also include `perCwdExtraUserRoots` to scan extra absolute paths as `user` scope for specific `cwd` values. App-server ignores entries whose `cwd` isn't present in `cwds`. `skills/list` may reuse a cached result per `cwd`; set `forceReload: true` to refresh from disk. When present, the server reads `interface` and `dependencies` from `SKILL.json`. + +```json +{ "method": "skills/list", "id": 25, "params": { + "cwds": ["/Users/me/project", "/Users/me/other-project"], + "forceReload": true, + "perCwdExtraUserRoots": [ + { + "cwd": "/Users/me/project", + "extraUserRoots": ["/Users/me/shared-skills"] + } + ] +} } +{ "id": 25, "result": { + "data": [{ + "cwd": "/Users/me/project", + "skills": [ + { + "name": "skill-creator", + "description": "Create or update a Codex skill", + "enabled": true, + "interface": { + "displayName": "Skill Creator", + "shortDescription": "Create or update a Codex skill" + }, + "dependencies": { + "tools": [ + { + "type": "env_var", + "value": "GITHUB_TOKEN", + "description": "GitHub API token" + }, + { + "type": "mcp", + "value": "github", + "transport": "streamable_http", + "url": "https://example.com/mcp" + } + ] + } + } + ], + "errors": [] + }] +} } +``` + +To enable or disable a skill by path: + +```json +{ + "method": "skills/config/write", + "id": 26, + "params": { + "path": "/Users/me/.codex/skills/skill-creator/SKILL.md", + "enabled": false + } +} +``` + +## Apps (connectors) + +Use `app/list` to fetch available apps. In the CLI/TUI, `/apps` is the user-facing picker; in custom clients, call `app/list` directly. Each entry includes both `isAccessible` (available to the user) and `isEnabled` (enabled in `config.toml`) so clients can distinguish install/access from local enabled state. App entries can also include optional `branding`, `appMetadata`, and `labels` fields. + +```json +{ "method": "app/list", "id": 50, "params": { + "cursor": null, + "limit": 50, + "threadId": "thread-1", + "forceRefetch": false +} } +{ "id": 50, "result": { + "data": [ + { + "id": "demo-app", + "name": "Demo App", + "description": "Example connector for documentation.", + "logoUrl": "https://example.com/demo-app.png", + "logoUrlDark": null, + "distributionChannel": null, + "branding": null, + "appMetadata": null, + "labels": null, + "installUrl": "https://chatgpt.com/apps/demo-app/demo-app", + "isAccessible": true, + "isEnabled": true + } + ], + "nextCursor": null +} } +``` + +If you provide `threadId`, app feature gating (`features.apps`) uses that thread's config snapshot. When omitted, app-server uses the latest global config. + +`app/list` returns after both accessible apps and directory apps load. Set `forceRefetch: true` to bypass app caches and fetch fresh data. Cache entries are only replaced when refreshes succeed. + +The server also emits `app/list/updated` notifications whenever either source (accessible apps or directory apps) finishes loading. Each notification includes the latest merged app list. + +```json +{ + "method": "app/list/updated", + "params": { + "data": [ + { + "id": "demo-app", + "name": "Demo App", + "description": "Example connector for documentation.", + "logoUrl": "https://example.com/demo-app.png", + "logoUrlDark": null, + "distributionChannel": null, + "branding": null, + "appMetadata": null, + "labels": null, + "installUrl": "https://chatgpt.com/apps/demo-app/demo-app", + "isAccessible": true, + "isEnabled": true + } + ] + } +} +``` + +Invoke an app by inserting `$` in the text input and adding a `mention` input item with the `app://` path (recommended). + +```json +{ + "method": "turn/start", + "id": 51, + "params": { + "threadId": "thread-1", + "input": [ + { + "type": "text", + "text": "$demo-app Pull the latest updates from the team." + }, + { + "type": "mention", + "name": "Demo App", + "path": "app://demo-app" + } + ] + } +} +``` + +### Config RPC examples for app settings + +Use `config/read`, `config/value/write`, and `config/batchWrite` to inspect or update app controls in `config.toml`. + +Read the effective app config shape (including `_default` and per-tool overrides): + +```json +{ "method": "config/read", "id": 60, "params": { "includeLayers": false } } +{ "id": 60, "result": { + "config": { + "apps": { + "_default": { + "enabled": true, + "destructive_enabled": true, + "open_world_enabled": true + }, + "google_drive": { + "enabled": true, + "destructive_enabled": false, + "default_tools_approval_mode": "prompt", + "tools": { + "files/delete": { "enabled": false, "approval_mode": "approve" } + } + } + } + } +} } +``` + +Update a single app setting: + +```json +{ + "method": "config/value/write", + "id": 61, + "params": { + "keyPath": "apps.google_drive.default_tools_approval_mode", + "value": "prompt", + "mergeStrategy": "replace" + } +} +``` + +Apply multiple app edits atomically: + +```json +{ + "method": "config/batchWrite", + "id": 62, + "params": { + "edits": [ + { + "keyPath": "apps._default.destructive_enabled", + "value": false, + "mergeStrategy": "upsert" + }, + { + "keyPath": "apps.google_drive.tools.files/delete.approval_mode", + "value": "approve", + "mergeStrategy": "upsert" + } + ] + } +} +``` + +### Detect and import external agent config + +Use `externalAgentConfig/detect` to discover migratable external-agent artifacts, then pass the selected entries to `externalAgentConfig/import`. + +Detection example: + +```json +{ "method": "externalAgentConfig/detect", "id": 63, "params": { + "includeHome": true, + "cwds": ["/Users/me/project"] +} } +{ "id": 63, "result": { + "items": [ + { + "itemType": "AGENTS_MD", + "description": "Import /Users/me/project/CLAUDE.md to /Users/me/project/AGENTS.md.", + "cwd": "/Users/me/project" + }, + { + "itemType": "SKILLS", + "description": "Copy skill folders from /Users/me/.claude/skills to /Users/me/.agents/skills.", + "cwd": null + } + ] +} } +``` + +Import example: + +```json +{ "method": "externalAgentConfig/import", "id": 64, "params": { + "migrationItems": [ + { + "itemType": "AGENTS_MD", + "description": "Import /Users/me/project/CLAUDE.md to /Users/me/project/AGENTS.md.", + "cwd": "/Users/me/project" + } + ] +} } +{ "id": 64, "result": {} } +``` + +Supported `itemType` values are `AGENTS_MD`, `CONFIG`, `SKILLS`, and `MCP_SERVER_CONFIG`. Detection returns only items that still have work to do. For example, AGENTS migration is skipped when `AGENTS.md` already exists and is non-empty, and skill imports do not overwrite existing skill directories. + +## Auth endpoints + +The JSON-RPC auth/account surface exposes request/response methods plus server-initiated notifications (no `id`). Use these to determine auth state, start or cancel logins, logout, and inspect ChatGPT rate limits. + +### Authentication modes + +Codex supports three authentication modes. `account/updated.authMode` shows the active mode, and `account/read` also reports it. + +- **API key (`apikey`)** - the caller supplies an OpenAI API key and Codex stores it for API requests. +- **ChatGPT managed (`chatgpt`)** - Codex owns the ChatGPT OAuth flow, persists tokens, and refreshes them automatically. +- **ChatGPT external tokens (`chatgptAuthTokens`)** - a host app supplies `idToken` and `accessToken` directly. Codex stores these tokens in memory, and the host app must refresh them when asked. + +### API overview + +- `account/read` - fetch current account info; optionally refresh tokens. +- `account/login/start` - begin login (`apiKey`, `chatgpt`, or `chatgptAuthTokens`). +- `account/login/completed` (notify) - emitted when a login attempt finishes (success or error). +- `account/login/cancel` - cancel a pending ChatGPT login by `loginId`. +- `account/logout` - sign out; triggers `account/updated`. +- `account/updated` (notify) - emitted whenever auth mode changes (`authMode`: `apikey`, `chatgpt`, `chatgptAuthTokens`, or `null`). +- `account/chatgptAuthTokens/refresh` (server request) - request fresh externally managed ChatGPT tokens after an authorization error. +- `account/rateLimits/read` - fetch ChatGPT rate limits. +- `account/rateLimits/updated` (notify) - emitted whenever a user's ChatGPT rate limits change. +- `mcpServer/oauthLogin/completed` (notify) - emitted after a `mcpServer/oauth/login` flow finishes; payload includes `{ name, success, error? }`. + +### 1) Check auth state + +Request: + +```json +{ "method": "account/read", "id": 1, "params": { "refreshToken": false } } +``` + +Response examples: + +```json +{ "id": 1, "result": { "account": null, "requiresOpenaiAuth": false } } +``` + +```json +{ "id": 1, "result": { "account": null, "requiresOpenaiAuth": true } } +``` + +```json +{ + "id": 1, + "result": { "account": { "type": "apiKey" }, "requiresOpenaiAuth": true } +} +``` + +```json +{ + "id": 1, + "result": { + "account": { + "type": "chatgpt", + "email": "user@example.com", + "planType": "pro" + }, + "requiresOpenaiAuth": true + } +} +``` + +Field notes: + +- `refreshToken` (boolean): set `true` to force a token refresh in managed ChatGPT mode. In external token mode (`chatgptAuthTokens`), app-server ignores this flag. +- `requiresOpenaiAuth` reflects the active provider; when `false`, Codex can run without OpenAI credentials. + +### 2) Log in with an API key + +1. Send: + + ```json + { + "method": "account/login/start", + "id": 2, + "params": { "type": "apiKey", "apiKey": "sk-..." } + } + ``` + +2. Expect: + + ```json + { "id": 2, "result": { "type": "apiKey" } } + ``` + +3. Notifications: + + ```json + { + "method": "account/login/completed", + "params": { "loginId": null, "success": true, "error": null } + } + ``` + + ```json + { "method": "account/updated", "params": { "authMode": "apikey" } } + ``` + +### 3) Log in with ChatGPT (browser flow) + +1. Start: + + ```json + { "method": "account/login/start", "id": 3, "params": { "type": "chatgpt" } } + ``` + + ```json + { + "id": 3, + "result": { + "type": "chatgpt", + "loginId": "", + "authUrl": "https://chatgpt.com/...&redirect_uri=http%3A%2F%2Flocalhost%3A%2Fauth%2Fcallback" + } + } + ``` + +2. Open `authUrl` in a browser; the app-server hosts the local callback. +3. Wait for notifications: + + ```json + { + "method": "account/login/completed", + "params": { "loginId": "", "success": true, "error": null } + } + ``` + + ```json + { "method": "account/updated", "params": { "authMode": "chatgpt" } } + ``` + +### 3b) Log in with externally managed ChatGPT tokens (`chatgptAuthTokens`) + +Use this mode when a host application owns the user's ChatGPT auth lifecycle and supplies tokens directly. + +1. Send: + + ```json + { + "method": "account/login/start", + "id": 7, + "params": { + "type": "chatgptAuthTokens", + "idToken": "", + "accessToken": "" + } + } + ``` + +2. Expect: + + ```json + { "id": 7, "result": { "type": "chatgptAuthTokens" } } + ``` + +3. Notifications: + + ```json + { + "method": "account/login/completed", + "params": { "loginId": null, "success": true, "error": null } + } + ``` + + ```json + { + "method": "account/updated", + "params": { "authMode": "chatgptAuthTokens" } + } + ``` + +When the server receives a `401 Unauthorized`, it may request refreshed tokens from the host app: + +```json +{ + "method": "account/chatgptAuthTokens/refresh", + "id": 8, + "params": { "reason": "unauthorized", "previousAccountId": "org-123" } +} +{ "id": 8, "result": { "idToken": "", "accessToken": "" } } +``` + +The server retries the original request after a successful refresh response. Requests time out after about 10 seconds. + +### 4) Cancel a ChatGPT login + +```json +{ "method": "account/login/cancel", "id": 4, "params": { "loginId": "" } } +{ "method": "account/login/completed", "params": { "loginId": "", "success": false, "error": "..." } } +``` + +### 5) Logout + +```json +{ "method": "account/logout", "id": 5 } +{ "id": 5, "result": {} } +{ "method": "account/updated", "params": { "authMode": null } } +``` + +### 6) Rate limits (ChatGPT) + +```json +{ "method": "account/rateLimits/read", "id": 6 } +{ "id": 6, "result": { + "rateLimits": { + "limitId": "codex", + "limitName": null, + "primary": { "usedPercent": 25, "windowDurationMins": 15, "resetsAt": 1730947200 }, + "secondary": null + }, + "rateLimitsByLimitId": { + "codex": { + "limitId": "codex", + "limitName": null, + "primary": { "usedPercent": 25, "windowDurationMins": 15, "resetsAt": 1730947200 }, + "secondary": null + }, + "codex_other": { + "limitId": "codex_other", + "limitName": "codex_other", + "primary": { "usedPercent": 42, "windowDurationMins": 60, "resetsAt": 1730950800 }, + "secondary": null + } + } +} } +{ "method": "account/rateLimits/updated", "params": { + "rateLimits": { + "limitId": "codex", + "primary": { "usedPercent": 31, "windowDurationMins": 15, "resetsAt": 1730948100 } + } +} } +``` + +Field notes: + +- `rateLimits` is the backward-compatible single-bucket view. +- `rateLimitsByLimitId` (when present) is the multi-bucket view keyed by metered `limit_id` (for example `codex`). +- `limitId` is the metered bucket identifier. +- `limitName` is an optional user-facing label for the bucket. +- `usedPercent` is current usage within the quota window. +- `windowDurationMins` is the quota window length. +- `resetsAt` is a Unix timestamp (seconds) for the next reset. diff --git a/specs/harness-spec/templates/NON_NEGOTIABLE_RULES.md b/specs/harness-spec/templates/NON_NEGOTIABLE_RULES.md new file mode 100644 index 0000000..fe5c99c --- /dev/null +++ b/specs/harness-spec/templates/NON_NEGOTIABLE_RULES.md @@ -0,0 +1,27 @@ +# Non-Negotiable Rules + +These rules are absolute. No exceptions, no workarounds, no "we'll fix it later." Every agent and every human must follow them on every change. Violations block merge unconditionally. + +--- + +## Rule 1: 100% Test Coverage + +Every line of code must be covered by tests. No exceptions. + +- Every new function, method, module, and code path must have corresponding tests before it can be merged. +- If you change existing code, update or add tests to cover the change. +- Coverage is measured mechanically and enforced in CI. If coverage drops, the build fails. +- "It's too hard to test" is not an excuse — refactor the code until it is testable. +- Test coverage includes unit tests, integration tests, or both — whichever is appropriate for the code under test. +- Dead code that cannot be reached by tests must be deleted, not excluded from coverage. + +### Why + +Untested code is unverified code. In an agent-driven codebase, tests are the only reliable contract between what the code claims to do and what it actually does. Without full coverage, agents will build on top of broken assumptions, and bugs compound silently. + +### Enforcement + +- CI runs coverage analysis on every PR. +- PRs that reduce coverage below 100% are blocked from merging. +- The `harnesscli lint` and `harnesscli test` commands both verify coverage thresholds. +- Coverage reports are tracked in `docs/generated/coverage-report.json`.