Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
139 changes: 139 additions & 0 deletions docs/RECEIPTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
# Runtime Receipts

This document sketches a future read-only receipt export for completed runtime
turns. It is a protocol note, not an implemented endpoint.

The goal is to let a local supervisor audit one completed turn without
screen-scraping the terminal transcript. A receipt should summarize the durable
runtime records that CodeWhale already owns: thread metadata, turn status, turn
items, event sequence lineage, usage when available, approval decisions, and
side-effect boundaries.

## Non-Goals

A receipt is not a safety certification, provider compatibility certification,
or hosted attestation. It must not call providers, execute tools, write memory,
write project files, mutate runtime state, or expose API keys.

Receipts should not export raw chain-of-thought or private reasoning by default.
When reasoning custody is represented, use stable item ids, counts, hashes, or
explicit `unavailable` fields rather than raw hidden content.

## Candidate Surfaces

Potential local-only surfaces:

```text
codewhale receipt export --thread <thread_id> --turn <turn_id> --format json
GET /v1/threads/{thread_id}/turns/{turn_id}/receipt
```

Both surfaces should share the existing runtime API auth boundary. They should
only read persisted runtime records and append-only events.

## Current Data Sources

The current runtime store already persists the core inputs a receipt builder
would need:

- `ThreadRecord`: model, workspace, mode, shell/trust/auto-approve flags,
title, task linkage, and latest turn metadata.
- `TurnRecord`: turn status, input summary, timestamps, duration, usage, error,
steer count, and item ids.
- `TurnItemRecord`: item kind, lifecycle status, summary, optional detail,
metadata, artifact refs, and item timestamps.
- `RuntimeEventRecord`: thread id, turn id, item id, event name, JSON payload,
timestamp, and monotonic `seq` values per runtime store.

Not every receipt field can be filled from those records today. If a provider or
store does not persist a value, the receipt should say `available: false` or
`unavailable`, not infer it from UI text.

## Draft Schema Shape

```json
{
"schema_id": "codewhale.conformance-receipt/v0",
"thread": {
"id": "thr_...",
"model": "deepseek-v4-pro",
"mode": "agent",
"auto_approve": false,
"trust_mode": false,
"allow_shell": false
},
"turn": {
"id": "turn_...",
"status": "completed",
"started_at": "2026-06-02T01:00:00Z",
"ended_at": "2026-06-02T01:00:12Z",
"duration_ms": 12000
},
"reasoning_custody": {
"raw_reasoning_exported": false,
"available": false,
"reason": "reasoning blocks are not persisted as receipt-ready records"
},
"tool_lineage": {
"tool_call_count": 1,
"tool_result_count": 1,
"unmatched_tool_call_ids": [],
"unmatched_tool_result_ids": []
},
"usage_evidence": {
"available": true,
"usage": {
"prompt_tokens": 123,
"completion_tokens": 45
},
"provider_cache_breakdown_available": false
},
"source_event_lineage": {
"first_seq": 10,
"last_seq": 42,
"event_count": 33,
"missing_event_ranges": []
},
"side_effect_boundary": {
"approval_required_count": 1,
"approval_allowed_count": 0,
"approval_denied_count": 1,
"command_execution_count": 0,
"file_change_count": 0,
"sandbox_denied_count": 0
},
"claim_ceiling": [
"local_receipt_only",
"not_safety_certification",
"not_provider_compatibility_certification"
]
}
```

## Builder Rules

A receipt builder should be deterministic and conservative:

1. Load the thread and turn by id, then reject mismatched `thread_id` values.
2. Load only item ids referenced by the turn.
3. Read event records for the thread and filter by `turn_id`.
4. Preserve event sequence boundaries with `first_seq`, `last_seq`, and any
detected gaps.
5. Count approval, command, file, sandbox, and tool events from typed records or
known event names only.
6. Mark unavailable evidence explicitly instead of deriving it from free-form
summaries.
7. Emit no raw tool output beyond existing item summaries unless a later schema
adds a separate redaction policy.

## Incremental Implementation Path

The safest implementation path is:

1. Land this protocol note and settle field names/non-goals.
2. Add protocol structs and JSON snapshot fixtures for completed, failed, and
approval-denied turns.
3. Add a pure builder over `ThreadRecord`, `TurnRecord`, `TurnItemRecord`, and
`RuntimeEventRecord`.
4. Expose the local runtime API endpoint.
5. Add the CLI export command and optional validation mode.
7 changes: 7 additions & 0 deletions docs/RUNTIME_API.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,10 @@ macOS workbench (or any local supervisor)
The engine runs as a local-only process. All APIs bind to `localhost` by
default. No hosted relay, no provider-token custody, no secret leakage.

For a proposed read-only audit export over completed turns, see
[`docs/RECEIPTS.md`](RECEIPTS.md). That document is a protocol note; the receipt
CLI/API surfaces are not implemented yet.

## ACP stdio adapter: `codewhale serve --acp`

`codewhale serve --acp` speaks JSON-RPC 2.0 over newline-delimited stdio for
Expand Down Expand Up @@ -215,6 +219,9 @@ accept an empty string to clear a previously-set value. Added in v0.8.10 (#562):
**Events** (SSE replay + live stream)
- `GET /v1/threads/{id}/events?since_seq=<u64>`

**Receipts** (future read-only audit export)
- Proposed only: `GET /v1/threads/{thread_id}/turns/{turn_id}/receipt`

**Compatibility stream** (one-shot, backwards-compatible)
- `POST /v1/stream`

Expand Down
Loading