Hi CodeWhale maintainers,
I have been looking at the public CodeWhale runtime surface as a local supervisor integration point. The existing runtime model already has most of the ingredients that downstream tools would need for auditability: durable ThreadRecord / TurnRecord / TurnItemRecord, append-only events, usage aggregation, approval events, sandbox boundaries, and workspace rollback.
Would a small read-only receipt export fit the project roadmap?
The narrow goal would be to let local supervisors verify one completed turn without screen-scraping terminal output:
- reasoning-block custody without exporting raw reasoning by default
- tool-call to tool-result lineage
- cache/usage evidence when available, with explicit
unavailable when provider-specific hit/miss is not stored
- event sequence lineage across replay/resume
- approval, sandbox, YOLO/auto-approve, command/file-change side-effect boundaries
- rollback/snapshot availability
One possible local-only surface:
codewhale receipt export --thread <thread_id> --turn <turn_id> --format json
GET /v1/threads/{thread_id}/turns/{turn_id}/receipt
This should be read-only and reuse the existing local runtime auth boundary. It should not call providers, execute tools, write memory, write target files, or expose API keys. For privacy, raw reasoning should stay omitted by default; a receipt can use counts, hashes/refs, and item/event ids instead.
Example sketch:
{
"schema_id": "codewhale.conformance-receipt/v0",
"thread": {
"id": "thr_...",
"model": "deepseek-v4-pro",
"mode": "agent",
"auto_approve": false
},
"turn": {
"id": "turn_...",
"status": "completed"
},
"reasoning_custody": {
"raw_reasoning_exported": false,
"reasoning_blocks_observed": true,
"reasoning_block_hashes": ["sha256:..."]
},
"tool_lineage": {
"tool_call_count": 1,
"tool_result_count": 1,
"unmatched_tool_call_ids": [],
"unmatched_tool_result_ids": []
},
"cache_evidence": {
"available": true,
"cached_tokens": 0,
"reasoning_tokens": 0,
"prompt_cache_hit_tokens": null,
"prompt_cache_miss_tokens": null
},
"source_event_lineage": {
"first_seq": 1,
"last_seq": 42,
"event_count": 42,
"missing_event_ranges": []
},
"side_effect_boundary": {
"mode": "agent",
"auto_approve": false,
"approval_required_count": 1,
"approval_allowed_count": 0,
"approval_denied_count": 1,
"command_execution_count": 0,
"file_change_count": 0,
"sandbox_denied_count": 0
},
"claim_ceiling": [
"local_receipt_only",
"not_safety_certification",
"not_provider_compatibility_certification"
]
}
A low-risk implementation path could be docs-first:
- Add a short receipt-export section or
docs/RECEIPTS.md with the intended schema and boundaries.
- Add protocol structs and snapshot tests.
- Add a pure receipt builder over persisted thread/turn/item/event records.
- Expose the local runtime API endpoint.
- Add the CLI export/validate command.
Open questions:
- Are reasoning blocks persisted as item metadata, or only streamed to the UI?
- Is per-turn DeepSeek cache hit/miss metadata available, or only aggregate cached/reasoning tokens?
- Is side-git snapshot metadata addressable per turn from the runtime store?
- Would this be better as docs/protocol-fixture first, rather than starting with runtime endpoint code?
This is not a compatibility certification, safety claim, or endorsement request. It is a proposal for a small local audit/export surface that seems aligned with the runtime API contract and CodeWhale's local-first security boundary.
Hi CodeWhale maintainers,
I have been looking at the public CodeWhale runtime surface as a local supervisor integration point. The existing runtime model already has most of the ingredients that downstream tools would need for auditability: durable
ThreadRecord/TurnRecord/TurnItemRecord, append-only events, usage aggregation, approval events, sandbox boundaries, and workspace rollback.Would a small read-only receipt export fit the project roadmap?
The narrow goal would be to let local supervisors verify one completed turn without screen-scraping terminal output:
unavailablewhen provider-specific hit/miss is not storedOne possible local-only surface:
This should be read-only and reuse the existing local runtime auth boundary. It should not call providers, execute tools, write memory, write target files, or expose API keys. For privacy, raw reasoning should stay omitted by default; a receipt can use counts, hashes/refs, and item/event ids instead.
Example sketch:
{ "schema_id": "codewhale.conformance-receipt/v0", "thread": { "id": "thr_...", "model": "deepseek-v4-pro", "mode": "agent", "auto_approve": false }, "turn": { "id": "turn_...", "status": "completed" }, "reasoning_custody": { "raw_reasoning_exported": false, "reasoning_blocks_observed": true, "reasoning_block_hashes": ["sha256:..."] }, "tool_lineage": { "tool_call_count": 1, "tool_result_count": 1, "unmatched_tool_call_ids": [], "unmatched_tool_result_ids": [] }, "cache_evidence": { "available": true, "cached_tokens": 0, "reasoning_tokens": 0, "prompt_cache_hit_tokens": null, "prompt_cache_miss_tokens": null }, "source_event_lineage": { "first_seq": 1, "last_seq": 42, "event_count": 42, "missing_event_ranges": [] }, "side_effect_boundary": { "mode": "agent", "auto_approve": false, "approval_required_count": 1, "approval_allowed_count": 0, "approval_denied_count": 1, "command_execution_count": 0, "file_change_count": 0, "sandbox_denied_count": 0 }, "claim_ceiling": [ "local_receipt_only", "not_safety_certification", "not_provider_compatibility_certification" ] }A low-risk implementation path could be docs-first:
docs/RECEIPTS.mdwith the intended schema and boundaries.Open questions:
This is not a compatibility certification, safety claim, or endorsement request. It is a proposal for a small local audit/export surface that seems aligned with the runtime API contract and CodeWhale's local-first security boundary.