Skip to content

Proposal: read-only conformance receipt export for runtime thread/turn replay #2556

@ctxyao

Description

@ctxyao

Hi CodeWhale maintainers,

I have been looking at the public CodeWhale runtime surface as a local supervisor integration point. The existing runtime model already has most of the ingredients that downstream tools would need for auditability: durable ThreadRecord / TurnRecord / TurnItemRecord, append-only events, usage aggregation, approval events, sandbox boundaries, and workspace rollback.

Would a small read-only receipt export fit the project roadmap?

The narrow goal would be to let local supervisors verify one completed turn without screen-scraping terminal output:

  • reasoning-block custody without exporting raw reasoning by default
  • tool-call to tool-result lineage
  • cache/usage evidence when available, with explicit unavailable when provider-specific hit/miss is not stored
  • event sequence lineage across replay/resume
  • approval, sandbox, YOLO/auto-approve, command/file-change side-effect boundaries
  • rollback/snapshot availability

One possible local-only surface:

codewhale receipt export --thread <thread_id> --turn <turn_id> --format json
GET /v1/threads/{thread_id}/turns/{turn_id}/receipt

This should be read-only and reuse the existing local runtime auth boundary. It should not call providers, execute tools, write memory, write target files, or expose API keys. For privacy, raw reasoning should stay omitted by default; a receipt can use counts, hashes/refs, and item/event ids instead.

Example sketch:

{
  "schema_id": "codewhale.conformance-receipt/v0",
  "thread": {
    "id": "thr_...",
    "model": "deepseek-v4-pro",
    "mode": "agent",
    "auto_approve": false
  },
  "turn": {
    "id": "turn_...",
    "status": "completed"
  },
  "reasoning_custody": {
    "raw_reasoning_exported": false,
    "reasoning_blocks_observed": true,
    "reasoning_block_hashes": ["sha256:..."]
  },
  "tool_lineage": {
    "tool_call_count": 1,
    "tool_result_count": 1,
    "unmatched_tool_call_ids": [],
    "unmatched_tool_result_ids": []
  },
  "cache_evidence": {
    "available": true,
    "cached_tokens": 0,
    "reasoning_tokens": 0,
    "prompt_cache_hit_tokens": null,
    "prompt_cache_miss_tokens": null
  },
  "source_event_lineage": {
    "first_seq": 1,
    "last_seq": 42,
    "event_count": 42,
    "missing_event_ranges": []
  },
  "side_effect_boundary": {
    "mode": "agent",
    "auto_approve": false,
    "approval_required_count": 1,
    "approval_allowed_count": 0,
    "approval_denied_count": 1,
    "command_execution_count": 0,
    "file_change_count": 0,
    "sandbox_denied_count": 0
  },
  "claim_ceiling": [
    "local_receipt_only",
    "not_safety_certification",
    "not_provider_compatibility_certification"
  ]
}

A low-risk implementation path could be docs-first:

  1. Add a short receipt-export section or docs/RECEIPTS.md with the intended schema and boundaries.
  2. Add protocol structs and snapshot tests.
  3. Add a pure receipt builder over persisted thread/turn/item/event records.
  4. Expose the local runtime API endpoint.
  5. Add the CLI export/validate command.

Open questions:

  • Are reasoning blocks persisted as item metadata, or only streamed to the UI?
  • Is per-turn DeepSeek cache hit/miss metadata available, or only aggregate cached/reasoning tokens?
  • Is side-git snapshot metadata addressable per turn from the runtime store?
  • Would this be better as docs/protocol-fixture first, rather than starting with runtime endpoint code?

This is not a compatibility certification, safety claim, or endorsement request. It is a proposal for a small local audit/export surface that seems aligned with the runtime API contract and CodeWhale's local-first security boundary.

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationenhancementNew feature or request

    Projects

    Status
    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions