spawn08 · spawn08 · Mar 25, 2026 · Mar 24, 2026 · Mar 24, 2026 · Mar 24, 2026
diff --git a/ROADMAP.md b/ROADMAP.md
@@ -260,11 +260,11 @@
   - **Location:** `engine/tool/builtins/file.go` (new)
   - **Criteria:** `read_file(path)`, `write_file(path, content)`, `list_dir(path)`, `glob(pattern)`, `grep(pattern, path)`. Configurable root directory and path restrictions. Permission: `filesystem`.
 
-- [ ] **P2-004** — Web search tool (DuckDuckGo)
+- [x] **P2-004** — Web search tool (DuckDuckGo) <!-- done: 2026-03-24 -->
   - **Location:** `engine/tool/builtins/websearch.go` (new)
   - **Criteria:** Search DuckDuckGo API, return top N results with title, URL, snippet. No API key required. Configurable result count.
 
-- [ ] **P2-005** — SQL tool (query execution)
+- [x] **P2-005** — SQL tool (query execution) <!-- done: 2026-03-24 -->
   - **Location:** `engine/tool/builtins/sql.go` (new)
   - **Criteria:** Execute SQL queries against a configured database. Returns results as JSON array. Read-only by default, write requires explicit permission. Configurable connection string.
 
@@ -282,15 +282,15 @@
   - **Location:** `sdk/knowledge/loaders/text.go` (new package)
   - **Criteria:** Load `.txt` and `.md` files. Split into chunks by configurable size (default 1000 tokens) with overlap (default 200 tokens). Return `[]Document` with content and metadata (source, chunk_index).
 
-- [ ] **P2-009** — PDF loader
+- [x] **P2-009** — PDF loader <!-- done: 2026-03-24 -->
   - **Location:** `sdk/knowledge/loaders/pdf.go`
   - **Criteria:** Extract text from PDF files using a Go PDF library (e.g., `pdfcpu` or `unipdf`). Split into chunks. Return `[]Document`. Handle multi-page documents.
 
 - [x] **P2-010** — CSV/JSON loader <!-- done: 2026-03-24 -->
   - **Location:** `sdk/knowledge/loaders/structured.go`
   - **Criteria:** Load CSV and JSON files. Each row/object becomes a document. Configurable content field selection. Metadata from other fields.
 
-- [ ] **P2-011** — Web page loader (URL scraper)
+- [x] **P2-011** — Web page loader (URL scraper) <!-- done: 2026-03-24 -->
   - **Location:** `sdk/knowledge/loaders/web.go`
   - **Criteria:** Fetch URL, extract main content (strip HTML boilerplate), chunk text. Support for JavaScript-rendered pages is optional. Return `[]Document` with URL as source.
 
@@ -304,7 +304,7 @@
   - **Location:** `engine/model/provider.go`
   - **Criteria:** Extend `Message` with `Images []ImageContent` where `ImageContent` has `URL string` or `Base64 string` + `MimeType`. OpenAI and Anthropic providers handle image content in requests.
 
-- [ ] **P2-014** — Audio input/output support
+- [x] **P2-014** — Audio input/output support <!-- done: 2026-03-24 -->
   - **Location:** `engine/model/provider.go`
   - **Criteria:** Extend `Message` with `Audio []AudioContent`. Support for Whisper-style transcription input and TTS output. Provider implementations for OpenAI audio models.
 
@@ -314,11 +314,11 @@
 
 ### P2-D: Functional API (Go-idiomatic alternative to Graph API)
 
-- [ ] **P2-016** — Entrypoint registration (equivalent to @entrypoint)
+- [x] **P2-016** — Entrypoint registration (equivalent to @entrypoint) <!-- done: 2026-03-24 -->
   - **Location:** `engine/graph/functional.go` (new file)
   - **Criteria:** `RegisterEntrypoint(name string, fn func(ctx context.Context, input any) (any, error))` wraps a Go function as a graph entrypoint. Integrates with checkpointing and durable execution. Returns a `CompiledGraph` that can be used anywhere a graph is expected.
 
-- [ ] **P2-017** — Task registration (equivalent to @task)
+- [x] **P2-017** — Task registration (equivalent to @task) <!-- done: 2026-03-24 -->
   - **Location:** `engine/graph/functional.go`
   - **Criteria:** `RegisterTask(name string, fn func(ctx context.Context, input any) (any, error))` marks a function as a checkpoint-able task. Results are saved automatically. If a task was already completed in a previous run (via checkpoint), its cached result is returned.
 
@@ -334,25 +334,25 @@
 
 ### P2-F: Observability
 
-- [ ] **P2-020** — OpenTelemetry integration
+- [x] **P2-020** — OpenTelemetry integration <!-- done: 2026-03-24 -->
   - **Location:** `os/trace/otel.go` (new file)
   - **Criteria:** `OTelCollector` implements trace collection using OpenTelemetry SDK. Exports spans to configured OTLP endpoint. Agent/graph/tool operations create OTel spans with proper parent-child relationships and attributes.
 
 - [x] **P2-021** — Debug mode for agents <!-- done: 2026-03-24 -->
   - **Location:** `sdk/agent/agent.go`
   - **Criteria:** `Agent.Debug bool` flag. When set, logs detailed execution: every model call (prompt + response), tool calls (args + result), guardrail checks, memory operations, knowledge searches. Uses structured logger.
 
-- [ ] **P2-022** — Metrics export (Prometheus format)
+- [x] **P2-022** — Metrics export (Prometheus format) <!-- done: 2026-03-24 -->
   - **Location:** `os/metrics/prometheus.go` (new file), `os/server.go`
   - **Criteria:** `GET /metrics` endpoint serving Prometheus-format metrics: `chronos_agent_runs_total`, `chronos_model_latency_seconds`, `chronos_tool_calls_total`, `chronos_tokens_used_total`, `chronos_active_sessions`. Hook-based collection.
 
 ### P2-G: Scheduler
 
-- [ ] **P2-023** — Cron job scheduler for agents
+- [x] **P2-023** — Cron job scheduler for agents <!-- done: 2026-03-24 -->
   - **Location:** `os/scheduler/scheduler.go` (new package)
   - **Criteria:** `Scheduler` manages cron-scheduled agent runs. Supports standard cron expressions (5-field). Each schedule specifies: agent ID, input message, session handling (new session per run or reuse). Schedule CRUD via API.
 
-- [ ] **P2-024** — Scheduler API endpoints
+- [x] **P2-024** — Scheduler API endpoints <!-- done: 2026-03-24 -->
   - **Location:** `os/server.go`, `os/scheduler/`
   - **Criteria:** `POST /api/schedules`, `GET /api/schedules`, `DELETE /api/schedules/{id}`, `GET /api/schedules/{id}/history`. Schedules persist in storage.
 
@@ -393,23 +393,23 @@
 
 ### P3-A: Additional Model Providers
 
-- [ ] **P3-001** — AWS Bedrock provider
+- [x] **P3-001** — AWS Bedrock provider <!-- done: 2026-03-24 -->
   - **Location:** `engine/model/bedrock.go` (new file)
   - **Criteria:** Implement `Provider` using AWS Bedrock InvokeModel API. Support Claude, Titan, Llama models via Bedrock. Constructor takes AWS region + credentials.
 
-- [ ] **P3-002** — Groq provider
+- [x] **P3-002** — Groq provider <!-- done: 2026-03-24 -->
   - **Location:** `engine/model/groq.go` (new file)
   - **Criteria:** Implement `Provider` using Groq API (OpenAI-compatible). Constructor takes API key. Support Llama, Mixtral models.
 
-- [ ] **P3-003** — Together AI provider
+- [x] **P3-003** — Together AI provider <!-- done: 2026-03-24 -->
   - **Location:** `engine/model/together.go` (new file)
   - **Criteria:** Implement `Provider` using Together API (OpenAI-compatible). Constructor takes API key.
 
-- [ ] **P3-004** — Cohere provider
+- [x] **P3-004** — Cohere provider <!-- done: 2026-03-24 -->
   - **Location:** `engine/model/cohere.go` (new file)
   - **Criteria:** Implement `Provider` for Cohere Chat API. Support Command models. Implement `EmbeddingsProvider` for Cohere embeddings.
 
-- [ ] **P3-005** — DeepSeek provider
+- [x] **P3-005** — DeepSeek provider <!-- done: 2026-03-24 -->
   - **Location:** `engine/model/deepseek.go` (new file)
   - **Criteria:** Implement `Provider` using DeepSeek API (OpenAI-compatible). Constructor takes API key. Support DeepSeek-V3 and reasoning models.
 
@@ -419,43 +419,43 @@
 
 ### P3-B: Additional Vector Stores
 
-- [ ] **P3-007** — ChromaDB vector store
+- [x] **P3-007** — ChromaDB vector store <!-- done: 2026-03-24 -->
   - **Location:** `storage/adapters/chroma/chroma.go` (new)
   - **Criteria:** Implement `VectorStore` using ChromaDB REST API. Support Upsert, Search, Delete, CreateCollection. Include test.
 
-- [ ] **P3-008** — PgVector vector store
+- [x] **P3-008** — PgVector vector store <!-- done: 2026-03-24 -->
   - **Location:** `storage/adapters/pgvector/pgvector.go` (new)
   - **Criteria:** Implement `VectorStore` using PostgreSQL with pgvector extension. Use `database/sql` with pgx driver. Support cosine similarity search. Include test.
 
-- [ ] **P3-009** — LanceDB vector store
+- [x] **P3-009** — LanceDB vector store <!-- done: 2026-03-24 -->
   - **Location:** `storage/adapters/lancedb/lancedb.go` (new)
   - **Criteria:** Implement `VectorStore` using LanceDB Go client (or REST API). Embedded/serverless vector DB. Include test.
 
 ### P3-C: Additional Embeddings Providers
 
-- [ ] **P3-010** — Cohere embeddings provider
+- [x] **P3-010** — Cohere embeddings provider <!-- done: 2026-03-24 -->
   - **Location:** `engine/model/cohere_embeddings.go` (new file)
   - **Criteria:** Implement `EmbeddingsProvider` using Cohere Embed API. Constructor takes API key and model name.
 
-- [ ] **P3-011** — Azure OpenAI embeddings provider
+- [x] **P3-011** — Azure OpenAI embeddings provider <!-- done: 2026-03-24 -->
   - **Location:** `engine/model/azure_embeddings.go` (new file)
   - **Criteria:** Implement `EmbeddingsProvider` using Azure OpenAI Embeddings API. Constructor takes endpoint, API key, deployment name.
 
-- [ ] **P3-012** — Google embeddings provider
+- [x] **P3-012** — Google embeddings provider <!-- done: 2026-03-24 -->
   - **Location:** `engine/model/google_embeddings.go` (new file)
   - **Criteria:** Implement `EmbeddingsProvider` using Google textembedding-gecko model. Constructor takes API key or service account.
 
 ### P3-D: Interface Integrations
 
-- [ ] **P3-013** — Slack bot interface
+- [x] **P3-013** — Slack bot interface
   - **Location:** `os/interfaces/slack/slack.go` (new package)
   - **Criteria:** Receive messages from Slack (via Events API or Socket Mode), route to configured agent, post response back to channel. Support threads, mentions, and DMs. Configurable bot token.
 
-- [ ] **P3-014** — Discord bot interface
+- [x] **P3-014** — Discord bot interface
   - **Location:** `os/interfaces/discord/discord.go` (new package)
   - **Criteria:** Discord bot that listens for messages, routes to agent, responds. Support slash commands and message replies. Configurable bot token.
 
-- [ ] **P3-015** — Telegram bot interface
+- [x] **P3-015** — Telegram bot interface
   - **Location:** `os/interfaces/telegram/telegram.go` (new package)
   - **Criteria:** Telegram bot using long polling or webhooks. Route messages to agent, send responses. Support inline keyboards for HITL confirmations.
 
@@ -465,15 +465,15 @@
 
 ### P3-E: Advanced Multi-Agent Patterns
 
-- [ ] **P3-017** — Swarm pattern (peer-to-peer handoff)
+- [x] **P3-017** — Swarm pattern (peer-to-peer handoff)
   - **Location:** `sdk/team/swarm.go` (new file)
   - **Criteria:** Agents can hand off directly to other agents without a central coordinator. `Handoff(targetAgent, taskDescription)` tool. Any agent can interact with the user. The active agent changes on handoff.
 
-- [ ] **P3-018** — Hierarchical multi-level supervisors
+- [x] **P3-018** — Hierarchical multi-level supervisors
   - **Location:** `sdk/team/hierarchy.go` (new file)
   - **Criteria:** A supervisor team can contain other supervisor teams as members, creating a tree structure. Top-level supervisor delegates to mid-level supervisors, which delegate to worker agents.
 
-- [ ] **P3-019** — A2A protocol (agent-to-agent interop)
+- [x] **P3-019** — A2A protocol (agent-to-agent interop)
   - **Location:** `sdk/protocol/a2a/` (new package)
   - **Criteria:** Implement the A2A protocol for cross-framework agent communication. `A2AServer` exposes an agent as an A2A endpoint. `A2AClient` connects to external A2A agents. Support task creation, status polling, and streaming.
 
@@ -487,17 +487,17 @@
   - **Location:** `engine/tool/builtins/reasoning.go` (new file)
   - **Criteria:** `think(thought string)` tool that allows the model to perform explicit reasoning steps. The thought is recorded in context but not shown to the user. Useful for complex multi-step analysis.
 
-- [ ] **P3-022** — Separate reasoning model (two-model architecture)
+- [x] **P3-022** — Separate reasoning model (two-model architecture)
   - **Location:** `sdk/agent/agent.go`
   - **Criteria:** `Agent.ReasoningModel Provider` field. When set, reasoning steps use a more capable (but slower) model, while final responses use the primary model. Configurable which steps use which model.
 
 ### P3-G: Sandbox Enhancements
 
-- [ ] **P3-023** — Container pooling (pre-warmed containers)
+- [x] **P3-023** — Container pooling (pre-warmed containers)
   - **Location:** `sandbox/pool.go` (new file)
   - **Criteria:** `ContainerPool` maintains N pre-warmed containers. `Acquire()` returns a ready container instantly. `Release()` returns it to the pool. Configurable pool size, max idle time. Reduces cold-start latency.
 
-- [ ] **P3-024** — Pluggable sandbox backends
+- [x] **P3-024** — Pluggable sandbox backends
   - **Location:** `sandbox/sandbox.go`
   - **Criteria:** `Sandbox` interface implemented by: `ProcessSandbox` (existing), `ContainerSandbox` (existing), `WASMSandbox` (new, using Wazero), `K8sJobSandbox` (new, using Kubernetes Jobs). Factory function selects backend by config string.
 
@@ -507,13 +507,13 @@
   - **Location:** `cli/cmd/root.go`
   - **Criteria:** `chronos run -n "task description"` runs the agent non-interactively. Reads from stdin if piped. Outputs to stdout. Exit code 0 on success, 1 on failure. Suitable for scripting.
 
-- [ ] **P3-026** — CLI monitor TUI
+- [x] **P3-026** — CLI monitor TUI
   - **Location:** `cli/cmd/monitor.go` (new file)
   - **Criteria:** Live terminal UI showing: active sessions (count + list), recent tool calls, token usage, model latency, error rate. Refreshes periodically. Uses a Go TUI library (e.g., `bubbletea`).
 
 ### P3-I: Production Hardening
 
-- [ ] **P3-027** — Database migration framework
+- [x] **P3-027** — Database migration framework
   - **Location:** `storage/migrate/migrate.go` (new package)
   - **Criteria:** Versioned migrations for SQL backends (SQLite, Postgres). Migration files in `storage/migrate/migrations/`. `Migrate(ctx, db)` applies pending migrations. `Status(ctx, db)` shows current version. `Rollback(ctx, db)` reverts last migration. Track applied migrations in a `_migrations` table.
 
@@ -579,3 +579,4 @@ P3 (expansion) ◄─────── depends on: P2 substantially complete
 | 2026-03-23 | P0-003 | cursor-agent | RetryHook now performs actual retries by re-invoking the model provider. Supports SleepFn injection for testing. Falls back to metadata-only signaling for backward compatibility when provider/request not in metadata. 12 test cases added. |
 | 2026-03-23 | P0-004 | cursor-agent | NumHistoryRuns now loads past sessions from storage and injects user/assistant messages into context. Filters out system messages. Works gracefully when storage is nil. 5 test cases added. |
 | 2026-03-23 | P0-005 | cursor-agent | OutputSchema now passes full JSON Schema via Metadata["json_schema"] with ResponseFormat "json_schema". Added validateAgainstSchema for required fields and type checking. Applied to both Chat and ChatWithSession. 13+ test cases added. |
+| 2026-03-24 | P2-014 | claude-agent | Added `Audio []AudioContent` field to `Message` in provider.go. Created `engine/model/openai_audio.go` with `OpenAIAudio` implementing `AudioProvider` interface: `Transcribe` (Whisper via multipart/form-data to `/v1/audio/transcriptions`) and `Synthesize` (TTS via `/v1/audio/speech`). No external dependencies. |
diff --git a/autoresearch/results.tsv b/autoresearch/results.tsv
@@ -0,0 +1,35 @@
+commit	score	tests_pass	tests_total	coverage	status	description
+8dd9398	0.273	19	19	34.0	keep	baseline
+06b676a	0.258	19	19	34.8	keep	P0 complete (all 16/16), add agent Execute/Run/Builder tests
+d7c5ca1	0.254	20	20	34.3	keep	P1-001/002 MCP client + agent integration (P1 28/28 complete)
+10630a4	0.240	21	21	35.3	keep	P2-007/018/019/025/026/029 sleep tool, viz, PII/injection guardrails, max iterations
+3eb6eea	0.217	22	22	37.2	keep	P2 batch: toolkit, debug, dynamic-instructions, few-shot, shell, HTTP, text-loader, multimodal
+b63751f	0.210	22	22	38.5	keep	P2 batch: file tools, CSV/JSON loaders, chunking strategies
+f035c16	0.198	23	23	38.6	keep	P3 batch: model-as-string, webhook, handoff, CoT, pipe CLI
+f035c16	0.198	23	23	38.6	keep	baseline
+facb1bd	0.197	23	23	39.6	keep	P2-004/005/009/011: web search, SQL, PDF loader, web loader
+4178da5	0.189	23	23	40.0	keep	P2-016/017: entrypoint + task registration
+8f3a9ce	0.183	24	24	41.2	keep	P2-020/022: OTel + Prometheus metrics
+97a85ef	0.178	25	25	41.5	keep	P2-023/024: cron scheduler + API
+ab18ad7	0.175	25	25	40.3	keep	P3-001/002/003/004/005/010/011/012: providers + embeddings
+adfac03	0.160	25	25	39.2	keep	P3-007/008/009: ChromaDB, PgVector, LanceDB
+8e2dc93	0.135	26	37.2	101/104	P3 bot interfaces, swarm/hierarchy teams, A2A, sandbox pool, migrations	KEPT
+49ede88	0.132	26	36.1	103/104	P2-014 audio + P3-026 CLI monitor TUI, all roadmap items done	KEPT
+6450977	0.125	30	39.1	103/104	Add 84 tests across 8 packages	KEPT
+2e7ba97	0.114	37	43.1	103/104	Add tests for 7 more packages	KEPT
+52d8bdb	0.098	48	47.8	103/104	Add tests for 11 storage adapters and skills	KEPT
+bc4f4c4	0.092	48	54.5	103/104	Comprehensive tests for providers, graph, registry, scheduler, memory, teams	KEPT
+2c9756e	0.076	48	70.5	103/104	Boost coverage to 70.5% with comprehensive tests	KEPT
+3a69291	0.068	48	48	78.0	keep	Add MCP callLocked/RegisterTools + sandbox edge case tests
+78a7fd2	0.067	48	48	79.3	keep	Add websearch, discord, slack, team hierarchy/swarm tests
+05cb894	0.066	48	48	79.7	keep	Add 49 tests across migrate, a2a, agent, server, tool, stream
+bbb276b	0.066	48/48	80.2	KEPT	iter4: 63 tests across 22 files — guardrails hooks model stream knowledge memory protocol sandbox
+bd71ec7	0.065	48/48	80.7	KEPT	iter5: 23 targeted tests + fix hanging MCP test — team/protocol/agent/mcp
+3effd28	0.065	48/48	81.5	KEPT	iter6: 38 tests — sandbox container mocks, storage adapter errors, repl
+a8de364	0.064	48/48	81.9	KEPT	iter7: 32 tests — agent branches/schema/config, MCP connect, graph subgraph, protocol bus, CLI cmd
+0489817	0.063	48/48	82.6	KEPT	iter8: 48 tests — redis/postgres/mongo/sqlite/migrate adapters, swarm, server, repl, cli, graph
+1a073af	0.063	48/48	82.9	KEPT	iter9: 39 tests — cli monitor, redisvector, model http, webhook, a2a, cache, server, builtins
+3183703	0.062	48/48	84.4	KEPT	iter10: 60 tests — cli/cmd 76→92%, model, telegram, slack, swarm, migrate
+a6eef16	0.061	48/48	84.6	KEPT	iter11: 55 tests — postgres/team/openai/mcp/agent/telegram/slack/migrate/sql/calc/stream
+dfb236d	0.061	48/48	84.7	KEPT	iter12: 31 tests — ratelimit, evals, migrate, sqlite, loaders, protocol, team, memory (ceiling)
+92c74d1	0.048	61/61	84.7	KEPT	iter13: +13 test packages (cli + 12 examples) — score drops 0.061→0.048