Skip to content

Commit 4e2da9a

Browse files
authored
Merge pull request #15 from spawn08/autoresearch/mar24
Implement multiple tools, metrics, schedulers, and extensive tests
2 parents 67531b8 + 307b927 commit 4e2da9a

File tree

250 files changed

+35736
-54
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

250 files changed

+35736
-54
lines changed

ROADMAP.md

Lines changed: 34 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -260,11 +260,11 @@
260260
- **Location:** `engine/tool/builtins/file.go` (new)
261261
- **Criteria:** `read_file(path)`, `write_file(path, content)`, `list_dir(path)`, `glob(pattern)`, `grep(pattern, path)`. Configurable root directory and path restrictions. Permission: `filesystem`.
262262

263-
- [ ] **P2-004** — Web search tool (DuckDuckGo)
263+
- [x] **P2-004** — Web search tool (DuckDuckGo) <!-- done: 2026-03-24 -->
264264
- **Location:** `engine/tool/builtins/websearch.go` (new)
265265
- **Criteria:** Search DuckDuckGo API, return top N results with title, URL, snippet. No API key required. Configurable result count.
266266

267-
- [ ] **P2-005** — SQL tool (query execution)
267+
- [x] **P2-005** — SQL tool (query execution) <!-- done: 2026-03-24 -->
268268
- **Location:** `engine/tool/builtins/sql.go` (new)
269269
- **Criteria:** Execute SQL queries against a configured database. Returns results as JSON array. Read-only by default, write requires explicit permission. Configurable connection string.
270270

@@ -282,15 +282,15 @@
282282
- **Location:** `sdk/knowledge/loaders/text.go` (new package)
283283
- **Criteria:** Load `.txt` and `.md` files. Split into chunks by configurable size (default 1000 tokens) with overlap (default 200 tokens). Return `[]Document` with content and metadata (source, chunk_index).
284284

285-
- [ ] **P2-009** — PDF loader
285+
- [x] **P2-009** — PDF loader <!-- done: 2026-03-24 -->
286286
- **Location:** `sdk/knowledge/loaders/pdf.go`
287287
- **Criteria:** Extract text from PDF files using a Go PDF library (e.g., `pdfcpu` or `unipdf`). Split into chunks. Return `[]Document`. Handle multi-page documents.
288288

289289
- [x] **P2-010** — CSV/JSON loader <!-- done: 2026-03-24 -->
290290
- **Location:** `sdk/knowledge/loaders/structured.go`
291291
- **Criteria:** Load CSV and JSON files. Each row/object becomes a document. Configurable content field selection. Metadata from other fields.
292292

293-
- [ ] **P2-011** — Web page loader (URL scraper)
293+
- [x] **P2-011** — Web page loader (URL scraper) <!-- done: 2026-03-24 -->
294294
- **Location:** `sdk/knowledge/loaders/web.go`
295295
- **Criteria:** Fetch URL, extract main content (strip HTML boilerplate), chunk text. Support for JavaScript-rendered pages is optional. Return `[]Document` with URL as source.
296296

@@ -304,7 +304,7 @@
304304
- **Location:** `engine/model/provider.go`
305305
- **Criteria:** Extend `Message` with `Images []ImageContent` where `ImageContent` has `URL string` or `Base64 string` + `MimeType`. OpenAI and Anthropic providers handle image content in requests.
306306

307-
- [ ] **P2-014** — Audio input/output support
307+
- [x] **P2-014** — Audio input/output support <!-- done: 2026-03-24 -->
308308
- **Location:** `engine/model/provider.go`
309309
- **Criteria:** Extend `Message` with `Audio []AudioContent`. Support for Whisper-style transcription input and TTS output. Provider implementations for OpenAI audio models.
310310

@@ -314,11 +314,11 @@
314314

315315
### P2-D: Functional API (Go-idiomatic alternative to Graph API)
316316

317-
- [ ] **P2-016** — Entrypoint registration (equivalent to @entrypoint)
317+
- [x] **P2-016** — Entrypoint registration (equivalent to @entrypoint) <!-- done: 2026-03-24 -->
318318
- **Location:** `engine/graph/functional.go` (new file)
319319
- **Criteria:** `RegisterEntrypoint(name string, fn func(ctx context.Context, input any) (any, error))` wraps a Go function as a graph entrypoint. Integrates with checkpointing and durable execution. Returns a `CompiledGraph` that can be used anywhere a graph is expected.
320320

321-
- [ ] **P2-017** — Task registration (equivalent to @task)
321+
- [x] **P2-017** — Task registration (equivalent to @task) <!-- done: 2026-03-24 -->
322322
- **Location:** `engine/graph/functional.go`
323323
- **Criteria:** `RegisterTask(name string, fn func(ctx context.Context, input any) (any, error))` marks a function as a checkpoint-able task. Results are saved automatically. If a task was already completed in a previous run (via checkpoint), its cached result is returned.
324324

@@ -334,25 +334,25 @@
334334

335335
### P2-F: Observability
336336

337-
- [ ] **P2-020** — OpenTelemetry integration
337+
- [x] **P2-020** — OpenTelemetry integration <!-- done: 2026-03-24 -->
338338
- **Location:** `os/trace/otel.go` (new file)
339339
- **Criteria:** `OTelCollector` implements trace collection using OpenTelemetry SDK. Exports spans to configured OTLP endpoint. Agent/graph/tool operations create OTel spans with proper parent-child relationships and attributes.
340340

341341
- [x] **P2-021** — Debug mode for agents <!-- done: 2026-03-24 -->
342342
- **Location:** `sdk/agent/agent.go`
343343
- **Criteria:** `Agent.Debug bool` flag. When set, logs detailed execution: every model call (prompt + response), tool calls (args + result), guardrail checks, memory operations, knowledge searches. Uses structured logger.
344344

345-
- [ ] **P2-022** — Metrics export (Prometheus format)
345+
- [x] **P2-022** — Metrics export (Prometheus format) <!-- done: 2026-03-24 -->
346346
- **Location:** `os/metrics/prometheus.go` (new file), `os/server.go`
347347
- **Criteria:** `GET /metrics` endpoint serving Prometheus-format metrics: `chronos_agent_runs_total`, `chronos_model_latency_seconds`, `chronos_tool_calls_total`, `chronos_tokens_used_total`, `chronos_active_sessions`. Hook-based collection.
348348

349349
### P2-G: Scheduler
350350

351-
- [ ] **P2-023** — Cron job scheduler for agents
351+
- [x] **P2-023** — Cron job scheduler for agents <!-- done: 2026-03-24 -->
352352
- **Location:** `os/scheduler/scheduler.go` (new package)
353353
- **Criteria:** `Scheduler` manages cron-scheduled agent runs. Supports standard cron expressions (5-field). Each schedule specifies: agent ID, input message, session handling (new session per run or reuse). Schedule CRUD via API.
354354

355-
- [ ] **P2-024** — Scheduler API endpoints
355+
- [x] **P2-024** — Scheduler API endpoints <!-- done: 2026-03-24 -->
356356
- **Location:** `os/server.go`, `os/scheduler/`
357357
- **Criteria:** `POST /api/schedules`, `GET /api/schedules`, `DELETE /api/schedules/{id}`, `GET /api/schedules/{id}/history`. Schedules persist in storage.
358358

@@ -393,23 +393,23 @@
393393
394394
### P3-A: Additional Model Providers
395395

396-
- [ ] **P3-001** — AWS Bedrock provider
396+
- [x] **P3-001** — AWS Bedrock provider <!-- done: 2026-03-24 -->
397397
- **Location:** `engine/model/bedrock.go` (new file)
398398
- **Criteria:** Implement `Provider` using AWS Bedrock InvokeModel API. Support Claude, Titan, Llama models via Bedrock. Constructor takes AWS region + credentials.
399399

400-
- [ ] **P3-002** — Groq provider
400+
- [x] **P3-002** — Groq provider <!-- done: 2026-03-24 -->
401401
- **Location:** `engine/model/groq.go` (new file)
402402
- **Criteria:** Implement `Provider` using Groq API (OpenAI-compatible). Constructor takes API key. Support Llama, Mixtral models.
403403

404-
- [ ] **P3-003** — Together AI provider
404+
- [x] **P3-003** — Together AI provider <!-- done: 2026-03-24 -->
405405
- **Location:** `engine/model/together.go` (new file)
406406
- **Criteria:** Implement `Provider` using Together API (OpenAI-compatible). Constructor takes API key.
407407

408-
- [ ] **P3-004** — Cohere provider
408+
- [x] **P3-004** — Cohere provider <!-- done: 2026-03-24 -->
409409
- **Location:** `engine/model/cohere.go` (new file)
410410
- **Criteria:** Implement `Provider` for Cohere Chat API. Support Command models. Implement `EmbeddingsProvider` for Cohere embeddings.
411411

412-
- [ ] **P3-005** — DeepSeek provider
412+
- [x] **P3-005** — DeepSeek provider <!-- done: 2026-03-24 -->
413413
- **Location:** `engine/model/deepseek.go` (new file)
414414
- **Criteria:** Implement `Provider` using DeepSeek API (OpenAI-compatible). Constructor takes API key. Support DeepSeek-V3 and reasoning models.
415415

@@ -419,43 +419,43 @@
419419

420420
### P3-B: Additional Vector Stores
421421

422-
- [ ] **P3-007** — ChromaDB vector store
422+
- [x] **P3-007** — ChromaDB vector store <!-- done: 2026-03-24 -->
423423
- **Location:** `storage/adapters/chroma/chroma.go` (new)
424424
- **Criteria:** Implement `VectorStore` using ChromaDB REST API. Support Upsert, Search, Delete, CreateCollection. Include test.
425425

426-
- [ ] **P3-008** — PgVector vector store
426+
- [x] **P3-008** — PgVector vector store <!-- done: 2026-03-24 -->
427427
- **Location:** `storage/adapters/pgvector/pgvector.go` (new)
428428
- **Criteria:** Implement `VectorStore` using PostgreSQL with pgvector extension. Use `database/sql` with pgx driver. Support cosine similarity search. Include test.
429429

430-
- [ ] **P3-009** — LanceDB vector store
430+
- [x] **P3-009** — LanceDB vector store <!-- done: 2026-03-24 -->
431431
- **Location:** `storage/adapters/lancedb/lancedb.go` (new)
432432
- **Criteria:** Implement `VectorStore` using LanceDB Go client (or REST API). Embedded/serverless vector DB. Include test.
433433

434434
### P3-C: Additional Embeddings Providers
435435

436-
- [ ] **P3-010** — Cohere embeddings provider
436+
- [x] **P3-010** — Cohere embeddings provider <!-- done: 2026-03-24 -->
437437
- **Location:** `engine/model/cohere_embeddings.go` (new file)
438438
- **Criteria:** Implement `EmbeddingsProvider` using Cohere Embed API. Constructor takes API key and model name.
439439

440-
- [ ] **P3-011** — Azure OpenAI embeddings provider
440+
- [x] **P3-011** — Azure OpenAI embeddings provider <!-- done: 2026-03-24 -->
441441
- **Location:** `engine/model/azure_embeddings.go` (new file)
442442
- **Criteria:** Implement `EmbeddingsProvider` using Azure OpenAI Embeddings API. Constructor takes endpoint, API key, deployment name.
443443

444-
- [ ] **P3-012** — Google embeddings provider
444+
- [x] **P3-012** — Google embeddings provider <!-- done: 2026-03-24 -->
445445
- **Location:** `engine/model/google_embeddings.go` (new file)
446446
- **Criteria:** Implement `EmbeddingsProvider` using Google textembedding-gecko model. Constructor takes API key or service account.
447447

448448
### P3-D: Interface Integrations
449449

450-
- [ ] **P3-013** — Slack bot interface
450+
- [x] **P3-013** — Slack bot interface
451451
- **Location:** `os/interfaces/slack/slack.go` (new package)
452452
- **Criteria:** Receive messages from Slack (via Events API or Socket Mode), route to configured agent, post response back to channel. Support threads, mentions, and DMs. Configurable bot token.
453453

454-
- [ ] **P3-014** — Discord bot interface
454+
- [x] **P3-014** — Discord bot interface
455455
- **Location:** `os/interfaces/discord/discord.go` (new package)
456456
- **Criteria:** Discord bot that listens for messages, routes to agent, responds. Support slash commands and message replies. Configurable bot token.
457457

458-
- [ ] **P3-015** — Telegram bot interface
458+
- [x] **P3-015** — Telegram bot interface
459459
- **Location:** `os/interfaces/telegram/telegram.go` (new package)
460460
- **Criteria:** Telegram bot using long polling or webhooks. Route messages to agent, send responses. Support inline keyboards for HITL confirmations.
461461

@@ -465,15 +465,15 @@
465465

466466
### P3-E: Advanced Multi-Agent Patterns
467467

468-
- [ ] **P3-017** — Swarm pattern (peer-to-peer handoff)
468+
- [x] **P3-017** — Swarm pattern (peer-to-peer handoff)
469469
- **Location:** `sdk/team/swarm.go` (new file)
470470
- **Criteria:** Agents can hand off directly to other agents without a central coordinator. `Handoff(targetAgent, taskDescription)` tool. Any agent can interact with the user. The active agent changes on handoff.
471471

472-
- [ ] **P3-018** — Hierarchical multi-level supervisors
472+
- [x] **P3-018** — Hierarchical multi-level supervisors
473473
- **Location:** `sdk/team/hierarchy.go` (new file)
474474
- **Criteria:** A supervisor team can contain other supervisor teams as members, creating a tree structure. Top-level supervisor delegates to mid-level supervisors, which delegate to worker agents.
475475

476-
- [ ] **P3-019** — A2A protocol (agent-to-agent interop)
476+
- [x] **P3-019** — A2A protocol (agent-to-agent interop)
477477
- **Location:** `sdk/protocol/a2a/` (new package)
478478
- **Criteria:** Implement the A2A protocol for cross-framework agent communication. `A2AServer` exposes an agent as an A2A endpoint. `A2AClient` connects to external A2A agents. Support task creation, status polling, and streaming.
479479

@@ -487,17 +487,17 @@
487487
- **Location:** `engine/tool/builtins/reasoning.go` (new file)
488488
- **Criteria:** `think(thought string)` tool that allows the model to perform explicit reasoning steps. The thought is recorded in context but not shown to the user. Useful for complex multi-step analysis.
489489

490-
- [ ] **P3-022** — Separate reasoning model (two-model architecture)
490+
- [x] **P3-022** — Separate reasoning model (two-model architecture)
491491
- **Location:** `sdk/agent/agent.go`
492492
- **Criteria:** `Agent.ReasoningModel Provider` field. When set, reasoning steps use a more capable (but slower) model, while final responses use the primary model. Configurable which steps use which model.
493493

494494
### P3-G: Sandbox Enhancements
495495

496-
- [ ] **P3-023** — Container pooling (pre-warmed containers)
496+
- [x] **P3-023** — Container pooling (pre-warmed containers)
497497
- **Location:** `sandbox/pool.go` (new file)
498498
- **Criteria:** `ContainerPool` maintains N pre-warmed containers. `Acquire()` returns a ready container instantly. `Release()` returns it to the pool. Configurable pool size, max idle time. Reduces cold-start latency.
499499

500-
- [ ] **P3-024** — Pluggable sandbox backends
500+
- [x] **P3-024** — Pluggable sandbox backends
501501
- **Location:** `sandbox/sandbox.go`
502502
- **Criteria:** `Sandbox` interface implemented by: `ProcessSandbox` (existing), `ContainerSandbox` (existing), `WASMSandbox` (new, using Wazero), `K8sJobSandbox` (new, using Kubernetes Jobs). Factory function selects backend by config string.
503503

@@ -507,13 +507,13 @@
507507
- **Location:** `cli/cmd/root.go`
508508
- **Criteria:** `chronos run -n "task description"` runs the agent non-interactively. Reads from stdin if piped. Outputs to stdout. Exit code 0 on success, 1 on failure. Suitable for scripting.
509509

510-
- [ ] **P3-026** — CLI monitor TUI
510+
- [x] **P3-026** — CLI monitor TUI
511511
- **Location:** `cli/cmd/monitor.go` (new file)
512512
- **Criteria:** Live terminal UI showing: active sessions (count + list), recent tool calls, token usage, model latency, error rate. Refreshes periodically. Uses a Go TUI library (e.g., `bubbletea`).
513513

514514
### P3-I: Production Hardening
515515

516-
- [ ] **P3-027** — Database migration framework
516+
- [x] **P3-027** — Database migration framework
517517
- **Location:** `storage/migrate/migrate.go` (new package)
518518
- **Criteria:** Versioned migrations for SQL backends (SQLite, Postgres). Migration files in `storage/migrate/migrations/`. `Migrate(ctx, db)` applies pending migrations. `Status(ctx, db)` shows current version. `Rollback(ctx, db)` reverts last migration. Track applied migrations in a `_migrations` table.
519519

@@ -579,3 +579,4 @@ P3 (expansion) ◄─────── depends on: P2 substantially complete
579579
| 2026-03-23 | P0-003 | cursor-agent | RetryHook now performs actual retries by re-invoking the model provider. Supports SleepFn injection for testing. Falls back to metadata-only signaling for backward compatibility when provider/request not in metadata. 12 test cases added. |
580580
| 2026-03-23 | P0-004 | cursor-agent | NumHistoryRuns now loads past sessions from storage and injects user/assistant messages into context. Filters out system messages. Works gracefully when storage is nil. 5 test cases added. |
581581
| 2026-03-23 | P0-005 | cursor-agent | OutputSchema now passes full JSON Schema via Metadata["json_schema"] with ResponseFormat "json_schema". Added validateAgainstSchema for required fields and type checking. Applied to both Chat and ChatWithSession. 13+ test cases added. |
582+
| 2026-03-24 | P2-014 | claude-agent | Added `Audio []AudioContent` field to `Message` in provider.go. Created `engine/model/openai_audio.go` with `OpenAIAudio` implementing `AudioProvider` interface: `Transcribe` (Whisper via multipart/form-data to `/v1/audio/transcriptions`) and `Synthesize` (TTS via `/v1/audio/speech`). No external dependencies. |

autoresearch/results.tsv

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
commit score tests_pass tests_total coverage status description
2+
8dd9398 0.273 19 19 34.0 keep baseline
3+
06b676a 0.258 19 19 34.8 keep P0 complete (all 16/16), add agent Execute/Run/Builder tests
4+
d7c5ca1 0.254 20 20 34.3 keep P1-001/002 MCP client + agent integration (P1 28/28 complete)
5+
10630a4 0.240 21 21 35.3 keep P2-007/018/019/025/026/029 sleep tool, viz, PII/injection guardrails, max iterations
6+
3eb6eea 0.217 22 22 37.2 keep P2 batch: toolkit, debug, dynamic-instructions, few-shot, shell, HTTP, text-loader, multimodal
7+
b63751f 0.210 22 22 38.5 keep P2 batch: file tools, CSV/JSON loaders, chunking strategies
8+
f035c16 0.198 23 23 38.6 keep P3 batch: model-as-string, webhook, handoff, CoT, pipe CLI
9+
f035c16 0.198 23 23 38.6 keep baseline
10+
facb1bd 0.197 23 23 39.6 keep P2-004/005/009/011: web search, SQL, PDF loader, web loader
11+
4178da5 0.189 23 23 40.0 keep P2-016/017: entrypoint + task registration
12+
8f3a9ce 0.183 24 24 41.2 keep P2-020/022: OTel + Prometheus metrics
13+
97a85ef 0.178 25 25 41.5 keep P2-023/024: cron scheduler + API
14+
ab18ad7 0.175 25 25 40.3 keep P3-001/002/003/004/005/010/011/012: providers + embeddings
15+
adfac03 0.160 25 25 39.2 keep P3-007/008/009: ChromaDB, PgVector, LanceDB
16+
8e2dc93 0.135 26 37.2 101/104 P3 bot interfaces, swarm/hierarchy teams, A2A, sandbox pool, migrations KEPT
17+
49ede88 0.132 26 36.1 103/104 P2-014 audio + P3-026 CLI monitor TUI, all roadmap items done KEPT
18+
6450977 0.125 30 39.1 103/104 Add 84 tests across 8 packages KEPT
19+
2e7ba97 0.114 37 43.1 103/104 Add tests for 7 more packages KEPT
20+
52d8bdb 0.098 48 47.8 103/104 Add tests for 11 storage adapters and skills KEPT
21+
bc4f4c4 0.092 48 54.5 103/104 Comprehensive tests for providers, graph, registry, scheduler, memory, teams KEPT
22+
2c9756e 0.076 48 70.5 103/104 Boost coverage to 70.5% with comprehensive tests KEPT
23+
3a69291 0.068 48 48 78.0 keep Add MCP callLocked/RegisterTools + sandbox edge case tests
24+
78a7fd2 0.067 48 48 79.3 keep Add websearch, discord, slack, team hierarchy/swarm tests
25+
05cb894 0.066 48 48 79.7 keep Add 49 tests across migrate, a2a, agent, server, tool, stream
26+
bbb276b 0.066 48/48 80.2 KEPT iter4: 63 tests across 22 files — guardrails hooks model stream knowledge memory protocol sandbox
27+
bd71ec7 0.065 48/48 80.7 KEPT iter5: 23 targeted tests + fix hanging MCP test — team/protocol/agent/mcp
28+
3effd28 0.065 48/48 81.5 KEPT iter6: 38 tests — sandbox container mocks, storage adapter errors, repl
29+
a8de364 0.064 48/48 81.9 KEPT iter7: 32 tests — agent branches/schema/config, MCP connect, graph subgraph, protocol bus, CLI cmd
30+
0489817 0.063 48/48 82.6 KEPT iter8: 48 tests — redis/postgres/mongo/sqlite/migrate adapters, swarm, server, repl, cli, graph
31+
1a073af 0.063 48/48 82.9 KEPT iter9: 39 tests — cli monitor, redisvector, model http, webhook, a2a, cache, server, builtins
32+
3183703 0.062 48/48 84.4 KEPT iter10: 60 tests — cli/cmd 76→92%, model, telegram, slack, swarm, migrate
33+
a6eef16 0.061 48/48 84.6 KEPT iter11: 55 tests — postgres/team/openai/mcp/agent/telegram/slack/migrate/sql/calc/stream
34+
dfb236d 0.061 48/48 84.7 KEPT iter12: 31 tests — ratelimit, evals, migrate, sqlite, loaders, protocol, team, memory (ceiling)
35+
92c74d1 0.048 61/61 84.7 KEPT iter13: +13 test packages (cli + 12 examples) — score drops 0.061→0.048

0 commit comments

Comments
 (0)