Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
7384b54
docs(supervisor): add Autopilot Supervisor v2 design (spec)
cursoragent Apr 30, 2026
69d2533
docs(supervisor): add Autopilot Supervisor v2 implementation plan (M0…
cursoragent Apr 30, 2026
e854828
supervisor(M0): add empty module skeleton
cursoragent Apr 30, 2026
cf81c8f
supervisor(M0): add SupervisorConfig with defaults
cursoragent Apr 30, 2026
8e33f15
supervisor(M0): add sup_* tables to memory migrations
cursoragent Apr 30, 2026
d5f4018
supervisor(M0): suppress dead_code on SupervisorConfig until M1
cursoragent Apr 30, 2026
9442695
supervisor(M1): Task, TaskType, RiskLevel, ExecutionMode, TaskStatus
cursoragent Apr 30, 2026
06ff730
supervisor(M1): Job, JobType, JobStatus, JobOutput contract
cursoragent Apr 30, 2026
e76c6ed
supervisor(M1): explicit state transition table
cursoragent Apr 30, 2026
fafba25
supervisor(M1): TaskStore CRUD + transition audit log
cursoragent Apr 30, 2026
e97c26b
supervisor(M1): IntakeRouter::normalize
cursoragent Apr 30, 2026
3151476
supervisor(M1): HeuristicClassifier (no LLM dependency)
cursoragent Apr 30, 2026
84ec1ae
supervisor(M1): LlmBackedClassifier scaffold (heuristic in M1, LLM pa…
cursoragent Apr 30, 2026
37e7558
supervisor(M1): PolicyEngine deterministic decision table
cursoragent Apr 30, 2026
62371a3
supervisor(M1): ArtifactManager (filesystem + sup_artifacts index)
cursoragent Apr 30, 2026
3234338
supervisor(M1): Supervisor::submit end-to-end (intake→classify→policy…
cursoragent Apr 30, 2026
78b16f4
supervisor(M1): replace unwrap with FromSqlConversionFailure for enum…
cursoragent Apr 30, 2026
b686b20
supervisor(M1): use PolicyEngine unit struct directly in tests (review)
cursoragent Apr 30, 2026
a8e7a24
chore: fix pre-existing clippy test warnings (useless_vec, unused imp…
cursoragent Apr 30, 2026
0d081d6
supervisor(M2): Backend trait + capability-based Registry
cursoragent Apr 30, 2026
1f4eb20
supervisor(M2): ReasoningBackend wrapping existing Agent
cursoragent Apr 30, 2026
8d9153b
supervisor(M2): ShellBackend with sandbox validation
cursoragent Apr 30, 2026
6f93a92
supervisor(M2): McpBackend delegating to McpManager
cursoragent Apr 30, 2026
e7f83ca
supervisor(M2): ClaudeCodeCliBackend, CodexCliBackend, ScriptBackend
cursoragent Apr 30, 2026
ce92fc6
supervisor(M2): cargo fmt
cursoragent Apr 30, 2026
8c23e0a
supervisor(M2): enforce job timeout in CLI backends with kill_on_drop…
cursoragent Apr 30, 2026
038a512
supervisor(M2): document ShellBackend sandbox-validation limitation (…
cursoragent Apr 30, 2026
780189b
supervisor(M3): WorkflowTemplate (Fast/Standard/Rigorous stages)
cursoragent Apr 30, 2026
77f4e32
supervisor(M3): Planner producing 1- and 3-job plans
cursoragent Apr 30, 2026
e40205a
supervisor(M3): TaskStore::create_job / jobs_for_task / update_job_st…
cursoragent Apr 30, 2026
3ee3e72
supervisor(M3): Orchestrator sequential single-backend execution
cursoragent Apr 30, 2026
b104e39
supervisor(M3): VerificationEngine evidence gate
cursoragent Apr 30, 2026
d865e02
supervisor(M3): Reporter human-readable summary
cursoragent Apr 30, 2026
9954228
supervisor(M3): Supervisor::execute_now fast-mode end-to-end
cursoragent Apr 30, 2026
f2363be
supervisor(M3): wire Supervisor into Telegram /supervise command (par…
cursoragent Apr 30, 2026
0890b8d
supervisor(M3): cargo fmt
cursoragent Apr 30, 2026
0c8f1e4
supervisor(M3): satisfy clippy if_same_then_else and unused_imports (…
cursoragent Apr 30, 2026
c7da175
supervisor(M4): WorkspaceManager (branch + optional worktree)
cursoragent Apr 30, 2026
7d82290
supervisor(M4): insert PREPARE_WORKSPACE stage for code tasks
cursoragent Apr 30, 2026
7d2ad98
supervisor(M5): skills can hint workflow + required capabilities
cursoragent Apr 30, 2026
ac5ca34
supervisor(M5): bundle five default workflow skill packs
cursoragent Apr 30, 2026
12471c7
supervisor(M5): SkillAwareClassifier consults skill hints
cursoragent Apr 30, 2026
ff5717f
supervisor(M6): parallel job groups in Plan + Orchestrator
cursoragent Apr 30, 2026
6a1a09c
supervisor(M6): fallback backends per capability
cursoragent Apr 30, 2026
c8518a3
supervisor(M6): subjob spawning via RunContext
cursoragent Apr 30, 2026
8d0cdee
supervisor(M7): risk-threshold-driven autonomy gate
cursoragent Apr 30, 2026
5a784c0
supervisor(M7): pause/resume + resumable task discovery on startup
cursoragent Apr 30, 2026
1b41ea8
supervisor(M7): /tasks /resume /cancel /approve /clarify Telegram com…
cursoragent Apr 30, 2026
487e09a
supervisor(M7): secret-redaction filter on artifacts and logs
cursoragent Apr 30, 2026
58cd2d9
supervisor(M7): wire RiskThresholdsConfig from config.toml into produ…
cursoragent Apr 30, 2026
a43705c
supervisor(M7): add end-to-end resume test (review)
cursoragent Apr 30, 2026
a887f5d
supervisor: DoD smoke test (intake→classify→policy→plan→result for ev…
cursoragent Apr 30, 2026
0f3c950
supervisor: document v2 supervisor architecture in CLAUDE.md
cursoragent Apr 30, 2026
e486150
supervisor: record Execute->Review->Verify for Rigorous mode (final r…
cursoragent Apr 30, 2026
1fd432d
supervisor: fix parallel group iteration to not skip non-grouped jobs…
cursoragent Apr 30, 2026
f58807d
supervisor: register ReasoningBackend + ShellBackend in production re…
cursoragent Apr 30, 2026
c0e1a30
fix(config): update comments for local ollama base URL in config.exam…
chinkan Apr 30, 2026
f81df5b
fix: address all C/S/D/A review items from PR feedback
Copilot May 3, 2026
83bfb42
Merge remote-tracking branch 'origin/main' into cursor/autopilot-supe…
chinkan May 3, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
154 changes: 154 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -189,3 +189,157 @@ All skills are represented in the system prompt by **metadata only** (name + des
- `config.toml` - Contains API keys and tokens
- `.env` - Environment variables
- `/target/` - Build artifacts

## Supervisor (Autopilot v2)

The supervisor is a generic autonomous task runner that lives alongside the
existing chat agent. It accepts a free-form request, classifies it, picks a
plan, dispatches work to one or more **backends** (reasoning, shell, MCP,
Claude Code CLI, Codex CLI, scripts), verifies the result, and persists
artifacts + audit transitions to SQLite.

### Module tree (`src/supervisor/`)

```
src/supervisor/
mod.rs — Supervisor facade: submit / execute_now / pause / resume / state / artifacts
task.rs — Task, TaskType, RiskLevel, ExecutionMode, TaskStatus enums
job.rs — Job, JobType, JobStatus, JobOutput, Evidence
state.rs — transition_allowed() — single source of truth for the state machine
store.rs — TaskStore: CRUD over sup_tasks / sup_jobs / sup_transitions
intake.rs — IntakeRouter::normalize() → Task from raw text
classifier.rs — Classifier trait + HeuristicClassifier / LlmBackedClassifier / SkillAwareClassifier
policy.rs — PolicyEngine: AutoExecute | Clarify | RequireApproval | UseFallbackBackend | StopAndReport
planner.rs — Planner: Task → Plan { jobs, parallel_groups }
workflow.rs — Fast / Standard / Rigorous workflow stage templates
orchestrator.rs — Orchestrator: executes Plan with fallback + parallel groups + subjob spawning
verification.rs — VerificationEngine: ≥1 evidence per job gate
artifact.rs — ArtifactManager: write_text() (redacts) + list()
workspace.rs — WorkspaceManager: per-task git branch / optional worktree
reporter.rs — Human-readable per-job summary
redact.rs — Secret scrubber for api_key / password / secret / token / bearer values
backend/
mod.rs — Backend trait + BackendCapabilities + Registry + RunContext
reasoning.rs — Wraps the chat Agent
shell.rs — Sandboxed shell commands
mcp.rs — Calls tools on a connected MCP server
claude_code.rs — Spawns the `claude` CLI as a backend
codex.rs — Spawns the `codex` CLI as a backend
script.rs — Runs a script file from the sandbox
```

### Lifecycle

```
INTAKE → CLASSIFY → ROUTE
(CLARIFY) | (PREPARE_WORKSPACE)? → PLAN → EXECUTE
↓ ↓
(Paused ⇄ Execute) REVIEW (rigorous mode)
VERIFY
REPORT → ARCHIVE → DONE
↘ Failed ↘ Cancelled
```

`state.rs::transition_allowed(from, to)` enumerates every legal edge. Add a
new arm there before introducing a new state — the rest of the supervisor
treats unknown transitions as bugs.

### Backend trait + adding a new backend

Every backend implements `Backend` from `src/supervisor/backend/mod.rs`. The
defaults from spec §10 (`prepare`, `collect_result`, `verify_result`,
`cancel`, `resume`) are already provided; most backends only override
`name`, `capabilities`, `can_handle`, and `run`. Register an `Arc<MyBackend>`
into the `Registry` at startup.

```rust
struct EchoBackend;
#[async_trait::async_trait]
impl rustfox::supervisor::backend::Backend for EchoBackend {
fn name(&self) -> &str { "echo" }
fn capabilities(&self) -> rustfox::supervisor::backend::BackendCapabilities {
rustfox::supervisor::backend::BackendCapabilities { reasoning: true, ..Default::default() }
}
fn can_handle(&self, _: &rustfox::supervisor::job::JobType) -> bool { true }
async fn run(&self, job: &mut rustfox::supervisor::job::Job, _: &rustfox::supervisor::backend::RunContext)
-> anyhow::Result<rustfox::supervisor::job::JobOutput> { /* ... */ todo!() }
}
let mut reg = rustfox::supervisor::backend::Registry::new();
reg.register(std::sync::Arc::new(EchoBackend));
```

### Adding a workflow skill pack

Drop a `skills/sup-<name>/SKILL.md` with frontmatter:

```yaml
---
name: sup-<name>
description: One-line summary
supervisor:
workflow: research # or: writing | refactor | research | ops | review
required_capabilities: [research, reasoning]
---
```

Skill packs are auto-loaded by the existing `SkillRegistry` at startup; the
`SkillAwareClassifier` consults them and overrides the default
`required_capabilities` when the request keyword matches the skill name
(prefix `sup-` is stripped before matching).

### TOML config keys

```toml
[supervisor]
default_autonomy_mode = "standard" # "fast" | "standard" | "rigorous"
artifacts_dir = "supervisor/artifacts"

[supervisor.risk]
require_approval_for_low = false
require_approval_for_medium = false
auto_execute_only_low = false # when true, Medium escalates to RequireApproval
```

Defaults preserve M1–M6 behavior (Medium-risk auto-executes). Flip individual
fields to tighten the gate.

### Bot commands

| Command | Behaviour |
|---------|-----------|
| `/supervise <text>` | Submit a new supervisor task |
| `/tasks` | List active / recent tasks |
| `/resume <id>` | Resume a paused task |
| `/cancel <id>` | Cancel a task |
| `/approve <id>` | Approve a task that hit `RequireApproval` |
| `/clarify <id> <text>` | Reply to a `Clarify` prompt |

The command **parser** is wired and emits a startup log line in `main.rs`;
routing user commands into supervisor handlers in the live Telegram dispatcher
is a minimum-viable integration (M3.8 / M7.3) and the full handler surface is
a follow-up task.

### Artifacts

Per-task artifacts are written to `<supervisor.artifacts_dir>/<task_id>/<filename>`
and indexed in `sup_artifacts` (`kind`, `path`, `sha256`, `bytes`). Every
artifact write goes through `redact::redact()`, which scrubs values that
follow `api_key`, `password`, `secret`, `token`, or `bearer` (case-insensitive)
and replaces them with `***` while preserving the key + separator so the
file stays human-readable. Standard kinds emitted by the pipeline: `intake`,
`classification`, `policy`, `plan`, `workspace` (when workspace prepared),
and `result` (Reporter Markdown summary).

### Database tables added

| Table | Purpose |
|-------|---------|
| `sup_tasks` | One row per submitted task — title, user_request, classification (`task_type` / `risk_level` / `execution_mode`), current `state`, platform / user / chat origin |
| `sup_jobs` | One row per job dispatched within a task — backend, goal, prompt, status, result_summary, error, optional `parent_job_id` for spawned subjobs |
| `sup_transitions` | Append-only audit log of every state change (`from_state`, `to_state`, `actor`, `reason`, `occurred_at`) |
| `sup_artifacts` | Index of files written under `artifacts_dir` (`task_id`, `job_id`, `kind`, `path`, `sha256`, `bytes`) |

All four tables are created idempotently in `MemoryStore` at startup.
13 changes: 13 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 3 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -63,5 +63,8 @@ rand = "0.8"
sha2 = "0.10"
base64 = "0.22"

# Secret-redaction filter (M7.4)
regex = "1"

[dev-dependencies]
tempfile = "3"
2 changes: 2 additions & 0 deletions config.example.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ api_key = "YOUR_OPENROUTER_API_KEY"
model = "moonshotai/kimi-k2.5"
# API base URL (usually no need to change)
base_url = "https://openrouter.ai/api/v1"
# Alternative using local ollama
# base_url = "http://localhost:11434/v1"
# Maximum tokens in response
max_tokens = 4096
# System prompt for the AI assistant
Expand Down
Loading
Loading