From a1e536854182730262e11a4cd607450a7cd0551f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E8=B6=85=E6=B8=A1=E6=B3=95=E5=B8=AB?=
 <chaodu-agent@openab.dev>
Date: Wed, 13 May 2026 22:02:59 +0000
Subject: [PATCH 1/6] docs: add goal-driven agent loop design spec

---
 docs/goal-driven-agent-loop.md | 258 +++++++++++++++++++++++++++++++++
 1 file changed, 258 insertions(+)
 create mode 100644 docs/goal-driven-agent-loop.md

diff --git a/docs/goal-driven-agent-loop.md b/docs/goal-driven-agent-loop.md
new file mode 100644
index 00000000..859b23cc
--- /dev/null
+++ b/docs/goal-driven-agent-loop.md
@@ -0,0 +1,258 @@
+# Goal-Driven Agent Loop
+
+Design spec for a goal-oriented execution mode where agents work autonomously until a defined objective is achieved.
+
+## Problem
+
+Today, agents respond to individual messages reactively. There is no mechanism to assign a persistent **goal** that agents must work toward across multiple rounds, self-organizing their approach without explicit step-by-step instructions.
+
+## Non-Goals (MVP)
+
+- Multi-agent goal contention / auto-claiming
+- Complex scoring or partial-credit evaluation
+- Long-term memory rewrite between rounds
+- LLM judge involvement on every round
+- UI/dashboard for goal management
+
+## Concept: "Escape Room" Mode
+
+The human sets a goal and a success condition. A CronJob periodically evaluates whether the goal is met. If not, it posts to the channel — agents must **self-organize** to figure out how to achieve it. They are not told what to do, only what the goal is and that it hasn't been met yet.
+
+```
+Human sets goal + eval command
+         │
+         ▼
+┌──► CronJob fires (on interval)
+│         │
+│         ▼
+│    Run done_check command
+│         │
+│    ┌────┴─────┐
+│    │  Pass?   │
+│    └────┬─────┘
+│     No  │  Yes
+│     │   │    │
+│     ▼   │    ▼
+│  Post to channel:    Goal achieved ✅
+│  "Goal not met,      Disable CronJob
+│   keep working"      Notify human
+│         │
+│         ▼
+│  Agents discuss & act
+│  (self-organized)
+│         │
+└─────────┘
+     Next interval
+```
+
+## Goal Schema
+
+```toml
+[[goals]]
+id = "goal-001"
+description = "All unit tests pass on main branch"
+done_check = "cd /repo && npm test"
+progress_check = "cd /repo && git log --oneline -5"
+interval = "10m"
+max_rounds = 10
+stuck_threshold = 3          # rounds without state delta → escalate
+channel = "123456789012345678"
+thread_id = ""               # optional: confine to existing thread
+owner = ""                   # optional: assigned agent UID
+enabled = true
+```
+
+| Field | Required | Default | Description |
+|-------|----------|---------|-------------|
+| `id` | ✅ | — | Unique goal identifier |
+| `description` | ✅ | — | Human-readable goal statement |
+| `done_check` | ✅ | — | Shell command; exit 0 = goal achieved |
+| `progress_check` | | — | Command to capture state snapshot for delta detection |
+| `interval` | | `"10m"` | Evaluation interval (e.g. `5m`, `1h`) |
+| `max_rounds` | | `10` | Hard cap on evaluation rounds |
+| `stuck_threshold` | | `3` | Consecutive rounds without state delta before escalation |
+| `channel` | ✅ | — | Target channel for agent communication |
+| `thread_id` | | — | Confine discussion to a specific thread |
+| `owner` | | — | Agent UID responsible for execution |
+| `enabled` | | `true` | Toggle without removing config |
+
+## Runner Loop State Machine
+
+```
+         ┌─────────┐
+         │  IDLE   │ ◄── goal created, waiting for first interval
+         └────┬────┘
+              │ interval fires
+              ▼
+         ┌─────────┐
+         │  EVAL   │ ◄── run done_check
+         └────┬────┘
+              │
+       ┌──────┴──────┐
+       │             │
+   exit 0        exit != 0
+       │             │
+       ▼             ▼
+  ┌────────┐   ┌──────────┐
+  │  DONE  │   │ COMPARE  │ ◄── compute state delta
+  └────────┘   └────┬─────┘
+                    │
+             ┌──────┴──────┐
+             │             │
+        has delta      no delta
+             │             │
+             ▼             ▼
+       ┌──────────┐  ┌──────────┐
+       │ CONTINUE │  │  STUCK   │ ◄── increment stuck_counter
+       └──────────┘  └────┬─────┘
+                          │
+                   stuck_counter >= threshold?
+                     │            │
+                    Yes           No
+                     │            │
+                     ▼            ▼
+               ┌───────────┐  ┌──────────┐
+               │ ESCALATE  │  │ CONTINUE │
+               └───────────┘  └──────────┘
+```
+
+### State Transitions
+
+| From | Event | To | Action |
+|------|-------|----|--------|
+| IDLE | interval fires | EVAL | Run `done_check` |
+| EVAL | exit 0 | DONE | Notify channel ✅, disable goal |
+| EVAL | exit != 0 | COMPARE | Run `progress_check`, compute delta |
+| COMPARE | has delta | CONTINUE | Reset stuck_counter, post round message |
+| COMPARE | no delta | STUCK | Increment stuck_counter |
+| STUCK | counter < threshold | CONTINUE | Post round message with warning |
+| STUCK | counter >= threshold | ESCALATE | Notify human, pause goal |
+| Any | round > max_rounds | ESCALATE | Hard stop, notify human |
+
+## State Snapshot & Delta Detection
+
+Each round captures a **state snapshot** via `progress_check`. Delta is computed by comparing current snapshot to previous round's snapshot.
+
+### Supported Delta Signals (MVP)
+
+| Signal | How to detect |
+|--------|---------------|
+| New commits | `git log --oneline` diff |
+| File changes | `git diff --stat` |
+| Test result change | Test output diff (pass/fail count) |
+| PR/Issue status | `gh pr view` / `gh issue view` |
+| Artifact existence | `ls` / `stat` on expected path |
+
+If no `progress_check` is defined, delta detection falls back to comparing `done_check` stdout/stderr between rounds.
+
+## Round Message Format
+
+Posted to channel/thread each round when goal is not yet achieved:
+
+```
+🔐 Goal: All unit tests pass on main branch
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+Round: 4 / 10
+Status: ❌ Not achieved
+Eval output:
+  FAIL src/auth.test.ts — TypeError: undefined is not a function
+  Tests: 12 passed, 1 failed
+Progress: ✅ Delta detected (new commit abc1234)
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+法師們，繼續想辦法。
+```
+
+When stuck (no delta):
+
+```
+🔐 Goal: All unit tests pass on main branch
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+Round: 7 / 10
+Status: ❌ Not achieved
+Eval output:
+  FAIL src/auth.test.ts — TypeError: undefined is not a function
+Progress: ⚠️ No state delta (2 / 3 rounds until escalation)
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+法師們，繼續想辦法。
+```
+
+## Escalation Payload
+
+When stuck_threshold is reached or max_rounds exceeded:
+
+```
+⚠️ Goal Stuck — Escalating
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+Goal: All unit tests pass on main branch
+Last successful delta: Round 5 — fixed auth.test.ts (commit abc1234)
+Blocked reason: No state change for 3 consecutive rounds
+Current eval output:
+  FAIL src/auth.test.ts — TypeError: undefined is not a function
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+需要主人決策：
+1️⃣ 給提示讓法師繼續
+2️⃣ 主人自己修，修完再讓法師 verify
+3️⃣ 調整 goal 或 eval command
+4️⃣ 放棄此 goal
+```
+
+## Done Confirmation (Optional LLM Judge)
+
+When `done_check` passes (exit 0), an optional LLM judge can confirm intent alignment:
+
+```
+done_check passes
+       │
+       ▼
+  LLM Judge: "Does the current state satisfy the goal description?"
+       │
+  ┌────┴────┐
+  │         │
+confirm   reject + reason
+  │         │
+  ▼         ▼
+DONE     CONTINUE (post rejection reason to channel)
+```
+
+This is a **tie-breaker only** — not involved in every round. Only fires after Layer 1 (deterministic check) passes.
+
+## Integration with Existing CronJob
+
+This feature extends the existing `[[cron.jobs]]` system. Implementation options:
+
+1. **New config section** `[[goals]]` — separate from `[[cron.jobs]]`, dedicated runner logic
+2. **Extension of cron** — add `goal_mode = true` fields to existing cron entries
+
+Recommended: **Option 1** — separate section. Goal semantics (state tracking, delta detection, escalation) are fundamentally different from simple scheduled messages.
+
+## MVP Test Scenario
+
+**Setup:**
+1. A repo with one failing test
+2. Goal: `done_check = "npm test"` with exit 0 = success
+3. Agent has write access to the repo
+
+**Expected behavior:**
+1. CronJob fires → runs `npm test` → fails → posts round message
+2. Agents discuss in thread, identify the bug, push a fix
+3. Next CronJob fires → runs `npm test` → passes → posts ✅ Done
+4. Goal disabled
+
+**Stuck scenario:**
+1. Agents cannot figure out the fix
+2. 3 consecutive rounds with no new commits
+3. Escalation message posted, goal paused
+
+## Open Questions
+
+1. **Persistence** — Where is goal state stored between rounds? In-memory (lost on restart) or persisted (DB/file)?
+2. **Thread vs channel** — Should each goal auto-create a dedicated thread, or reuse an existing one?
+3. **Multi-agent coordination** — In escape room mode, how do agents avoid conflicting actions? First-come-first-serve? Or coordinator (超渡) assigns sub-tasks?
+4. **Goal lifecycle commands** — How does the human create/pause/cancel goals? Slash commands? Config file reload?
+5. **Observability** — How to surface goal progress history (rounds, deltas, escalations)?
+6. **Security** — `done_check` runs arbitrary shell commands. Sandboxing? Allowed command whitelist?
+
+## References
+
+- [Existing CronJob docs](./cronjob.md)
+- [Discord thread for this design discussion](https://discord.com/channels/1491295927620169908/1504239931940409587)

From ef5e803be593db77f5d8adc40b60f0b8e97d7bc9 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E8=B6=85=E6=B8=A1=E6=B3=95=E5=B8=AB?=
 <chaodu-agent@openab.dev>
Date: Wed, 13 May 2026 22:04:23 +0000
Subject: [PATCH 2/6] =?UTF-8?q?docs:=20address=20review=20findings=20?=
 =?UTF-8?q?=E2=80=94=20security,=20persistence,=20escalation=20recovery,?=
 =?UTF-8?q?=20context=20overflow?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 docs/goal-driven-agent-loop.md | 60 ++++++++++++++++++++++++++++++----
 1 file changed, 54 insertions(+), 6 deletions(-)

diff --git a/docs/goal-driven-agent-loop.md b/docs/goal-driven-agent-loop.md
index 859b23cc..8c0daa65 100644
--- a/docs/goal-driven-agent-loop.md
+++ b/docs/goal-driven-agent-loop.md
@@ -243,14 +243,62 @@ Recommended: **Option 1** — separate section. Goal semantics (state tracking,
 2. 3 consecutive rounds with no new commits
 3. Escalation message posted, goal paused
 
+## Security: Shell Execution
+
+`done_check` and `progress_check` execute arbitrary shell commands. Mitigation strategy:
+
+| Phase | Mitigation |
+|-------|-----------|
+| MVP | Trust config source — only repo maintainers can define goals. Document that commands run with agent's permissions. |
+| v2 | Allowed command whitelist + read-only mode for `progress_check` |
+| v3 | Container isolation — run eval commands in ephemeral sandbox with no network/write access to host |
+
+MVP explicitly does NOT sandbox. This is acceptable because config is maintainer-controlled (same trust model as existing `[[cron.jobs]]`).
+
+## Persistence
+
+Goal state **must be persisted** to survive process restarts. Without persistence, `max_rounds` and `stuck_threshold` safety valves can be bypassed by restarts.
+
+Persisted state per goal:
+
+```json
+{
+  "goal_id": "goal-001",
+  "round": 4,
+  "stuck_counter": 1,
+  "last_snapshot": "abc1234...",
+  "last_eval_output": "FAIL src/auth.test.ts...",
+  "status": "active",
+  "history": [
+    { "round": 1, "delta": true, "timestamp": "..." },
+    { "round": 2, "delta": true, "timestamp": "..." },
+    { "round": 3, "delta": false, "timestamp": "..." }
+  ]
+}
+```
+
+MVP storage: local JSON file (`goals-state.json`). Future: DB or object store.
+
+## Escalation Recovery Rules
+
+When the human responds to an escalation:
+
+| Human action | Effect on counters |
+|---|---|
+| 1️⃣ Give hint, continue | `stuck_counter` resets to 0; `round` continues (does NOT reset) |
+| 2️⃣ Human fixes, agents verify | `stuck_counter` resets to 0; `round` continues |
+| 3️⃣ Adjust goal/eval | `stuck_counter` resets to 0; `round` resets to 0 (new goal effectively) |
+| 4️⃣ Abandon goal | `status` = `abandoned`, goal disabled |
+
+Key principle: **`max_rounds` never resets** unless the goal itself is redefined (option 3). This prevents infinite loops even with repeated escalations.
+
 ## Open Questions
 
-1. **Persistence** — Where is goal state stored between rounds? In-memory (lost on restart) or persisted (DB/file)?
-2. **Thread vs channel** — Should each goal auto-create a dedicated thread, or reuse an existing one?
-3. **Multi-agent coordination** — In escape room mode, how do agents avoid conflicting actions? First-come-first-serve? Or coordinator (超渡) assigns sub-tasks?
-4. **Goal lifecycle commands** — How does the human create/pause/cancel goals? Slash commands? Config file reload?
-5. **Observability** — How to surface goal progress history (rounds, deltas, escalations)?
-6. **Security** — `done_check` runs arbitrary shell commands. Sandboxing? Allowed command whitelist?
+1. **Thread vs channel** — Should each goal auto-create a dedicated thread, or reuse an existing one?
+2. **Multi-agent coordination** — In escape room mode, how do agents avoid conflicting actions? First-come-first-serve? Or coordinator (超渡) assigns sub-tasks?
+3. **Goal lifecycle commands** — How does the human create/pause/cancel goals? Slash commands? Config file reload?
+4. **Observability** — How to surface goal progress history (rounds, deltas, escalations)?
+5. **Context window overflow** — Long-running goals accumulate thread history. Should each round message include a condensed summary of prior rounds to prevent context overflow? Or implement a sliding window / summarization step?
 
 ## References
 

From e7270a88b8edac8153132e02a37f7b85cacc47e0 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E8=B6=85=E6=B8=A1=E6=B3=95=E5=B8=AB?=
 <chaodu-agent@openab.dev>
Date: Wed, 13 May 2026 22:05:10 +0000
Subject: [PATCH 3/6] docs: fix Discord URL, add thread lifecycle, make
 persistence definitive

---
 docs/goal-driven-agent-loop.md | 29 ++++++++++++++++++++++-------
 1 file changed, 22 insertions(+), 7 deletions(-)

diff --git a/docs/goal-driven-agent-loop.md b/docs/goal-driven-agent-loop.md
index 8c0daa65..5397c913 100644
--- a/docs/goal-driven-agent-loop.md
+++ b/docs/goal-driven-agent-loop.md
@@ -277,7 +277,7 @@ Persisted state per goal:
 }
 ```
 
-MVP storage: local JSON file (`goals-state.json`). Future: DB or object store.
+MVP storage: **local JSON state file** (`goals-state.json`) — loaded on startup, written after each round. This is a hard requirement, not optional. Future: DB or object store.
 
 ## Escalation Recovery Rules
 
@@ -292,15 +292,30 @@ When the human responds to an escalation:
 
 Key principle: **`max_rounds` never resets** unless the goal itself is redefined (option 3). This prevents infinite loops even with repeated escalations.
 
+## Thread Lifecycle (MVP)
+
+Each goal **must** run in a single, persistent thread to preserve agent context across rounds.
+
+| Scenario | Behavior |
+|----------|----------|
+| `thread_id` provided | Use that thread for all rounds |
+| `thread_id` empty | Auto-create a dedicated thread on first round; persist `thread_id` in goal state |
+
+Rules:
+- All round messages, agent discussions, and escalations happen in the **same thread**
+- Thread is never re-created between rounds
+- Thread title updated with status: `🔐 Goal: <description> [Round N/max]`
+
+This ensures agents always have full conversation history as context.
+
 ## Open Questions
 
-1. **Thread vs channel** — Should each goal auto-create a dedicated thread, or reuse an existing one?
-2. **Multi-agent coordination** — In escape room mode, how do agents avoid conflicting actions? First-come-first-serve? Or coordinator (超渡) assigns sub-tasks?
-3. **Goal lifecycle commands** — How does the human create/pause/cancel goals? Slash commands? Config file reload?
-4. **Observability** — How to surface goal progress history (rounds, deltas, escalations)?
-5. **Context window overflow** — Long-running goals accumulate thread history. Should each round message include a condensed summary of prior rounds to prevent context overflow? Or implement a sliding window / summarization step?
+1. **Multi-agent coordination** — In escape room mode, how do agents avoid conflicting actions? First-come-first-serve? Or coordinator (超渡) assigns sub-tasks?
+2. **Goal lifecycle commands** — How does the human create/pause/cancel goals? Slash commands? Config file reload?
+3. **Observability** — How to surface goal progress history (rounds, deltas, escalations)?
+4. **Context window overflow** — Long-running goals accumulate thread history. Should each round message include a condensed summary of prior rounds to prevent context overflow? Or implement a sliding window / summarization step?
 
 ## References
 
 - [Existing CronJob docs](./cronjob.md)
-- [Discord thread for this design discussion](https://discord.com/channels/1491295927620169908/1504239931940409587)
+- [Discord thread for this design discussion](https://discord.com/channels/1491295327620169908/1504239931940409587)

From de17b0947483a73bfabb444d156d11397b6400f0 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E8=B6=85=E6=B8=A1=E6=B3=95=E5=B8=AB?=
 <chaodu-agent@openab.dev>
Date: Wed, 13 May 2026 22:06:27 +0000
Subject: [PATCH 4/6] =?UTF-8?q?docs:=20restructure=20spec=20=E2=80=94=20Ph?=
 =?UTF-8?q?ase=201=20cron=20extension,=20Phase=202=20full=20goal=20runner?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 docs/goal-driven-agent-loop.md | 367 ++++++++++++---------------------
 1 file changed, 137 insertions(+), 230 deletions(-)

diff --git a/docs/goal-driven-agent-loop.md b/docs/goal-driven-agent-loop.md
index 5397c913..6e1a59b4 100644
--- a/docs/goal-driven-agent-loop.md
+++ b/docs/goal-driven-agent-loop.md
@@ -6,314 +6,221 @@ Design spec for a goal-oriented execution mode where agents work autonomously un
 
 Today, agents respond to individual messages reactively. There is no mechanism to assign a persistent **goal** that agents must work toward across multiple rounds, self-organizing their approach without explicit step-by-step instructions.
 
-## Non-Goals (MVP)
-
-- Multi-agent goal contention / auto-claiming
-- Complex scoring or partial-credit evaluation
-- Long-term memory rewrite between rounds
-- LLM judge involvement on every round
-- UI/dashboard for goal management
-
 ## Concept: "Escape Room" Mode
 
 The human sets a goal and a success condition. A CronJob periodically evaluates whether the goal is met. If not, it posts to the channel — agents must **self-organize** to figure out how to achieve it. They are not told what to do, only what the goal is and that it hasn't been met yet.
 
 ```
-Human sets goal + eval command
+Human sets goal via cron config
          │
          ▼
-┌──► CronJob fires (on interval)
+┌──► CronJob fires (on schedule)
 │         │
 │         ▼
-│    Run done_check command
+│    Run disable_on_success command
 │         │
 │    ┌────┴─────┐
-│    │  Pass?   │
+│    │ exit 0?  │
 │    └────┬─────┘
 │     No  │  Yes
 │     │   │    │
 │     ▼   │    ▼
-│  Post to channel:    Goal achieved ✅
-│  "Goal not met,      Disable CronJob
-│   keep working"      Notify human
+│  Send message:     Goal achieved ✅
+│  agents continue   Auto-disable job
+│  working           (no message sent)
 │         │
 │         ▼
 │  Agents discuss & act
 │  (self-organized)
 │         │
 └─────────┘
-     Next interval
+     Next schedule
 ```
 
-## Goal Schema
+---
+
+# Phase 1: Cron Auto-Disable on Success (MVP)
+
+Minimal extension to existing `[[cron.jobs]]` — add a single field `disable_on_success`.
+
+## Configuration
 
 ```toml
-[[goals]]
-id = "goal-001"
-description = "All unit tests pass on main branch"
-done_check = "cd /repo && npm test"
-progress_check = "cd /repo && git log --oneline -5"
-interval = "10m"
-max_rounds = 10
-stuck_threshold = 3          # rounds without state delta → escalate
+[[cron.jobs]]
+schedule = "*/10 * * * *"
 channel = "123456789012345678"
-thread_id = ""               # optional: confine to existing thread
-owner = ""                   # optional: assigned agent UID
-enabled = true
+thread_id = ""                                    # optional: auto-created on first fire if empty
+message = "Goal not met: all unit tests must pass. <@&1496247626675257384> please continue."
+disable_on_success = "cd /repo && npm test"       # NEW: command to evaluate goal
+timeout = 60                                      # NEW: command timeout in seconds
+working_dir = "/repo"                             # NEW: optional working directory
 ```
 
+### New Fields
+
 | Field | Required | Default | Description |
 |-------|----------|---------|-------------|
-| `id` | ✅ | — | Unique goal identifier |
-| `description` | ✅ | — | Human-readable goal statement |
-| `done_check` | ✅ | — | Shell command; exit 0 = goal achieved |
-| `progress_check` | | — | Command to capture state snapshot for delta detection |
-| `interval` | | `"10m"` | Evaluation interval (e.g. `5m`, `1h`) |
-| `max_rounds` | | `10` | Hard cap on evaluation rounds |
-| `stuck_threshold` | | `3` | Consecutive rounds without state delta before escalation |
-| `channel` | ✅ | — | Target channel for agent communication |
-| `thread_id` | | — | Confine discussion to a specific thread |
-| `owner` | | — | Agent UID responsible for execution |
-| `enabled` | | `true` | Toggle without removing config |
-
-## Runner Loop State Machine
-
-```
-         ┌─────────┐
-         │  IDLE   │ ◄── goal created, waiting for first interval
-         └────┬────┘
-              │ interval fires
-              ▼
-         ┌─────────┐
-         │  EVAL   │ ◄── run done_check
-         └────┬────┘
-              │
-       ┌──────┴──────┐
-       │             │
-   exit 0        exit != 0
-       │             │
-       ▼             ▼
-  ┌────────┐   ┌──────────┐
-  │  DONE  │   │ COMPARE  │ ◄── compute state delta
-  └────────┘   └────┬─────┘
-                    │
-             ┌──────┴──────┐
-             │             │
-        has delta      no delta
-             │             │
-             ▼             ▼
-       ┌──────────┐  ┌──────────┐
-       │ CONTINUE │  │  STUCK   │ ◄── increment stuck_counter
-       └──────────┘  └────┬─────┘
-                          │
-                   stuck_counter >= threshold?
-                     │            │
-                    Yes           No
-                     │            │
-                     ▼            ▼
-               ┌───────────┐  ┌──────────┐
-               │ ESCALATE  │  │ CONTINUE │
-               └───────────┘  └──────────┘
-```
-
-### State Transitions
-
-| From | Event | To | Action |
-|------|-------|----|--------|
-| IDLE | interval fires | EVAL | Run `done_check` |
-| EVAL | exit 0 | DONE | Notify channel ✅, disable goal |
-| EVAL | exit != 0 | COMPARE | Run `progress_check`, compute delta |
-| COMPARE | has delta | CONTINUE | Reset stuck_counter, post round message |
-| COMPARE | no delta | STUCK | Increment stuck_counter |
-| STUCK | counter < threshold | CONTINUE | Post round message with warning |
-| STUCK | counter >= threshold | ESCALATE | Notify human, pause goal |
-| Any | round > max_rounds | ESCALATE | Hard stop, notify human |
-
-## State Snapshot & Delta Detection
-
-Each round captures a **state snapshot** via `progress_check`. Delta is computed by comparing current snapshot to previous round's snapshot.
+| `disable_on_success` | | — | Shell command; if exit 0, job auto-disables and message is NOT sent |
+| `timeout` | | `60` | Max seconds for `disable_on_success` to run before being killed |
+| `working_dir` | | — | Working directory for command execution |
 
-### Supported Delta Signals (MVP)
+### Behavior
 
-| Signal | How to detect |
-|--------|---------------|
-| New commits | `git log --oneline` diff |
-| File changes | `git diff --stat` |
-| Test result change | Test output diff (pass/fail count) |
-| PR/Issue status | `gh pr view` / `gh issue view` |
-| Artifact existence | `ls` / `stat` on expected path |
+When a cron job has `disable_on_success` set:
 
-If no `progress_check` is defined, delta detection falls back to comparing `done_check` stdout/stderr between rounds.
+1. Schedule fires
+2. Run `disable_on_success` command (with `timeout` and `working_dir`)
+3. **exit 0** → Goal achieved. Set `enabled = false` in persisted state. Do NOT send message.
+4. **exit != 0** → Goal not met. Send `message` to channel/thread as normal.
+5. **timeout exceeded** → Treat as exit != 0 (goal not met). Send message.
 
-## Round Message Format
+### Thread Lifecycle
 
-Posted to channel/thread each round when goal is not yet achieved:
-
-```
-🔐 Goal: All unit tests pass on main branch
-━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
-Round: 4 / 10
-Status: ❌ Not achieved
-Eval output:
-  FAIL src/auth.test.ts — TypeError: undefined is not a function
-  Tests: 12 passed, 1 failed
-Progress: ✅ Delta detected (new commit abc1234)
-━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
-法師們，繼續想辦法。
-```
-
-When stuck (no delta):
+| Scenario | Behavior |
+|----------|----------|
+| `thread_id` provided | Use that thread for all fires |
+| `thread_id` empty | Auto-create a thread on first fire; persist `thread_id` in state |
 
-```
-🔐 Goal: All unit tests pass on main branch
-━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
-Round: 7 / 10
-Status: ❌ Not achieved
-Eval output:
-  FAIL src/auth.test.ts — TypeError: undefined is not a function
-Progress: ⚠️ No state delta (2 / 3 rounds until escalation)
-━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
-法師們，繼續想辦法。
-```
+All messages go to the **same thread** — agents need conversation history as context.
 
-## Escalation Payload
+### Persistence
 
-When stuck_threshold is reached or max_rounds exceeded:
+Auto-disable state must survive restarts. Persisted per job:
 
-```
-⚠️ Goal Stuck — Escalating
-━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
-Goal: All unit tests pass on main branch
-Last successful delta: Round 5 — fixed auth.test.ts (commit abc1234)
-Blocked reason: No state change for 3 consecutive rounds
-Current eval output:
-  FAIL src/auth.test.ts — TypeError: undefined is not a function
-━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
-需要主人決策：
-1️⃣ 給提示讓法師繼續
-2️⃣ 主人自己修，修完再讓法師 verify
-3️⃣ 調整 goal 或 eval command
-4️⃣ 放棄此 goal
+```json
+{
+  "job_key": "cron-<schedule_hash>-<channel>",
+  "enabled": true,
+  "thread_id": "1504239931940409587",
+  "auto_disabled_at": null
+}
 ```
 
-## Done Confirmation (Optional LLM Judge)
+Storage: **local JSON state file** (`cron-state.json`) — loaded on startup, written on state change.
 
-When `done_check` passes (exit 0), an optional LLM judge can confirm intent alignment:
+Key rule: **config reload does NOT re-enable an auto-disabled job.** Only explicit `enabled = true` in config (human intent) can re-enable it.
 
-```
-done_check passes
-       │
-       ▼
-  LLM Judge: "Does the current state satisfy the goal description?"
-       │
-  ┌────┴────┐
-  │         │
-confirm   reject + reason
-  │         │
-  ▼         ▼
-DONE     CONTINUE (post rejection reason to channel)
-```
-
-This is a **tie-breaker only** — not involved in every round. Only fires after Layer 1 (deterministic check) passes.
+### Security
 
-## Integration with Existing CronJob
+`disable_on_success` executes arbitrary shell commands. MVP mitigation:
 
-This feature extends the existing `[[cron.jobs]]` system. Implementation options:
+- Trust config source (same model as existing `[[cron.jobs]]` message execution)
+- Only repo maintainers can define cron jobs
+- Commands run with agent's permissions
+- `timeout` prevents runaway processes
 
-1. **New config section** `[[goals]]` — separate from `[[cron.jobs]]`, dedicated runner logic
-2. **Extension of cron** — add `goal_mode = true` fields to existing cron entries
+## Phase 1 Non-Goals
 
-Recommended: **Option 1** — separate section. Goal semantics (state tracking, delta detection, escalation) are fundamentally different from simple scheduled messages.
+- State delta / progress detection
+- Stuck detection / escalation
+- LLM judge
+- Max rounds
+- Multi-agent coordination logic
+- Goal lifecycle slash commands
 
 ## MVP Test Scenario
 
 **Setup:**
 1. A repo with one failing test
-2. Goal: `done_check = "npm test"` with exit 0 = success
-3. Agent has write access to the repo
+2. Cron job: `disable_on_success = "cd /repo && npm test"`, schedule every 10 min
+3. Agents have write access to the repo
 
 **Expected behavior:**
-1. CronJob fires → runs `npm test` → fails → posts round message
+1. Cron fires → `npm test` fails (exit 1) → message sent to thread
 2. Agents discuss in thread, identify the bug, push a fix
-3. Next CronJob fires → runs `npm test` → passes → posts ✅ Done
-4. Goal disabled
+3. Next cron fires → `npm test` passes (exit 0) → job auto-disables, no message sent
+4. Done. Job stays disabled until human re-enables.
 
-**Stuck scenario:**
-1. Agents cannot figure out the fix
-2. 3 consecutive rounds with no new commits
-3. Escalation message posted, goal paused
+**Edge cases:**
+- Process restarts between fires → state file preserves `thread_id` and `enabled` status
+- Command hangs → killed after `timeout` seconds, treated as failure, message sent
+- Human sets `enabled = true` in config → job re-activates (intentional reset)
 
-## Security: Shell Execution
+---
 
-`done_check` and `progress_check` execute arbitrary shell commands. Mitigation strategy:
+# Phase 2: Full Goal Runner (Future Design)
 
-| Phase | Mitigation |
-|-------|-----------|
-| MVP | Trust config source — only repo maintainers can define goals. Document that commands run with agent's permissions. |
-| v2 | Allowed command whitelist + read-only mode for `progress_check` |
-| v3 | Container isolation — run eval commands in ephemeral sandbox with no network/write access to host |
+When Phase 1 is proven, extend with richer goal semantics.
 
-MVP explicitly does NOT sandbox. This is acceptable because config is maintainer-controlled (same trust model as existing `[[cron.jobs]]`).
+## Additional Capabilities
 
-## Persistence
+### State Delta Detection
 
-Goal state **must be persisted** to survive process restarts. Without persistence, `max_rounds` and `stuck_threshold` safety valves can be bypassed by restarts.
+Track progress between rounds using a `progress_check` command:
 
-Persisted state per goal:
-
-```json
-{
-  "goal_id": "goal-001",
-  "round": 4,
-  "stuck_counter": 1,
-  "last_snapshot": "abc1234...",
-  "last_eval_output": "FAIL src/auth.test.ts...",
-  "status": "active",
-  "history": [
-    { "round": 1, "delta": true, "timestamp": "..." },
-    { "round": 2, "delta": true, "timestamp": "..." },
-    { "round": 3, "delta": false, "timestamp": "..." }
-  ]
-}
+```toml
+[[goals]]
+id = "goal-001"
+description = "All unit tests pass"
+done_check = "cd /repo && npm test"
+progress_check = "cd /repo && git log --oneline -5"
+interval = "10m"
+max_rounds = 10
+stuck_threshold = 3
+channel = "123456789012345678"
 ```
 
-MVP storage: **local JSON state file** (`goals-state.json`) — loaded on startup, written after each round. This is a hard requirement, not optional. Future: DB or object store.
+Delta signals: git commits, file changes, test result transitions, PR status, artifact existence.
+
+### Stuck Detection & Escalation
+
+| Signal | Judgment |
+|--------|----------|
+| Has state delta + eval fail | Progressing, continue |
+| No state delta + eval fail | Stuck, increment counter |
+| Counter >= stuck_threshold | Escalate to human |
+| Round > max_rounds | Hard stop, escalate |
 
-## Escalation Recovery Rules
+### Escalation Message
+
+```
+⚠️ Goal Stuck — Escalating
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+Goal: All unit tests pass
+Last successful delta: Round 5 — commit abc1234
+Blocked reason: No state change for 3 consecutive rounds
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+1️⃣ Give hint, continue
+2️⃣ Human fixes, agents verify
+3️⃣ Adjust goal/eval command
+4️⃣ Abandon goal
+```
 
-When the human responds to an escalation:
+### Escalation Recovery Rules
 
-| Human action | Effect on counters |
+| Human action | Effect |
 |---|---|
-| 1️⃣ Give hint, continue | `stuck_counter` resets to 0; `round` continues (does NOT reset) |
-| 2️⃣ Human fixes, agents verify | `stuck_counter` resets to 0; `round` continues |
-| 3️⃣ Adjust goal/eval | `stuck_counter` resets to 0; `round` resets to 0 (new goal effectively) |
-| 4️⃣ Abandon goal | `status` = `abandoned`, goal disabled |
+| 1️⃣ Give hint | `stuck_counter` resets; `round` continues |
+| 2️⃣ Human fixes | `stuck_counter` resets; `round` continues |
+| 3️⃣ Adjust goal | Full reset (new goal) |
+| 4️⃣ Abandon | Goal disabled |
 
-Key principle: **`max_rounds` never resets** unless the goal itself is redefined (option 3). This prevents infinite loops even with repeated escalations.
+Key: **`max_rounds` never resets** unless goal is redefined.
 
-## Thread Lifecycle (MVP)
+### LLM Judge (Tie-Breaker Only)
 
-Each goal **must** run in a single, persistent thread to preserve agent context across rounds.
+After `done_check` passes, optionally confirm intent alignment via LLM. Not involved every round.
 
-| Scenario | Behavior |
-|----------|----------|
-| `thread_id` provided | Use that thread for all rounds |
-| `thread_id` empty | Auto-create a dedicated thread on first round; persist `thread_id` in goal state |
+### Round Message Format
 
-Rules:
-- All round messages, agent discussions, and escalations happen in the **same thread**
-- Thread is never re-created between rounds
-- Thread title updated with status: `🔐 Goal: <description> [Round N/max]`
+```
+🔐 Goal: All unit tests pass
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+Round: 4 / 10
+Status: ❌ Not achieved
+Eval output: FAIL src/auth.test.ts — TypeError
+Progress: ✅ Delta detected (commit abc1234)
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+```
 
-This ensures agents always have full conversation history as context.
+---
 
 ## Open Questions
 
-1. **Multi-agent coordination** — In escape room mode, how do agents avoid conflicting actions? First-come-first-serve? Or coordinator (超渡) assigns sub-tasks?
-2. **Goal lifecycle commands** — How does the human create/pause/cancel goals? Slash commands? Config file reload?
-3. **Observability** — How to surface goal progress history (rounds, deltas, escalations)?
-4. **Context window overflow** — Long-running goals accumulate thread history. Should each round message include a condensed summary of prior rounds to prevent context overflow? Or implement a sliding window / summarization step?
+1. **Multi-agent coordination** — How do agents avoid conflicting actions in escape room mode?
+2. **Goal lifecycle commands** — Slash commands? Config reload?
+3. **Observability** — How to surface goal progress history?
+4. **Context window overflow** — Summarization strategy for long-running goals?
 
 ## References
 

From 947d7dde697e78da32791acfb4ca5065b2354b7f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E8=B6=85=E6=B8=A1=E6=B3=95=E5=B8=AB?=
 <chaodu-agent@openab.dev>
Date: Wed, 13 May 2026 22:07:27 +0000
Subject: [PATCH 5/6] =?UTF-8?q?docs:=20clarify=20state=20vs=20config=20pre?=
 =?UTF-8?q?cedence,=20add=20Phase=201=E2=86=922=20migration=20note?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 docs/goal-driven-agent-loop.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/goal-driven-agent-loop.md b/docs/goal-driven-agent-loop.md
index 6e1a59b4..5cc6b4f6 100644
--- a/docs/goal-driven-agent-loop.md
+++ b/docs/goal-driven-agent-loop.md
@@ -98,7 +98,7 @@ Auto-disable state must survive restarts. Persisted per job:
 
 Storage: **local JSON state file** (`cron-state.json`) — loaded on startup, written on state change.
 
-Key rule: **config reload does NOT re-enable an auto-disabled job.** Only explicit `enabled = true` in config (human intent) can re-enable it.
+Key rule: **Persisted state takes precedence over config for auto-disabled jobs.** When a job is auto-disabled (exit 0), the state file records `auto_disabled_at`. From that point, the config `enabled` field is ignored for this job. To re-enable, the human must **both** set `enabled = true` in config **and** remove the `auto_disabled_at` entry from state (or delete the state entry entirely). This prevents config reload from accidentally resurrecting a completed goal.
 
 ### Security
 
@@ -140,7 +140,7 @@ Key rule: **config reload does NOT re-enable an auto-disabled job.** Only explic
 
 # Phase 2: Full Goal Runner (Future Design)
 
-When Phase 1 is proven, extend with richer goal semantics.
+When Phase 1 is proven, extend with richer goal semantics. Phase 2 will introduce a new `[[goals]]` config section; Phase 1 `[[cron.jobs]]` entries with `disable_on_success` remain valid and coexist — no migration required.
 
 ## Additional Capabilities
 

From 4b3c4a7a5f38647c0ce55dc796eb64e695cc020a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E8=B6=85=E6=B8=A1=E6=B3=95=E5=B8=AB?=
 <chaodu-agent@openab.dev>
Date: Wed, 13 May 2026 22:08:08 +0000
Subject: [PATCH 6/6] docs: add job id, generation-based re-enable, prefix
 field names

---
 docs/goal-driven-agent-loop.md | 38 ++++++++++++++++++++--------------
 1 file changed, 23 insertions(+), 15 deletions(-)

diff --git a/docs/goal-driven-agent-loop.md b/docs/goal-driven-agent-loop.md
index 5cc6b4f6..3358a04f 100644
--- a/docs/goal-driven-agent-loop.md
+++ b/docs/goal-driven-agent-loop.md
@@ -47,32 +47,38 @@ Minimal extension to existing `[[cron.jobs]]` — add a single field `disable_on
 
 ```toml
 [[cron.jobs]]
+id = "unit-tests-pass"                            # REQUIRED for disable_on_success jobs
 schedule = "*/10 * * * *"
 channel = "123456789012345678"
 thread_id = ""                                    # optional: auto-created on first fire if empty
 message = "Goal not met: all unit tests must pass. <@&1496247626675257384> please continue."
-disable_on_success = "cd /repo && npm test"       # NEW: command to evaluate goal
-timeout = 60                                      # NEW: command timeout in seconds
-working_dir = "/repo"                             # NEW: optional working directory
+disable_on_success = "npm test"                   # NEW: command to evaluate goal
+disable_on_success_timeout_secs = 60              # NEW: command timeout
+disable_on_success_working_dir = "/repo"          # NEW: working directory for command
+generation = 1                                    # NEW: bump to re-enable after auto-disable
 ```
 
 ### New Fields
 
 | Field | Required | Default | Description |
 |-------|----------|---------|-------------|
+| `id` | ✅ (when `disable_on_success` set) | — | Stable unique identifier for state persistence |
 | `disable_on_success` | | — | Shell command; if exit 0, job auto-disables and message is NOT sent |
-| `timeout` | | `60` | Max seconds for `disable_on_success` to run before being killed |
-| `working_dir` | | — | Working directory for command execution |
+| `disable_on_success_timeout_secs` | | `60` | Max seconds for command to run before being killed |
+| `disable_on_success_working_dir` | | — | Working directory for command execution |
+| `generation` | | `1` | Bump this number to re-enable an auto-disabled job |
 
 ### Behavior
 
 When a cron job has `disable_on_success` set:
 
 1. Schedule fires
-2. Run `disable_on_success` command (with `timeout` and `working_dir`)
-3. **exit 0** → Goal achieved. Set `enabled = false` in persisted state. Do NOT send message.
-4. **exit != 0** → Goal not met. Send `message` to channel/thread as normal.
-5. **timeout exceeded** → Treat as exit != 0 (goal not met). Send message.
+2. Check: if persisted `generation` matches config `generation` AND job is auto-disabled → skip (stay disabled)
+3. If config `generation` > persisted `generation` → reset auto-disable state (re-enabled by human)
+4. Run `disable_on_success` command (with `disable_on_success_timeout_secs` and `disable_on_success_working_dir`)
+5. **exit 0** → Goal achieved. Persist auto-disabled state with current `generation`. Do NOT send message.
+6. **exit != 0** → Goal not met. Send `message` to channel/thread as normal.
+7. **timeout exceeded** → Treat as exit != 0 (goal not met). Send message.
 
 ### Thread Lifecycle
 
@@ -85,20 +91,22 @@ All messages go to the **same thread** — agents need conversation history as c
 
 ### Persistence
 
-Auto-disable state must survive restarts. Persisted per job:
+Auto-disable state must survive restarts. Persisted per job (keyed by `id`):
 
 ```json
 {
-  "job_key": "cron-<schedule_hash>-<channel>",
-  "enabled": true,
-  "thread_id": "1504239931940409587",
-  "auto_disabled_at": null
+  "unit-tests-pass": {
+    "generation": 1,
+    "auto_disabled": true,
+    "auto_disabled_at": "2026-05-13T22:00:00Z",
+    "thread_id": "1504239931940409587"
+  }
 }
 ```
 
 Storage: **local JSON state file** (`cron-state.json`) — loaded on startup, written on state change.
 
-Key rule: **Persisted state takes precedence over config for auto-disabled jobs.** When a job is auto-disabled (exit 0), the state file records `auto_disabled_at`. From that point, the config `enabled` field is ignored for this job. To re-enable, the human must **both** set `enabled = true` in config **and** remove the `auto_disabled_at` entry from state (or delete the state entry entirely). This prevents config reload from accidentally resurrecting a completed goal.
+**Re-enable logic:** When config `generation` > persisted `generation`, the auto-disable is cleared and the job runs again. This gives humans a clear, unambiguous way to restart a completed goal — just bump the number.
 
 ### Security