diff --git a/skills/workflow-builder/SKILL.md b/skills/workflow-builder/SKILL.md index 4ca935e..ccf6f63 100644 --- a/skills/workflow-builder/SKILL.md +++ b/skills/workflow-builder/SKILL.md @@ -1,12 +1,12 @@ --- name: workflow-builder -version: 0.1.0 +version: 0.2.0 description: Design, build, and maintain autonomous OpenClaw workflows (stewards). Use when creating new workflow agents, improving existing ones, evaluating automation opportunities, or debugging workflow reliability. Triggers on "build a workflow", "create a steward", "automate this process", "workflow audit", "what should I - automate". + automate", "create a cron job", "schedule a recurring task", "build a scheduled job". metadata: openclaw: emoji: "🏗️" @@ -219,10 +219,68 @@ Write confidence thresholds to `rules.md` so the user can tune them. ### Pattern 3: Sub-Agent Orchestration -Match intelligence to task complexity: +Match intelligence to task complexity, and **always use sub-agents for loops.** + +#### Rule: Never Loop Over Collections in the Orchestrator + +**Any time you iterate over a list (contacts, emails, tasks, records), spawn a sub-agent +per item.** This preserves the parent context for coordination and prevents pollution. + +**Pattern:** + +``` +Orchestrator (parent): +1. Fetch the list (from API, file, database) +2. Query tracking state to filter already-processed items +3. FOR EACH new item: Spawn a sub-agent with that item's details +4. Sub-agent processes one item, returns structured result +5. Parent collects results, updates tracking state, alerts if needed + +Sub-agent: +- Receives: One item + context needed for that item +- Does: All the reasoning, decision-making, work +- Returns: Structured summary (status, action taken, errors, alerts) +- Never accesses parent's full context +``` + +**Why:** Each sub-agent gets a fresh context window. Parent stays clean for +orchestration logic. No pollution from per-item reasoning. + +#### Model Selection: Check-Work Tiering for High-Frequency Jobs + +For jobs running every few minutes (e.g., every 5 min, every 15 min): + +**Two-stage pattern:** + +``` +Stage 1 (Cheap): Use Haiku to ask "Is there any work to do?" + - Cheap to run often + - Quick predicate check (yes/no) + - Examples: "Any new emails?", "Any cron job failures?", "Any security alerts?" + +Stage 2 (Expensive): If yes, spawn Opus/Sonnet to do the actual work + - Only spawned when there's real work + - Has full context for reasoning/decisions + - Saves tokens on empty runs +``` + +**Example:** + +``` +Cron job runs every 5 minutes: +1. Haiku runs: "Are there any unprocessed emails in my inbox?" + → Returns boolean (with brief explanation) +2. If yes: Spawn Sonnet to "Process and categorize these 3 emails" + → Does the actual work +3. If no: Skip expensive processing, return early + → Save ~90% tokens on empty runs +``` + +**Model selection for different complexities:** ``` -Obvious/routine items → Spawn sub-agent (cheaper model: Haiku/Sonnet) +High-frequency checks (every 5-15 min) → Haiku to check, Sonnet/Opus to act +Obvious/routine items → Spawn sub-agent (cheaper model: Sonnet) Important/nuanced items → Handle yourself or spawn a powerful sub-agent (Opus) Quality verification → Can use a strong model as QA reviewer (Opus as sub-agent) Uncertain items → Sub-agents escalate to you rather than guessing @@ -231,21 +289,66 @@ Uncertain items → Sub-agents escalate to you rather than guessing **Note:** Don't hardcode model IDs (they go stale fast). Use aliases like `sonnet`, `opus`, `haiku` or reference the model by capability level. -### Pattern 4: State Externalization (Compaction-Safe) +### Pattern 4: State Externalization — Contextual State vs Tracking State **Critical:** Chat history is a cache, not the source of truth. After every meaningful -step, write state to disk. +step, write state to disk. But distinguish between two types: + +#### 4a. Contextual State (Markdown only) + +**What:** Information the agent reasons about or learns over time. **Examples:** +`agent_notes.md`, `rules.md`, daily logs, decision summaries. **Format:** Markdown. +Always human-readable. **Why markdown:** These belong in context so the agent can reason +about them. ```markdown -# state/active-work.json (or inline in agent_notes.md) +# agent_notes.md + +## Patterns Observed + +- Contact X always sends updates on Tuesdays +- Task type Y typically needs 2-hour blocks -{ "current_phase": "processing", "next_action": "Review batch 2 of inbox", -"last_completed": "Batch 1: archived 12, deleted 3", "resume_prompt": "Continue inbox -processing from message ID xyz", "updated_at": "2026-02-18T14:30:00Z" } +## Mistakes Made + +- Once skipped important sender — now review sender importance before filtering ``` -**Rule in AGENT.md:** "On every run, read state first. Either advance it or explicitly -conclude it." +#### 4b. Tracking State (SQLite only) + +**What:** Deduplication, "have I seen this?", processed IDs, state queries. +**Examples:** `processed.db` with tables for seen IDs, statuses, timestamps. **Format:** +SQLite database with structured queries. **Why SQLite:** The agent doesn't reason about +this — it only queries it. SQLite gives O(1) lookups without loading the entire history +into context. + +⚠️ **NEVER use JSON for state files.** You are an LLM, not a JSON parser. JSON is useful +for API responses and tool output flags, but state files should be markdown +(human-readable) or SQLite (queryable). JSON state files create noise, parsing errors, +and waste context on structure rather than content. + +The workflow's `db-setup.md` defines the specific schema. The calling LLM writes the SQL +— don't over-prescribe queries in AGENT.md. Just describe what should happen (e.g., +"check if already processed", "mark as classified", "clean up entries older than 90 +days") and let the LLM write the appropriate queries. + +#### Schema Versioning & Migration + +Every workflow that uses SQLite should track schema versions using SQLite's built-in +`PRAGMA user_version` (an integer stored in the database header — no extra tables): + +1. **Put the schema inline in AGENT.md** — the LLM needs it to write queries anyway +2. **Declare the expected version** (e.g., `PRAGMA user_version: 1`) +3. **Each run checks:** `PRAGMA user_version` + - Matches → proceed + - Lower or missing → create tables / apply migrations / set user_version +4. **If legacy state files exist** (e.g., `processed.md`), migrate entries and archive + +See `workflows/contact-steward/AGENT.md` for a reference implementation. + +**Rule in AGENT.md:** "On every run, read contextual state first (agent_notes.md, +rules.md). Query tracking state via SQLite — one version check, then targeted queries. +After processing, update both as needed. Never load tracking history into context." ### Pattern 5: Error Handling & Alerting @@ -300,12 +403,21 @@ openclaw cron add \ ### Cron Configuration Guidelines -| Workflow Type | Schedule | Model | Session | -| -------------------------------------------- | --------------------------- | --------------- | ---------------- | -| High-frequency triage (email, notifications) | Every 15-30 min | Sonnet | Isolated | -| Daily reports/summaries | Once daily at fixed time | Opus | Isolated | -| Weekly reviews/audits | Weekly cron | Opus + thinking | Isolated | -| Reactive (triggered by events) | Via webhook or system event | Varies | Main or Isolated | +| Workflow Type | Schedule | Model Pattern | Session | +| -------------------------------------------- | --------------------------- | ---------------------------- | -------- | +| High-frequency checks (every 5-15 min) | Every 5-15 min | Haiku (check) → Sonnet (act) | Isolated | +| High-frequency triage (email, notifications) | Every 15-30 min | Sonnet | Isolated | +| Daily reports/summaries | Once daily at fixed time | Opus | Isolated | +| Weekly reviews/audits | Weekly cron | Opus + thinking | Isolated | +| Reactive (triggered by events) | Via webhook or system event | Varies | Isolated | + +**Note on Check-Work Tiering:** + +- If a job runs multiple times per hour, use the two-stage pattern: cheap check (Haiku) + → expensive work (Sonnet/Opus) +- This cuts token costs on empty runs (when there's no work to do) +- Example: "Email arrived?" (Haiku) → "Process these 5 emails" (Sonnet) only if yes +- Apply to: health checks, inbox scans, notification monitors, cron job monitors ### Delivery @@ -380,6 +492,14 @@ If `rules.md` doesn't exist or is empty: +## Database (only if this workflow tracks processed items) + +**PRAGMA user_version: 1** + + + ## Regular Operation ### Your Tools @@ -390,11 +510,15 @@ If `rules.md` doesn't exist or is empty: 1. Read `rules.md` for preferences 2. Read `agent_notes.md` for learned patterns (if exists) -3. -4. -5. Alert if anything needs attention -6. Append to today's log in `logs/` -7. Update `agent_notes.md` if you learned something +3. Ensure database is ready (see Database section — one quick version check) +4. +5. Query `processed.db` to filter items already handled +6. FOR EACH new item: Spawn a sub-agent to process it (see Sub-Agent Orchestration) +7. After each item, update `processed.db` with status +8. Collect sub-agent results +9. Alert if anything needs attention +10. Append to today's log in `logs/` +11. Update `agent_notes.md` if you learned something new about patterns/mistakes ### Judgment Guidelines @@ -416,7 +540,13 @@ If `rules.md` doesn't exist or is empty: - [ ] Setup interview creates rules.md with all needed preferences - [ ] Has clear judgment guidelines (when to act vs leave alone) - [ ] Error handling: logs errors, alerts on critical failures -- [ ] Housekeeping: auto-prunes old logs +- [ ] **Tracking state:** If workflow queries "have I seen this?", uses `processed.db` + (SQLite), not markdown lists +- [ ] **Sub-agents:** Any loop over a collection spawns sub-agents per item, not in + orchestrator +- [ ] **Contextual state:** agent_notes.md and rules.md are markdown, not JSON +- [ ] Housekeeping: auto-prunes old logs and cleans up stale tracking entries (e.g., + `DELETE FROM processed WHERE last_checked < ...`) - [ ] Integration points documented - [ ] Cron job configured with appropriate schedule/model - [ ] First week monitoring plan in place diff --git a/workflows/contact-steward/AGENT.md b/workflows/contact-steward/AGENT.md index f5c027b..f8e223e 100644 --- a/workflows/contact-steward/AGENT.md +++ b/workflows/contact-steward/AGENT.md @@ -172,7 +172,7 @@ the detective work. - Filtering out spam, automated messages, businesses - Cross-platform lookups to gather context (e.g. `wacli contacts search` for a number) - Detecting enrichment opportunities (new details in recent messages) -- Updating `processed.md` with scan results +- Updating `processed.db` with scan results (via SQLite queries) - Deciding whether to spawn Opus **You NEVER:** add, update, or modify contacts. All writes go through Opus. @@ -190,7 +190,7 @@ the detective work. - Contact already exists and no new info in recent messages - Obvious spam, OTP codes, delivery notifications, automated alerts - Your human didn't reply (no reply = no signal that this person matters) -- Business/automated accounts (log in processed.md and move on) +- Business/automated accounts (mark as `skipped` in processed.db and move on) ## The Trigger @@ -211,23 +211,62 @@ the log how many remain. They'll get picked up on subsequent runs. This means the first few runs after setup will be catching up on the backlog. That's expected — don't try to process everything at once. +## Database + +Tracking state lives in `processed.db` (SQLite). **PRAGMA user_version: 1** + +### Schema + +```sql +CREATE TABLE IF NOT EXISTS processed ( + platform TEXT NOT NULL, + contact_id TEXT NOT NULL, + status TEXT NOT NULL, + last_checked INTEGER NOT NULL, + metadata TEXT, + PRIMARY KEY (platform, contact_id) +); + +CREATE INDEX IF NOT EXISTS idx_status ON processed(status); +CREATE INDEX IF NOT EXISTS idx_last_checked ON processed(last_checked); +``` + +**Columns:** `platform` (whatsapp/imessage/quo), `contact_id` (phone/JID), `status` +(classified/asked_human/skipped/enriched/error), `last_checked` (unix timestamp), +`metadata` (brief notes). + +### Setup & Migration + +Before first scan, check `PRAGMA user_version`: + +- **Database missing** → create it with the schema above, set `PRAGMA user_version = 1` +- **user_version = 0** → tables may exist without version tracking. Run the CREATE IF + NOT EXISTS statements (idempotent), set `PRAGMA user_version = 1` +- **user_version matches** → proceed +- **user_version lower than current** → apply any needed ALTER TABLE changes for the new + version, then update user_version +- **`processed.md` exists (legacy)** → create the database, migrate entries from the + markdown file into the processed table, archive as `processed.md.migrated` + ## Each Run 1. Read `preferences.md` — know which platforms to scan and how to notify -2. Read `processed.md` — know what you've already looked at +2. Ensure database is ready (see Database section above) 3. Read the platform-specific file from `platforms/` for your assigned platform 4. Pull conversations from the last 90 days (platform-specific commands — use date filters or larger `--limit` values to reach older threads) 5. For each conversation where your human replied (oldest unprocessed first, max 10 Opus - spawns per run — enrichment checks and skips don't count toward the cap): a. Is the - other party a saved contact on this platform? If yes, check for enrichment (new - messages with contact-relevant info since last processed). If no new info, skip. b. - Not a saved contact? Cross-reference the phone number on other platforms (especially - `wacli contacts search `) c. Found info (cross-reference match, profile name, - or conversation clues)? Spawn Opus with everything you gathered. Opus verifies and - writes the contact. d. No match anywhere? Spawn Opus with full conversation context - for detective work. -6. Update `processed.md` with what you checked and the outcome + spawns per run — enrichment checks and skips don't count toward the cap): a. Check + processed.db for this platform + contact_id. b. If found, not an `error`, and no new + messages since last_checked → skip. c. If found with status `error` → treat as new, + retry (counts toward cap). d. Is the other party a saved contact on this platform? + Check for enrichment (new messages with contact-relevant info). If no new info, + update last_checked and skip. e. Not a saved contact? Cross-reference the phone + number on other platforms (especially `wacli contacts search `) f. Found info + (cross-reference match, profile name, or conversation clues)? Spawn Opus with + everything you gathered. Opus verifies and writes the contact. g. No match anywhere? + Spawn Opus with full conversation context for detective work. +6. After each contact, upsert into processed.db with the outcome status and timestamp 7. Notify your human with a batch summary of what was added and what needs their input 8. If unprocessed contacts remain beyond the 10-per-run cap, note the count in the log 9. Append to today's log in `logs/` (see Log Format below) @@ -300,9 +339,9 @@ that's an Opus job. ## Businesses vs People Detect obvious businesses (rental companies, delivery services, support lines). Skip -them by default, but log them in processed.md so we don't re-check. If your human is -having a genuine ongoing relationship with a business contact (e.g. a specific person at -a company), treat them as a person. +them by default, but mark them as `skipped` in processed.db so we don't re-check. If +your human is having a genuine ongoing relationship with a business contact (e.g. a +specific person at a company), treat them as a person. ## Notifications @@ -356,7 +395,7 @@ If a platform CLI command fails (non-zero exit, timeout, empty response): If an Opus sub-agent fails or times out: - Log the identifier it was working on -- Mark it as "error" in processed.md (will be retried next run) +- Mark it as `error` in processed.db (will be retried next run) - Continue with remaining contacts ## Log Format @@ -392,28 +431,22 @@ spawns, the Classification Result block from the sub-agent] ## State -`processed.md` is the only state file. It's natural language, not structured data. You -read it, you update it. Create it on first run if it doesn't exist. - -Format: grouped by platform. Each entry has the identifier, name if known, date last -checked, and a status: +`processed.db` is the tracking state (SQLite). It stores which contacts have been seen +and their status. Schema and setup instructions are in the Database section above. -- **classified** — identity resolved, contact added -- **asked human** — couldn't resolve, asked human, awaiting response -- **skipped** — spam, business, automated, or human didn't reply -- **enriched** — existing contact updated with new details -- **error** — processing failed, retry next run +Status values: `classified`, `asked_human`, `skipped`, `enriched`, `error`. -Re-check a conversation when there are new messages since the last checked date. Expire -"asked human" entries after 14 days with no response — downgrade to skipped. Clean up -"classified" entries older than 90 days. +Re-check a conversation when there are new messages since `last_checked`. The following +maintenance queries run during housekeeping (see below). ## Housekeeping -First run each day: clean up `processed.md` entries older than 90 days that are marked -as classified (they're stable). Keep "asked human" entries until resolved. +First run each day: -Delete logs older than 30 days. +- Expire `asked_human` entries older than 14 days → downgrade to `skipped` +- Delete `classified` entries older than 120 days (must exceed the 90-day scan window to + avoid re-processing contacts whose conversations are still visible) +- Delete logs older than 30 days ## Cron Setup @@ -439,7 +472,7 @@ specific platform name. This file (`AGENT.md`) and the workflow logic files (`classifier.md`, `platforms/`) are maintained upstream and update on deploy. User-specific configuration lives in -`preferences.md` and `processed.md`, which are **never overwritten** by updates. +`preferences.md` and `processed.db`, which are **never overwritten** by updates. ## Security Checklist (Every Run)