From cae6baf8b1937bd3ea2e23ceaa4f9984c3efd5ef Mon Sep 17 00:00:00 2001 From: Nick Sullivan Date: Sun, 29 Mar 2026 12:57:11 -0500 Subject: [PATCH 1/5] Workflow builder v0.2.0: SQLite tracking, sub-agent loops, check-work tiering MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Three new best practices for workflow design: - Pattern 3: Never loop over collections in orchestrator — spawn sub-agents per item - Pattern 3: Check-work tiering — cheap model checks if work exists, expensive model acts - Pattern 4: Split contextual state (markdown) vs tracking state (SQLite), ban JSON - Pattern 4: Schema versioning mechanism with per-workflow db-setup.md Contact steward migrated from processed.md to processed.db with schema versioning, legacy migration path, and automatic initialization for new installs. Co-Authored-By: Claude Opus 4.6 --- skills/workflow-builder/SKILL.md | 195 ++++++++++++++++++++++---- workflows/contact-steward/AGENT.md | 71 ++++++---- workflows/contact-steward/db-setup.md | 126 +++++++++++++++++ 3 files changed, 337 insertions(+), 55 deletions(-) create mode 100644 workflows/contact-steward/db-setup.md diff --git a/skills/workflow-builder/SKILL.md b/skills/workflow-builder/SKILL.md index 4ca935e..7934021 100644 --- a/skills/workflow-builder/SKILL.md +++ b/skills/workflow-builder/SKILL.md @@ -1,12 +1,12 @@ --- name: workflow-builder -version: 0.1.0 +version: 0.2.0 description: Design, build, and maintain autonomous OpenClaw workflows (stewards). Use when creating new workflow agents, improving existing ones, evaluating automation opportunities, or debugging workflow reliability. Triggers on "build a workflow", "create a steward", "automate this process", "workflow audit", "what should I - automate". + automate", "create a cron job", "schedule a recurring task", "build a scheduled job". metadata: openclaw: emoji: "🏗️" @@ -219,10 +219,68 @@ Write confidence thresholds to `rules.md` so the user can tune them. ### Pattern 3: Sub-Agent Orchestration -Match intelligence to task complexity: +Match intelligence to task complexity, and **always use sub-agents for loops.** + +#### Rule: Never Loop Over Collections in the Orchestrator + +**Any time you iterate over a list (contacts, emails, tasks, records), spawn a sub-agent +per item.** This preserves the parent context for coordination and prevents pollution. + +**Pattern:** + +``` +Orchestrator (parent): +1. Fetch the list (from API, file, database) +2. Query tracking state to filter already-processed items +3. FOR EACH new item: Spawn a sub-agent with that item's details +4. Sub-agent processes one item, returns structured result +5. Parent collects results, updates tracking state, alerts if needed + +Sub-agent: +- Receives: One item + context needed for that item +- Does: All the reasoning, decision-making, work +- Returns: Structured summary (status, action taken, errors, alerts) +- Never accesses parent's full context +``` + +**Why:** Each sub-agent gets a fresh context window. Parent stays clean for +orchestration logic. No pollution from per-item reasoning. + +#### Model Selection: Check-Work Tiering for High-Frequency Jobs + +For jobs running every few minutes (e.g., every 5 min, every 15 min): +**Two-stage pattern:** + +``` +Stage 1 (Cheap): Use Haiku to ask "Is there any work to do?" + - Cheap to run often + - Quick predicate check (yes/no) + - Examples: "Any new emails?", "Any cron job failures?", "Any security alerts?" + +Stage 2 (Expensive): If yes, spawn Opus/Sonnet to do the actual work + - Only spawned when there's real work + - Has full context for reasoning/decisions + - Saves tokens on empty runs +``` + +**Example:** + +``` +Cron job runs every 5 minutes: +1. Haiku runs: "Are there any unprocessed emails in my inbox?" + → Returns boolean (with brief explanation) +2. If yes: Spawn Sonnet to "Process and categorize these 3 emails" + → Does the actual work +3. If no: Skip expensive processing, return early + → Save ~90% tokens on empty runs ``` -Obvious/routine items → Spawn sub-agent (cheaper model: Haiku/Sonnet) + +**Model selection for different complexities:** + +``` +High-frequency checks (every 5-15 min) → Haiku to check, Sonnet/Opus to act +Obvious/routine items → Spawn sub-agent (cheaper model: Sonnet) Important/nuanced items → Handle yourself or spawn a powerful sub-agent (Opus) Quality verification → Can use a strong model as QA reviewer (Opus as sub-agent) Uncertain items → Sub-agents escalate to you rather than guessing @@ -231,21 +289,84 @@ Uncertain items → Sub-agents escalate to you rather than guessing **Note:** Don't hardcode model IDs (they go stale fast). Use aliases like `sonnet`, `opus`, `haiku` or reference the model by capability level. -### Pattern 4: State Externalization (Compaction-Safe) +### Pattern 4: State Externalization — Contextual State vs Tracking State **Critical:** Chat history is a cache, not the source of truth. After every meaningful -step, write state to disk. +step, write state to disk. But distinguish between two types: + +#### 4a. Contextual State (Markdown only) + +**What:** Information the agent reasons about or learns over time. **Examples:** +`agent_notes.md`, `rules.md`, daily logs, decision summaries. **Format:** Markdown. +Always human-readable. **Why markdown:** These belong in context so the agent can reason +about them. ```markdown -# state/active-work.json (or inline in agent_notes.md) +# agent_notes.md -{ "current_phase": "processing", "next_action": "Review batch 2 of inbox", -"last_completed": "Batch 1: archived 12, deleted 3", "resume_prompt": "Continue inbox -processing from message ID xyz", "updated_at": "2026-02-18T14:30:00Z" } +## Patterns Observed + +- Contact X always sends updates on Tuesdays +- Task type Y typically needs 2-hour blocks + +## Mistakes Made + +- Once skipped important sender — now review sender importance before filtering ``` -**Rule in AGENT.md:** "On every run, read state first. Either advance it or explicitly -conclude it." +#### 4b. Tracking State (SQLite only) + +**What:** Deduplication, "have I seen this?", processed IDs, state queries. +**Examples:** `processed.db` with tables for seen IDs, statuses, timestamps. **Format:** +SQLite database with structured queries. **Why SQLite:** The agent doesn't reason about +this — it only queries it. SQLite gives O(1) lookups without loading the entire history +into context. + +⚠️ **NEVER use JSON for state files.** You are an LLM, not a JSON parser. JSON is useful +for API responses and tool output flags, but state files should be markdown +(human-readable) or SQLite (queryable). JSON state files create noise, parsing errors, +and waste context on structure rather than content. + +The workflow's `db-setup.md` defines the specific schema. The calling LLM writes the SQL +— don't over-prescribe queries in AGENT.md. Just describe what should happen (e.g., +"check if already processed", "mark as classified", "clean up entries older than 90 +days") and let the LLM write the appropriate queries. + +#### Schema Versioning & Migration + +Every workflow that uses SQLite must track schema versions so upgrades happen +automatically: + +1. **Store version in the database** via a `schema_meta` table +2. **Declare the expected version in AGENT.md** (e.g., `Schema version: 1`) +3. **Each run checks with one query:** `SELECT version FROM schema_meta LIMIT 1` + - Matches → proceed (99% of runs, no extra reads) + - Lower → read `db-setup.md` for migration steps + - Missing → run inline initialization SQL +4. **Keep initialization SQL inline in AGENT.md** (idempotent `CREATE IF NOT EXISTS`) +5. **Keep migration steps in a separate `db-setup.md`** — only read on version mismatch + or legacy conversion + +**Per-workflow `db-setup.md`** contains: + +- Target schema with column reference +- Schema version history table +- Legacy migration instructions (e.g., `processed.md` → `processed.db`) +- Versioned migration blocks (e.g., "Version 1 → 2: ALTER TABLE ADD COLUMN ...") +- Common queries for reference + +This pattern handles all scenarios automatically: + +- **New server:** No database → initialization SQL creates it +- **Legacy server:** `processed.md` exists → db-setup.md migration +- **Schema upgrade pushed:** Version mismatch detected → db-setup.md migration +- **Normal run:** Version matches → zero overhead + +See `workflows/contact-steward/db-setup.md` for a reference implementation. + +**Rule in AGENT.md:** "On every run, read contextual state first (agent_notes.md, +rules.md). Query tracking state via SQLite — one version check, then targeted queries. +After processing, update both as needed. Never load tracking history into context." ### Pattern 5: Error Handling & Alerting @@ -300,12 +421,21 @@ openclaw cron add \ ### Cron Configuration Guidelines -| Workflow Type | Schedule | Model | Session | -| -------------------------------------------- | --------------------------- | --------------- | ---------------- | -| High-frequency triage (email, notifications) | Every 15-30 min | Sonnet | Isolated | -| Daily reports/summaries | Once daily at fixed time | Opus | Isolated | -| Weekly reviews/audits | Weekly cron | Opus + thinking | Isolated | -| Reactive (triggered by events) | Via webhook or system event | Varies | Main or Isolated | +| Workflow Type | Schedule | Model Pattern | Session | +| -------------------------------------------- | --------------------------- | ---------------------------- | -------- | +| High-frequency checks (every 5-15 min) | Every 5-15 min | Haiku (check) → Sonnet (act) | Isolated | +| High-frequency triage (email, notifications) | Every 15-30 min | Sonnet | Isolated | +| Daily reports/summaries | Once daily at fixed time | Opus | Isolated | +| Weekly reviews/audits | Weekly cron | Opus + thinking | Isolated | +| Reactive (triggered by events) | Via webhook or system event | Varies | Isolated | + +**Note on Check-Work Tiering:** + +- If a job runs multiple times per hour, use the two-stage pattern: cheap check (Haiku) + → expensive work (Sonnet/Opus) +- This cuts token costs on empty runs (when there's no work to do) +- Example: "Email arrived?" (Haiku) → "Process these 5 emails" (Sonnet) only if yes +- Apply to: health checks, inbox scans, notification monitors, cron job monitors ### Delivery @@ -380,6 +510,13 @@ If `rules.md` doesn't exist or is empty: +## Database (only if this workflow tracks processed items) + +**Schema version: 1** — See `db-setup.md` for full schema. + +Before processing, verify schema_meta.version matches the version above. If missing, +mismatched, or legacy state files exist → read `db-setup.md`. + ## Regular Operation ### Your Tools @@ -390,11 +527,15 @@ If `rules.md` doesn't exist or is empty: 1. Read `rules.md` for preferences 2. Read `agent_notes.md` for learned patterns (if exists) -3. -4. -5. Alert if anything needs attention -6. Append to today's log in `logs/` -7. Update `agent_notes.md` if you learned something +3. Ensure database is ready (see Database section — one quick version check) +4. +5. Query `processed.db` to filter items already handled +6. FOR EACH new item: Spawn a sub-agent to process it (see Sub-Agent Orchestration) +7. After each item, update `processed.db` with status +8. Collect sub-agent results +9. Alert if anything needs attention +10. Append to today's log in `logs/` +11. Update `agent_notes.md` if you learned something new about patterns/mistakes ### Judgment Guidelines @@ -416,7 +557,13 @@ If `rules.md` doesn't exist or is empty: - [ ] Setup interview creates rules.md with all needed preferences - [ ] Has clear judgment guidelines (when to act vs leave alone) - [ ] Error handling: logs errors, alerts on critical failures -- [ ] Housekeeping: auto-prunes old logs +- [ ] **Tracking state:** If workflow queries "have I seen this?", uses `processed.db` + (SQLite), not markdown lists +- [ ] **Sub-agents:** Any loop over a collection spawns sub-agents per item, not in + orchestrator +- [ ] **Contextual state:** agent_notes.md and rules.md are markdown, not JSON +- [ ] Housekeeping: auto-prunes old logs and cleans up stale tracking entries (e.g., + `DELETE FROM processed WHERE last_checked < ...`) - [ ] Integration points documented - [ ] Cron job configured with appropriate schedule/model - [ ] First week monitoring plan in place diff --git a/workflows/contact-steward/AGENT.md b/workflows/contact-steward/AGENT.md index f5c027b..297cbcf 100644 --- a/workflows/contact-steward/AGENT.md +++ b/workflows/contact-steward/AGENT.md @@ -172,7 +172,7 @@ the detective work. - Filtering out spam, automated messages, businesses - Cross-platform lookups to gather context (e.g. `wacli contacts search` for a number) - Detecting enrichment opportunities (new details in recent messages) -- Updating `processed.md` with scan results +- Updating `processed.db` with scan results (via SQLite queries) - Deciding whether to spawn Opus **You NEVER:** add, update, or modify contacts. All writes go through Opus. @@ -190,7 +190,7 @@ the detective work. - Contact already exists and no new info in recent messages - Obvious spam, OTP codes, delivery notifications, automated alerts - Your human didn't reply (no reply = no signal that this person matters) -- Business/automated accounts (log in processed.md and move on) +- Business/automated accounts (mark as `skipped` in processed.db and move on) ## The Trigger @@ -211,23 +211,38 @@ the log how many remain. They'll get picked up on subsequent runs. This means the first few runs after setup will be catching up on the backlog. That's expected — don't try to process everything at once. +## Database + +**Schema version: 1** — See `db-setup.md` for the full schema definition. + +Tracking state lives in `processed.db` (SQLite). Before first scan, check: + +- If `processed.db` doesn't exist or `schema_meta` table is missing → create the + database using the schema in `db-setup.md` +- If `processed.md` exists (legacy) → read `db-setup.md` for migration instructions +- If `schema_meta.version` is lower than the version above → read `db-setup.md` for + upgrade steps +- If version matches → proceed normally + ## Each Run 1. Read `preferences.md` — know which platforms to scan and how to notify -2. Read `processed.md` — know what you've already looked at +2. Ensure database is ready (see Database section above) 3. Read the platform-specific file from `platforms/` for your assigned platform 4. Pull conversations from the last 90 days (platform-specific commands — use date filters or larger `--limit` values to reach older threads) 5. For each conversation where your human replied (oldest unprocessed first, max 10 Opus - spawns per run — enrichment checks and skips don't count toward the cap): a. Is the - other party a saved contact on this platform? If yes, check for enrichment (new - messages with contact-relevant info since last processed). If no new info, skip. b. - Not a saved contact? Cross-reference the phone number on other platforms (especially - `wacli contacts search `) c. Found info (cross-reference match, profile name, + spawns per run — enrichment checks and skips don't count toward the cap): a. Check + processed.db for this platform + contact_id. b. If found and no new messages since + last_checked → skip. c. If found with status `error` → retry (counts toward cap). d. + Not in database and saved contact on platform? Check for enrichment (new messages + with contact-relevant info). If no new info, skip. e. Not a saved contact? + Cross-reference the phone number on other platforms (especially + `wacli contacts search `) f. Found info (cross-reference match, profile name, or conversation clues)? Spawn Opus with everything you gathered. Opus verifies and - writes the contact. d. No match anywhere? Spawn Opus with full conversation context + writes the contact. g. No match anywhere? Spawn Opus with full conversation context for detective work. -6. Update `processed.md` with what you checked and the outcome +6. After each contact, upsert into processed.db with the outcome status and timestamp 7. Notify your human with a batch summary of what was added and what needs their input 8. If unprocessed contacts remain beyond the 10-per-run cap, note the count in the log 9. Append to today's log in `logs/` (see Log Format below) @@ -300,9 +315,9 @@ that's an Opus job. ## Businesses vs People Detect obvious businesses (rental companies, delivery services, support lines). Skip -them by default, but log them in processed.md so we don't re-check. If your human is -having a genuine ongoing relationship with a business contact (e.g. a specific person at -a company), treat them as a person. +them by default, but mark them as `skipped` in processed.db so we don't re-check. If +your human is having a genuine ongoing relationship with a business contact (e.g. a +specific person at a company), treat them as a person. ## Notifications @@ -356,7 +371,7 @@ If a platform CLI command fails (non-zero exit, timeout, empty response): If an Opus sub-agent fails or times out: - Log the identifier it was working on -- Mark it as "error" in processed.md (will be retried next run) +- Mark it as `error` in processed.db (will be retried next run) - Continue with remaining contacts ## Log Format @@ -392,28 +407,22 @@ spawns, the Classification Result block from the sub-agent] ## State -`processed.md` is the only state file. It's natural language, not structured data. You -read it, you update it. Create it on first run if it doesn't exist. - -Format: grouped by platform. Each entry has the identifier, name if known, date last -checked, and a status: +`processed.db` is the tracking state (SQLite). It stores which contacts have been seen +and their status. The database schema and setup instructions are in the Database section +above. For migration and upgrade details, see `db-setup.md`. -- **classified** — identity resolved, contact added -- **asked human** — couldn't resolve, asked human, awaiting response -- **skipped** — spam, business, automated, or human didn't reply -- **enriched** — existing contact updated with new details -- **error** — processing failed, retry next run +Status values: `classified`, `asked_human`, `skipped`, `enriched`, `error`. -Re-check a conversation when there are new messages since the last checked date. Expire -"asked human" entries after 14 days with no response — downgrade to skipped. Clean up -"classified" entries older than 90 days. +Re-check a conversation when there are new messages since `last_checked`. The following +maintenance queries run during housekeeping (see below). ## Housekeeping -First run each day: clean up `processed.md` entries older than 90 days that are marked -as classified (they're stable). Keep "asked human" entries until resolved. +First run each day: -Delete logs older than 30 days. +- Expire `asked_human` entries older than 14 days → downgrade to `skipped` +- Delete `classified` entries older than 90 days (they're stable, no need to track) +- Delete logs older than 30 days ## Cron Setup @@ -439,7 +448,7 @@ specific platform name. This file (`AGENT.md`) and the workflow logic files (`classifier.md`, `platforms/`) are maintained upstream and update on deploy. User-specific configuration lives in -`preferences.md` and `processed.md`, which are **never overwritten** by updates. +`preferences.md` and `processed.db`, which are **never overwritten** by updates. ## Security Checklist (Every Run) diff --git a/workflows/contact-steward/db-setup.md b/workflows/contact-steward/db-setup.md new file mode 100644 index 0000000..3049a6f --- /dev/null +++ b/workflows/contact-steward/db-setup.md @@ -0,0 +1,126 @@ +# Contact Steward — Database Setup & Migration + +Only read this file when AGENT.md directs you here — during first-time setup, legacy +migration, or schema upgrade. Do not read on normal runs. + +## Prerequisites + +Verify sqlite3 is available: + +```bash +which sqlite3 +``` + +If not found: + +- **macOS:** Already installed at `/usr/bin/sqlite3`. If missing: `brew install sqlite` +- **Ubuntu/Debian:** `sudo apt install sqlite3` + +## Schema Version History + +| Version | Changes | Date | +| ------- | ------------------------------------------------------------------------------------------ | ---------- | +| 1 | Initial schema — processed table with platform, contact_id, status, last_checked, metadata | 2026-03-29 | + +## Target Schema (Current: Version 1) + +```sql +CREATE TABLE IF NOT EXISTS schema_meta (version INTEGER NOT NULL); + +CREATE TABLE IF NOT EXISTS processed ( + platform TEXT NOT NULL, + contact_id TEXT NOT NULL, + status TEXT NOT NULL, + last_checked INTEGER NOT NULL, + metadata TEXT, + PRIMARY KEY (platform, contact_id) +); + +CREATE INDEX IF NOT EXISTS idx_status ON processed(status); +CREATE INDEX IF NOT EXISTS idx_last_checked ON processed(last_checked); +``` + +### Column Reference + +| Column | Type | Description | +| ------------ | ------- | --------------------------------------------------------------------- | +| platform | TEXT | `whatsapp`, `imessage`, or `quo` | +| contact_id | TEXT | Phone number, JID, or platform-specific identifier | +| status | TEXT | One of: `classified`, `asked_human`, `skipped`, `enriched`, `error` | +| last_checked | INTEGER | Unix timestamp of last processing | +| metadata | TEXT | Brief notes (e.g., "enriched from WhatsApp", "spam — pizza delivery") | + +### Status Values + +- **classified** — Identity resolved, contact added to platform +- **asked_human** — Couldn't resolve, asked human, awaiting response +- **skipped** — Spam, business, automated, or human didn't reply +- **enriched** — Existing contact updated with new details +- **error** — Processing failed, will retry next run + +## Scenario: New Installation (No Database) + +Run the initialization SQL from AGENT.md. It's inline there and fully idempotent. You +don't need this file for new installations — AGENT.md has everything. + +## Scenario: Legacy Migration (processed.md exists) + +If `processed.md` exists from a previous version, migrate its entries to SQLite. + +**Step 1:** Run the initialization SQL from AGENT.md to create the database. + +**Step 2:** Read `processed.md`. It's natural language grouped by platform. Each entry +has an identifier, optional name, date, and status. For each entry, insert: + +```bash +sqlite3 workflows/contact-steward/processed.db \ + "INSERT OR IGNORE INTO processed (platform, contact_id, status, last_checked, metadata) \ + VALUES ('', '', '', , '')" +``` + +Use `INSERT OR IGNORE` to skip duplicates safely. Map the natural language statuses to +the standard values: classified, asked_human, skipped, enriched, error. + +**Step 3:** Verify by comparing counts: + +```bash +sqlite3 workflows/contact-steward/processed.db \ + "SELECT platform, COUNT(*) FROM processed GROUP BY platform" +``` + +**Step 4:** Archive the old file: + +```bash +mv workflows/contact-steward/processed.md workflows/contact-steward/processed.md.migrated +``` + +Keep `.migrated` for a few weeks as a safety net, then delete it. + +## Scenario: Schema Upgrade (Version Mismatch) + +When AGENT.md's `schema_version` is higher than the database's version, apply migrations +in order. Each migration block is idempotent — safe to re-run. + +### Migrating from Version 0 → 1 (No schema_meta table) + +If `SELECT version FROM schema_meta` errors (table doesn't exist), the database was +created before version tracking. The processed table likely already exists with the +correct columns. Run: + +```bash +sqlite3 workflows/contact-steward/processed.db <<'SQL' +CREATE TABLE IF NOT EXISTS schema_meta (version INTEGER NOT NULL); +INSERT INTO schema_meta VALUES (1); +CREATE INDEX IF NOT EXISTS idx_status ON processed(status); +CREATE INDEX IF NOT EXISTS idx_last_checked ON processed(last_checked); +SQL +``` + + From 86a9b398830ddcf16ee1426ae6bca2968e6aa7e0 Mon Sep 17 00:00:00 2001 From: Nick Sullivan Date: Sun, 29 Mar 2026 12:58:46 -0500 Subject: [PATCH 2/5] Fix three issues from code review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - schema_meta: Add PRIMARY KEY + CHECK constraint to prevent duplicate rows - Cleanup window: 90 days → 120 days to avoid race with 90-day scan window - SKILL.md: Pattern 4 no longer says inline SQL — schema lives in db-setup.md Co-Authored-By: Claude Opus 4.6 --- skills/workflow-builder/SKILL.md | 11 ++++++----- workflows/contact-steward/AGENT.md | 3 ++- workflows/contact-steward/db-setup.md | 12 +++++++++--- 3 files changed, 17 insertions(+), 9 deletions(-) diff --git a/skills/workflow-builder/SKILL.md b/skills/workflow-builder/SKILL.md index 7934021..05230b5 100644 --- a/skills/workflow-builder/SKILL.md +++ b/skills/workflow-builder/SKILL.md @@ -339,13 +339,14 @@ automatically: 1. **Store version in the database** via a `schema_meta` table 2. **Declare the expected version in AGENT.md** (e.g., `Schema version: 1`) -3. **Each run checks with one query:** `SELECT version FROM schema_meta LIMIT 1` +3. **Each run checks with one query:** `SELECT version FROM schema_meta` - Matches → proceed (99% of runs, no extra reads) - Lower → read `db-setup.md` for migration steps - - Missing → run inline initialization SQL -4. **Keep initialization SQL inline in AGENT.md** (idempotent `CREATE IF NOT EXISTS`) -5. **Keep migration steps in a separate `db-setup.md`** — only read on version mismatch - or legacy conversion + - Missing or error → read `db-setup.md` for initialization +4. **Keep the schema definition in `db-setup.md`** — the calling LLM creates tables from + the schema, no need to inline SQL in AGENT.md +5. **Keep migration steps in `db-setup.md`** — only read on version mismatch, missing + database, or legacy conversion **Per-workflow `db-setup.md`** contains: diff --git a/workflows/contact-steward/AGENT.md b/workflows/contact-steward/AGENT.md index 297cbcf..60b7f04 100644 --- a/workflows/contact-steward/AGENT.md +++ b/workflows/contact-steward/AGENT.md @@ -421,7 +421,8 @@ maintenance queries run during housekeeping (see below). First run each day: - Expire `asked_human` entries older than 14 days → downgrade to `skipped` -- Delete `classified` entries older than 90 days (they're stable, no need to track) +- Delete `classified` entries older than 120 days (must exceed the 90-day scan window to + avoid re-processing contacts whose conversations are still visible) - Delete logs older than 30 days ## Cron Setup diff --git a/workflows/contact-steward/db-setup.md b/workflows/contact-steward/db-setup.md index 3049a6f..5d6d38b 100644 --- a/workflows/contact-steward/db-setup.md +++ b/workflows/contact-steward/db-setup.md @@ -25,7 +25,10 @@ If not found: ## Target Schema (Current: Version 1) ```sql -CREATE TABLE IF NOT EXISTS schema_meta (version INTEGER NOT NULL); +CREATE TABLE IF NOT EXISTS schema_meta ( + id INTEGER PRIMARY KEY CHECK(id = 1), + version INTEGER NOT NULL +); CREATE TABLE IF NOT EXISTS processed ( platform TEXT NOT NULL, @@ -109,8 +112,11 @@ correct columns. Run: ```bash sqlite3 workflows/contact-steward/processed.db <<'SQL' -CREATE TABLE IF NOT EXISTS schema_meta (version INTEGER NOT NULL); -INSERT INTO schema_meta VALUES (1); +CREATE TABLE IF NOT EXISTS schema_meta ( + id INTEGER PRIMARY KEY CHECK(id = 1), + version INTEGER NOT NULL +); +INSERT OR REPLACE INTO schema_meta (id, version) VALUES (1, 1); CREATE INDEX IF NOT EXISTS idx_status ON processed(status); CREATE INDEX IF NOT EXISTS idx_last_checked ON processed(last_checked); SQL From fda5a06a0ada177c57f6733e134b976b4d16f2eb Mon Sep 17 00:00:00 2001 From: Nick Sullivan Date: Sun, 29 Mar 2026 13:00:25 -0500 Subject: [PATCH 3/5] Replace schema_meta table with PRAGMA user_version SQLite has a built-in integer for version tracking in the database header. No extra table, no constraints, no duplicate row risks. Co-Authored-By: Claude Opus 4.6 --- skills/workflow-builder/SKILL.md | 11 ++++----- workflows/contact-steward/AGENT.md | 10 ++++----- workflows/contact-steward/db-setup.md | 32 ++++++++------------------- 3 files changed, 20 insertions(+), 33 deletions(-) diff --git a/skills/workflow-builder/SKILL.md b/skills/workflow-builder/SKILL.md index 05230b5..8e21c09 100644 --- a/skills/workflow-builder/SKILL.md +++ b/skills/workflow-builder/SKILL.md @@ -337,12 +337,13 @@ days") and let the LLM write the appropriate queries. Every workflow that uses SQLite must track schema versions so upgrades happen automatically: -1. **Store version in the database** via a `schema_meta` table +1. **Use SQLite's built-in `PRAGMA user_version`** to track schema version (no extra + tables needed) 2. **Declare the expected version in AGENT.md** (e.g., `Schema version: 1`) -3. **Each run checks with one query:** `SELECT version FROM schema_meta` +3. **Each run checks:** `PRAGMA user_version` - Matches → proceed (99% of runs, no extra reads) - Lower → read `db-setup.md` for migration steps - - Missing or error → read `db-setup.md` for initialization + - Database missing → read `db-setup.md` for initialization 4. **Keep the schema definition in `db-setup.md`** — the calling LLM creates tables from the schema, no need to inline SQL in AGENT.md 5. **Keep migration steps in `db-setup.md`** — only read on version mismatch, missing @@ -515,8 +516,8 @@ If `rules.md` doesn't exist or is empty: **Schema version: 1** — See `db-setup.md` for full schema. -Before processing, verify schema_meta.version matches the version above. If missing, -mismatched, or legacy state files exist → read `db-setup.md`. +Before processing, check `PRAGMA user_version`. If it doesn't match the version above, +or the database is missing → read `db-setup.md`. ## Regular Operation diff --git a/workflows/contact-steward/AGENT.md b/workflows/contact-steward/AGENT.md index 60b7f04..5fcac8f 100644 --- a/workflows/contact-steward/AGENT.md +++ b/workflows/contact-steward/AGENT.md @@ -215,14 +215,14 @@ expected — don't try to process everything at once. **Schema version: 1** — See `db-setup.md` for the full schema definition. -Tracking state lives in `processed.db` (SQLite). Before first scan, check: +Tracking state lives in `processed.db` (SQLite). Before first scan, check +`PRAGMA user_version` on the database: -- If `processed.db` doesn't exist or `schema_meta` table is missing → create the - database using the schema in `db-setup.md` +- If `processed.db` doesn't exist → read `db-setup.md` to create it - If `processed.md` exists (legacy) → read `db-setup.md` for migration instructions -- If `schema_meta.version` is lower than the version above → read `db-setup.md` for +- If `user_version` is lower than the schema version above → read `db-setup.md` for upgrade steps -- If version matches → proceed normally +- If `user_version` matches → proceed normally ## Each Run diff --git a/workflows/contact-steward/db-setup.md b/workflows/contact-steward/db-setup.md index 5d6d38b..aa68716 100644 --- a/workflows/contact-steward/db-setup.md +++ b/workflows/contact-steward/db-setup.md @@ -25,11 +25,6 @@ If not found: ## Target Schema (Current: Version 1) ```sql -CREATE TABLE IF NOT EXISTS schema_meta ( - id INTEGER PRIMARY KEY CHECK(id = 1), - version INTEGER NOT NULL -); - CREATE TABLE IF NOT EXISTS processed ( platform TEXT NOT NULL, contact_id TEXT NOT NULL, @@ -63,8 +58,11 @@ CREATE INDEX IF NOT EXISTS idx_last_checked ON processed(last_checked); ## Scenario: New Installation (No Database) -Run the initialization SQL from AGENT.md. It's inline there and fully idempotent. You -don't need this file for new installations — AGENT.md has everything. +Create the database using the target schema above, then set the version: + +```sql +PRAGMA user_version = 1; +``` ## Scenario: Legacy Migration (processed.md exists) @@ -104,23 +102,11 @@ Keep `.migrated` for a few weeks as a safety net, then delete it. When AGENT.md's `schema_version` is higher than the database's version, apply migrations in order. Each migration block is idempotent — safe to re-run. -### Migrating from Version 0 → 1 (No schema_meta table) - -If `SELECT version FROM schema_meta` errors (table doesn't exist), the database was -created before version tracking. The processed table likely already exists with the -correct columns. Run: +### Migrating from Version 0 → 1 (user_version is 0) -```bash -sqlite3 workflows/contact-steward/processed.db <<'SQL' -CREATE TABLE IF NOT EXISTS schema_meta ( - id INTEGER PRIMARY KEY CHECK(id = 1), - version INTEGER NOT NULL -); -INSERT OR REPLACE INTO schema_meta (id, version) VALUES (1, 1); -CREATE INDEX IF NOT EXISTS idx_status ON processed(status); -CREATE INDEX IF NOT EXISTS idx_last_checked ON processed(last_checked); -SQL -``` +If `PRAGMA user_version` returns 0, the database was created before version tracking or +is brand new. Ensure the processed table and indexes exist (the CREATE IF NOT EXISTS +statements are idempotent), then set `PRAGMA user_version = 1`. From bffd52ed162705eec3a4f5d3eee3eb1f6059ec03 Mon Sep 17 00:00:00 2001 From: Nick Sullivan Date: Sun, 29 Mar 2026 13:05:14 -0500 Subject: [PATCH 5/5] Fix enrichment gap for tracked contacts with new messages MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Step 5d previously required "not in database AND saved contact" — so contacts already in processed.db with new messages since last_checked had no explicit path. Now enrichment check applies to all saved contacts regardless of DB state. Co-Authored-By: Claude Opus 4.6 --- workflows/contact-steward/AGENT.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/workflows/contact-steward/AGENT.md b/workflows/contact-steward/AGENT.md index bc02e24..f8e223e 100644 --- a/workflows/contact-steward/AGENT.md +++ b/workflows/contact-steward/AGENT.md @@ -257,15 +257,15 @@ Before first scan, check `PRAGMA user_version`: filters or larger `--limit` values to reach older threads) 5. For each conversation where your human replied (oldest unprocessed first, max 10 Opus spawns per run — enrichment checks and skips don't count toward the cap): a. Check - processed.db for this platform + contact_id. b. If found and no new messages since - last_checked → skip. c. If found with status `error` → retry (counts toward cap). d. - Not in database and saved contact on platform? Check for enrichment (new messages - with contact-relevant info). If no new info, skip. e. Not a saved contact? - Cross-reference the phone number on other platforms (especially - `wacli contacts search `) f. Found info (cross-reference match, profile name, - or conversation clues)? Spawn Opus with everything you gathered. Opus verifies and - writes the contact. g. No match anywhere? Spawn Opus with full conversation context - for detective work. + processed.db for this platform + contact_id. b. If found, not an `error`, and no new + messages since last_checked → skip. c. If found with status `error` → treat as new, + retry (counts toward cap). d. Is the other party a saved contact on this platform? + Check for enrichment (new messages with contact-relevant info). If no new info, + update last_checked and skip. e. Not a saved contact? Cross-reference the phone + number on other platforms (especially `wacli contacts search `) f. Found info + (cross-reference match, profile name, or conversation clues)? Spawn Opus with + everything you gathered. Opus verifies and writes the contact. g. No match anywhere? + Spawn Opus with full conversation context for detective work. 6. After each contact, upsert into processed.db with the outcome status and timestamp 7. Notify your human with a batch summary of what was added and what needs their input 8. If unprocessed contacts remain beyond the 10-per-run cap, note the count in the log