Architecture — selftune

selftune is a local-first feedback loop for AI agent skills. It turns saved agent activity into trustworthy local evidence, uses that evidence to improve low-risk skill behavior, and exposes the result through CLI surfaces and a local dashboard SPA.

Agent-First Design Principle

selftune is a skill consumed by AI agents, not a CLI tool for humans. The user installs the skill (npx skills add selftune-dev/selftune), then interacts through their coding agent ("set up selftune", "improve my skills"). The agent reads skill/SKILL.md to discover commands, routes to the correct workflow doc, and executes CLI commands on the user's behalf.

This means:

skill/SKILL.md is the primary product surface (agent reads this to know what to do)
skill/workflows/*.md are the agent's step-by-step guides
cli/selftune/ is the agent's API (the CLI binary the agent calls)
Error messages and output should be machine-parseable (JSON) and guide the agent to the next action

If you are new to the repo, read these in order:

Architecture At A Glance

flowchart LR
  Agent[Claude Code / Codex / OpenCode / OpenClaw] --> Sources[Transcripts / rollouts / session stores]
  Agent -. hook hints .-> Hooks[Claude hooks]

  Sources --> Sync[selftune sync]
  Hooks --> SQLite[(SQLite — sole write target)]
  Sync --> SQLite
  Sync --> Repaired[Repaired skill-usage overlay]

  SQLite --> Eval[Eval + grading]
  Repaired --> Eval
  Eval --> Orchestrate[selftune orchestrate]
  Orchestrate --> Evolution[Evolve / deploy / audit]
  Orchestrate --> Monitoring[Watch / rollback]

  Evolution --> SQLite
  Monitoring --> SQLite

  Logs[JSONL files — recovery only] -. disaster recovery .-> Materializer[Materializer — one-time rebuild]
  Materializer --> SQLite

  SQLite --> API[dashboard-server v2 API]
  SQLite -. WAL watch .-> API
  API -. SSE push .-> SPA[apps/local-dashboard]
  API --> CLI[status / last / badge]

  SQLite -. alpha enrolled .-> AlphaUpload[alpha-upload pipeline]
  AlphaUpload --> Queue[(upload_queue table)]
  Queue --> Flush[flush + retry]
  Flush --> CloudAPI[cloud API — POST /api/v1/push]
  CloudAPI --> Postgres[(Neon Postgres — canonical tables)]

Operating Rules

Source-truth first. Transcripts, rollouts, and session stores are authoritative. Hooks are low-latency hints.
Shared local evidence. Downstream modules communicate through SQLite (sole operational store) and repaired overlays. Legacy JSONL files are retained on disk for disaster recovery only.
Autonomy with safeguards. Low-risk description evolution can deploy automatically, but validation, watch, and rollback remain mandatory.
Local-first product surfaces. status, last, and the dashboard read from local evidence, not external services.
Alpha data pipeline. Opted-in users upload V2 canonical push payloads to the cloud API via alpha-upload/. Uploads are fail-open and never block the orchestrate loop.
Generic scheduling first. selftune cron setup is the main automation path (auto-detects platform). selftune schedule is a backward-compatible alias.

Domain Map

Domain	Directory / File	Responsibility	Quality Grade
Bootstrap	`cli/selftune/init.ts`	Agent detection, config bootstrap, setup guidance	B
Telemetry	`cli/selftune/hooks/`	Claude Code hook-based prompt, session, and skill-use hints	B
Hooks Shared	`cli/selftune/hooks-shared/`	Universal hook types, normalizers, and utilities for multi-platform support	B
Platform Adapters	`cli/selftune/adapters/`	Per-platform hook handlers and install commands (Codex, OpenCode, Cline)	B
Ingestors	`cli/selftune/ingestors/`	Normalize Claude, Codex, OpenCode, and OpenClaw data into shared logs	B
Source Sync	`cli/selftune/sync.ts`, `cli/selftune/repair/`	Rebuild source-truth local evidence and repaired overlays	B
Scheduling	`cli/selftune/schedule.ts`	Generic cron/launchd/systemd artifact generation and install	B
Cron Adapter	`cli/selftune/cron/`	Optional OpenClaw cron integration	B
Eval	`cli/selftune/eval/`	False-negative detection, eval generation, baseline, unit tests, composability	B
Grading	`cli/selftune/grading/`	Three-tier session grading with deterministic pre-gates and agent-based evaluation	B
Evolution	`cli/selftune/evolution/`	Propose, structurally validate, runtime/fixture replay validate, deploy, audit, rollback, and shared validation-mode policy	B
Orchestrator	`cli/selftune/orchestrate.ts`	Autonomy-first sync -> candidate selection -> evolve -> watch loop	B
Monitoring	`cli/selftune/monitoring/`	Post-deploy regression detection and rollback triggers	B
Local DB	`cli/selftune/localdb/`	SQLite materialization and payload-oriented queries	B
Dashboard	`cli/selftune/dashboard.ts`, `cli/selftune/dashboard-server.ts`, `apps/local-dashboard/`	Local SPA shell, v2 API with SSE live updates, overview/report/status UI	B
Observability CLI	`cli/selftune/status.ts`, `cli/selftune/last.ts`, `cli/selftune/badge/`	Fast local readouts of health, recent activity, and badge state	B
Alpha Upload	`cli/selftune/alpha-upload/`, `cli/selftune/alpha-identity.ts`	Alpha data pipeline: queue, V2 payload build, flush, HTTP transport with API key auth	B
Contribute	`cli/selftune/contribute/`	Opt-in anonymized export for community signal pooling	C
Skill	`skill/`	Agent-facing routing table, workflows, and references	B

Dependency Direction

Dependencies are intended to flow forward through the pipeline:

flowchart TD
  Shared[Shared types / constants / utils]
  Hooks[Hooks]
  Ingestors[Ingestors]
  Sync[Sync + repair]
  Eval[Eval]
  Grading[Grading]
  Evolution[Evolution]
  Orchestrate[Orchestrator]
  Monitoring[Monitoring]
  LocalDB[LocalDB]
  Dashboard[Dashboard]

  Shared --> Hooks
  Shared --> Ingestors
  Shared --> Sync
  Shared --> Eval
  Shared --> Grading
  Shared --> Evolution
  Shared --> Orchestrate
  Shared --> Monitoring
  Shared --> LocalDB
  Shared --> Dashboard

  Hooks --> Sync
  Ingestors --> Sync
  Sync --> Eval
  Eval --> Grading
  Eval --> Evolution
  Grading --> Evolution
  Evolution --> Orchestrate
  Evolution --> Monitoring
  Sync --> LocalDB
  Evolution --> LocalDB
  Monitoring --> LocalDB
  LocalDB --> Dashboard

Important practical interpretation:

Hooks should not import grading or evolution code.
The dashboard should consume payload-oriented queries, not rebuild business logic itself.
The orchestrator should coordinate existing modules, not duplicate evolution or monitoring logic.

Two Operating Modes

selftune has two distinct operating modes with different execution models:

Interactive Mode (agent-driven)

The user talks to their coding agent. The agent reads skill/SKILL.md, routes to the correct workflow, and runs CLI commands. The agent is the operator.

User: "improve my skills"
  → Agent reads SKILL.md → routes to Orchestrate workflow
  → Agent runs: selftune orchestrate
  → Agent summarizes results to user

Automated Mode (OS-driven)

System scheduling (cron/launchd/systemd) calls the CLI binary directly. No agent session needed, no token cost. Set up via selftune cron setup.

OS scheduler fires every 6 hours
  → selftune orchestrate --max-skills 3
  → sync → status → auto-grade ungraded → candidate selection → evolve → watch → write results to SQLite
  → Next interactive session sees improved SKILL.md

The agent is NOT in the loop for automated runs. This is intentional: automated runs are routine maintenance (sync, low-risk evolutions) that don't need agent intelligence or user interaction.

Data Architecture

SQLite is the sole write target and operational database. Hooks and sync write directly to SQLite via localdb/direct-write.ts. JSONL writes have been removed (Phase 3 complete). Existing JSONL files are retained on disk but only cover pre-cutover history. Post-cutover recovery requires selftune export snapshots or SQLite backups. The skill_usage table still exists in the schema alongside skill_invocations for backward compatibility; new consumers should use skill_invocations via localdb/queries.ts.

Primary Store: SQLite (~/.selftune/selftune.db)
├── Hooks write directly via localdb/direct-write.ts (sole write path)
├── Sync writes directly via localdb/direct-write.ts
├── All reads (orchestrate, evolve, grade, status, dashboard) query SQLite
└── Target freshness model: WAL-mode watch powers SSE live updates

Legacy JSONL files (~/.claude/*.jsonl) — pre-cutover history only, no longer written
├── session_telemetry_log.jsonl    Session telemetry records
├── skill_usage_log.jsonl          Skill trigger/miss records (deprecated; consolidated into skill_invocations SQLite table)
├── all_queries_log.jsonl          User prompt log
├── evolution_audit_log.jsonl      Evolution decisions + evidence
├── orchestrate_runs.jsonl         Orchestrate run reports
└── canonical_telemetry_log.jsonl  Normalized cross-platform records

Core Loop: reads SQLite
├── orchestrate.ts  → db.query("SELECT ... FROM sessions ...")
├── evolve.ts       → db.query("SELECT ... FROM evolution_audit ...")
├── grade.ts        → db.query("SELECT ... FROM sessions ...")
└── status.ts       → db.query("SELECT ... FROM sessions, skill_usage, queries ...")

Rebuild Paths:
├── materialize.ts  — runs once on startup for historical JSONL backfill
└── selftune export — generates JSONL from SQLite on demand

Alpha Upload Path (opted-in users only):
├── stage-canonical.ts  — reads canonical records from SQLite + evolution evidence + orchestrate_runs into canonical_upload_staging table
├── build-payloads.ts   — reads staging table via single monotonic cursor, produces V2 canonical push payloads
├── flush.ts            — POSTs to cloud API (POST /api/v1/push) with Bearer auth, handles 409/401/403
└── Cloud storage: Neon Postgres (raw_pushes for lossless ingest → canonical tables for analysis)

Hooks and sync write exclusively to SQLite. JSONL writes have been removed (Phase 3 complete). All local product reads go through SQLite. The materializer runs once on startup to backfill any historical JSONL data not yet in the database. selftune export can regenerate JSONL from SQLite when needed for portability or debugging.

The dashboard uses WAL-based invalidation for SSE live updates — JSONL file watchers have been removed from the dashboard server.

Repository Shape

cli/selftune/
├── index.ts              CLI entry point
├── init.ts               Config bootstrap and environment detection
├── sync.ts               Source-truth sync orchestration
├── orchestrate.ts        Main autonomous loop
├── schedule.ts           Generic scheduler install/preview
├── dashboard.ts          Dashboard command entry point
├── dashboard-server.ts   Bun.serve API + SPA shell + SSE live updates
├── dashboard-contract.ts Shared overview/report/run-report payload types
├── constants.ts          Paths and log file constants
├── types.ts              Shared TypeScript interfaces
├── utils/                JSONL, transcript, logging, schema, CLI error handler, agent-call helpers
├── hooks/                Claude Code hook handlers (prompt-log, skill-eval, session-stop, guards)
├── hooks-shared/         Universal hook types, normalizers, and shared utilities
├── adapters/             Per-platform hook adapters (codex/, opencode/, cline/)
├── ingestors/            Claude/Codex/OpenCode/OpenClaw batch ingest adapters
├── repair/               Rebuild repaired skill-usage overlay
├── routes/               HTTP route handlers (extracted from dashboard-server)
├── eval/                 False-negative detection and eval generation
├── grading/              Session grading
├── evolution/            Propose / validate / runtime-or-fixture replay-validate / deploy / rollback / shared validation contract
├── monitoring/           Post-deploy watch and rollback
├── localdb/              SQLite schema, materialization, queries
├── contribute/           Opt-in anonymized export
├── cron/                 OpenClaw scheduler adapter
├── memory/               Evolution memory persistence
└── workflows/            Multi-skill workflow discovery and persistence

apps/local-dashboard/
├── src/pages/            Overview, per-skill report, and system status routes
├── src/components/       Dashboard components
├── src/hooks/            Data-fetch hooks + SSE live update hook
└── src/types.ts          Frontend types from dashboard-contract.ts

skill/
├── SKILL.md              Agent-facing routing table
├── workflows/            Workflow docs for each command
└── references/           Logs, grading, and taxonomy references

Module Definitions

Module	Files	Responsibility	May Import From
Shared	`types.ts`, `constants.ts`, `utils/*.ts`	Core shared types, paths, JSONL helpers, transcript parsing, CLI error handler, agent-call helpers	Bun built-ins only
Bootstrap	`init.ts`, `observability.ts`	Config bootstrap and health checks	Shared
Hooks	`hooks/*.ts`	Claude Code hook handlers: prompt logging, skill eval, session stop, guards	Shared
Hooks Shared	`hooks-shared/*.ts`	Universal hook types, platform normalizers, session state, git metadata, skill path utils	Shared
Adapters	`adapters/{codex,opencode,cline}/*.ts`	Per-platform hook handlers and `install` commands; delegate to Hooks for business logic	Shared, Hooks, Hooks Shared
Ingestors	`ingestors/*.ts`	Normalize platform-specific session sources (batch backfill)	Shared
Source Sync	`sync.ts`, `repair/*.ts`	Produce trustworthy local evidence before downstream decisions	Shared, Ingestors
Scheduling	`schedule.ts`	Build and optionally install generic scheduling artifacts	Shared
Cron Adapter	`cron/*.ts`	OpenClaw-specific scheduling setup/list/remove	Shared
Eval	`eval/*.ts`	Build eval sets, detect false negatives, baseline and composability analysis	Shared
Grading	`grading/*.ts`	Session grading and pre-gates	Shared, Eval
Evolution	`evolution/*.ts` (including `validate-host-replay.ts`)	Description/body/routing proposal, structural + replay-backed validation, Claude runtime routing replay with fixture fallback, deploy, rollback, audit	Shared, Eval, Grading
Orchestrator	`orchestrate.ts`	Coordinate sync, candidate selection, evolve, and watch	Shared, Sync, Evolution, Monitoring, Status
Monitoring	`monitoring/*.ts`	Watch deployed changes and trigger rollback	Shared, Evolution
Local DB	`localdb/*.ts`	Materialize logs and audits into overview/report/query shapes	Shared, Sync outputs, Evolution audit
Dashboard	`dashboard.ts`, `dashboard-server.ts`, `apps/local-dashboard/`	Serve and render the local dashboard experience	Shared, LocalDB, Status, Observability, Evolution (evidence)
Skill	`skill/`	Provide agent-facing command routing and workflow guidance	Reads public CLI behavior and references

Truth Model: Hooks vs. Source Systems

flowchart LR
  Hooks[Hook events] --> Hints[Low-latency hints]
  Stores[Transcripts / rollouts / session stores] --> Sync[selftune sync]
  Sync --> Truth[Trustworthy local evidence]
  Hints -. enrich .-> Truth

Why this matters:

Hooks can be missing, polluted, or agent-specific.
Source sync is how selftune stays cross-agent and backfillable.
Autonomous changes should be justified from the synced evidence path, not from hooks alone.

Autonomous Loop

sequenceDiagram
  participant User
  participant Orchestrate
  participant Sync
  participant Status
  participant Evolution
  participant Monitoring

  User->>Orchestrate: selftune orchestrate
  Orchestrate->>Sync: rebuild source-truth telemetry
  Sync-->>Orchestrate: shared logs + repaired overlay
  Orchestrate->>Status: compute current skill health
  Status-->>Orchestrate: candidates + reasons
  Orchestrate->>Evolution: evolve selected low-risk descriptions
  Evolution-->>Orchestrate: deployed proposals + audit entries
  Orchestrate->>Monitoring: watch recent deployments
  Monitoring-->>Orchestrate: stable or rollback result
  Orchestrate-->>User: decision report

Current policy:

Low-risk description evolution is autonomous by default.
--review-required is an opt-in stricter policy mode.
Validation, watch, and rollback are the main safety system.

Signal-Reactive Improvement

In addition to scheduled and interactive orchestration, selftune detects high-priority improvement signals in real-time and triggers focused orchestration automatically.

sequenceDiagram
  participant User
  participant PromptLog as prompt-log hook
  participant SignalLog as improvement_signals (SQLite)
  participant SessionStop as session-stop hook
  participant Orchestrate

  User->>PromptLog: "why didn't you use the commit skill?"
  PromptLog->>SignalLog: append signal (correction, skill=commit)
  Note over PromptLog: continues normal prompt logging
  User->>SessionStop: session ends
  SessionStop->>SignalLog: read pending signals
  SessionStop->>Orchestrate: spawn background (--max-skills 2)
  Note over SessionStop: exits immediately (fire-and-forget)
  Orchestrate->>SignalLog: read signals, boost signaled skills
  Orchestrate->>Orchestrate: evolve with signal-aware priority
  Orchestrate->>SignalLog: mark signals consumed

Signal detection is pure regex in the prompt-log hook — no LLM calls, no network. Patterns include corrections ("why didn't you use X?", "you should have used X"), explicit requests ("please use the X skill"), and manual invocations. Skill names are matched against the installed skill directory listing.

The orchestrator boosts signaled skills by +150 priority per signal (capped at +450) and relaxes the minimum evidence gate and UNGRADED gate for skills with pending signals. After a run completes, signals are marked consumed so they don't affect subsequent runs.

Config System

selftune init writes ~/.selftune/config.json.

Field	Type	Description
`agent_type`	`claude_code \| codex \| opencode \| openclaw \| unknown`	Detected host agent
`cli_path`	`string`	Absolute path to the selftune CLI entry point
`llm_mode`	`agent \| api`	How grading/evolution run model calls
`agent_cli`	`string \| null`	Preferred agent binary
`hooks_installed`	`boolean`	Whether Claude hooks are configured
`initialized_at`	`string`	ISO timestamp of the last bootstrap

Shared Local Artifacts

Artifact	Writer	Reader
`~/.claude/session_telemetry_log.jsonl`	Legacy / export-only (`selftune export`)	Materializer recovery, export
`~/.claude/skill_usage_log.jsonl`	Legacy / export-only (`selftune export`)	Materializer recovery (deprecated — consolidated into `skill_invocations` table in SQLite)
`~/.claude/skill_usage_repaired.jsonl`	Legacy / export-only (`selftune export`)	Materializer recovery (deprecated — consolidated into `skill_invocations` table in SQLite)
`~/.claude/all_queries_log.jsonl`	Legacy / export-only (`selftune export`)	Materializer recovery, export
`~/.claude/evolution_audit_log.jsonl`	Legacy / export-only (`selftune export`)	Materializer recovery, export
`~/.claude/orchestrate_runs.jsonl`	Legacy / export-only (`selftune export`)	Materializer recovery, export
`~/.claude/improvement_signals.jsonl`	Legacy / export-only (`selftune export`)	Materializer recovery, export
`~/.claude/.orchestrate.lock`	Orchestrator	session-stop hook (staleness check)
`~/.selftune/*.sqlite`	Hooks (direct-write), sync, materializer (backfill)	All reads: orchestrate, evolve, grade, status, dashboard

The Evaluation Model

Tier	What It Checks	Automated
Tier 1 — Trigger	Did the skill fire when it should have?	Yes
Tier 2 — Process	Did the session follow the expected workflow?	Yes
Tier 3 — Quality	Was the resulting work actually good enough?	Yes, via agent-as-grader

Invocation Taxonomy

Type	Description
Explicit	The user names the skill directly
Implicit	The task matches the skill without naming it
Contextual	The task is implicit with real-world domain noise
Negative	Nearby queries that should not trigger the skill

Current Known Tensions

Candidate selection is improving, but still needs stronger real-world evidence gating.
Local and cloud dashboard semantics should converge on the same payload contracts.
The CLI core still avoids runtime dependencies, while the local SPA intentionally uses frontend build-time dependencies.
OpenClaw cron remains supported, but it is no longer the primary automation story.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture — selftune

Agent-First Design Principle

Architecture At A Glance

Operating Rules

Domain Map

Dependency Direction

Two Operating Modes

Interactive Mode (agent-driven)

Automated Mode (OS-driven)

Data Architecture

Repository Shape

Module Definitions

Truth Model: Hooks vs. Source Systems

Autonomous Loop

Signal-Reactive Improvement

Config System

Shared Local Artifacts

The Evaluation Model

Invocation Taxonomy

Current Known Tensions

Related Docs

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History

ARCHITECTURE.md

File metadata and controls

Architecture — selftune

Agent-First Design Principle

Architecture At A Glance

Operating Rules

Domain Map

Dependency Direction

Two Operating Modes

Interactive Mode (agent-driven)

Automated Mode (OS-driven)

Data Architecture

Repository Shape

Module Definitions

Truth Model: Hooks vs. Source Systems

Autonomous Loop

Signal-Reactive Improvement

Config System

Shared Local Artifacts

The Evaluation Model

Invocation Taxonomy

Current Known Tensions

Related Docs