Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 9 additions & 5 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ here explicitly.

### Changed

- **Project data moved out of repo** — all per-workspace artifacts (SQLite DB, config, search index, reports, telemetry NDJSON, backups, sampler stop files) now live under `~/.kaizen/projects/<slug>/` instead of `<workspace>/.kaizen/`. Slug = canonical path with `/` → `-`. `KAIZEN_HOME` overrides `~/.kaizen`. Existing in-repo `.kaizen/` directories auto-migrate on first use; a `MIGRATED.txt` marker is left behind and the old directory is safe to delete. Identity (workspace key) is unchanged. See [ADR 007](docs/adr/007-project-data-in-home.md).
- **Target repositories are read-only** — all per-workspace artifacts live under `~/.kaizen/projects/<slug>/`; `KAIZEN_HOME` overrides `~/.kaizen`. Legacy in-repo `.kaizen/` data is copied without moving, deleting, or marking the source. Host hooks live in user-level agent configuration. See [ADR 011](docs/adr/011-target-repositories-read-only.md).

### Added

Expand All @@ -32,9 +32,9 @@ here explicitly.

### Changed

- **Machine-local project registry** lives in `~/.kaizen/machine.db` (SQLite) instead of `workspaces.json`; `kaizen init` upserts the current repo. Legacy `workspaces.json` is imported once and renamed to `workspaces.json.migrated`. `kaizen doctor` reports registry status. `--all-workspaces` still merges per-repo stores and now includes inited projects that do not yet have `.kaizen/kaizen.db`.
- **Machine-local project registry** lives in `~/.kaizen/machine.db` (SQLite) instead of `workspaces.json`; `kaizen init` upserts the current repo. Legacy `workspaces.json` is imported once and renamed to `workspaces.json.migrated`. `kaizen doctor` reports registry status. `--all-workspaces` merges available project databases and ignores missing or unsafe roots.
- **Telemetry wire format (third-party sinks):** PostHog capture is one event per canonical row (`kaizen.event`, `kaizen.tool_span`, `kaizen.repo_snapshot_chunk`, `kaizen.workspace_fact_snapshot`); Datadog uses the [Logs API v2](https://docs.datadoghq.com/api/latest/logs/) (`POST /api/v2/logs`) with one JSON log per canonical item instead of the Events API. OTLP remains a placeholder with `tracing::debug` of expanded item counts. Primary Kaizen `POST` ingest and outbox JSON shapes for `events` / `tool_spans` / `repo_snapshots` are unchanged; when sync is enabled, workspace skill/rule discovery can enqueue **`workspace_facts`** for the new `/v1/workspace-facts` path.
- **CLI — read paths and telemetry:** `summary`, `insights`, `metrics`, `guidance`, and `retro` accept `--source local|provider|mixed` (default `local`). With `provider` or `mixed`, a background provider pull runs when `[telemetry.query].cache_ttl_seconds` has expired, or when you pass `--refresh` (in addition to transcript rescan where applicable). New `kaizen telemetry` subcommands: `init` (alias of `configure`), `doctor`, `pull --days`, and `print-schema`. MCP tools keep the previous local-only behavior.
- **CLI — read paths and telemetry:** `summary`, `insights`, `metrics`, `guidance`, and `retro` accept `--source local|provider|mixed` (default `local`). With `provider` or `mixed`, a background provider pull runs when `[telemetry.query].cache_ttl_seconds` has expired, or when you pass `--refresh` (in addition to bounded local tail ingest where applicable). New `kaizen telemetry` subcommands: `init` (alias of `configure`), `doctor`, `pull --days`, and `print-schema`. MCP tools keep the previous local-only behavior.
- `Cargo.toml` no longer excludes `assets/` so `cargo publish` / docs.rs builds resolve
`include_str!` for embedded defaults and the retro skill template.
- Release workflow **`update-homebrew-tap`**: `scripts/render-homebrew-tap-formula.sh` + push to
Expand All @@ -49,6 +49,10 @@ here explicitly.

### Fixed

- The Web dashboard now selects the most recently active valid project, refreshes from SQLite/WAL changes within one second, preserves the selected session, and exposes bounded session details plus tool, attention, and telemetry-coverage insights.
- Hook-backed tool activity appears as named `ToolCall` and `ToolResult` events in Web and TUI views; legacy stored hook rows are normalized at read time.
- Hook ingestion no longer commits and merges the Tantivy index after every event. Search writes use one indexing worker and one merge worker, commit in bounded batches, and flush when a session stops.
- Transcript refreshes cap recent files and growing Claude/Codex JSONL tails, append only unseen events in one transaction, and avoid repeated Git enrichment and full derived-row rebuilds.
- `kaizen daemon status` now exits successfully with a stable `status: stopped` line after the
daemon has been stopped, instead of surfacing a raw Unix socket connection error.
- `kaizen sessions tree <id>` now prints a non-empty placeholder for existing sessions with no
Expand All @@ -72,9 +76,9 @@ here explicitly.

### Added

- `kaizen init` now creates `.cursor/hooks.json` and `.claude/settings.json` from scratch when absent (in addition to patching them when present), so a single command is enough to instrument a fresh workspace for both Cursor and Claude Code.
- `kaizen init` now creates or patches `~/.cursor/hooks.json` and `~/.claude/settings.json`, so one user-level setup covers every workspace without writing to the target repository.
- Local retention: `[retention].hot_days` (default 30) prunes old sessions from SQLite after rescans (throttled to once per 24h); `hot_days = 0` disables auto-prune. `kaizen gc` with optional `--days` and `--vacuum`.
- `[scan].min_rescan_seconds` (default 300) skips full transcript rescans; `--refresh` / `-r` on `sessions list`, `summary`, `insights`, `metrics`, and `retro` forces a rescan. MCP tools accept `refresh=true` for the same behavior.
- `[scan].min_rescan_seconds` (default 300) throttles bounded transcript-tail ingestion; `--refresh` / `-r` on `sessions list`, `summary`, `insights`, `metrics`, and `retro` ingests recently changed tails. MCP tools accept `refresh=true` for the same behavior.
- Composite index `sessions(workspace, started_at_ms)` for faster session listing.
- `docs/telemetry-journey.md` — end-to-end “session → data” learning path; README and `docs/`
index point to it. Root `README` clarifies that long-form docs live in the GitHub `docs/`
Expand Down
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -354,6 +354,7 @@ clap_complete = "4"
indicatif = "0.18"
arboard = "3"
anyhow = "1"
memchr = "2"
tempfile = "3"
tar = "0.4"
rusqlite = { version = "0.40", features = ["bundled"] }
Expand Down
2 changes: 1 addition & 1 deletion docs/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ When you pass **`--all-workspaces`** (or MCP `all_workspaces: true`), Kaizen loa
| Key | Default | Purpose |
|-----|---------|--------|
| `roots` | `["~/.cursor/projects"]` | Transcript index roots (Cursor projects layout) |
| `min_rescan_seconds` | `300` | Minimum seconds between full transcript rescans when a command is already in refresh mode (`--refresh` on the CLI or `refresh=true` over MCP). The daemon uses the same value for its workspace scanner loop after `kaizen init`. |
| `min_rescan_seconds` | `300` | Minimum seconds between bounded incremental transcript scans in refresh mode (`--refresh` on the CLI or `refresh=true` over MCP). The daemon uses the same value after `kaizen init`; Claude, Codex, and Cursor discovery caps each source at 32 recent transcripts, while Claude and Codex growing JSONL reads are capped at 256 KiB. |

## `[retention]`

Expand Down
2 changes: 1 addition & 1 deletion docs/experiments.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ kaizen exp list
kaizen exp status <id>
kaizen exp tag <id> --variant treatment # manual override
kaizen exp report <id> # markdown + bootstrap CI + sequential decision
kaizen exp report <id> --refresh # optional: full transcript rescan first; can take a while
kaizen exp report <id> --refresh # optional: ingest changed transcript tails first
kaizen exp conclude <id> # Running → Concluded
kaizen exp archive <id> # Concluded → Archived
```
Expand Down
6 changes: 3 additions & 3 deletions docs/mcp.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ Set any value to `false` to skip that agent’s local scan (useful if a VS Code
|------|----------------|--------|
| `kaizen_capabilities` | (no CLI; static text) | Read first: which tool to use for cost rollups vs repo metrics, sessions, retro, etc. |
| `kaizen_ingest_hook` | `kaizen ingest hook` | Pass hook JSON in `payload` (not stdin). `source`: `cursor` or `claude`. |
| `kaizen_sessions_list` | `kaizen sessions list` | Optional `json: true`, `refresh: true` (full transcript rescan; matches `--refresh`), `all_workspaces: true`, `limit` (cap rows, newest first). |
| `kaizen_sessions_list` | `kaizen sessions list` | Optional `json: true`, `refresh: true` (bounded changed-tail ingest; matches `--refresh`), `all_workspaces: true`, `limit` (cap rows, newest first). |
| `kaizen_session_show` | `kaizen sessions show` | `id` + optional `workspace`. |
| `kaizen_search_sessions` | `kaizen search` | Structured BM25 event search. Args: `query`, optional `since`, `agent`, `kind`, `limit`, `workspace`. `kaizen sessions search` remains a compatible CLI alias. Returns `hits[]` with session id, seq, ts, score, snippet, paths, skills, and `tokens_total`. |
| `kaizen_query` | `kaizen query` | Structured trace query. |
Expand All @@ -143,7 +143,7 @@ Set any value to `false` to skip that agent’s local scan (useful if a VS Code
| `kaizen_exp_list` | `kaizen exp list` | |
| `kaizen_exp_status` | `kaizen exp status` | |
| `kaizen_exp_tag` | `kaizen exp tag` | |
| `kaizen_exp_report` | `kaizen exp report` | `json` and optional `refresh: true` (full rescan before report; matches CLI `--refresh`). Includes `sequential_decision` and `srm_warning`. |
| `kaizen_exp_report` | `kaizen exp report` | `json` and optional `refresh: true` (bounded changed-tail ingest before report; matches CLI `--refresh`). Includes `sequential_decision` and `srm_warning`. |
| `kaizen_exp_conclude` | `kaizen exp conclude` | Running → Concluded. |
| `kaizen_exp_archive` | `kaizen exp archive` | Concluded → Archived. |
| `kaizen_retro` | `kaizen retro` | `json`, `refresh`, etc. Set `json: true` for the same `Report` JSON as `kaizen retro --json`. |
Expand All @@ -152,7 +152,7 @@ Set any value to `false` to skip that agent’s local scan (useful if a VS Code

- **Workspace**: most tools accept optional `workspace` (string path) or `project` (short project name — resolved from existing rows shown by `kaizen projects`; mutually exclusive with `workspace`). If neither is given, the server uses the process current directory, matching CLI defaults. Use `kaizen projects --include-missing` to inspect stale registry rows; MCP workspace reads ignore them.
- **Data source**: `kaizen_summary`, `kaizen_insights`, `kaizen_metrics`, and `kaizen_retro` use the local DB only (`DataSource::Local`), matching CLI default `--source local`. The MCP server does not expose CLI `--source` switches; use the CLI if you need another source.
- **Rescan**: list/summary/insights/metrics/retro stay on the cached local DB unless you pass `refresh: true` (same as CLI `--refresh`). `kaizen_exp_report` defaults to cache-first as well; set `refresh: true` to force a full transcript rescan before computing the report.
- **Refresh**: list/summary/insights/metrics/retro stay on the cached local DB unless you pass `refresh: true` (same as CLI `--refresh`). `kaizen_exp_report` defaults to cache-first as well; refresh ingests only bounded, recently changed transcript tails.
- **Aggregation**: `kaizen_sessions_list`, `kaizen_summary`, `kaizen_insights`, and `kaizen_metrics` accept `all_workspaces: true`. Kaizen opens each existing registered workspace DB separately and merges the results in memory.
- **Blocking work** is run on a blocking thread pool so the async MCP runtime is not starved; long `retro` or metrics runs may take time.
- **Version** in the MCP `initialize` response is the built-in string configured for the server (keep in sync with releases when using strict client checks).
2 changes: 1 addition & 1 deletion docs/retro.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
`kaizen retro` reads recent sessions and cached local repo facts, then produces
a ranked Markdown report of changes that may make agents cheaper, faster, or
more accurate in this codebase. Default runs are cache-first. Use `--refresh`
when the local store may be stale; it rescans agent transcripts first and can
when the local store may be stale; it ingests bounded changed tails first and can
take a while on large workspaces.

The engine is deterministic and pure:
Expand Down
3 changes: 2 additions & 1 deletion docs/tui.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ kaizen tui --workspace /path/to/project

The TUI requires an interactive terminal. It watches the SQLite WAL and
coalesces refreshes, so active sessions update without a busy polling loop.
Hook-backed `PreToolUse` and `PostToolUse` rows appear as named tool calls and
results; lifecycle hooks remain lifecycle events.

## Keys

Expand Down Expand Up @@ -54,4 +56,3 @@ help view instead of terminating the interface.
| Wrong repository appears | Start with `--workspace /path/to/project`. |
| Navigation seems stuck | Press `Esc` to close overlays, then `Tab` to select the intended pane. |
| Terminal display is damaged after a crash | Run `reset`, then restart `kaizen tui`. |

4 changes: 2 additions & 2 deletions docs/tutorial/02-observe.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ kaizen sessions list --json
kaizen sessions list --refresh
```

**`--json`** is for scripts and MCP-shaped tooling. **`--refresh`** forces a full transcript rescan (subject to min interval).
**`--json`** is for scripts and MCP-shaped tooling. **`--refresh`** ingests bounded, recently changed transcript tails (subject to the minimum interval).

## Show one session

Expand All @@ -43,7 +43,7 @@ The JSON shape includes rollups by agent and model, total cost, and when availab

## Data source: local, provider, or mixed

Most of this tutorial assumes the default **`--source local`**: numbers come from the workspace’s local SQLite store (and the usual transcript rescan rules with `--refresh`).
Most of this tutorial assumes the default **`--source local`**: numbers come from the workspace’s local SQLite store (and the bounded changed-tail rules used by `--refresh`).

When you have **[sync]** identity in config (`team_id`, `team_salt_hex`, …) *and* a **[telemetry.query](https://github.com/marquesds/kaizen/blob/main/docs/config.md#telemetryquery) provider** (PostHog or Datadog) configured, you can ask read commands to fold in **provider-pulled events** cached under `remote_events` in the same DB:

Expand Down
2 changes: 1 addition & 1 deletion docs/tutorial/06-experiments.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ kaizen exp status <id>
kaizen exp tag <id> --session <sid> --variant treatment
kaizen exp report <id>
kaizen exp report <id> --json
kaizen exp report <id> --refresh # full transcript rescan before report if the store may be stale
kaizen exp report <id> --refresh # ingest changed transcript tails if the store may be stale
kaizen exp conclude <id>
```

Expand Down
11 changes: 7 additions & 4 deletions docs/usage-observe.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@

[Back to CLI index](usage.md).

These commands are cache-first. Pass `--refresh` when they should rescan local
transcripts before rendering. With `--source provider|mixed`, refresh can also
refresh a configured remote provider cache.
These commands are cache-first. Pass `--refresh` to ingest recently changed,
bounded local transcript tails before rendering. With
`--source provider|mixed`, refresh can also refresh a configured remote cache.

## `kaizen sessions`

Expand Down Expand Up @@ -33,7 +33,10 @@ Text output shows a placeholder when no spans exist; JSON returns `[]`.

`search` uses the rebuildable Tantivy index at
`~/.kaizen/projects/<slug>/search/`. It indexes redacted event text. Payload
bodies remain in SQLite and are not copied into the index. Rebuild with:
bodies remain in SQLite and are not copied into the index. Daemon-backed hook
sessions commit search batches at 256 documents, on the first event after a
60-second batch window, or immediately when the session stops. SQLite session
and event views remain live before that secondary-index commit. Rebuild with:

```bash
kaizen search reindex
Expand Down
9 changes: 6 additions & 3 deletions docs/usage-setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Every command resolves a workspace through one of three mechanisms:
opens each registered project database separately and merges results. See
[machine-local registry](config.md#machine-local-registry).

After a full transcript rescan, Kaizen may delete sessions older than
After a transcript refresh, Kaizen may delete sessions older than
`[retention].hot_days`, at most once per 24 hours. Set `hot_days = 0` to disable
automatic pruning; use `kaizen gc` for explicit pruning.

Expand Down Expand Up @@ -93,8 +93,11 @@ kaizen sessions load --json
```

Use `load` after installing or upgrading when existing sessions should appear
in reports. Use `sessions list --refresh` when one read command should rescan
before rendering.
in reports. Claude, Codex, and Cursor imports consider at most 32 recent
transcripts per source. Claude and Codex decode at most the latest 256 KiB of a
growing JSONL file. `load` also enriches imported sessions with Git binding
metadata. Use `sessions list --refresh` to ingest recently changed tails before
one read.

## `kaizen outcomes`

Expand Down
4 changes: 2 additions & 2 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ Use `--no-daemon` or `KAIZEN_DAEMON=0` for direct SQLite mode. Both modes use
the same project database. See [daemon.md](daemon.md).

Cache-first reads use the local SQLite database without rescanning agent
transcripts. Pass `--refresh` when a command should ingest new transcript data
before rendering.
transcripts. Pass `--refresh` to ingest recently changed transcript tails before
rendering. Refresh work is bounded; it does not replay all historical files.

## Reference

Expand Down
12 changes: 7 additions & 5 deletions docs/web.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,16 +23,19 @@ kaizen open --no-browser

The dashboard provides:

- project selection, including a manual local path;
- automatic selection of the most recently active valid project, plus a manual
local-path fallback;
- session, active-session, error, and cost totals;
- project-level tool, attention, and telemetry-coverage insights;
- the latest 30 sessions for the selected project;
- selected-session facts, recent events, nested tool spans, touched files, and
top tools;
- the exact bounded report under **Developer details**.

Selected-session detail is capped at 40 events, 40 spans, and 40 files. Those
limits keep refresh latency and memory use predictable. The page refreshes every
20 seconds while connected; **Refresh now** requests an immediate snapshot.
limits keep refresh latency and memory use predictable. The server watches the
selected project's SQLite database and WAL; a committed change requests a new
snapshot within one second. **Refresh now** remains available for manual checks.

Web is an Observe-only surface. It cannot mutate experiments, guidance, sync,
configuration, or local data. Use the CLI or MCP for those workflows.
Expand All @@ -55,6 +58,5 @@ for protocol and runtime-file details.
| Browser did not open | Run `kaizen open --no-browser` and open the printed URL. |
| Page says connection failed | Run `kaizen daemon status`; restart with `kaizen daemon stop` followed by `kaizen open`. |
| URL has no valid token | Run `kaizen open --no-browser` again instead of editing the URL. |
| Expected project is missing | Open its path manually or run `kaizen sessions list --refresh` from that repository. |
| Expected project is missing | Run one Kaizen command from that repository to register it, then reload. Unsafe roots containing `KAIZEN_HOME` and missing paths are ignored. |
| Default port is busy | Use the URL Kaizen prints; the daemon automatically chooses another loopback port. |

Loading