Part 6 — Improve: experiments

Experiments let you bind a hypothesis to commits or manual session tags, then compare a metric with bootstrap confidence intervals. Full reference: experiments.md.

Create an experiment

kaizen exp new --name add-skill \
  --hypothesis "skill cuts tokens" \
  --change "add .cursor/skills/my-skill" \
  --metric tokens_per_session \
  --bind git \
  --duration-days 14 --target-pct=-10

Metrics include tokens_per_session, cost_per_session, success_rate, tool_loops, duration_minutes, files_per_session.

Operate the lifecycle

kaizen exp list
kaizen exp status <id>
kaizen exp tag <id> --session <sid> --variant treatment
kaizen exp report <id>
kaizen exp report <id> --json
kaizen exp report <id> --refresh   # ingest changed transcript tails if the store may be stale
kaizen exp conclude <id>

Use tag when binding is manual; use git binding when control and treatment are defined by commits (see product doc for your workflow).

Exercise

Create a draft experiment with a hypothesis you actually believe.
Run exp list and exp status until you understand state transitions.
When you have enough tagged sessions, run exp report and read the CI around the median delta.

Next: Part 7 — Proxy, sync, telemetry

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Part 6 — Improve: experiments

Create an experiment

Operate the lifecycle

Exercise

Uh oh!

FilesExpand file tree

06-experiments.md

Latest commit

History

06-experiments.md

File metadata and controls

Part 6 — Improve: experiments

Create an experiment

Operate the lifecycle

Exercise