Experiments let you bind a hypothesis to commits or manual session tags, then compare a metric with bootstrap confidence intervals. Full reference: experiments.md.
kaizen exp new --name add-skill \
--hypothesis "skill cuts tokens" \
--change "add .cursor/skills/my-skill" \
--metric tokens_per_session \
--bind git \
--duration-days 14 --target-pct=-10Metrics include tokens_per_session, cost_per_session, success_rate, tool_loops, duration_minutes, files_per_session.
kaizen exp list
kaizen exp status <id>
kaizen exp tag <id> --session <sid> --variant treatment
kaizen exp report <id>
kaizen exp report <id> --json
kaizen exp report <id> --refresh # ingest changed transcript tails if the store may be stale
kaizen exp conclude <id>Use tag when binding is manual; use git binding when control and treatment are defined by commits (see product doc for your workflow).
- Create a draft experiment with a hypothesis you actually believe.
- Run
exp listandexp statusuntil you understand state transitions. - When you have enough tagged sessions, run
exp reportand read the CI around the median delta.