diff --git a/CHANGELOG.md b/CHANGELOG.md
index be47357..1e901d4 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -101,6 +101,22 @@ Read the philosophy: https://garryslist.org/posts/boil-the-ocean
- Added `bin/gstack-slug` helper (5-line bash) with unit tests. Outputs `SLUG=` and `BRANCH=` lines, sanitizes `/` to `-`.
- New TODOs: smart review relevance detection (P3), `/merge` skill for review-gated PR merge (P2).
+## fork: multi-stack portability — 2026-03-17
+
+- **`/ship` works on Node.js, TypeScript, and Python projects now.** It used to hardcode `bin/test-lane` (Rails) + `npm run test`. Now it auto-detects your test runner: vitest, jest, pytest, make test, or the Rails test lane. If you have both a frontend and backend test suite, they run in parallel. No config needed.
+- **`/ship` respects your version format.** Projects using standard 3-digit semver (`1.2.3`) are handled correctly — no more confusion from the internal 4-digit format. If you have no `VERSION` file at all, the version step is silently skipped.
+- **Eval suites in `/ship` are now Rails-only.** The eval gate only activates when a `test/evals/` directory and Rails are both detected. TypeScript and Python projects skip it cleanly with a one-line message instead of failing.
+- **`/retro` shows your local time, not Pacific.** Timestamps, histograms, session times, and streak dates are now displayed in your system's timezone. Works on macOS and Linux.
+- **`/qa` no longer hard-fails on uncommitted changes.** Instead of refusing to start, it offers to stash your changes, run QA, and restore them automatically when done. The fix loop still stays atomic — the stash just handles the setup for you.
+- **`/plan-ceo-review` system audit works on any stack.** The pre-review grep for TODOs/FIXMEs now covers `.py`, `.ts`, `.tsx`, `.go`, `.rs` alongside `.rb`. Recently-touched-files detection picks the right lock file anchor automatically (pnpm-lock, poetry.lock, go.sum, etc.).
+- **Diff-aware QA and design review use the correct base branch.** Previously hardcoded to `main` — now uses the dynamically detected base branch, so stacked PRs and `master`-default repos work correctly.
+
+### For contributors
+
+- Added `{{PROJECT_DETECT}}` resolver to `gen-skill-docs.ts` — stack detection block (test runner, VERSION format, languages, eval suite flag) injected into `/ship`. Follows the `{{QA_METHODOLOGY}}` / `{{DESIGN_METHODOLOGY}}` pattern.
+- Fixed unescaped `${}` shell interpolations in TypeScript template literals.
+- `{{QA_METHODOLOGY}}` and `{{DESIGN_METHODOLOGY}}` diff-aware mode: replaced hardcoded `main` with `` placeholder.
+
## 0.5.0 — 2026-03-16
- **Your site just got a design review.** `/plan-design-review` opens your site and reviews it like a senior product designer — typography, spacing, hierarchy, color, responsive, interactions, and AI slop detection. Get letter grades (A-F) per category, a dual headline "Design Score" + "AI Slop Score", and a structured first impression that doesn't pull punches.
diff --git a/plan-ceo-review/SKILL.md b/plan-ceo-review/SKILL.md
index d7953a9..3bde692 100644
--- a/plan-ceo-review/SKILL.md
+++ b/plan-ceo-review/SKILL.md
@@ -187,8 +187,32 @@ Run the following commands:
git log --oneline -30 # Recent history
git diff --stat # What's already changed
git stash list # Any stashed work
-grep -r "TODO\|FIXME\|HACK\|XXX" --include="*.rb" --include="*.js" -l
-find . -name "*.rb" -newer Gemfile.lock | head -20 # Recently touched files
+grep -r "TODO\|FIXME\|HACK\|XXX" \
+ --include="*.rb" --include="*.py" --include="*.ts" --include="*.tsx" \
+ --include="*.js" --include="*.jsx" --include="*.go" --include="*.rs" \
+ -l 2>/dev/null | head -20 # Files with deferred work
+```
+
+Detect recently touched source files based on the project's lock/manifest file:
+```bash
+# Find the most recent lock/manifest as the freshness anchor
+_ANCHOR=""
+[ -f Gemfile.lock ] && _ANCHOR="Gemfile.lock"
+[ -z "$_ANCHOR" ] && [ -f pnpm-lock.yaml ] && _ANCHOR="pnpm-lock.yaml"
+[ -z "$_ANCHOR" ] && [ -f package-lock.json ] && _ANCHOR="package-lock.json"
+[ -z "$_ANCHOR" ] && [ -f poetry.lock ] && _ANCHOR="poetry.lock"
+[ -z "$_ANCHOR" ] && [ -f pyproject.toml ] && _ANCHOR="pyproject.toml"
+[ -z "$_ANCHOR" ] && [ -f go.sum ] && _ANCHOR="go.sum"
+[ -z "$_ANCHOR" ] && [ -f package.json ] && _ANCHOR="package.json"
+
+if [ -n "$_ANCHOR" ]; then
+ find . \( -name "*.rb" -o -name "*.py" -o -name "*.ts" -o -name "*.tsx" \
+ -o -name "*.js" -o -name "*.jsx" -o -name "*.go" -o -name "*.rs" \) \
+ -newer "$_ANCHOR" \
+ -not -path "*/node_modules/*" -not -path "*/.git/*" \
+ -not -path "*/__pycache__/*" -not -path "*/dist/*" -not -path "*/build/*" \
+ 2>/dev/null | head -20
+fi
```
Then read CLAUDE.md, TODOS.md, and any existing architecture docs. When reading TODOS.md, specifically:
* Note any TODOs this plan touches, blocks, or unlocks
diff --git a/plan-ceo-review/SKILL.md.tmpl b/plan-ceo-review/SKILL.md.tmpl
index 8695dd8..80e0f6a 100644
--- a/plan-ceo-review/SKILL.md.tmpl
+++ b/plan-ceo-review/SKILL.md.tmpl
@@ -66,8 +66,32 @@ Run the following commands:
git log --oneline -30 # Recent history
git diff --stat # What's already changed
git stash list # Any stashed work
-grep -r "TODO\|FIXME\|HACK\|XXX" --include="*.rb" --include="*.js" -l
-find . -name "*.rb" -newer Gemfile.lock | head -20 # Recently touched files
+grep -r "TODO\|FIXME\|HACK\|XXX" \
+ --include="*.rb" --include="*.py" --include="*.ts" --include="*.tsx" \
+ --include="*.js" --include="*.jsx" --include="*.go" --include="*.rs" \
+ -l 2>/dev/null | head -20 # Files with deferred work
+```
+
+Detect recently touched source files based on the project's lock/manifest file:
+```bash
+# Find the most recent lock/manifest as the freshness anchor
+_ANCHOR=""
+[ -f Gemfile.lock ] && _ANCHOR="Gemfile.lock"
+[ -z "$_ANCHOR" ] && [ -f pnpm-lock.yaml ] && _ANCHOR="pnpm-lock.yaml"
+[ -z "$_ANCHOR" ] && [ -f package-lock.json ] && _ANCHOR="package-lock.json"
+[ -z "$_ANCHOR" ] && [ -f poetry.lock ] && _ANCHOR="poetry.lock"
+[ -z "$_ANCHOR" ] && [ -f pyproject.toml ] && _ANCHOR="pyproject.toml"
+[ -z "$_ANCHOR" ] && [ -f go.sum ] && _ANCHOR="go.sum"
+[ -z "$_ANCHOR" ] && [ -f package.json ] && _ANCHOR="package.json"
+
+if [ -n "$_ANCHOR" ]; then
+ find . \( -name "*.rb" -o -name "*.py" -o -name "*.ts" -o -name "*.tsx" \
+ -o -name "*.js" -o -name "*.jsx" -o -name "*.go" -o -name "*.rs" \) \
+ -newer "$_ANCHOR" \
+ -not -path "*/node_modules/*" -not -path "*/.git/*" \
+ -not -path "*/__pycache__/*" -not -path "*/dist/*" -not -path "*/build/*" \
+ 2>/dev/null | head -20
+fi
```
Then read CLAUDE.md, TODOS.md, and any existing architecture docs. When reading TODOS.md, specifically:
* Note any TODOs this plan touches, blocks, or unlocks
diff --git a/plan-design-review/SKILL.md b/plan-design-review/SKILL.md
index f0b2fdd..87f8a82 100644
--- a/plan-design-review/SKILL.md
+++ b/plan-design-review/SKILL.md
@@ -187,7 +187,7 @@ Comprehensive review: 10-15 pages, every interaction flow, exhaustive checklist.
### Diff-aware (automatic when on a feature branch with no URL)
When on a feature branch, scope to pages affected by the branch changes:
-1. Analyze the branch diff: `git diff main...HEAD --name-only`
+1. Analyze the branch diff: `git diff ...HEAD --name-only` (use base branch from preamble)
2. Map changed files to affected pages/routes
3. Detect running app on common local ports (3000, 4000, 8080)
4. Audit only affected pages, compare design quality before/after
diff --git a/qa-design-review/SKILL.md b/qa-design-review/SKILL.md
index 1d6200c..38a965d 100644
--- a/qa-design-review/SKILL.md
+++ b/qa-design-review/SKILL.md
@@ -355,7 +355,7 @@ Comprehensive review: 10-15 pages, every interaction flow, exhaustive checklist.
### Diff-aware (automatic when on a feature branch with no URL)
When on a feature branch, scope to pages affected by the branch changes:
-1. Analyze the branch diff: `git diff main...HEAD --name-only`
+1. Analyze the branch diff: `git diff ...HEAD --name-only` (use base branch from preamble)
2. Map changed files to affected pages/routes
3. Detect running app on common local ports (3000, 4000, 8080)
4. Audit only affected pages, compare design quality before/after
diff --git a/qa-only/SKILL.md b/qa-only/SKILL.md
index 594979b..b06a220 100644
--- a/qa-only/SKILL.md
+++ b/qa-only/SKILL.md
@@ -187,10 +187,10 @@ Before falling back to git diff heuristics, check for richer test plan sources:
This is the **primary mode** for developers verifying their work. When the user says `/qa` without a URL and the repo is on a feature branch, automatically:
-1. **Analyze the branch diff** to understand what changed:
+1. **Analyze the branch diff** to understand what changed (use the base branch detected in the preamble, defaulting to `main`):
```bash
- git diff main...HEAD --name-only
- git log main..HEAD --oneline
+ git diff ...HEAD --name-only
+ git log ..HEAD --oneline
```
2. **Identify affected pages/routes** from the changed files:
diff --git a/qa/SKILL.md b/qa/SKILL.md
index 10e5071..eba8130 100644
--- a/qa/SKILL.md
+++ b/qa/SKILL.md
@@ -168,14 +168,21 @@ You are a QA engineer AND a bug-fix engineer. Test web applications like a real
**If no URL is given and you're on a feature branch:** Automatically enter **diff-aware mode** (see Modes below). This is the most common case — the user just shipped code on a branch and wants to verify it works.
-**Require clean working tree before starting:**
+**Check working tree state:**
```bash
-if [ -n "$(git status --porcelain)" ]; then
- echo "ERROR: Working tree is dirty. Commit or stash changes before running /qa."
- exit 1
-fi
+_DIRTY=$(git status --porcelain 2>/dev/null)
+[ -n "$_DIRTY" ] && echo "DIRTY_WORKING_TREE:" && echo "$_DIRTY" | head -10 || echo "CLEAN"
```
+If `DIRTY_WORKING_TREE` is printed, use AskUserQuestion:
+- Re-ground: project, branch, the uncommitted changes shown above
+- Explain: there are unsaved changes that would mix with QA's own commits if not handled first
+- `RECOMMENDATION: Choose A — stashing is safe and reversible`
+- A) Stash now (`git stash`) and proceed — stash will be restored automatically after Phase 10
+- B) Cancel — I'll commit or stash manually and re-run `/qa`
+
+If the user chooses A: run `git stash` and note that `git stash pop` will be run automatically after Phase 10 completes. If B: stop.
+
**Find the browse binary:**
## SETUP (run this check BEFORE any browse command)
@@ -382,10 +389,10 @@ Before falling back to git diff heuristics, check for richer test plan sources:
This is the **primary mode** for developers verifying their work. When the user says `/qa` without a URL and the repo is on a feature branch, automatically:
-1. **Analyze the branch diff** to understand what changed:
+1. **Analyze the branch diff** to understand what changed (use the base branch detected in the preamble, defaulting to `main`):
```bash
- git diff main...HEAD --name-only
- git log main..HEAD --oneline
+ git diff ...HEAD --name-only
+ git log ..HEAD --oneline
```
2. **Identify affected pages/routes** from the changed files:
@@ -860,7 +867,7 @@ If the repo has a `TODOS.md`:
## Additional Rules (qa-specific)
-11. **Clean working tree required.** Refuse to start if `git status --porcelain` is non-empty.
+11. **Clean working tree required.** If `git status --porcelain` is non-empty, offer to stash (AskUserQuestion A/B). Never start the fix loop with uncommitted changes present — QA's own commits must be atomic and isolated. If stash was used, run `git stash pop` after Phase 10.
12. **One commit per fix.** Never bundle multiple fixes into one commit.
13. **Only modify tests when generating regression tests in Phase 8e.5.** Never modify CI configuration. Never modify existing tests — only create new test files.
14. **Revert on regression.** If a fix makes things worse, `git revert HEAD` immediately.
diff --git a/qa/SKILL.md.tmpl b/qa/SKILL.md.tmpl
index bd94deb..0644686 100644
--- a/qa/SKILL.md.tmpl
+++ b/qa/SKILL.md.tmpl
@@ -47,14 +47,21 @@ You are a QA engineer AND a bug-fix engineer. Test web applications like a real
**If no URL is given and you're on a feature branch:** Automatically enter **diff-aware mode** (see Modes below). This is the most common case — the user just shipped code on a branch and wants to verify it works.
-**Require clean working tree before starting:**
+**Check working tree state:**
```bash
-if [ -n "$(git status --porcelain)" ]; then
- echo "ERROR: Working tree is dirty. Commit or stash changes before running /qa."
- exit 1
-fi
+_DIRTY=$(git status --porcelain 2>/dev/null)
+[ -n "$_DIRTY" ] && echo "DIRTY_WORKING_TREE:" && echo "$_DIRTY" | head -10 || echo "CLEAN"
```
+If `DIRTY_WORKING_TREE` is printed, use AskUserQuestion:
+- Re-ground: project, branch, the uncommitted changes shown above
+- Explain: there are unsaved changes that would mix with QA's own commits if not handled first
+- `RECOMMENDATION: Choose A — stashing is safe and reversible`
+- A) Stash now (`git stash`) and proceed — stash will be restored automatically after Phase 10
+- B) Cancel — I'll commit or stash manually and re-run `/qa`
+
+If the user chooses A: run `git stash` and note that `git stash pop` will be run automatically after Phase 10 completes. If B: stop.
+
**Find the browse binary:**
{{BROWSE_SETUP}}
@@ -298,7 +305,7 @@ If the repo has a `TODOS.md`:
## Additional Rules (qa-specific)
-11. **Clean working tree required.** Refuse to start if `git status --porcelain` is non-empty.
+11. **Clean working tree required.** If `git status --porcelain` is non-empty, offer to stash (AskUserQuestion A/B). Never start the fix loop with uncommitted changes present — QA's own commits must be atomic and isolated. If stash was used, run `git stash pop` after Phase 10.
12. **One commit per fix.** Never bundle multiple fixes into one commit.
13. **Only modify tests when generating regression tests in Phase 8e.5.** Never modify CI configuration. Never modify existing tests — only create new test files.
14. **Revert on regression.** If a fix makes things worse, `git revert HEAD` immediately.
diff --git a/retro/SKILL.md b/retro/SKILL.md
index 71eab98..a932598 100644
--- a/retro/SKILL.md
+++ b/retro/SKILL.md
@@ -146,7 +146,9 @@ When the user types `/retro`, run this skill.
## Instructions
-Parse the argument to determine the time window. Default to 7 days if no argument given. Use `--since="N days ago"`, `--since="N hours ago"`, or `--since="N weeks ago"` (for `w` units) for git log queries. All times should be reported in **Pacific time** (use `TZ=America/Los_Angeles` when converting timestamps).
+Parse the argument to determine the time window. Default to 7 days if no argument given. Use `--since="N days ago"`, `--since="N hours ago"`, or `--since="N weeks ago"` (for `w` units) for git log queries.
+
+**Timezone:** detect the system's local timezone once at the start of Step 1, then use it for all timestamp conversions. All displayed times should use the local timezone (not hardcoded Pacific).
**Argument validation:** If the argument doesn't match a number followed by `d`, `h`, or `w`, the word `compare`, or `compare` followed by a number and `d`/`h`/`w`, show this usage and stop:
```
@@ -161,14 +163,22 @@ Usage: /retro [window]
### Step 1: Gather Raw Data
-First, fetch origin and identify the current user:
+First, detect the local timezone, fetch origin, and identify the current user:
```bash
+# Detect local timezone (macOS + Linux compatible)
+_TZ=$(readlink /etc/localtime 2>/dev/null | sed 's|.*/zoneinfo/||' || \
+ cat /etc/timezone 2>/dev/null || \
+ timedatectl show --property=Timezone --value 2>/dev/null || \
+ echo "UTC")
+echo "LOCAL_TZ: $_TZ"
git fetch origin --quiet
# Identify who is running the retro
git config user.name
git config user.email
```
+Use the printed `LOCAL_TZ` value for all timestamp conversions in subsequent steps.
+
The name returned by `git config user.name` is **"you"** — the person reading this retro. All other authors are teammates. Use this to orient the narrative: "your" commits vs teammate contributions.
Run ALL of these git commands in parallel (they are independent):
@@ -183,8 +193,8 @@ git log origin/ --since="" --format="%H|%aN|%ae|%ai|%s" --short
git log origin/ --since="" --format="COMMIT:%H|%aN" --numstat
# 3. Commit timestamps for session detection and hourly distribution (with author)
-# Use TZ=America/Los_Angeles for Pacific time conversion
-TZ=America/Los_Angeles git log origin/ --since="" --format="%at|%aN|%ai|%s" | sort -n
+# Use LOCAL_TZ detected above for time conversion
+TZ= git log origin/ --since="" --format="%at|%aN|%ai|%s" | sort -n
# 4. Files most frequently changed (hotspot analysis)
git log origin/ --since="" --format="" --name-only | grep -v '^$' | sort | uniq -c | sort -rn
@@ -264,7 +274,7 @@ If TODOS.md doesn't exist, skip the Backlog Health row.
### Step 3: Commit Time Distribution
-Show hourly histogram in Pacific time using bar chart:
+Show hourly histogram in local time using bar chart:
```
Hour Commits ████████████████
@@ -282,7 +292,7 @@ Identify and call out:
### Step 4: Work Session Detection
Detect sessions using **45-minute gap** threshold between consecutive commits. For each session report:
-- Start/end time (Pacific)
+- Start/end time (local timezone)
- Number of commits
- Duration in minutes
@@ -368,11 +378,11 @@ If the time window is 14 days or more, split into weekly buckets and show trends
Count consecutive days with at least 1 commit to origin/, going back from today. Track both team streak and personal streak:
```bash
-# Team streak: all unique commit dates (Pacific time) — no hard cutoff
-TZ=America/Los_Angeles git log origin/ --format="%ad" --date=format:"%Y-%m-%d" | sort -u
+# Team streak: all unique commit dates (local timezone) — no hard cutoff
+TZ= git log origin/ --format="%ad" --date=format:"%Y-%m-%d" | sort -u
# Personal streak: only the current user's commits
-TZ=America/Los_Angeles git log origin/ --author="" --format="%ad" --date=format:"%Y-%m-%d" | sort -u
+TZ= git log origin/ --author="" --format="%ad" --date=format:"%Y-%m-%d" | sort -u
```
Count backward from today — how many consecutive days have at least one commit? This queries the full history so streaks of any length are reported accurately. Display both:
@@ -411,7 +421,7 @@ mkdir -p .context/retros
Determine the next sequence number for today (substitute the actual date for `$(date +%Y-%m-%d)`):
```bash
# Count existing retros for today to get next sequence number
-today=$(TZ=America/Los_Angeles date +%Y-%m-%d)
+today=$(TZ= date +%Y-%m-%d)
existing=$(ls .context/retros/${today}-*.json 2>/dev/null | wc -l | tr -d ' ')
next=$((existing + 1))
# Save as .context/retros/${today}-${next}.json
@@ -608,7 +618,7 @@ When the user runs `/retro compare` (or `/retro compare 14d`):
- ALL narrative output goes directly to the user in the conversation. The ONLY file written is the `.context/retros/` JSON snapshot.
- Use `origin/` for all git queries (not local main which may be stale)
-- Convert all timestamps to Pacific time for display (use `TZ=America/Los_Angeles`)
+- Convert all timestamps to local time for display (use `TZ=` detected in Step 1)
- If the window has zero commits, say so and suggest a different window
- Round LOC/hour to nearest 50
- Treat merge commits as PR boundaries
diff --git a/retro/SKILL.md.tmpl b/retro/SKILL.md.tmpl
index bfbc200..8aa4d38 100644
--- a/retro/SKILL.md.tmpl
+++ b/retro/SKILL.md.tmpl
@@ -42,7 +42,9 @@ When the user types `/retro`, run this skill.
## Instructions
-Parse the argument to determine the time window. Default to 7 days if no argument given. Use `--since="N days ago"`, `--since="N hours ago"`, or `--since="N weeks ago"` (for `w` units) for git log queries. All times should be reported in **Pacific time** (use `TZ=America/Los_Angeles` when converting timestamps).
+Parse the argument to determine the time window. Default to 7 days if no argument given. Use `--since="N days ago"`, `--since="N hours ago"`, or `--since="N weeks ago"` (for `w` units) for git log queries.
+
+**Timezone:** detect the system's local timezone once at the start of Step 1, then use it for all timestamp conversions. All displayed times should use the local timezone (not hardcoded Pacific).
**Argument validation:** If the argument doesn't match a number followed by `d`, `h`, or `w`, the word `compare`, or `compare` followed by a number and `d`/`h`/`w`, show this usage and stop:
```
@@ -57,14 +59,22 @@ Usage: /retro [window]
### Step 1: Gather Raw Data
-First, fetch origin and identify the current user:
+First, detect the local timezone, fetch origin, and identify the current user:
```bash
+# Detect local timezone (macOS + Linux compatible)
+_TZ=$(readlink /etc/localtime 2>/dev/null | sed 's|.*/zoneinfo/||' || \
+ cat /etc/timezone 2>/dev/null || \
+ timedatectl show --property=Timezone --value 2>/dev/null || \
+ echo "UTC")
+echo "LOCAL_TZ: $_TZ"
git fetch origin --quiet
# Identify who is running the retro
git config user.name
git config user.email
```
+Use the printed `LOCAL_TZ` value for all timestamp conversions in subsequent steps.
+
The name returned by `git config user.name` is **"you"** — the person reading this retro. All other authors are teammates. Use this to orient the narrative: "your" commits vs teammate contributions.
Run ALL of these git commands in parallel (they are independent):
@@ -79,8 +89,8 @@ git log origin/ --since="" --format="%H|%aN|%ae|%ai|%s" --short
git log origin/ --since="" --format="COMMIT:%H|%aN" --numstat
# 3. Commit timestamps for session detection and hourly distribution (with author)
-# Use TZ=America/Los_Angeles for Pacific time conversion
-TZ=America/Los_Angeles git log origin/ --since="" --format="%at|%aN|%ai|%s" | sort -n
+# Use LOCAL_TZ detected above for time conversion
+TZ= git log origin/ --since="" --format="%at|%aN|%ai|%s" | sort -n
# 4. Files most frequently changed (hotspot analysis)
git log origin/ --since="" --format="" --name-only | grep -v '^$' | sort | uniq -c | sort -rn
@@ -160,7 +170,7 @@ If TODOS.md doesn't exist, skip the Backlog Health row.
### Step 3: Commit Time Distribution
-Show hourly histogram in Pacific time using bar chart:
+Show hourly histogram in local time using bar chart:
```
Hour Commits ████████████████
@@ -178,7 +188,7 @@ Identify and call out:
### Step 4: Work Session Detection
Detect sessions using **45-minute gap** threshold between consecutive commits. For each session report:
-- Start/end time (Pacific)
+- Start/end time (local timezone)
- Number of commits
- Duration in minutes
@@ -264,11 +274,11 @@ If the time window is 14 days or more, split into weekly buckets and show trends
Count consecutive days with at least 1 commit to origin/, going back from today. Track both team streak and personal streak:
```bash
-# Team streak: all unique commit dates (Pacific time) — no hard cutoff
-TZ=America/Los_Angeles git log origin/ --format="%ad" --date=format:"%Y-%m-%d" | sort -u
+# Team streak: all unique commit dates (local timezone) — no hard cutoff
+TZ= git log origin/ --format="%ad" --date=format:"%Y-%m-%d" | sort -u
# Personal streak: only the current user's commits
-TZ=America/Los_Angeles git log origin/ --author="" --format="%ad" --date=format:"%Y-%m-%d" | sort -u
+TZ= git log origin/ --author="" --format="%ad" --date=format:"%Y-%m-%d" | sort -u
```
Count backward from today — how many consecutive days have at least one commit? This queries the full history so streaks of any length are reported accurately. Display both:
@@ -307,7 +317,7 @@ mkdir -p .context/retros
Determine the next sequence number for today (substitute the actual date for `$(date +%Y-%m-%d)`):
```bash
# Count existing retros for today to get next sequence number
-today=$(TZ=America/Los_Angeles date +%Y-%m-%d)
+today=$(TZ= date +%Y-%m-%d)
existing=$(ls .context/retros/${today}-*.json 2>/dev/null | wc -l | tr -d ' ')
next=$((existing + 1))
# Save as .context/retros/${today}-${next}.json
@@ -504,7 +514,7 @@ When the user runs `/retro compare` (or `/retro compare 14d`):
- ALL narrative output goes directly to the user in the conversation. The ONLY file written is the `.context/retros/` JSON snapshot.
- Use `origin/` for all git queries (not local main which may be stale)
-- Convert all timestamps to Pacific time for display (use `TZ=America/Los_Angeles`)
+- Convert all timestamps to local time for display (use `TZ=` detected in Step 1)
- If the window has zero commits, say so and suggest a different window
- Round LOC/hour to nearest 50
- Treat merge commits as PR boundaries
diff --git a/scripts/gen-skill-docs.ts b/scripts/gen-skill-docs.ts
index d2e86ec..5183960 100644
--- a/scripts/gen-skill-docs.ts
+++ b/scripts/gen-skill-docs.ts
@@ -249,10 +249,10 @@ function generateQAMethodology(): string {
This is the **primary mode** for developers verifying their work. When the user says \`/qa\` without a URL and the repo is on a feature branch, automatically:
-1. **Analyze the branch diff** to understand what changed:
+1. **Analyze the branch diff** to understand what changed (use the base branch detected in the preamble, defaulting to \`main\`):
\`\`\`bash
- git diff main...HEAD --name-only
- git log main..HEAD --oneline
+ git diff ...HEAD --name-only
+ git log ..HEAD --oneline
\`\`\`
2. **Identify affected pages/routes** from the changed files:
@@ -533,7 +533,7 @@ Comprehensive review: 10-15 pages, every interaction flow, exhaustive checklist.
### Diff-aware (automatic when on a feature branch with no URL)
When on a feature branch, scope to pages affected by the branch changes:
-1. Analyze the branch diff: \`git diff main...HEAD --name-only\`
+1. Analyze the branch diff: \`git diff ...HEAD --name-only\` (use base branch from preamble)
2. Map changed files to affected pages/routes
3. Detect running app on common local ports (3000, 4000, 8080)
4. Audit only affected pages, compare design quality before/after
@@ -1048,6 +1048,79 @@ Only commit if there are changes. Stage all bootstrap files (config, test direct
---`;
}
+function generateProjectDetect(): string {
+ return `## Step 0: Detect project stack
+
+Run this before any stack-sensitive step. Outputs are printed to stdout — use them in prose to
+determine which commands to run in subsequent steps.
+
+\`\`\`bash
+# Package manager
+_PKG_MGR="npm"
+[ -f pnpm-lock.yaml ] && _PKG_MGR="pnpm"
+[ -f yarn.lock ] && _PKG_MGR="yarn"
+[ -f bun.lockb ] && _PKG_MGR="bun"
+
+# Detect test runner (first match wins)
+_TEST_CMD=""
+_TEST_LABEL=""
+if [ -f package.json ]; then
+ if grep -qE '"vitest"' package.json 2>/dev/null || \
+ (node -e "const s=require('./package.json').scripts||{}; process.exit(Object.values(s).some(v=>v.includes('vitest'))?0:1)" 2>/dev/null); then
+ _TEST_CMD="$_PKG_MGR test" _TEST_LABEL="vitest"
+ elif grep -qE '"jest"' package.json 2>/dev/null; then
+ _TEST_CMD="$_PKG_MGR test" _TEST_LABEL="jest"
+ elif node -e "const s=require('./package.json').scripts||{}; process.exit(s.test?0:1)" 2>/dev/null; then
+ _TEST_CMD="$_PKG_MGR test" _TEST_LABEL="npm-script"
+ fi
+fi
+if [ -z "$_TEST_CMD" ]; then
+ if command -v pytest &>/dev/null && \
+ ([ -f pytest.ini ] || [ -f setup.cfg ] || ([ -f pyproject.toml ] && grep -q '\[tool.pytest' pyproject.toml 2>/dev/null)); then
+ _TEST_CMD="pytest" _TEST_LABEL="pytest"
+ fi
+fi
+if [ -z "$_TEST_CMD" ] && [ -x bin/test-lane ]; then
+ _TEST_CMD="bin/test-lane" _TEST_LABEL="rails-test-lane"
+fi
+if [ -z "$_TEST_CMD" ] && [ -f Makefile ] && grep -q '^test:' Makefile 2>/dev/null; then
+ _TEST_CMD="make test" _TEST_LABEL="makefile"
+fi
+echo "TEST_CMD: \${_TEST_CMD:-NONE_DETECTED} (\$_TEST_LABEL)"
+
+# VERSION format
+if [ -f VERSION ]; then
+ _VER=$(cat VERSION | tr -d '[:space:]')
+ _VER_DIGITS=$(echo "\$_VER" | awk -F. '{print NF}')
+ echo "VERSION: \$_VER (\${_VER_DIGITS}-digit)"
+else
+ echo "VERSION: NO_VERSION_FILE"
+fi
+
+# Project language(s)
+_LANGS=""
+[ -f package.json ] && _LANGS="\${_LANGS}nodejs "
+[ -f pyproject.toml ] || [ -f requirements.txt ] && _LANGS="\${_LANGS}python "
+[ -f Gemfile ] && _LANGS="\${_LANGS}ruby "
+[ -f go.mod ] && _LANGS="\${_LANGS}go "
+[ -f Cargo.toml ] && _LANGS="\${_LANGS}rust "
+echo "LANGUAGES: \${_LANGS:-unknown}"
+
+# Eval suite (Rails only)
+_HAS_EVALS=""
+[ -d test/evals ] && [ -f Gemfile ] && grep -q 'rails' Gemfile 2>/dev/null && _HAS_EVALS="yes"
+echo "EVAL_SUITE: \${_HAS_EVALS:-no}"
+\`\`\`
+
+Use the printed values in all subsequent steps:
+- **TEST_CMD** — the command to run the test suite; if \`NONE_DETECTED\`, skip tests and note it
+- **VERSION** — format determines bump logic (3-digit = semver, 4-digit = extended)
+- **LANGUAGES** — determines which file patterns to use in grep/find commands
+- **EVAL_SUITE** — only run eval suites when this is \`yes\`
+
+---`;
+}
+
const RESOLVERS: Record string> = {
COMMAND_REFERENCE: generateCommandReference,
SNAPSHOT_FLAGS: generateSnapshotFlags,
@@ -1058,6 +1131,7 @@ const RESOLVERS: Record string> = {
DESIGN_METHODOLOGY: generateDesignMethodology,
REVIEW_DASHBOARD: generateReviewDashboard,
TEST_BOOTSTRAP: generateTestBootstrap,
+ PROJECT_DETECT: generateProjectDetect,
};
// ─── Template Processing ────────────────────────────────────
diff --git a/ship/SKILL.md b/ship/SKILL.md
index e2b524d..af6bc16 100644
--- a/ship/SKILL.md
+++ b/ship/SKILL.md
@@ -139,6 +139,75 @@ branch name wherever the instructions say "the base branch."
---
+## Step 0: Detect project stack
+
+Run this before any stack-sensitive step. Outputs are printed to stdout — use them in prose to
+determine which commands to run in subsequent steps.
+
+```bash
+# Package manager
+_PKG_MGR="npm"
+[ -f pnpm-lock.yaml ] && _PKG_MGR="pnpm"
+[ -f yarn.lock ] && _PKG_MGR="yarn"
+[ -f bun.lockb ] && _PKG_MGR="bun"
+
+# Detect test runner (first match wins)
+_TEST_CMD=""
+_TEST_LABEL=""
+if [ -f package.json ]; then
+ if grep -qE '"vitest"' package.json 2>/dev/null || (node -e "const s=require('./package.json').scripts||{}; process.exit(Object.values(s).some(v=>v.includes('vitest'))?0:1)" 2>/dev/null); then
+ _TEST_CMD="$_PKG_MGR test" _TEST_LABEL="vitest"
+ elif grep -qE '"jest"' package.json 2>/dev/null; then
+ _TEST_CMD="$_PKG_MGR test" _TEST_LABEL="jest"
+ elif node -e "const s=require('./package.json').scripts||{}; process.exit(s.test?0:1)" 2>/dev/null; then
+ _TEST_CMD="$_PKG_MGR test" _TEST_LABEL="npm-script"
+ fi
+fi
+if [ -z "$_TEST_CMD" ]; then
+ if command -v pytest &>/dev/null && ([ -f pytest.ini ] || [ -f setup.cfg ] || ([ -f pyproject.toml ] && grep -q '[tool.pytest' pyproject.toml 2>/dev/null)); then
+ _TEST_CMD="pytest" _TEST_LABEL="pytest"
+ fi
+fi
+if [ -z "$_TEST_CMD" ] && [ -x bin/test-lane ]; then
+ _TEST_CMD="bin/test-lane" _TEST_LABEL="rails-test-lane"
+fi
+if [ -z "$_TEST_CMD" ] && [ -f Makefile ] && grep -q '^test:' Makefile 2>/dev/null; then
+ _TEST_CMD="make test" _TEST_LABEL="makefile"
+fi
+echo "TEST_CMD: ${_TEST_CMD:-NONE_DETECTED} ($_TEST_LABEL)"
+
+# VERSION format
+if [ -f VERSION ]; then
+ _VER=$(cat VERSION | tr -d '[:space:]')
+ _VER_DIGITS=$(echo "$_VER" | awk -F. '{print NF}')
+ echo "VERSION: $_VER (${_VER_DIGITS}-digit)"
+else
+ echo "VERSION: NO_VERSION_FILE"
+fi
+
+# Project language(s)
+_LANGS=""
+[ -f package.json ] && _LANGS="${_LANGS}nodejs "
+[ -f pyproject.toml ] || [ -f requirements.txt ] && _LANGS="${_LANGS}python "
+[ -f Gemfile ] && _LANGS="${_LANGS}ruby "
+[ -f go.mod ] && _LANGS="${_LANGS}go "
+[ -f Cargo.toml ] && _LANGS="${_LANGS}rust "
+echo "LANGUAGES: ${_LANGS:-unknown}"
+
+# Eval suite (Rails only)
+_HAS_EVALS=""
+[ -d test/evals ] && [ -f Gemfile ] && grep -q 'rails' Gemfile 2>/dev/null && _HAS_EVALS="yes"
+echo "EVAL_SUITE: ${_HAS_EVALS:-no}"
+```
+
+Use the printed values in all subsequent steps:
+- **TEST_CMD** — the command to run the test suite; if `NONE_DETECTED`, skip tests and note it
+- **VERSION** — format determines bump logic (3-digit = semver, 4-digit = extended)
+- **LANGUAGES** — determines which file patterns to use in grep/find commands
+- **EVAL_SUITE** — only run eval suites when this is `yes`
+
+---
+
# Ship: Fully Automated Ship Workflow
You are running the `/ship` workflow. This is a **non-interactive, fully automated** workflow. Do NOT ask for confirmation at any step. The user said `/ship` which means DO IT. Run straight through and output the PR URL at the end.
@@ -410,19 +479,28 @@ Only commit if there are changes. Stage all bootstrap files (config, test direct
## Step 3: Run tests (on merged code)
-**Do NOT run `RAILS_ENV=test bin/rails db:migrate`** — `bin/test-lane` already calls
-`db:test:prepare` internally, which loads the schema into the correct lane database.
-Running bare test migrations without INSTANCE hits an orphan DB and corrupts structure.sql.
+Use the **TEST_CMD** detected in Step 0.
+
+**If TEST_CMD is `NONE_DETECTED`:** Note "No test runner detected — skipping tests." and continue to Step 3.25.
-Run both test suites in parallel:
+**If LANGUAGES includes `nodejs` AND another language (e.g., `python` or `ruby`):** run both test suites in parallel:
```bash
-bin/test-lane 2>&1 | tee /tmp/ship_tests.txt &
-npm run test 2>&1 | tee /tmp/ship_vitest.txt &
+# Run detected suites in parallel — substitute and with detected TEST_CMDs
+ 2>&1 | tee /tmp/ship_tests_1.txt &
+ 2>&1 | tee /tmp/ship_tests_2.txt &
wait
```
-After both complete, read the output files and check pass/fail.
+**If only one test suite:** run it directly:
+
+```bash
+ 2>&1 | tee /tmp/ship_tests.txt
+```
+
+**Rails note:** If TEST_CMD is `bin/test-lane`, do NOT run `RAILS_ENV=test bin/rails db:migrate` separately — `bin/test-lane` calls `db:test:prepare` internally.
+
+After completing, read the output file(s) and check pass/fail.
**If any test fails:** Show the failures and **STOP**. Do not proceed.
@@ -430,9 +508,13 @@ After both complete, read the output files and check pass/fail.
---
-## Step 3.25: Eval Suites (conditional)
+## Step 3.25: Eval Suites (Rails + eval infrastructure only)
+
+**Skip this step entirely if EVAL_SUITE is `no` (detected in Step 0).** This step only applies to Rails projects with a `test/evals/` directory. Non-Rails, Node.js, and Python projects skip directly to Step 3.5.
-Evals are mandatory when prompt-related files change. Skip this step entirely if no prompt files are in the diff.
+If EVAL_SUITE is `yes`:
+
+Evals are mandatory when prompt-related files change. Skip if no prompt files are in the diff.
**1. Check if the diff touches prompt-related files:**
@@ -440,14 +522,10 @@ Evals are mandatory when prompt-related files change. Skip this step entirely if
git diff origin/ --name-only
```
-Match against these patterns (from CLAUDE.md):
-- `app/services/*_prompt_builder.rb`
-- `app/services/*_generation_service.rb`, `*_writer_service.rb`, `*_designer_service.rb`
-- `app/services/*_evaluator.rb`, `*_scorer.rb`, `*_classifier_service.rb`, `*_analyzer.rb`
-- `app/services/concerns/*voice*.rb`, `*writing*.rb`, `*prompt*.rb`, `*token*.rb`
-- `app/services/chat_tools/*.rb`, `app/services/x_thread_tools/*.rb`
-- `config/system_prompts/*.txt`
-- `test/evals/**/*` (eval infrastructure changes affect all suites)
+Match against the patterns listed in CLAUDE.md under "Prompt/LLM changes". If CLAUDE.md has no such section, match against:
+- `*prompt*`, `*generation_service*`, `*evaluator*`, `*scorer*`
+- `config/system_prompts/`
+- `test/evals/`
**If no matches:** Print "No prompt-related files changed — skipping evals." and continue to Step 3.5.
@@ -459,31 +537,26 @@ Each eval runner (`test/evals/*_eval_runner.rb`) declares `PROMPT_SOURCE_FILES`
grep -l "changed_file_basename" test/evals/*_eval_runner.rb
```
-Map runner → test file: `post_generation_eval_runner.rb` → `post_generation_eval_test.rb`.
-
**Special cases:**
-- Changes to `test/evals/judges/*.rb`, `test/evals/support/*.rb`, or `test/evals/fixtures/` affect ALL suites that use those judges/support files. Check imports in the eval test files to determine which.
-- Changes to `config/system_prompts/*.txt` — grep eval runners for the prompt filename to find affected suites.
-- If unsure which suites are affected, run ALL suites that could plausibly be impacted. Over-testing is better than missing a regression.
+- Changes to `test/evals/judges/`, `test/evals/support/`, or `test/evals/fixtures/` affect ALL suites.
+- If unsure, run ALL suites that could plausibly be impacted. Over-testing beats missing a regression.
**3. Run affected suites at `EVAL_JUDGE_TIER=full`:**
-`/ship` is a pre-merge gate, so always use full tier (Sonnet structural + Opus persona judges).
-
```bash
EVAL_JUDGE_TIER=full EVAL_VERBOSE=1 bin/test-lane --eval test/evals/_eval_test.rb 2>&1 | tee /tmp/ship_evals.txt
```
-If multiple suites need to run, run them sequentially (each needs a test lane). If the first suite fails, stop immediately — don't burn API cost on remaining suites.
+Run sequentially. Stop at first failure — don't burn API cost on remaining suites.
**4. Check results:**
-- **If any eval fails:** Show the failures, the cost dashboard, and **STOP**. Do not proceed.
+- **If any eval fails:** Show the failures, the cost dashboard, and **STOP**.
- **If all pass:** Note pass counts and cost. Continue to Step 3.5.
**5. Save eval output** — include eval results and cost dashboard in the PR body (Step 8).
-**Tier reference (for context — /ship always uses `full`):**
+**Tier reference:**
| Tier | When | Speed (cached) | Cost |
|------|------|----------------|------|
| `fast` (Haiku) | Dev iteration, smoke tests | ~5s (14x faster) | ~$0.07/run |
@@ -707,18 +780,34 @@ For each classified comment:
## Step 4: Version bump (auto-decide)
-1. Read the current `VERSION` file (4-digit format: `MAJOR.MINOR.PATCH.MICRO`)
+1. Use the **VERSION** value detected in Step 0.
+ - If `NO_VERSION_FILE`: skip this step entirely. Note "No VERSION file — skipping version bump."
+ - **4-digit format** (`MAJOR.MINOR.PATCH.MICRO`): bump MICRO for tiny changes, PATCH for features/fixes
+ - **3-digit format** (`MAJOR.MINOR.PATCH`): standard semver — bump PATCH for fixes, MINOR for features (ASK), MAJOR for breaking (ASK)
+ - Any other format: treat as 3-digit semver
2. **Auto-decide the bump level based on the diff:**
- - Count lines changed (`git diff origin/...HEAD --stat | tail -1`)
+
+ Count lines changed:
+ ```bash
+ git diff origin/...HEAD --stat | tail -1
+ ```
+
+ **For 4-digit VERSION:**
- **MICRO** (4th digit): < 50 lines changed, trivial tweaks, typos, config
- **PATCH** (3rd digit): 50+ lines changed, bug fixes, small-medium features
- - **MINOR** (2nd digit): **ASK the user** — only for major features or significant architectural changes
- - **MAJOR** (1st digit): **ASK the user** — only for milestones or breaking changes
+ - **MINOR** (2nd digit): **ASK the user** — major features or significant architectural changes
+ - **MAJOR** (1st digit): **ASK the user** — milestones or breaking changes
+
+ **For 3-digit VERSION (standard semver):**
+ - **PATCH** (3rd digit): bug fixes, small changes (< 200 lines)
+ - **MINOR** (2nd digit): **ASK the user** — new features (200+ lines or clear feature addition)
+ - **MAJOR** (1st digit): **ASK the user** — breaking changes only
3. Compute the new version:
- Bumping a digit resets all digits to its right to 0
- - Example: `0.19.1.0` + PATCH → `0.19.2.0`
+ - 4-digit example: `0.19.1.0` + PATCH → `0.19.2.0`
+ - 3-digit example: `1.4.2` + PATCH → `1.4.3`
4. Write the new version to the `VERSION` file.
@@ -898,7 +987,7 @@ EOF
- **Never skip the pre-landing review.** If checklist.md is unreadable, stop.
- **Never force push.** Use regular `git push` only.
- **Never ask for confirmation** except for MINOR/MAJOR version bumps and pre-landing review ASK items (batched into at most one AskUserQuestion).
-- **Always use the 4-digit version format** from the VERSION file.
+- **Preserve the VERSION file's digit format** (3-digit semver or 4-digit extended). Never change the format.
- **Date format in CHANGELOG:** `YYYY-MM-DD`
- **Split commits for bisectability** — each commit = one logical change.
- **TODOS.md completion detection must be conservative.** Only mark items as completed when the diff clearly shows the work is done.
diff --git a/ship/SKILL.md.tmpl b/ship/SKILL.md.tmpl
index e059fc6..d887d85 100644
--- a/ship/SKILL.md.tmpl
+++ b/ship/SKILL.md.tmpl
@@ -18,6 +18,8 @@ allowed-tools:
{{BASE_BRANCH_DETECT}}
+{{PROJECT_DETECT}}
+
# Ship: Fully Automated Ship Workflow
You are running the `/ship` workflow. This is a **non-interactive, fully automated** workflow. Do NOT ask for confirmation at any step. The user said `/ship` which means DO IT. Run straight through and output the PR URL at the end.
@@ -102,19 +104,28 @@ git fetch origin && git merge origin/ --no-edit
## Step 3: Run tests (on merged code)
-**Do NOT run `RAILS_ENV=test bin/rails db:migrate`** — `bin/test-lane` already calls
-`db:test:prepare` internally, which loads the schema into the correct lane database.
-Running bare test migrations without INSTANCE hits an orphan DB and corrupts structure.sql.
+Use the **TEST_CMD** detected in Step 0.
+
+**If TEST_CMD is `NONE_DETECTED`:** Note "No test runner detected — skipping tests." and continue to Step 3.25.
-Run both test suites in parallel:
+**If LANGUAGES includes `nodejs` AND another language (e.g., `python` or `ruby`):** run both test suites in parallel:
```bash
-bin/test-lane 2>&1 | tee /tmp/ship_tests.txt &
-npm run test 2>&1 | tee /tmp/ship_vitest.txt &
+# Run detected suites in parallel — substitute and with detected TEST_CMDs
+ 2>&1 | tee /tmp/ship_tests_1.txt &
+ 2>&1 | tee /tmp/ship_tests_2.txt &
wait
```
-After both complete, read the output files and check pass/fail.
+**If only one test suite:** run it directly:
+
+```bash
+ 2>&1 | tee /tmp/ship_tests.txt
+```
+
+**Rails note:** If TEST_CMD is `bin/test-lane`, do NOT run `RAILS_ENV=test bin/rails db:migrate` separately — `bin/test-lane` calls `db:test:prepare` internally.
+
+After completing, read the output file(s) and check pass/fail.
**If any test fails:** Show the failures and **STOP**. Do not proceed.
@@ -122,9 +133,13 @@ After both complete, read the output files and check pass/fail.
---
-## Step 3.25: Eval Suites (conditional)
+## Step 3.25: Eval Suites (Rails + eval infrastructure only)
+
+**Skip this step entirely if EVAL_SUITE is `no` (detected in Step 0).** This step only applies to Rails projects with a `test/evals/` directory. Non-Rails, Node.js, and Python projects skip directly to Step 3.5.
-Evals are mandatory when prompt-related files change. Skip this step entirely if no prompt files are in the diff.
+If EVAL_SUITE is `yes`:
+
+Evals are mandatory when prompt-related files change. Skip if no prompt files are in the diff.
**1. Check if the diff touches prompt-related files:**
@@ -132,14 +147,10 @@ Evals are mandatory when prompt-related files change. Skip this step entirely if
git diff origin/ --name-only
```
-Match against these patterns (from CLAUDE.md):
-- `app/services/*_prompt_builder.rb`
-- `app/services/*_generation_service.rb`, `*_writer_service.rb`, `*_designer_service.rb`
-- `app/services/*_evaluator.rb`, `*_scorer.rb`, `*_classifier_service.rb`, `*_analyzer.rb`
-- `app/services/concerns/*voice*.rb`, `*writing*.rb`, `*prompt*.rb`, `*token*.rb`
-- `app/services/chat_tools/*.rb`, `app/services/x_thread_tools/*.rb`
-- `config/system_prompts/*.txt`
-- `test/evals/**/*` (eval infrastructure changes affect all suites)
+Match against the patterns listed in CLAUDE.md under "Prompt/LLM changes". If CLAUDE.md has no such section, match against:
+- `*prompt*`, `*generation_service*`, `*evaluator*`, `*scorer*`
+- `config/system_prompts/`
+- `test/evals/`
**If no matches:** Print "No prompt-related files changed — skipping evals." and continue to Step 3.5.
@@ -151,31 +162,26 @@ Each eval runner (`test/evals/*_eval_runner.rb`) declares `PROMPT_SOURCE_FILES`
grep -l "changed_file_basename" test/evals/*_eval_runner.rb
```
-Map runner → test file: `post_generation_eval_runner.rb` → `post_generation_eval_test.rb`.
-
**Special cases:**
-- Changes to `test/evals/judges/*.rb`, `test/evals/support/*.rb`, or `test/evals/fixtures/` affect ALL suites that use those judges/support files. Check imports in the eval test files to determine which.
-- Changes to `config/system_prompts/*.txt` — grep eval runners for the prompt filename to find affected suites.
-- If unsure which suites are affected, run ALL suites that could plausibly be impacted. Over-testing is better than missing a regression.
+- Changes to `test/evals/judges/`, `test/evals/support/`, or `test/evals/fixtures/` affect ALL suites.
+- If unsure, run ALL suites that could plausibly be impacted. Over-testing beats missing a regression.
**3. Run affected suites at `EVAL_JUDGE_TIER=full`:**
-`/ship` is a pre-merge gate, so always use full tier (Sonnet structural + Opus persona judges).
-
```bash
EVAL_JUDGE_TIER=full EVAL_VERBOSE=1 bin/test-lane --eval test/evals/_eval_test.rb 2>&1 | tee /tmp/ship_evals.txt
```
-If multiple suites need to run, run them sequentially (each needs a test lane). If the first suite fails, stop immediately — don't burn API cost on remaining suites.
+Run sequentially. Stop at first failure — don't burn API cost on remaining suites.
**4. Check results:**
-- **If any eval fails:** Show the failures, the cost dashboard, and **STOP**. Do not proceed.
+- **If any eval fails:** Show the failures, the cost dashboard, and **STOP**.
- **If all pass:** Note pass counts and cost. Continue to Step 3.5.
**5. Save eval output** — include eval results and cost dashboard in the PR body (Step 8).
-**Tier reference (for context — /ship always uses `full`):**
+**Tier reference:**
| Tier | When | Speed (cached) | Cost |
|------|------|----------------|------|
| `fast` (Haiku) | Dev iteration, smoke tests | ~5s (14x faster) | ~$0.07/run |
@@ -399,18 +405,34 @@ For each classified comment:
## Step 4: Version bump (auto-decide)
-1. Read the current `VERSION` file (4-digit format: `MAJOR.MINOR.PATCH.MICRO`)
+1. Use the **VERSION** value detected in Step 0.
+ - If `NO_VERSION_FILE`: skip this step entirely. Note "No VERSION file — skipping version bump."
+ - **4-digit format** (`MAJOR.MINOR.PATCH.MICRO`): bump MICRO for tiny changes, PATCH for features/fixes
+ - **3-digit format** (`MAJOR.MINOR.PATCH`): standard semver — bump PATCH for fixes, MINOR for features (ASK), MAJOR for breaking (ASK)
+ - Any other format: treat as 3-digit semver
2. **Auto-decide the bump level based on the diff:**
- - Count lines changed (`git diff origin/...HEAD --stat | tail -1`)
+
+ Count lines changed:
+ ```bash
+ git diff origin/...HEAD --stat | tail -1
+ ```
+
+ **For 4-digit VERSION:**
- **MICRO** (4th digit): < 50 lines changed, trivial tweaks, typos, config
- **PATCH** (3rd digit): 50+ lines changed, bug fixes, small-medium features
- - **MINOR** (2nd digit): **ASK the user** — only for major features or significant architectural changes
- - **MAJOR** (1st digit): **ASK the user** — only for milestones or breaking changes
+ - **MINOR** (2nd digit): **ASK the user** — major features or significant architectural changes
+ - **MAJOR** (1st digit): **ASK the user** — milestones or breaking changes
+
+ **For 3-digit VERSION (standard semver):**
+ - **PATCH** (3rd digit): bug fixes, small changes (< 200 lines)
+ - **MINOR** (2nd digit): **ASK the user** — new features (200+ lines or clear feature addition)
+ - **MAJOR** (1st digit): **ASK the user** — breaking changes only
3. Compute the new version:
- Bumping a digit resets all digits to its right to 0
- - Example: `0.19.1.0` + PATCH → `0.19.2.0`
+ - 4-digit example: `0.19.1.0` + PATCH → `0.19.2.0`
+ - 3-digit example: `1.4.2` + PATCH → `1.4.3`
4. Write the new version to the `VERSION` file.
@@ -590,7 +612,7 @@ EOF
- **Never skip the pre-landing review.** If checklist.md is unreadable, stop.
- **Never force push.** Use regular `git push` only.
- **Never ask for confirmation** except for MINOR/MAJOR version bumps and pre-landing review ASK items (batched into at most one AskUserQuestion).
-- **Always use the 4-digit version format** from the VERSION file.
+- **Preserve the VERSION file's digit format** (3-digit semver or 4-digit extended). Never change the format.
- **Date format in CHANGELOG:** `YYYY-MM-DD`
- **Split commits for bisectability** — each commit = one logical change.
- **TODOS.md completion detection must be conservative.** Only mark items as completed when the diff clearly shows the work is done.