docs: document test pyramid for ways matching system

aaronsb · aaronsb · commit 0a1ae10f51cd · 2026-02-16T17:25:49.000-06:00
- tests/README.md: Full guide covering all three test layers (fixture,
  integration, activation), baselines, /test-way skill, and when-to-run
- docs/hooks-and-ways.md: Add Testing section with summary table
- tools/way-match/test-integration.sh: Fix stale match: field check
  (use description+vocabulary presence instead), fix default threshold
  from NCD 0.58 to BM25 2.0
diff --git a/docs/hooks-and-ways.md b/docs/hooks-and-ways.md
@@ -409,3 +409,19 @@ flowchart TD
 ```
 
 Checked by `show-way.sh` before outputting any way.
+
+## Testing
+
+Three test layers verify the matching and injection pipeline. See [tests/README.md](../tests/README.md) for full details.
+
+| Layer | Command | What it tests |
+|-------|---------|---------------|
+| **Fixture** | `bash tools/way-match/test-harness.sh` | BM25 vs NCD scorer accuracy (32 prompts, fixed corpus) |
+| **Integration** | `bash tools/way-match/test-integration.sh` | Real way files, frontmatter extraction, multi-way discrimination |
+| **Activation** | `read and run the activation test at tests/way-activation-test.md` | Live hook pipeline: regex, BM25, negative control, subagent injection |
+
+The `/test-way` skill provides ad-hoc scoring for vocabulary tuning:
+
+```
+/test-way "write some unit tests for this module"
+```
diff --git a/tests/README.md b/tests/README.md
@@ -1,17 +1,71 @@
-# Integration Tests
+# Testing the Ways System
 
-## Way Activation Test
+Three test layers, from fast/automated to slow/interactive.
 
-Tests that contextual hooks fire correctly for parent agents and subagents.
+## 1. Fixture Tests (BM25 vs NCD scorer comparison)
+
+Runs 32 test prompts against a fixed 7-way corpus. Compares BM25 binary against gzip NCD baseline. Reports TP/FP/TN/FN for each scorer.
+
+```bash
+bash tools/way-match/test-harness.sh
+```
+
+Options: `--bm25-only`, `--ncd-only`, `--verbose`
+
+**What it covers**: Scorer accuracy, false positive rate, head-to-head comparison. Tests direct vocabulary matches, synonym/paraphrase variants, and negative controls.
+
+**Current baseline**: BM25 26/32, NCD 24/32, 0 FP for both.
+
+## 2. Integration Tests (real way files)
+
+Scores 31 test prompts against actual `way.md` files extracted from the live ways directory. Tests the real frontmatter extraction pipeline.
+
+```bash
+bash tools/way-match/test-integration.sh
+```
+
+**What it covers**: End-to-end scoring with real way vocabulary, multi-way discrimination (does the right way win?), threshold behavior with actual threshold values.
+
+**Current baseline**: BM25 27/31 (0 FP), NCD 15/31 (3 FP).
+
+## 3. Activation Test (live agent + subagent)
+
+Interactive test protocol that verifies the full hook pipeline in a running Claude Code session. Tests regex matching, BM25 semantic matching, negative controls, and subagent injection.
 
 **To run**: Start a fresh session from `~/.claude/` and type:
 
 ```
 read and run the activation test at tests/way-activation-test.md
 ```
 
-Claude reads the test protocol, then walks you through 7 steps:
-- Steps 1, 5, 6, 7: Claude-only (just watch)
-- Steps 2, 3, 4: You type specific phrases when prompted
+Claude reads the test file (avoiding prompt-hook contamination), then walks you through 7 steps:
+
+| Step | Who | Tests |
+|------|-----|-------|
+| 1 | Claude | Session baseline (no premature domain activation) |
+| 2 | User types prompt | Regex pattern matching (commits way) |
+| 3 | User types prompt | BM25 semantic matching (security way) |
+| 4 | User types prompt | Negative control (no false positives) |
+| 5 | Claude | Subagent injection (Testing Way via SubagentStart) |
+| 6 | Claude | Subagent negative (no injection on irrelevant prompt) |
+| 7 | Claude | Summary table |
+
+Takes about 3 minutes. **Current baseline**: 6/6 PASS.
+
+## Ad-Hoc Vocabulary Testing
+
+The `/test-way` skill scores a prompt against all semantic ways and reports BM25 scores. Use it during vocabulary tuning to check discrimination between ways.
+
+```
+/test-way "write some unit tests for this module"
+```
+
+## When to Run Which
 
-Takes about 3 minutes.
+| Scenario | Test |
+|----------|------|
+| Changed `way-match.c` or rebuilt binary | Fixture tests + integration tests |
+| Changed a way's vocabulary or threshold | Integration tests + `/test-way` |
+| Changed hook scripts (check-*.sh, inject-*.sh, match-way.sh) | Activation test |
+| Added a new way | Integration tests + `/test-way` + activation test |
+| Sanity check after merge | All three |
diff --git a/tools/way-match/test-integration.sh b/tools/way-match/test-integration.sh
@@ -33,9 +33,6 @@ echo "Scanning for semantic ways..."
 echo ""
 
 while IFS= read -r wayfile; do
-  match_mode=$(sed -n 's/^match: *//p' "$wayfile")
-  [[ "$match_mode" != "semantic" ]] && continue
-
   # Derive way ID from path
   rel=$(echo "$wayfile" | sed "s|$WAYS_DIR/||;s|/way\.md$||")
   way_id=$(echo "$rel" | tr '/' '-')
@@ -44,14 +41,15 @@ while IFS= read -r wayfile; do
   vocab=$(sed -n 's/^vocabulary: *//p' "$wayfile")
   thresh=$(sed -n 's/^threshold: *//p' "$wayfile")
 
-  [[ -z "$desc" ]] && continue
+  # Skip ways without semantic matching fields
+  [[ -z "$desc" || -z "$vocab" ]] && continue
 
   WAY_DESC[$way_id]="$desc"
   WAY_VOCAB[$way_id]="$vocab"
-  WAY_THRESH[$way_id]="${thresh:-0.58}"
+  WAY_THRESH[$way_id]="${thresh:-2.0}"
   WAY_PATH[$way_id]="$wayfile"
 
-  printf "  %-30s thresh=%-5s  %s\n" "$way_id" "${thresh:-0.58}" "$(echo "$desc" | cut -c1-60)"
+  printf "  %-30s thresh=%-5s  %s\n" "$way_id" "${thresh:-2.0}" "$(echo "$desc" | cut -c1-60)"
 done < <(find "$WAYS_DIR" -name "way.md" -type f | sort)
 
 echo ""