|
1 | | -# Integration Tests |
| 1 | +# Testing the Ways System |
2 | 2 |
|
3 | | -## Way Activation Test |
| 3 | +Three test layers, from fast/automated to slow/interactive. |
4 | 4 |
|
5 | | -Tests that contextual hooks fire correctly for parent agents and subagents. |
| 5 | +## 1. Fixture Tests (BM25 vs NCD scorer comparison) |
| 6 | + |
| 7 | +Runs 32 test prompts against a fixed 7-way corpus. Compares BM25 binary against gzip NCD baseline. Reports TP/FP/TN/FN for each scorer. |
| 8 | + |
| 9 | +```bash |
| 10 | +bash tools/way-match/test-harness.sh |
| 11 | +``` |
| 12 | + |
| 13 | +Options: `--bm25-only`, `--ncd-only`, `--verbose` |
| 14 | + |
| 15 | +**What it covers**: Scorer accuracy, false positive rate, head-to-head comparison. Tests direct vocabulary matches, synonym/paraphrase variants, and negative controls. |
| 16 | + |
| 17 | +**Current baseline**: BM25 26/32, NCD 24/32, 0 FP for both. |
| 18 | + |
| 19 | +## 2. Integration Tests (real way files) |
| 20 | + |
| 21 | +Scores 31 test prompts against actual `way.md` files extracted from the live ways directory. Tests the real frontmatter extraction pipeline. |
| 22 | + |
| 23 | +```bash |
| 24 | +bash tools/way-match/test-integration.sh |
| 25 | +``` |
| 26 | + |
| 27 | +**What it covers**: End-to-end scoring with real way vocabulary, multi-way discrimination (does the right way win?), threshold behavior with actual threshold values. |
| 28 | + |
| 29 | +**Current baseline**: BM25 27/31 (0 FP), NCD 15/31 (3 FP). |
| 30 | + |
| 31 | +## 3. Activation Test (live agent + subagent) |
| 32 | + |
| 33 | +Interactive test protocol that verifies the full hook pipeline in a running Claude Code session. Tests regex matching, BM25 semantic matching, negative controls, and subagent injection. |
6 | 34 |
|
7 | 35 | **To run**: Start a fresh session from `~/.claude/` and type: |
8 | 36 |
|
9 | 37 | ``` |
10 | 38 | read and run the activation test at tests/way-activation-test.md |
11 | 39 | ``` |
12 | 40 |
|
13 | | -Claude reads the test protocol, then walks you through 7 steps: |
14 | | -- Steps 1, 5, 6, 7: Claude-only (just watch) |
15 | | -- Steps 2, 3, 4: You type specific phrases when prompted |
| 41 | +Claude reads the test file (avoiding prompt-hook contamination), then walks you through 7 steps: |
| 42 | + |
| 43 | +| Step | Who | Tests | |
| 44 | +|------|-----|-------| |
| 45 | +| 1 | Claude | Session baseline (no premature domain activation) | |
| 46 | +| 2 | User types prompt | Regex pattern matching (commits way) | |
| 47 | +| 3 | User types prompt | BM25 semantic matching (security way) | |
| 48 | +| 4 | User types prompt | Negative control (no false positives) | |
| 49 | +| 5 | Claude | Subagent injection (Testing Way via SubagentStart) | |
| 50 | +| 6 | Claude | Subagent negative (no injection on irrelevant prompt) | |
| 51 | +| 7 | Claude | Summary table | |
| 52 | + |
| 53 | +Takes about 3 minutes. **Current baseline**: 6/6 PASS. |
| 54 | + |
| 55 | +## Ad-Hoc Vocabulary Testing |
| 56 | + |
| 57 | +The `/test-way` skill scores a prompt against all semantic ways and reports BM25 scores. Use it during vocabulary tuning to check discrimination between ways. |
| 58 | + |
| 59 | +``` |
| 60 | +/test-way "write some unit tests for this module" |
| 61 | +``` |
| 62 | + |
| 63 | +## When to Run Which |
16 | 64 |
|
17 | | -Takes about 3 minutes. |
| 65 | +| Scenario | Test | |
| 66 | +|----------|------| |
| 67 | +| Changed `way-match.c` or rebuilt binary | Fixture tests + integration tests | |
| 68 | +| Changed a way's vocabulary or threshold | Integration tests + `/test-way` | |
| 69 | +| Changed hook scripts (check-*.sh, inject-*.sh, match-way.sh) | Activation test | |
| 70 | +| Added a new way | Integration tests + `/test-way` + activation test | |
| 71 | +| Sanity check after merge | All three | |
0 commit comments