|
| 1 | +--- |
| 2 | +name: advocacy-testing-strategy |
| 3 | +description: Spec-first test generation, assertion quality review, mutation testing, five anti-patterns to avoid — for AI-assisted advocacy development where silent test failures mean lost evidence or exposed activists |
| 4 | +--- |
| 5 | +# Testing Strategy |
| 6 | + |
| 7 | +## When to Use |
| 8 | +- Writing or generating any tests |
| 9 | +- Reviewing AI-generated test code |
| 10 | +- Setting up test infrastructure for a new feature |
| 11 | +- When test suite quality is in question (flaky tests, low mutation scores, false confidence) |
| 12 | + |
| 13 | +## Process |
| 14 | + |
| 15 | +### Step 1: Read the Specification |
| 16 | +Before writing any test, identify the specification or acceptance criteria for the behavior under test. If no spec exists, write one — even a brief description of what the code should do and what constitutes failure. Without a spec, AI generates tests that mirror the implementation rather than the intent, producing circular validation. |
| 17 | + |
| 18 | +### Step 2: Write Failing Tests from the Spec (Pattern 2 — Spec-First) |
| 19 | +Generate tests from the specification BEFORE writing implementation. Each test should encode a business rule you can state in words. For advocacy software: "investigation records must be anonymized before export," "coalition data must not cross organizational boundaries without explicit agreement," "graphic content must never display without a content warning." Write the test. Verify it fails. This is the preferred generation pattern. |
| 20 | + |
| 21 | +### Step 3: Verify Tests Fail for the Right Reason |
| 22 | +A failing test is only useful if it fails because the behavior is absent — not because of a setup error, typo, or misconfigured test environment. Read each failure message. Confirm it describes the missing behavior, not a broken test. |
| 23 | + |
| 24 | +### Step 4: Implement Until Tests Pass |
| 25 | +Write the minimum implementation that makes the failing tests pass. Do not write more code than the tests demand. |
| 26 | + |
| 27 | +### Step 5: Review Assertions Against the Spec, Not the Code |
| 28 | +This is the critical step for AI-generated tests. Ask three questions of every assertion: |
| 29 | +1. **Does this test fail if the code is wrong?** If you break the implementation and the test still passes, it is worthless. |
| 30 | +2. **Does the assertion encode a domain rule?** If you cannot name the rule, it is a snapshot, not a test. |
| 31 | +3. **Would mutation testing kill this?** If changing `+` to `-` leaves the test green, the assertion is weak. |
| 32 | + |
| 33 | +NEVER accept tautological assertions — tests that assert output equals the output of the same function call. This is the single most dangerous pattern in AI-generated tests. |
| 34 | + |
| 35 | +### Step 6: Run Mutation Testing |
| 36 | +Run a mutation testing tool against the test suite. Surviving mutants reveal assertions that look thorough but verify nothing. Feed surviving mutants to the AI and ask it to write tests that kill them. Mutation score is the primary quality metric — not coverage percentage. A suite with 90% coverage and 40% mutation score is a false sense of security. |
| 37 | + |
| 38 | +### Step 7: Fix Weak Tests |
| 39 | +For each surviving mutant, write a targeted test that kills it. This closes the loop between test generation and test quality. |
| 40 | + |
| 41 | +## Five Generation Patterns (Know When to Use Each) |
| 42 | +1. **Implementation-first** — generate tests from existing code. Dangerous: tests mirror code, not intent. Use only when no spec exists and you need characterization tests. |
| 43 | +2. **Spec-first** — generate tests from specification before coding. Preferred pattern. Produces tests that encode intent. |
| 44 | +3. **Edge-case generation** — give the AI a function signature and ask specifically for: empty inputs, boundary values, null/undefined, unicode, timezone boundaries, concurrent access, overflow. AI excels here. |
| 45 | +4. **Characterization tests** — for legacy or AI-generated code that lacks tests: capture current behavior before changing anything. Cover before you change (Feathers). |
| 46 | +5. **Mutation-guided improvement** — run mutation testing, feed surviving mutants to AI, generate targeted tests. |
| 47 | + |
| 48 | +## Five Anti-Patterns to Reject on Sight |
| 49 | +1. **Snapshot trap** — tests that snapshot current output and assert against it. They pass today and break on any correct change. They verify nothing about correctness. |
| 50 | +2. **Mock everything** — over-mocked tests verify that mocks behave as expected, not that real code works. Mock only at system boundaries: external APIs, databases, file systems. |
| 51 | +3. **Happy path only** — AI-generated tests overwhelmingly test the success path. Explicitly request error path, boundary condition, and adversarial input tests. In advocacy software, error paths are where people get hurt. |
| 52 | +4. **Test-after-commit** — writing tests after code is committed defeats the feedback loop. Tests must exist during development. |
| 53 | +5. **Coverage theater** — chasing coverage numbers with meaningless assertions. A line "covered" by a test with no assertion is not tested. |
| 54 | + |
| 55 | +## Advocacy-Specific Testing |
| 56 | +- Contract tests at every service boundary, especially coalition cross-organization APIs where different groups have different security postures |
| 57 | +- Test adversarial inputs: SQL injection through investigation search, XSS through testimony display, path traversal through evidence uploads |
| 58 | +- Verify progressive disclosure: graphic content must not render without explicit opt-in |
| 59 | +- Test offline behavior: what happens when connectivity drops during evidence sync |
| 60 | +- Fast test execution is non-negotiable for AI agent loops — a 10-minute suite across 15 iterations burns 2.5 hours |
0 commit comments