Co-activation testing, new ways, vocabulary tuning by aaronsb · Pull Request #31 · aaronsb/claude-code-config

aaronsb · 2026-02-17T18:32:02Z

Summary

Addresses #27, #28, #29 in a single branch. The three issues are tightly coupled — co-activation testing validates the new ways, and vocabulary tuning validates both.

Co-activation test infrastructure (Add co-activation quality testing for way matching #27): Extended test harness and integration tests to support multi-expected-way fixtures. 6/6 co-activation tests pass with FULL results.
New ways (Author new softwaredev ways: threat-modeling, standards, RFC expansion #28): Authored architecture/threat-modeling (STRIDE, trust boundaries) and docs/standards (team norms, conventions). Expanded architecture/design vocabulary with RFC/proposal terms.
Vocabulary tuning (Tune vocabulary for newly-semantic ways via real session data #29): Fixed debugging description cross-talk, removed generic "risk" and "standard" from new way vocabularies after negative controls caught FPs. Added 2 new negative controls.

Test Results

Fixture tests (70 prompts, 20-way corpus):

BM25: TP=55 FP=0 TN=8 FN=7  accuracy=63/70
Co-activation (6 tests): full=6  partial=0  miss=0

Integration tests (34 prompts, real way files):

BM25: TP=24 FP=0 TN=4 FN=6  accuracy=28/34

Test plan

Fixture harness: bash tools/way-match/test-harness.sh --bm25-only → 63/70, 0 FP
Integration: bash tools/way-match/test-integration.sh → 28/34, 0 FP
Co-activation: 6/6 FULL (all expected ways fire)
Negative controls: 8/8 TN (including 2 new controls for new ways)
Cross-talk: debugging no longer fires on "add error handling" (score 1.29 < 2.0)
Activation test (manual, post-merge)

, #29) Co-activation test infrastructure (#27): - Test harness supports array expected values for multi-way fixtures - New eval_scorer function with FULL/PARTIAL/MISS result types - 6 co-activation fixtures covering delivery, code, and architecture overlaps - Integration tests support comma-separated expected way IDs New ways (#28): - architecture/threat-modeling: STRIDE analysis, trust boundaries, attack surfaces - docs/standards: team norms, coding conventions, accessibility - Expanded design way vocabulary with RFC/proposal terms Vocabulary tuning (#29): - Fixed debugging description cross-talk ("errors" → "failures") - Removed generic "risk" from threat-modeling vocabulary (caused FP) - Removed generic "standard" from standards vocabulary (caused FP) - Added 2 new negative controls for the new ways - 0 FP maintained across 70 fixture tests and 34 integration tests

aaronsb merged commit 413b1d4 into main Feb 17, 2026
2 checks passed

aaronsb deleted the ways-coactivation-and-content branch February 17, 2026 18:44

aaronsb mentioned this pull request Feb 17, 2026

Author new softwaredev ways: threat-modeling, standards, RFC expansion #28

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

Co-activation testing, new ways, vocabulary tuning#31

Co-activation testing, new ways, vocabulary tuning#31
aaronsb merged 1 commit intomainfrom
ways-coactivation-and-content

aaronsb commented Feb 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Comments

Conversation

aaronsb commented Feb 17, 2026

Summary

Test Results

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant