diff --git a/hooks/ways/softwaredev/architecture/design/way.md b/hooks/ways/softwaredev/architecture/design/way.md index eb1c55b..384d66c 100644 --- a/hooks/ways/softwaredev/architecture/design/way.md +++ b/hooks/ways/softwaredev/architecture/design/way.md @@ -1,6 +1,6 @@ --- description: software system design architecture patterns database schema component modeling -vocabulary: architecture pattern database schema modeling interface component modules factory observer strategy monolith microservice domain layer coupling cohesion abstraction singleton +vocabulary: architecture pattern database schema modeling interface component modules factory observer strategy monolith microservice domain layer coupling cohesion abstraction singleton proposal rfc sketch deliberation whiteboard threshold: 2.0 scope: agent, subagent --- @@ -16,6 +16,14 @@ scope: agent, subagent When the design involves architectural trade-offs worth documenting, escalate to an ADR (see ADR Way). +## RFCs and Proposals + +RFCs are the "before" to an ADR's "after" — they capture deliberation while the design is still open. Use them for changes that affect multiple teams or systems. + +- **RFC**: Proposes a change, invites feedback, converges on a decision +- **ADR**: Records the decision after deliberation is complete +- Start with a sketch or whiteboard session, formalize as an RFC if the scope warrants it + ## Common Patterns | Pattern | When to Use | When NOT to Use | diff --git a/hooks/ways/softwaredev/architecture/threat-modeling/way.md b/hooks/ways/softwaredev/architecture/threat-modeling/way.md new file mode 100644 index 0000000..9f5353b --- /dev/null +++ b/hooks/ways/softwaredev/architecture/threat-modeling/way.md @@ -0,0 +1,75 @@ +--- +description: threat modeling, STRIDE analysis, trust boundaries, attack surface assessment, security design review +vocabulary: threat model stride attack surface trust boundary mitigation adversary dread spoofing tampering repudiation elevation +threshold: 2.0 +scope: agent, subagent +provenance: + policy: + - uri: governance/policies/operations.md + type: governance-doc + controls: + - id: OWASP Threat Modeling Cheat Sheet + justifications: + - STRIDE framework applied at design phase before code review + - Trust boundaries identified between components and external systems + - id: NIST SP 800-30 (Risk Assessment) + justifications: + - Risk register documents accepted risks with expiration dates + - Likelihood and impact assessed for each identified threat + verified: 2026-02-17 + rationale: > + Security Way covers code-level detection (SQL injection, XSS, secrets). + Threat modeling operates at design altitude — understanding adversaries, + trust boundaries, and systemic risks before they become code bugs. +--- +# Threat Modeling Way + +Threat modeling is security at design altitude. Where the Security Way catches code-level issues (injection, exposed secrets), this way maps adversaries, trust boundaries, and systemic risks. + +## When to Threat Model + +- New service or component with external-facing surface +- Authentication/authorization redesign +- Data flow changes crossing trust boundaries +- Third-party integration adding new attack vectors + +## STRIDE Framework + +Analyze each component interaction for: + +| Threat | Question | Mitigation Pattern | +|--------|----------|--------------------| +| **S**poofing | Can an attacker impersonate a user or service? | Authentication, mutual TLS, signed tokens | +| **T**ampering | Can data be modified in transit or at rest? | Integrity checks, HMAC, immutable logs | +| **R**epudiation | Can actions be denied after the fact? | Audit trails, signed events, timestamps | +| **I**nformation Disclosure | Can sensitive data leak? | Encryption, access controls, data classification | +| **D**enial of Service | Can availability be degraded? | Rate limiting, circuit breakers, redundancy | +| **E**levation of Privilege | Can an attacker gain higher access? | Least privilege, role separation, input validation | + +## Risk Register + +Document accepted risks with expiration — risks don't stay accepted forever. + +```markdown +| Risk | Likelihood | Impact | Mitigation | Status | Expires | +|------|-----------|--------|------------|--------|---------| +| API rate limiting absent | Medium | High | Planned for Q2 | Accepted | 2026-06-01 | +``` + +Expired accepted risks must be re-evaluated or mitigated. + +## Trust Boundaries + +Identify where data crosses trust levels: +- Browser to API gateway (untrusted → semi-trusted) +- API to internal service (semi-trusted → trusted) +- Service to third-party API (trusted → external) + +Each boundary crossing needs: authentication, input validation, output encoding. + +## Relationship to Security Way + +- **Threat modeling**: "What could go wrong?" (design phase) +- **Security Way**: "Is this code safe?" (implementation phase) + +Both may fire on security-related prompts. Threat modeling adds the systemic view. diff --git a/hooks/ways/softwaredev/docs/standards/way.md b/hooks/ways/softwaredev/docs/standards/way.md new file mode 100644 index 0000000..ecfef79 --- /dev/null +++ b/hooks/ways/softwaredev/docs/standards/way.md @@ -0,0 +1,54 @@ +--- +description: establishing team norms, coding conventions, testing philosophy, dependency policy, accessibility requirements +vocabulary: convention norm guideline accessibility style guide linting rule agreement philosophy +threshold: 2.0 +scope: agent, subagent +--- +# Standards Way + +Standards define how a team works. They're distinct from quality metrics (the Quality Way measures adherence) — this way is about establishing and documenting the norms themselves. + +## When Standards Come Up + +- Starting a new project and defining conventions +- Onboarding context: "what are our standards?" +- Policy decisions: dependency criteria, accessibility requirements +- Resolving disagreements about style or approach + +## Writing Standards Documents + +Structure standards as actionable rules, not aspirational prose: + +```markdown +## [Category] Standards + +### Rule: [Concise directive] +**Rationale**: Why this matters. +**Example**: What compliance looks like. +**Exception**: When this doesn't apply (if any). +``` + +Keep them scannable. A standard nobody reads is a standard nobody follows. + +## Common Standards Areas + +| Area | Covers | Not | +|------|--------|----| +| Coding style | Formatting, naming, file structure | Architecture patterns (Design Way) | +| Testing philosophy | When to test, coverage expectations | Test mechanics (Testing Way) | +| Dependency policy | Evaluation criteria, update cadence | Package management (Deps Way) | +| Accessibility | WCAG compliance level, testing requirements | UI implementation details | +| Documentation | What to document, where, format | How to write docs (Docs Way) | + +## Establishing vs Enforcing + +This way helps **establish** standards. Enforcement belongs elsewhere: +- Linters and formatters for style rules +- CI checks for coverage thresholds +- Review checklists for process compliance + +## Avoid + +- Standards without rationale (rules need "why") +- Standards that duplicate tooling (if the linter catches it, don't write a standard for it) +- Aspirational standards nobody plans to enforce diff --git a/hooks/ways/softwaredev/environment/debugging/way.md b/hooks/ways/softwaredev/environment/debugging/way.md index 3653c18..de2dce7 100644 --- a/hooks/ways/softwaredev/environment/debugging/way.md +++ b/hooks/ways/softwaredev/environment/debugging/way.md @@ -1,5 +1,5 @@ --- -description: debugging code issues, troubleshooting errors, investigating broken behavior, fixing bugs +description: debugging code issues, troubleshooting failures, investigating broken behavior, fixing bugs vocabulary: debug breakpoint stacktrace investigate troubleshoot regression bisect crash error fail bug log trace exception segfault hang timeout threshold: 2.0 scope: agent, subagent diff --git a/tests/README.md b/tests/README.md index 3b4cf8d..b2af604 100644 --- a/tests/README.md +++ b/tests/README.md @@ -13,7 +13,7 @@ Three layers, from fast/automated to slow/interactive. See [way-match/results.md ### 1. Fixture Tests (BM25 vs NCD scorer comparison) -Runs 54 test prompts against a fixed 18-way corpus (all softwaredev ways with BM25 semantic matching). Compares BM25 binary against gzip NCD baseline. Reports TP/FP/TN/FN for each scorer. +Runs 70 test prompts against a fixed 20-way corpus (all softwaredev ways with BM25 semantic matching). Compares BM25 binary against gzip NCD baseline. Reports TP/FP/TN/FN for each scorer. Includes co-activation fixtures that validate multi-way triggering. ```bash tests/way-match/run-tests.sh fixture --verbose @@ -23,13 +23,13 @@ bash tools/way-match/test-harness.sh --verbose Options: `--bm25-only`, `--ncd-only`, `--verbose` -**What it covers**: Scorer accuracy, false positive rate, head-to-head comparison. Tests direct vocabulary matches, synonym/paraphrase variants, and negative controls. +**What it covers**: Scorer accuracy, false positive rate, head-to-head comparison. Tests direct vocabulary matches, synonym/paraphrase variants, negative controls, and co-activation (multi-way expected sets). -**Current baseline**: BM25 48/54, 0 FP. +**Current baseline**: BM25 63/70, 0 FP. Co-activation: 6/6 FULL. ### 2. Integration Tests (real way files) -Scores 31 test prompts against actual `way.md` files extracted from the live ways directory. Tests the real frontmatter extraction pipeline. +Scores 34 test prompts (including 3 co-activation) against actual `way.md` files extracted from the live ways directory. Tests the real frontmatter extraction pipeline. ```bash tests/way-match/run-tests.sh integration @@ -39,7 +39,7 @@ bash tools/way-match/test-integration.sh **What it covers**: End-to-end scoring with real way vocabulary, multi-way discrimination (does the right way win?), threshold behavior with actual threshold values. -**Current baseline**: BM25 27/31 (0 FP), NCD 15/31 (3 FP). +**Current baseline**: BM25 28/34 (0 FP), NCD 16/34 (3 FP). ### 3. Activation Test (live agent + subagent) diff --git a/tools/way-match/test-fixtures.jsonl b/tools/way-match/test-fixtures.jsonl index b0fac6f..46a26ca 100644 --- a/tools/way-match/test-fixtures.jsonl +++ b/tools/way-match/test-fixtures.jsonl @@ -52,3 +52,19 @@ {"prompt": "what time zone is Tokyo in", "expected": null, "match": false, "category": "negative"} {"prompt": "summarize this document for me", "expected": null, "match": false, "category": "negative"} {"prompt": "translate this paragraph to Spanish", "expected": null, "match": false, "category": "negative"} +{"prompt": "create a migration to alter the users table and add an index on email", "expected": ["softwaredev-delivery-migrations", "softwaredev-delivery-patches"], "match": true, "category": "coactivation", "note": "delivery domain overlap: migrations schema + patches diff"} +{"prompt": "write tests to verify the exception handling works correctly", "expected": ["softwaredev-code-testing", "softwaredev-code-errors"], "match": true, "category": "coactivation", "note": "testing verify + errors exception handling"} +{"prompt": "audit our dependencies for security vulnerabilities", "expected": ["softwaredev-environment-deps", "softwaredev-code-security"], "match": true, "category": "coactivation", "note": "shared vulnerability and audit terms"} +{"prompt": "debug the unhandled exception and add proper error handling", "expected": ["softwaredev-environment-debugging", "softwaredev-code-errors"], "match": true, "category": "coactivation", "note": "debugging investigate + errors exception handling"} +{"prompt": "design the database schema for the new microservice", "expected": ["softwaredev-architecture-design", "softwaredev-delivery-migrations"], "match": true, "category": "coactivation", "note": "shared schema/database terms across architecture and delivery"} +{"prompt": "refactor this module to reduce coupling and improve cohesion", "expected": ["softwaredev-code-quality", "softwaredev-architecture-design"], "match": true, "category": "coactivation", "note": "shared coupling/cohesion between quality metrics and design patterns"} +{"prompt": "perform a STRIDE analysis on the authentication flow", "expected": "softwaredev-architecture-threat-modeling", "match": true, "category": "direct"} +{"prompt": "document the trust boundaries between services", "expected": "softwaredev-architecture-threat-modeling", "match": true, "category": "direct"} +{"prompt": "update the risk register with the new attack surface", "expected": "softwaredev-architecture-threat-modeling", "match": true, "category": "direct"} +{"prompt": "establish coding standards for the team", "expected": "softwaredev-docs-standards", "match": true, "category": "direct"} +{"prompt": "write a dependency policy with evaluation guidelines", "expected": "softwaredev-docs-standards", "match": true, "category": "direct"} +{"prompt": "define accessibility guidelines for the frontend", "expected": "softwaredev-docs-standards", "match": true, "category": "direct"} +{"prompt": "write an RFC for the new caching architecture", "expected": "softwaredev-architecture-design", "match": true, "category": "direct", "note": "RFC vocabulary expansion"} +{"prompt": "sketch out a proposal for the event-driven refactoring", "expected": "softwaredev-architecture-design", "match": true, "category": "synonym", "note": "design way RFC/proposal expansion"} +{"prompt": "what are the risks of going to the beach today", "expected": null, "match": false, "category": "negative", "note": "risk is in threat-modeling but context is non-technical"} +{"prompt": "follow the standard operating procedure for checkout", "expected": null, "match": false, "category": "negative", "note": "standard is in standards way but context is non-technical"} diff --git a/tools/way-match/test-harness.sh b/tools/way-match/test-harness.sh index 2f51da5..fac5c7a 100755 --- a/tools/way-match/test-harness.sh +++ b/tools/way-match/test-harness.sh @@ -24,7 +24,7 @@ WAY_DESC[softwaredev-docs-api]="designing REST APIs, HTTP endpoints, API version WAY_VOCAB[softwaredev-docs-api]="endpoint api rest route http status pagination versioning graphql request response header payload crud webhook" WAY_THRESH[softwaredev-docs-api]="2.0" -WAY_DESC[softwaredev-environment-debugging]="debugging code issues, troubleshooting errors, investigating broken behavior, fixing bugs" +WAY_DESC[softwaredev-environment-debugging]="debugging code issues, troubleshooting failures, investigating broken behavior, fixing bugs" WAY_VOCAB[softwaredev-environment-debugging]="debug breakpoint stacktrace investigate troubleshoot regression bisect crash error fail bug log trace exception segfault hang timeout" WAY_THRESH[softwaredev-environment-debugging]="2.0" @@ -33,7 +33,7 @@ WAY_VOCAB[softwaredev-code-security]="authentication secrets password credential WAY_THRESH[softwaredev-code-security]="2.0" WAY_DESC[softwaredev-architecture-design]="software system design architecture patterns database schema component modeling" -WAY_VOCAB[softwaredev-architecture-design]="architecture pattern database schema modeling interface component modules factory observer strategy monolith microservice domain layer coupling cohesion abstraction singleton" +WAY_VOCAB[softwaredev-architecture-design]="architecture pattern database schema modeling interface component modules factory observer strategy monolith microservice domain layer coupling cohesion abstraction singleton proposal rfc sketch deliberation whiteboard" WAY_THRESH[softwaredev-architecture-design]="2.0" WAY_DESC[softwaredev-environment-config]="application configuration, environment variables, dotenv files, config file management" @@ -88,7 +88,15 @@ WAY_DESC[softwaredev-docs]="README authoring, docstrings, technical prose, Merma WAY_VOCAB[softwaredev-docs]="readme docstring technical writing mermaid diagram flowchart sequence onboarding" WAY_THRESH[softwaredev-docs]="2.0" -WAY_IDS=(softwaredev-code-testing softwaredev-docs-api softwaredev-environment-debugging softwaredev-code-security softwaredev-architecture-design softwaredev-environment-config softwaredev-architecture-adr-context softwaredev-delivery-commits softwaredev-delivery-github softwaredev-delivery-patches softwaredev-delivery-release softwaredev-delivery-migrations softwaredev-code-errors softwaredev-code-quality softwaredev-code-performance softwaredev-environment-deps softwaredev-environment-ssh softwaredev-docs) +WAY_DESC[softwaredev-architecture-threat-modeling]="threat modeling, STRIDE analysis, trust boundaries, attack surface assessment, security design review" +WAY_VOCAB[softwaredev-architecture-threat-modeling]="threat model stride attack surface trust boundary mitigation adversary dread spoofing tampering repudiation elevation" +WAY_THRESH[softwaredev-architecture-threat-modeling]="2.0" + +WAY_DESC[softwaredev-docs-standards]="establishing team norms, coding conventions, testing philosophy, dependency policy, accessibility requirements" +WAY_VOCAB[softwaredev-docs-standards]="convention norm guideline accessibility style guide linting rule agreement philosophy" +WAY_THRESH[softwaredev-docs-standards]="2.0" + +WAY_IDS=(softwaredev-code-testing softwaredev-docs-api softwaredev-environment-debugging softwaredev-code-security softwaredev-architecture-design softwaredev-environment-config softwaredev-architecture-adr-context softwaredev-delivery-commits softwaredev-delivery-github softwaredev-delivery-patches softwaredev-delivery-release softwaredev-delivery-migrations softwaredev-code-errors softwaredev-code-quality softwaredev-code-performance softwaredev-environment-deps softwaredev-environment-ssh softwaredev-docs softwaredev-architecture-threat-modeling softwaredev-docs-standards) # --- Options --- RUN_NCD=true @@ -117,6 +125,7 @@ fi ncd_tp=0 ncd_fp=0 ncd_tn=0 ncd_fn=0 bm25_tp=0 bm25_fp=0 bm25_tn=0 bm25_fn=0 bm25_wins=0 ncd_wins=0 ties=0 +coact_full=0 coact_partial=0 coact_miss=0 coact_total=0 total=0 # --- NCD scorer --- @@ -214,71 +223,109 @@ echo "" while IFS= read -r line; do prompt=$(echo "$line" | jq -r '.prompt') - expected=$(echo "$line" | jq -r '.expected // "none"') - should_match=$(echo "$line" | jq -r '.match') category=$(echo "$line" | jq -r '.category') note=$(echo "$line" | jq -r '.note // ""') + # Parse expected: null → negative, string → single, array → co-activation + expected_type=$(echo "$line" | jq -r '.expected | type') + expected_list=() + is_negative=false + is_coact=false + + case "$expected_type" in + null) is_negative=true ;; + string) expected_list=("$(echo "$line" | jq -r '.expected')") ;; + array) mapfile -t expected_list < <(echo "$line" | jq -r '.expected[]') + [[ ${#expected_list[@]} -gt 1 ]] && is_coact=true ;; + esac + total=$((total + 1)) + $is_coact && coact_total=$((coact_total + 1)) ncd_result="skip" bm25_result="skip" - # NCD scoring - if [[ "$RUN_NCD" == true ]]; then - if [[ "$expected" == "none" ]]; then + # --- Scorer evaluation function --- + # Usage: eval_scorer + # Sets: ${scorer}_result variable + eval_scorer() { + local scorer="$1" prompt="$2" + shift 2 + local exp_list=("$@") + local result="" + + if $is_negative; then # Negative test: check no way matches - any_match=false + local any_match=false for way_id in "${WAY_IDS[@]}"; do - if ncd_matches_way "$prompt" "$way_id"; then + if "${scorer}_matches_way" "$prompt" "$way_id"; then any_match=true - ncd_result="FP:$way_id" + result="FP:$way_id" break fi done if [[ "$any_match" == false ]]; then - ncd_result="TN" - ncd_tn=$((ncd_tn + 1)) + result="TN" + fi + elif $is_coact; then + # Co-activation: check ALL expected ways match + local matched=0 + local missed="" + for exp in "${exp_list[@]}"; do + if "${scorer}_matches_way" "$prompt" "$exp"; then + matched=$((matched + 1)) + else + missed+="${exp##*-} " + fi + done + if [[ $matched -eq ${#exp_list[@]} ]]; then + result="FULL" + elif [[ $matched -gt 0 ]]; then + result="PARTIAL:${missed% }" else - ncd_fp=$((ncd_fp + 1)) + result="MISS" fi else - # Positive test: check expected way matches - if ncd_matches_way "$prompt" "$expected"; then - ncd_result="TP" - ncd_tp=$((ncd_tp + 1)) + # Single-expected: check the one expected way matches + if "${scorer}_matches_way" "$prompt" "${exp_list[0]}"; then + result="TP" else - ncd_result="FN" - ncd_fn=$((ncd_fn + 1)) + result="FN" fi fi + + echo "$result" + } + + # NCD scoring + if [[ "$RUN_NCD" == true ]]; then + ncd_result=$(eval_scorer "ncd" "$prompt" "${expected_list[@]+"${expected_list[@]}"}") + case "$ncd_result" in + TP|FULL) ncd_tp=$((ncd_tp + 1)) ;; + TN) ncd_tn=$((ncd_tn + 1)) ;; + FN|MISS) ncd_fn=$((ncd_fn + 1)) ;; + FP:*) ncd_fp=$((ncd_fp + 1)) ;; + PARTIAL:*) ncd_fn=$((ncd_fn + 1)) ;; + esac fi # BM25 scoring if [[ "$RUN_BM25" == true ]]; then - if [[ "$expected" == "none" ]]; then - any_match=false - for way_id in "${WAY_IDS[@]}"; do - if bm25_matches_way "$prompt" "$way_id"; then - any_match=true - bm25_result="FP:$way_id" - break - fi - done - if [[ "$any_match" == false ]]; then - bm25_result="TN" - bm25_tn=$((bm25_tn + 1)) - else - bm25_fp=$((bm25_fp + 1)) - fi - else - if bm25_matches_way "$prompt" "$expected"; then - bm25_result="TP" - bm25_tp=$((bm25_tp + 1)) - else - bm25_result="FN" - bm25_fn=$((bm25_fn + 1)) - fi + bm25_result=$(eval_scorer "bm25" "$prompt" "${expected_list[@]+"${expected_list[@]}"}") + case "$bm25_result" in + TP|FULL) bm25_tp=$((bm25_tp + 1)) ;; + TN) bm25_tn=$((bm25_tn + 1)) ;; + FN|MISS) bm25_fn=$((bm25_fn + 1)) ;; + FP:*) bm25_fp=$((bm25_fp + 1)) ;; + PARTIAL:*) bm25_fn=$((bm25_fn + 1)) ;; + esac + # Track co-activation detail for BM25 + if $is_coact; then + case "$bm25_result" in + FULL) coact_full=$((coact_full + 1)) ;; + PARTIAL:*) coact_partial=$((coact_partial + 1)) ;; + MISS) coact_miss=$((coact_miss + 1)) ;; + esac fi fi @@ -286,8 +333,8 @@ while IFS= read -r line; do if [[ "$RUN_NCD" == true ]] && [[ "$RUN_BM25" == true ]]; then ncd_correct=false bm25_correct=false - [[ "$ncd_result" == "TP" || "$ncd_result" == "TN" ]] && ncd_correct=true - [[ "$bm25_result" == "TP" || "$bm25_result" == "TN" ]] && bm25_correct=true + [[ "$ncd_result" == "TP" || "$ncd_result" == "TN" || "$ncd_result" == "FULL" ]] && ncd_correct=true + [[ "$bm25_result" == "TP" || "$bm25_result" == "TN" || "$bm25_result" == "FULL" ]] && bm25_correct=true if [[ "$bm25_correct" == true ]] && [[ "$ncd_correct" == false ]]; then bm25_wins=$((bm25_wins + 1)) @@ -298,26 +345,31 @@ while IFS= read -r line; do fi fi - # Output - if [[ "$VERBOSE" == true ]] || [[ "$ncd_result" == "FN" ]] || [[ "$ncd_result" == FP:* ]] || [[ "$bm25_result" == "FN" ]] || [[ "$bm25_result" == FP:* ]]; then + # Output — show failures always, everything in verbose + show=false + if [[ "$VERBOSE" == true ]]; then show=true; fi + case "$ncd_result" in FN|MISS|FP:*|PARTIAL:*) show=true ;; esac + case "$bm25_result" in FN|MISS|FP:*|PARTIAL:*) show=true ;; esac + + if $show; then printf "%-3s " "$total" - printf "[%-7s] " "$category" + printf "[%-12s] " "$category" # NCD result if [[ "$RUN_NCD" == true ]]; then case "$ncd_result" in - TP|TN) printf "${GREEN}NCD:%-6s${NC} " "$ncd_result" ;; - FN) printf "${RED}NCD:%-6s${NC} " "$ncd_result" ;; - FP:*) printf "${YELLOW}NCD:%-6s${NC} " "$ncd_result" ;; + TP|TN|FULL) printf "${GREEN}NCD:%-10s${NC} " "$ncd_result" ;; + FN|MISS) printf "${RED}NCD:%-10s${NC} " "$ncd_result" ;; + FP:*|PARTIAL:*) printf "${YELLOW}NCD:%-10s${NC} " "$ncd_result" ;; esac fi # BM25 result if [[ "$RUN_BM25" == true ]]; then case "$bm25_result" in - TP|TN) printf "${GREEN}BM25:%-6s${NC} " "$bm25_result" ;; - FN) printf "${RED}BM25:%-6s${NC} " "$bm25_result" ;; - FP:*) printf "${YELLOW}BM25:%-6s${NC} " "$bm25_result" ;; + TP|TN|FULL) printf "${GREEN}BM25:%-10s${NC} " "$bm25_result" ;; + FN|MISS) printf "${RED}BM25:%-10s${NC} " "$bm25_result" ;; + FP:*|PARTIAL:*) printf "${YELLOW}BM25:%-10s${NC} " "$bm25_result" ;; esac fi @@ -349,3 +401,8 @@ if [[ "$RUN_NCD" == true ]] && [[ "$RUN_BM25" == true ]]; then echo "" echo "Head-to-head: BM25 wins=$bm25_wins NCD wins=$ncd_wins ties=$ties" fi + +if [[ $coact_total -gt 0 ]]; then + echo "" + echo "Co-activation ($coact_total tests): full=$coact_full partial=$coact_partial miss=$coact_miss" +fi diff --git a/tools/way-match/test-integration.sh b/tools/way-match/test-integration.sh index 36f3c24..2cae8f3 100755 --- a/tools/way-match/test-integration.sh +++ b/tools/way-match/test-integration.sh @@ -94,6 +94,10 @@ TEST_CASES=( "softwaredev-code-security|are our API keys exposed anywhere" "softwaredev-architecture-design|should we use a monolith or microservices architecture" "softwaredev-environment-config|the database connection string needs updating" + # Co-activation cases — comma-separated expected ways + "softwaredev-environment-debugging,softwaredev-code-errors|debug the unhandled exception and add proper error handling" + "softwaredev-environment-deps,softwaredev-code-security|audit our dependencies for security vulnerabilities" + "softwaredev-architecture-design,softwaredev-delivery-migrations|design the database schema for the new microservice" ) # --- Run tests --- @@ -136,6 +140,11 @@ for test_case in "${TEST_CASES[@]}"; do fi done + # Parse expected: comma-separated for co-activation (e.g., "way-a,way-b") + IFS=',' read -ra expected_list <<< "$expected" + is_coact=false + [[ ${#expected_list[@]} -gt 1 ]] && is_coact=true + # Evaluate BM25 bm25_ok=false if [[ "$expected" == "NONE" ]]; then @@ -145,11 +154,15 @@ for test_case in "${TEST_CASES[@]}"; do bm25_fp=$((bm25_fp + 1)) fi else - found=false - for m in "${bm25_matches[@]}"; do - [[ "$m" == "$expected" ]] && found=true + all_found=true + for exp in "${expected_list[@]}"; do + found=false + for m in "${bm25_matches[@]}"; do + [[ "$m" == "$exp" ]] && found=true && break + done + [[ "$found" == false ]] && all_found=false done - if [[ "$found" == true ]]; then + if [[ "$all_found" == true ]]; then bm25_tp=$((bm25_tp + 1)); bm25_ok=true else bm25_fn=$((bm25_fn + 1)) @@ -165,11 +178,15 @@ for test_case in "${TEST_CASES[@]}"; do ncd_fp=$((ncd_fp + 1)) fi else - found=false - for m in "${ncd_matches[@]}"; do - [[ "$m" == "$expected" ]] && found=true + all_found=true + for exp in "${expected_list[@]}"; do + found=false + for m in "${ncd_matches[@]}"; do + [[ "$m" == "$exp" ]] && found=true && break + done + [[ "$found" == false ]] && all_found=false done - if [[ "$found" == true ]]; then + if [[ "$all_found" == true ]]; then ncd_tp=$((ncd_tp + 1)); ncd_ok=true else ncd_fn=$((ncd_fn + 1)) @@ -193,6 +210,8 @@ for test_case in "${TEST_CASES[@]}"; do if [[ "$expected" == "NONE" ]]; then printf "expect=NONE " + elif $is_coact; then + printf "expect=[%s] " "$(echo "$expected" | sed 's/softwaredev-//g')" else printf "expect=%-28s " "$(echo "$expected" | sed 's/softwaredev-//')" fi