docs: sync all markdown files with v0.5.0 final state

Sephyi · Sephyi · commit 08d6d4893abc · 2026-03-22T05:30:36.000+01:00
- Rename v0.5.0 release to "Beyond the Diff"
- Test count 340 → 367, secret patterns 25 → 24
- PRD v4.2: add FR-062 (security hardening), FR-063 (prompt optimization),
  update eval harness description, fix test file list, fuzz targets 3 → 5
- CHANGELOG: add Security, Prompt Quality, Testing, API subsections
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -8,17 +8,45 @@ SPDX-License-Identifier: PolyForm-Noncommercial-1.0.0
 
 All notable changes to CommitBee are documented here.
 
-## `v0.5.0` — Understand Everything (current)
-
-- **Full signature extraction** — The LLM sees `pub fn connect(host: &str, timeout: Duration) -> Result<Connection>`, not just "Function connect." Extracted from tree-sitter AST nodes using a two-strategy approach: `child_by_field_name("body")` primary, body-node-kind scan fallback. Works across all 10 languages.
-- **Signature diffs for modified symbols** — When a function signature changes, the prompt shows `[~] old_sig → new_sig` so the LLM understands exactly what was modified.
-- **Cross-file connection detection** — Detects when a changed file calls a symbol defined in another changed file. Shown as `CONNECTIONS: validator calls parse() — both changed` in the prompt.
-- **Semantic change classification** — Modified symbols are classified as whitespace-only or semantic via character-stream comparison. Formatting-only changes (cargo fmt, prettier) auto-detected as `style` type when all modified symbols are whitespace-only.
-- **Dual old/new line tracking** — `classify_span_change` correctly tracks old-file and new-file line numbers independently, handling cases where symbols shift positions due to added/removed lines above them.
-- **Token budget rebalance** — Symbol section gets 30% of budget (up from 20%) when signatures are present, since richer symbols reduce the LLM's dependency on raw diff.
-- **BODY_NODE_KINDS coverage** — Signature extraction verified across all 10 languages with dedicated tests for Java, C, C++, Ruby, and C#.
-- **Connection reliability** — Short symbol names (<4 chars) filtered to prevent false positives, short-circuit after 5 connections, sort+dedup for correctness.
-- **Fixed false positive in breaking change detection** — Modified public symbols were incorrectly counted as "removed APIs", causing spurious `breaking_change` validator violations and retry exhaustion.
+## `v0.5.0` — Beyond the Diff (current)
+
+### Semantic Analysis
+
+- **Full signature extraction** — The LLM sees `pub fn connect(host: &str, timeout: Duration) -> Result<Connection>`, not just "Function connect." Two-strategy body detection: `child_by_field_name("body")` primary, `BODY_NODE_KINDS` fallback. Works across all 10 languages.
+- **Signature diffs for modified symbols** — When a function signature changes, the prompt shows `[~] old_sig → new_sig`.
+- **Cross-file connection detection** — Detects when a changed file calls a symbol defined in another changed file. Shown as `CONNECTIONS: validator calls parse() — both changed`.
+- **Semantic change classification** — Modified symbols classified as whitespace-only or semantic via character-stream comparison. Formatting-only changes auto-detected as `style`.
+- **Dual old/new line tracking** — Correctly handles symbols shifting positions between HEAD and staged.
+- **Token budget rebalance** — Symbol section gets 30% of budget (up from 20%) when signatures present.
+
+### Security
+
+- **Block project config URL overrides** — `.commitbee.toml` can no longer redirect `openai_base_url`, `anthropic_base_url`, or `ollama_host` to prevent SSRF/exfiltration of API keys and staged code.
+- **Cap streaming line_buffer** — All 3 LLM providers cap `line_buffer` at 1 MB to prevent unbounded memory growth from malicious servers.
+- **Strip URLs from error messages** — `reqwest::Error` display uses `without_url()` to prevent leaking configured base URLs.
+- **Broadened OpenAI secret pattern** — Detects `sk-proj-` and `sk-svcacct-` prefixed keys alongside legacy `sk-` format.
+- **Replaced Box::leak with Cow** — Custom secret pattern names use `Cow<'static, str>` instead of leaked heap allocations.
+
+### Prompt Quality
+
+- **Fixed breaking change subject budget** — Subject character budget now accounts for `!` suffix, preventing guaranteed validator rejection on breaking changes.
+- **Omit empty EVIDENCE section** — Saves ~200 chars when all flags are at default (most changes).
+- **Symbol marker legend** — SYSTEM_PROMPT now explains `[+] added, [-] removed, [~] modified`.
+- **Removed duplicate JSON schema** — System prompt no longer includes a competing schema template.
+- **Replaced emoji with text** — `⚠` replaced with `WARNING:` for better small-model tokenization.
+- **Enhanced Python queries** — Tree-sitter now captures decorated functions and classes.
+
+### Testing & Evaluation
+
+- **Evaluation harness** — 36 fixtures covering all 11 commit types, AST features, and edge cases. Per-type accuracy reporting with `EvalSummary`.
+- **15+ new unit tests** — Coverage for `detect_primary_change`, `detect_metadata_breaking`, `detect_bug_evidence` (all 7 patterns), Deleted/Renamed status, signature edge cases, connection content assertions.
+- **5 fuzz targets** — `fuzz_sanitizer`, `fuzz_safety`, `fuzz_diff_parser`, `fuzz_signature`, `fuzz_classify_span`.
+- **367 tests** total (up from 308 at v0.4.0).
+
+### API
+
+- **Demoted internal types** — `SymbolChangeType`, `GitService`, `Progress` changed from `pub` to `pub(crate)`.
+- **Added `#[non_exhaustive]`** to `SymbolChangeType` for future-safe extension.
 
 ## `v0.4.0` — See Everything
 
@@ -27,7 +55,7 @@ All notable changes to CommitBee are documented here.
 - **Multi-language commit messages** — Generate messages in any language with `--locale` flag or `locale` config (e.g., `--locale de` for German).
 - **Commit history style learning** — Learns from recent commit history to match your project's style (`learn_from_history`, `history_sample_size` config).
 - **Rename detection** — Detects file renames with similarity percentage via `git diff --find-renames`, displayed as `old → new (N% similar)` in prompts and split suggestions. Configurable threshold (default 70%, set to 0 to disable).
-- **Expanded secret scanning** — 25 built-in patterns across 13 categories (cloud providers, AI/ML, source control, communication, payment, database, cryptographic, generic). Pluggable engine: add custom regex patterns or disable built-ins by name via config.
+- **Expanded secret scanning** — 24 built-in patterns across 13 categories (cloud providers, AI/ML, source control, communication, payment, database, cryptographic, generic). Pluggable engine: add custom regex patterns or disable built-ins by name via config.
 - **Progress indicators** — Contextual `indicatif` spinners during pipeline phases (analyzing, scanning, generating). Auto-suppressed in non-TTY environments (git hooks, pipes).
 - **Evaluation harness** — `cargo test --features eval` for structured LLM output quality benchmarking.
 - **Fuzz testing** — `cargo-fuzz` targets for sanitizer and diff parser robustness.
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -132,7 +132,7 @@ src/
     ├── analyzer.rs      # AnalyzerService (tree-sitter queries, parallel via rayon)
     ├── context.rs       # ContextBuilder (token budget)
     ├── history.rs       # HistoryService (commit style learning)
-    ├── safety.rs        # Secret scanning (25 patterns), conflict detection
+    ├── safety.rs        # Secret scanning (24 patterns), conflict detection
     ├── sanitizer.rs     # CommitSanitizer (JSON + plain text, BREAKING CHANGE footer)
     ├── splitter.rs      # CommitSplitter (multi-commit detection)
     ├── template.rs      # TemplateService (custom prompt templates)
@@ -192,7 +192,7 @@ src/
 ### Running Tests
 
 ```bash
-cargo test                    # All tests (340 tests)
+cargo test                    # All tests (367 tests)
 cargo test --test sanitizer   # CommitSanitizer tests
 cargo test --test safety      # Safety module tests
 cargo test --test context     # ContextBuilder tests
diff --git a/DOCS.md b/DOCS.md
@@ -175,7 +175,7 @@ think = false
 # Set to 0 to disable rename detection
 rename_threshold = 70
 
-# Custom secret patterns (regex). Added to the 25 built-in patterns.
+# Custom secret patterns (regex). Added to the 24 built-in patterns.
 # custom_secret_patterns = ["CUSTOM_KEY_[a-zA-Z0-9]{32}"]
 
 # Disable built-in secret patterns by name (case-insensitive).
@@ -452,7 +452,7 @@ If the sanitizer can't produce a valid commit message, you get a clear error exp
 
 ### Secret Scanning
 
-Before anything is sent to an LLM, CommitBee scans all staged content with **25 built-in patterns** across 13 categories:
+Before anything is sent to an LLM, CommitBee scans all staged content with **24 built-in patterns** across 13 categories:
 
 | Category | Patterns |
 | --- | --- |
@@ -657,7 +657,7 @@ src/
     ├── git.rs           # GitService — gix for discovery, git CLI for diffs
     ├── analyzer.rs      # AnalyzerService — tree-sitter parsing via rayon
     ├── context.rs       # ContextBuilder — evidence flags, token budget
-    ├── safety.rs        # Secret scanning (25 patterns), conflict detection
+    ├── safety.rs        # Secret scanning (24 patterns), conflict detection
     ├── sanitizer.rs     # CommitSanitizer + CommitValidator
     ├── splitter.rs      # CommitSplitter — diff-shape + Jaccard clustering
     ├── progress.rs      # Progress indicators (indicatif spinners, TTY-aware)
@@ -694,7 +694,7 @@ No panics in user-facing code paths. The sanitizer and validator are tested with
 
 ### Testing Strategy
 
-CommitBee has 340 tests across multiple strategies:
+CommitBee has 367 tests across multiple strategies:
 
 | Strategy | What It Covers |
 | --- | --- |
@@ -707,7 +707,7 @@ CommitBee has 340 tests across multiple strategies:
 Run them:
 
 ```bash
-cargo test                    # All 340 tests
+cargo test                    # All 367 tests
 cargo test --test sanitizer   # Just sanitizer tests
 cargo test --test integration # LLM provider mocks
 COMMITBEE_LOG=debug cargo test -- --nocapture  # With logging
diff --git a/PRD.md b/PRD.md
@@ -6,19 +6,20 @@ SPDX-License-Identifier: PolyForm-Noncommercial-1.0.0
 
 # CommitBee — Product Requirements Document
 
-**Version**: 4.1
+**Version**: 4.2
 **Date**: 2026-03-22
 **Status**: Active  
 **Author**: [Sephyi](https://github.com/Sephyi) + [Claude Opus 4.6](https://www.anthropic.com/news/claude-opus-4-6)  
 
 ## Changelog
 
 <details>
-<summary>Revision history (v3.3 → v4.1)</summary>
+<summary>Revision history (v3.3 → v4.2)</summary>
 
 | Version | Date       | Summary |
 |---------|------------|---------|
-| 4.1     | 2026-03-22 | AST context overhaul (v0.5.0): full signature extraction from tree-sitter nodes, semantic change classification (whitespace vs body vs signature), old→new signature diffs, cross-file connection detection, formatting auto-detection via symbols. 340 tests. |
+| 4.2     | 2026-03-22 | v0.5.0 hardening: security fixes (SSRF prevention, streaming caps), prompt optimization (budget fix, evidence omission, emoji removal), eval harness (36 fixtures, per-type reporting), test coverage (15+ new tests), API hygiene (pub(crate) demotions), 5 fuzz targets. 367 tests. |
+| 4.1     | 2026-03-22 | AST context overhaul (v0.5.0): full signature extraction from tree-sitter nodes, semantic change classification (whitespace vs body vs signature), old→new signature diffs, cross-file connection detection, formatting auto-detection via symbols. 367 tests. |
 | 4.0     | 2026-03-13 | PRD normalization: aligned phases with shipped versions (v0.2.0/v0.3.x/v0.4.0), collapsed revision history, unified status markers, resolved stale critical issues, canonicalized test count to 308, removed dead cross-references. FR-031 (Exclude Files) and FR-033 (Copy to Clipboard) shipped. |
 | 3.3     | 2026-03-13 | v0.4.0 full feature completion — FR-030 (Custom Prompt Templates), FR-032 (Multi-Language), FR-036 (Tree-sitter Query Patterns), FR-057 (Additional Languages), FR-058 (History Learning), TR-006 (Eval Harness), TR-007 (Fuzzing). 308 tests. |
 | 3.2     | 2026-03-13 | FR-035 (Rename Detection), FR-037 (Expanded Secret Scanning), FR-038 (Progress Indicators). 202 tests. |
@@ -91,7 +92,7 @@ CommitBee is a Rust-native CLI tool that uses tree-sitter semantic analysis and
 | Multiple message generation (pick from N)          | Common (aicommits, aicommit2) | ✅ v0.2.0       |
 | Commit splitting (multi-concern detection)         | No competitor has this        | ✅ v0.2.0       |
 | Custom prompt/instruction files                    | Growing (Copilot, aicommit2)  | ✅ v0.4.0       |
-| Unit/integration tests                             | Non-negotiable for quality    | ✅ 340 tests    |
+| Unit/integration tests                             | Non-negotiable for quality    | ✅ 367 tests    |
 
 ## 3. Architecture
 
@@ -158,7 +159,7 @@ commitbee
 │       ├── git.rs           # GitService trait + impl (async, single-diff)
 │       ├── analyzer.rs      # AnalyzerService (parallel parsing via rayon)
 │       ├── context.rs       # ContextBuilder (fixed budget math, fallback ladder)
-│       ├── safety.rs        # Secret scanning (25 patterns, pluggable engine)
+│       ├── safety.rs        # Secret scanning (24 patterns, pluggable engine)
 │       ├── sanitizer.rs     # CommitSanitizer (UTF-8 safe) + CommitValidator (7 rules)
 │       ├── splitter.rs      # CommitSplitter (Jaccard + fingerprinting)
 │       ├── template.rs      # TemplateService (custom prompt templates)
@@ -170,16 +171,19 @@ commitbee
 │           └── anthropic.rs # Anthropic Claude
 ├── tests/
 │   ├── snapshots/           # insta snapshot files
-│   ├── fixtures/            # Test git repos, diff samples, golden semantic fixtures, eval fixtures
-│   ├── languages.rs         # Feature-gated language tests
-│   ├── sanitizer.rs         # Unit + snapshot + proptest
-│   ├── context.rs           # Unit + snapshot
-│   ├── safety.rs            # Unit + proptest
-│   ├── analyzer.rs          # Unit + snapshot with fixture files
-│   ├── git.rs               # Integration with tempfile repos
-│   ├── ollama.rs            # Integration with wiremock
-│   └── cli.rs               # CLI integration with assert_cmd
-├── fuzz/                    # cargo-fuzz targets (sanitizer, safety, diff parser)
+│   ├── fixtures/            # Eval fixtures (36 scenarios), diff samples
+│   ├── helpers.rs           # Shared test helpers (make_file_change, make_staged_changes)
+│   ├── context.rs           # ContextBuilder, type inference, evidence, signatures, connections
+│   ├── sanitizer.rs         # CommitSanitizer + CommitValidator (unit + snapshot + proptest)
+│   ├── splitter.rs          # CommitSplitter grouping and merge logic
+│   ├── languages.rs         # Feature-gated per-language symbol + signature extraction
+│   ├── safety.rs            # Secret scanning patterns + conflict detection
+│   ├── integration.rs       # LLM provider round-trips with wiremock
+│   ├── history.rs           # HistoryService with tempfile git repos
+│   ├── template.rs          # TemplateService custom/default templates
+│   ├── commit_type.rs       # CommitType parsing and ALL sync
+│   └── eval.rs              # Eval harness fixture validation (feature-gated)
+├── fuzz/                    # cargo-fuzz targets (sanitizer, safety, diff parser, signature, classify_span)
 └── completions/             # Generated shell completions
 ```
 
@@ -438,11 +442,11 @@ Config: `learn_from_history` (default `false`), `history_sample_size` (default 5
 
 #### TR-006: Evaluation Harness ✅
 
-`commitbee eval` — runs full pipeline against fixture diffs, compares against expected snapshots. Feature-gated (`eval` feature). Fixtures in `tests/fixtures/eval/`. Pass/fail report with diff of expected vs. actual.
+`commitbee eval` — runs full pipeline against fixture diffs with assertion-based validation. Feature-gated (`eval` feature). 36 fixtures in `tests/fixtures/eval/` covering all 11 commit types, AST features (signatures, connections, whitespace classification), and edge cases. Each fixture has `metadata.toml` (assertions for type, evidence flags, prompt content, connections, breaking changes), `diff.patch`, and optional `symbols.toml` (injected CodeSymbol data). `EvalSummary` reports per-type accuracy and overall score. `run_sync()` method for integration test access.
 
 #### TR-007: Fuzzing ✅
 
-3 `cargo-fuzz` targets: `fuzz_sanitizer`, `fuzz_safety`, `fuzz_diff_parser`. `fuzz/Cargo.toml` with `libfuzzer-sys`.
+5 `cargo-fuzz` targets: `fuzz_sanitizer`, `fuzz_safety`, `fuzz_diff_parser`, `fuzz_signature`, `fuzz_classify_span`. `fuzz/Cargo.toml` with `libfuzzer-sys`.
 
 #### FR-031: Exclude Files ✅
 
@@ -466,6 +470,14 @@ Modified symbols (same name+kind+file in both HEAD and staged) are classified as
 
 Scans added diff lines for `symbol_name(` call patterns referencing symbols defined in other changed files. Connections displayed in new `CONNECTIONS:` prompt section (e.g., `validator calls parse() — both changed`). Capped at 5 connections to prevent prompt bloat. SYSTEM_PROMPT updated with connection-aware guidance. 1 test + 1 splitter integration test.
 
+#### FR-062: Security Hardening ✅
+
+Project-level `.commitbee.toml` can no longer override `openai_base_url`, `anthropic_base_url`, or `ollama_host` (SSRF/exfiltration prevention). All 3 streaming LLM providers cap `line_buffer` at `MAX_RESPONSE_BYTES` (1 MB) to prevent unbounded memory growth. `reqwest::Error` display stripped of URLs via `without_url()`. OpenAI secret pattern broadened to `sk-proj-` and `sk-svcacct-` prefixes. `Box::leak` replaced with `Cow<'static, str>` for custom secret pattern names.
+
+#### FR-063: Prompt Optimization for Small Models ✅
+
+Subject character budget accounts for `!` suffix on breaking changes. EVIDENCE section omitted when all flags are default (~200 chars saved). Symbol marker legend added to SYSTEM_PROMPT (`[+] added, [-] removed, [~] modified`). Duplicate JSON schema removed from system prompt. Emoji replaced with text labels (`WARNING:` instead of `⚠`). CONNECTIONS instruction softened for small models. Python tree-sitter queries enhanced with `decorated_definition` support.
+
 ### 4.6 Future — v0.6.0+ (Market Leadership)
 
 #### FR-050: MCP Server Mode
@@ -648,7 +660,7 @@ commitbee eval                         # Run evaluation harness (dev, feature-ga
 
 ## 8. Testing Requirements
 
-**Current test count: 334**
+**Current test count: 367**
 
 ### TR-001: Unit Tests
 
@@ -806,7 +818,7 @@ Invalid JSON → retry once with repair prompt. Second failure → heuristic ext
 | 2 | v0.3.x | ✅ Shipped | Differentiation — heuristics, validation, spec compliance |
 | 3 | v0.4.0 | ✅ Shipped | Feature completion — templates, languages, rename, history, eval, fuzzing |
 | 4 | v0.4.x | ✅ Shipped | Remaining polish — exclude files (FR-031), clipboard (FR-033) |
-| 5 | v0.5.0 | ✅ Shipped | AST context overhaul — full signatures, semantic change classification, cross-file connections. 340 tests. |
+| 5 | v0.5.0 | ✅ Shipped | AST context overhaul — full signatures, semantic change classification, cross-file connections. 367 tests. |
 | 6 | v0.6.0+ | 📋 Planned | Market leadership — MCP server, changelog, monorepo, version bumping, GitHub Action |
 
 ## 12. Success Metrics
diff --git a/README.md b/README.md