security(prompt): pre-prompt injection sanitizer + defensive frame (closes #5, v1.0.7)#7
Merged
Conversation
…loses #5, v1.0.7) The Apr 29 2026 audit flagged that pasted command output containing prompt-injection markers ("ignore previous instructions", role-overrides, etc) can steer Claude Opus toward incorrect hypotheses, wasting the investigation budget on a bad path. The deterministic Layer 1-4 validator still holds — the LLM cannot be tricked into running dangerous commands — but the *conclusions* of an investigation can be misled. This is a defense-in-layers fix, not a claim of perfection. Changes - src/ghosthunter/security/prompt_sanitizer.py (new): 7 known injection patterns redacted to [INJECTION-PATTERN-REDACTED] placeholders. Patterns cover: ignore-previous-instructions, you-are-now-X, system-role-override, forget-role, disregard-everything-above, admin/system/override/sudo XML tags, "new instructions:" markers. Returns SanitizationResult with per-pattern hit counts so callers can log redactions to audit.log (future v1.0.7+). - src/ghosthunter/investigator.py: _format_for_compression() now runs stdout + stderr through sanitize_for_prompt() AND wraps the result in a <command_output> defensive frame. Both layers run unconditionally — even if a future injection shape evades the pattern list, the wrapper still tells the LLM not to follow embedded instructions. - tests/test_prompt_sanitizer.py (new): 26 tests across hits/misses, multi-pattern counts, defensive wrapper, pattern-registry invariants, and end-to-end integration with _format_for_compression. False positives explicitly tolerated; false negatives are not. - SECURITY.md: section 1 of "What Ghosthunter does NOT protect against" updated with the v1.0.7 mitigation note. Honest about scope — best-effort, not absolute. Verified - pytest tests/test_prompt_sanitizer.py -> 26 passed - pytest tests/ -> 1226 passed (all green, including the existing awk-block tests from #4) - ruff check + ruff format --check -> clean Closes #5. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #5.
Pre-prompt sanitizer for the Apr 29 audit's Gap #1 (prompt injection via pasted command output). Defense in layers, not absolute.
What ships:
security/prompt_sanitizer.py— 7 injection patterns, redaction to[INJECTION-PATTERN-REDACTED]placeholder, per-pattern hit counts for future audit logging.investigator.py::_format_for_compressionnow sanitizes stdout/stderr AND wraps the result in a<command_output>defensive frame telling the LLM the contents are untrusted data.tests/test_prompt_sanitizer.pycovering hits, misses, counts, wrapper, registry invariants, and end-to-end integration.Verification:
pytest tests/test_prompt_sanitizer.py→ 26 passedpytest tests/→ 1226 passed🤖 Generated with Claude Code