security(prompt): pre-prompt injection sanitizer + defensive frame (closes #5, v1.0.7) by avinash-matrixgard · Pull Request #7 · avinash-matrixgard/ghosthunter

avinash-matrixgard · 2026-04-30T00:36:52Z

Closes #5.

Pre-prompt sanitizer for the Apr 29 audit's Gap #1 (prompt injection via pasted command output). Defense in layers, not absolute.

What ships:

New security/prompt_sanitizer.py — 7 injection patterns, redaction to [INJECTION-PATTERN-REDACTED] placeholder, per-pattern hit counts for future audit logging.
investigator.py::_format_for_compression now sanitizes stdout/stderr AND wraps the result in a <command_output> defensive frame telling the LLM the contents are untrusted data.
26 new tests in tests/test_prompt_sanitizer.py covering hits, misses, counts, wrapper, registry invariants, and end-to-end integration.
SECURITY.md item 1 updated with the v1.0.7 mitigation note (honest about scope).

Verification:

pytest tests/test_prompt_sanitizer.py → 26 passed
pytest tests/ → 1226 passed
ruff check + format → clean

🤖 Generated with Claude Code

…loses #5, v1.0.7) The Apr 29 2026 audit flagged that pasted command output containing prompt-injection markers ("ignore previous instructions", role-overrides, etc) can steer Claude Opus toward incorrect hypotheses, wasting the investigation budget on a bad path. The deterministic Layer 1-4 validator still holds — the LLM cannot be tricked into running dangerous commands — but the *conclusions* of an investigation can be misled. This is a defense-in-layers fix, not a claim of perfection. Changes - src/ghosthunter/security/prompt_sanitizer.py (new): 7 known injection patterns redacted to [INJECTION-PATTERN-REDACTED] placeholders. Patterns cover: ignore-previous-instructions, you-are-now-X, system-role-override, forget-role, disregard-everything-above, admin/system/override/sudo XML tags, "new instructions:" markers. Returns SanitizationResult with per-pattern hit counts so callers can log redactions to audit.log (future v1.0.7+). - src/ghosthunter/investigator.py: _format_for_compression() now runs stdout + stderr through sanitize_for_prompt() AND wraps the result in a <command_output> defensive frame. Both layers run unconditionally — even if a future injection shape evades the pattern list, the wrapper still tells the LLM not to follow embedded instructions. - tests/test_prompt_sanitizer.py (new): 26 tests across hits/misses, multi-pattern counts, defensive wrapper, pattern-registry invariants, and end-to-end integration with _format_for_compression. False positives explicitly tolerated; false negatives are not. - SECURITY.md: section 1 of "What Ghosthunter does NOT protect against" updated with the v1.0.7 mitigation note. Honest about scope — best-effort, not absolute. Verified - pytest tests/test_prompt_sanitizer.py -> 26 passed - pytest tests/ -> 1226 passed (all green, including the existing awk-block tests from #4) - ruff check + ruff format --check -> clean Closes #5. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

avinash-matrixgard merged commit 9cd7c71 into main Apr 30, 2026
4 checks passed

avinash-matrixgard deleted the claude/v1.0.7-prompt-sanitizer-issue-5 branch April 30, 2026 00:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

security(prompt): pre-prompt injection sanitizer + defensive frame (closes #5, v1.0.7)#7

security(prompt): pre-prompt injection sanitizer + defensive frame (closes #5, v1.0.7)#7
avinash-matrixgard merged 1 commit into
mainfrom
claude/v1.0.7-prompt-sanitizer-issue-5

avinash-matrixgard commented Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

avinash-matrixgard commented Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant