Skip to content

security(prompt): pre-prompt injection sanitizer + defensive frame (closes #5, v1.0.7)#7

Merged
avinash-matrixgard merged 1 commit into
mainfrom
claude/v1.0.7-prompt-sanitizer-issue-5
Apr 30, 2026
Merged

security(prompt): pre-prompt injection sanitizer + defensive frame (closes #5, v1.0.7)#7
avinash-matrixgard merged 1 commit into
mainfrom
claude/v1.0.7-prompt-sanitizer-issue-5

Conversation

@avinash-matrixgard
Copy link
Copy Markdown
Owner

Closes #5.

Pre-prompt sanitizer for the Apr 29 audit's Gap #1 (prompt injection via pasted command output). Defense in layers, not absolute.

What ships:

  • New security/prompt_sanitizer.py — 7 injection patterns, redaction to [INJECTION-PATTERN-REDACTED] placeholder, per-pattern hit counts for future audit logging.
  • investigator.py::_format_for_compression now sanitizes stdout/stderr AND wraps the result in a <command_output> defensive frame telling the LLM the contents are untrusted data.
  • 26 new tests in tests/test_prompt_sanitizer.py covering hits, misses, counts, wrapper, registry invariants, and end-to-end integration.
  • SECURITY.md item 1 updated with the v1.0.7 mitigation note (honest about scope).

Verification:

  • pytest tests/test_prompt_sanitizer.py → 26 passed
  • pytest tests/ → 1226 passed
  • ruff check + format → clean

🤖 Generated with Claude Code

…loses #5, v1.0.7)

The Apr 29 2026 audit flagged that pasted command output containing
prompt-injection markers ("ignore previous instructions", role-overrides,
etc) can steer Claude Opus toward incorrect hypotheses, wasting the
investigation budget on a bad path. The deterministic Layer 1-4 validator
still holds — the LLM cannot be tricked into running dangerous commands —
but the *conclusions* of an investigation can be misled.

This is a defense-in-layers fix, not a claim of perfection.

Changes
- src/ghosthunter/security/prompt_sanitizer.py (new): 7 known injection
  patterns redacted to [INJECTION-PATTERN-REDACTED] placeholders.
  Patterns cover: ignore-previous-instructions, you-are-now-X,
  system-role-override, forget-role, disregard-everything-above,
  admin/system/override/sudo XML tags, "new instructions:" markers.
  Returns SanitizationResult with per-pattern hit counts so callers
  can log redactions to audit.log (future v1.0.7+).
- src/ghosthunter/investigator.py: _format_for_compression() now runs
  stdout + stderr through sanitize_for_prompt() AND wraps the
  result in a <command_output> defensive frame. Both layers run
  unconditionally — even if a future injection shape evades the
  pattern list, the wrapper still tells the LLM not to follow
  embedded instructions.
- tests/test_prompt_sanitizer.py (new): 26 tests across hits/misses,
  multi-pattern counts, defensive wrapper, pattern-registry
  invariants, and end-to-end integration with _format_for_compression.
  False positives explicitly tolerated; false negatives are not.
- SECURITY.md: section 1 of "What Ghosthunter does NOT protect against"
  updated with the v1.0.7 mitigation note. Honest about scope —
  best-effort, not absolute.

Verified
- pytest tests/test_prompt_sanitizer.py -> 26 passed
- pytest tests/ -> 1226 passed (all green, including the existing
  awk-block tests from #4)
- ruff check + ruff format --check -> clean

Closes #5.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@avinash-matrixgard avinash-matrixgard merged commit 9cd7c71 into main Apr 30, 2026
4 checks passed
@avinash-matrixgard avinash-matrixgard deleted the claude/v1.0.7-prompt-sanitizer-issue-5 branch April 30, 2026 00:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Security · v1.0.7] Pre-paste prompt-injection sanitizer + explicit doc warning

1 participant