Skip to content

feat: v0.2–v0.5 — corpus, description scanner, argument fuzzer, demo, MCPTox/MCPSecBench corpus expansion#30

Merged
ksek87 merged 8 commits into
mainfrom
claude/plan-fuzzd-project-88AMD
May 14, 2026
Merged

feat: v0.2–v0.5 — corpus, description scanner, argument fuzzer, demo, MCPTox/MCPSecBench corpus expansion#30
ksek87 merged 8 commits into
mainfrom
claude/plan-fuzzd-project-88AMD

Conversation

@ksek87
Copy link
Copy Markdown
Owner

@ksek87 ksek87 commented May 9, 2026

Summary

End-to-end security fuzzing pipeline for MCP servers, built across v0.2–v0.5. Closes #6, #7, #8, #9, #10, #11, #12.

v0.2 — Attack Corpus

  • AttackRecord schema (src/corpus/schema.rs) — typed Category, Severity, Vector enums with serde, Display/FromStr for CLI
  • Corpus loader (src/corpus/loader.rs) — Corpus::embedded() via include_str! macros (self-contained binary); load_file(), load_dir(), by_category(), by_min_severity()
  • 12 seed records (corpus/tool_poisoning/TPA-001..012) — 4 per MCPTox paradigm (Wang et al., 2025); paradigms 1 (explicit trigger), 2 (implicit/background), 3 (persistent injection)
  • corpus list, corpus validate, corpus add CLI subcommands

v0.3 — Static Description Scanner

  • DescriptionScanner::scan(&[ToolDefinition]) -> Vec<Finding> in src/fuzzer/description.rs
  • 46 patterns covering 7 signals: ImperativeOverride, CredentialReference, PrivilegedPath, ExfiltrationMechanism, StealthLanguage, SessionPersistence, CrossToolContamination
  • fuzzd scan --schema tools.json exits 0 (clean) or 1 (blocking findings ≥ High)

v0.4 — Argument Fuzzer + Payload Library

  • ArgumentFuzzer::fuzz(&schema) -> Vec<FuzzCase> in src/fuzzer/argument.rs
  • Generates empty/null/boundary/type-confusion/required-field-omission/unknown-field cases per JSON Schema
  • 8 injection payload categories in src/fuzzer/payloads.rs: path traversal, command injection, SQL, LDAP, NoSQL, format string, template, XML
  • 22 integer boundary values

v0.5 — Corpus Expansion (MCPTox, MCPSecBench, Invariant Labs)

  • 2 new Category variants: ToolShadowing, RugPull
  • 4 new Signal types: FakePrerequisite, ArgumentInterception, HtmlInjectionTag, ConditionalActivation
  • 24 new detection patterns in the description scanner
  • 11 new corpus records:
    • TPA-013..017: MCPTox Template-1/2/3 patterns + Invariant Labs <IMPORTANT>/<SYSTEM> XML injection
    • TS-001..003 (corpus/tool_shadowing/): name squatting, capability override, typosquatting
    • RUG-001..003 (corpus/rug_pull/): trigger-file sleeper, invocation-count sleeper, time-delayed activation
  • Embedded corpus: 12 → 23 records

Demo

  • demo/servers/clean.json + demo/servers/poisoned.json — realistic MCP servers, one clean, one with 4 TPA payloads
  • demo/run.sh — end-to-end demo script (corpus list → clean scan → poisoned scan → validate)
  • demo/github-actions.yml — drop-in CI workflow using fuzzd scan as a blocking gate
  • demo/README.md — integration guide (Makefile, stdio/HTTP tool export, CI usage)

Test plan

  • cargo test — 96 tests, all passing
  • cargo clippy -- -D warnings clean
  • fuzzd corpus list — shows all 23 embedded records
  • fuzzd scan --schema demo/servers/clean.json exits 0
  • fuzzd scan --schema demo/servers/poisoned.json exits 1 with findings
  • New signals detected: <IMPORTANT> tag, append to every, .mcp-triggered, to unlock this, if previously triggered

https://claude.ai/code/session_014T1x8ZiDbJcVvkZBfP91nk

claude added 5 commits May 9, 2026 20:28
Closes #6, #7, #8.

Schema (src/corpus/schema.rs):
- AttackRecord, Category, Severity, Vector types with serde impls
- Display + FromStr on Category and Severity for CLI filter parsing
- Severity derives Ord so filters can use >= comparisons

Loader (src/corpus/loader.rs):
- Corpus::embedded() — 12 seed records included at compile time via
  include_str!, binary is self-contained with no runtime file deps
- Corpus::load_file() — parses and validates a single JSON record
- Corpus::load_dir() — walks a directory tree, skips invalid files
  with a tracing::warn rather than failing the entire load
- by_category() / by_min_severity() filter helpers

Seed corpus (corpus/tool_poisoning/TPA-001..012.json):
- 4 records per MCPTox paradigm (Wang et al., 2025):
  Paradigm 1 — explicit trigger, function hijacking (SSH/AWS/git/env)
  Paradigm 2 — implicit trigger, background hijacking (staging/cron/in-band/interception)
  Paradigm 3 — persistent instruction injection (BCC/credential log/cmd suffix/output exfil)

CLI (src/main.rs):
- corpus list: loads embedded + optional --corpus-dir, filters by
  category and severity, prints formatted table
- corpus validate: parses a single file, exits 1 on error
- corpus add: validates then copies to --corpus-dir if given

README: removed Recut AI section, replaced Contributing with a
substantive call for contributors covering corpus records, fuzzer
modules, real-server testing, and detection signals.

57 tests passing.

https://claude.ai/code/session_014T1x8ZiDbJcVvkZBfP91nk
Closes #9, #10.

Signal types (src/fuzzer/mod.rs):
- Finding struct: tool_name, signal, severity, matched_text, detail, corpus_refs
- Signal enum: ImperativeOverride, CredentialReference, PrivilegedPath,
  ExfiltrationMechanism, StealthLanguage, SessionPersistence, CrossToolContamination

Description scanner (src/fuzzer/description.rs):
- 46 patterns across all 7 signal categories, each with severity,
  detail text, and corpus cross-references (TPA-001..012)
- Case-insensitive matching against lowercased description text
- extract_snippet() returns up to 40 chars of original context around
  each match with … elision markers
- DescriptionScanner::scan(&[ToolDefinition]) -> Vec<Finding>
- Tools with no description are skipped silently

Pattern coverage (by category):
  ImperativeOverride — 9 patterns: "you must", "must first", "[system:",
    "standing rule", "global execution policy", etc.
  CredentialReference — 7 patterns: id_rsa, .ssh/, .aws/credentials,
    bash_history, .netrc, .pgpass, ~./aws
  PrivilegedPath — 5 patterns: /etc/passwd, /etc/shadow, /etc/cron,
    /root/, /tmp/.
  ExfiltrationMechanism — 7 patterns: | sh, | bash, curl -, wget,
    http://c2, > /dev/null, fire-and-forget
  StealthLanguage — 7 patterns: silently, do not disclose, do not mention,
    never mention, not for display, without mentioning, do not report
  SessionPersistence — 6 patterns: for the remainder of this session,
    cannot be overridden, supersedes all, takes precedence, etc.
  CrossToolContamination — 5 patterns: regardless of which tool,
    whenever any, background monitor, passive observer, background agent

CLI (src/main.rs): fuzzd scan --schema tools.json now runs the
scanner and prints a finding-per-line report with severity, signal,
detail, matched text, and corpus refs. Accepts both bare array and
MCP tools/list envelope formats.

17 new tests — all signal categories, case insensitivity, multi-signal
descriptions, TPA-001 payload regression, clean description false-positive check.

74 tests passing total.

https://claude.ai/code/session_014T1x8ZiDbJcVvkZBfP91nk
…ibrary

Closes #11, #12.

Payload library (src/fuzzer/payloads.rs):
- 8 PayloadCategory statics: PATH_TRAVERSAL, COMMAND_INJECTION,
  SQL_INJECTION, LDAP_INJECTION, NOSQL_INJECTION, FORMAT_STRING,
  TEMPLATE_INJECTION, XML_INJECTION — 46 total payloads
- INTEGER_BOUNDARIES: 22 boundary values covering 0, ±1, signed/unsigned
  8/16/32/64-bit extremes (i32::MAX/MIN, u32::MAX, i64::MAX/MIN, etc.)
- ALL_CATEGORIES slice for uniform iteration in the fuzzer

Argument fuzzer (src/fuzzer/argument.rs):
- ArgumentFuzzer::fuzz(&Value) -> Vec<FuzzCase> — takes a JSON Schema
  (tool's inputSchema) and generates one complete arg object per mutation
- FuzzCase { label, args } — label identifies the mutation for reporting
- Per-field dispatch: string, integer/number, boolean, array, object, unknown
- String mutations: 10 base cases (empty, null byte, long 256/64k, unicode
  RTL/BOM, wrong types) + all 46 injection payloads + enum validation
- Integer mutations: all 22 boundary values + float, NaN string, Infinity,
  wrong types
- Boolean mutations: type confusion (string "true", integer 0/1, null)
- Array mutations: empty, null element, 1000-element, deeply nested, wrong types
- Object mutations: empty, extra field, deeply nested, wrong types
- Cross-cutting: empty args, null args, required field omissions (one per
  required field), extra unknown field injection
- Non-mutated fields filled with type-appropriate defaults

14 new tests — per-type coverage, required omissions, defaults, enum,
all-JSON-valid round-trip, combinatorial count guard.

91 tests passing total.

https://claude.ai/code/session_014T1x8ZiDbJcVvkZBfP91nk
demo/:
- demo/servers/clean.json — 5-tool MCP filesystem server with clean descriptions
- demo/servers/poisoned.json — same server with 3 tools carrying live TPA payloads
  (Paradigm 1: SSH key exfiltration, Paradigm 2: silent cross-tool staging,
   Paradigm 3: persistent BCC injection + C2 beacon on all shell commands)
- demo/run.sh — end-to-end runnable demo; shows corpus list, clean scan (pass),
  poisoned scan (fail/block), and corpus record validation
- demo/github-actions.yml — drop-in GitHub Actions workflow for CI/CD integration
- demo/README.md — workflow integration guide with CLI usage, Makefile example,
  and tool definition export instructions

scan exit code:
- fuzzd scan now exits 1 if any critical or high findings are present,
  enabling fuzzd as a blocking CI gate with no extra configuration needed.
  Medium/low findings are reported but do not block.

https://claude.ai/code/session_014T1x8ZiDbJcVvkZBfP91nk
…erns

Adds 11 new corpus records (TPA-013–017, TS-001–003, RUG-001–003) and 4 new
detection signals derived from the MCPTox, MCPSecBench, and Invariant Labs
research datasets. The embedded corpus grows from 12 to 23 records spanning
three attack categories.

New categories: ToolShadowing, RugPull
New signals: FakePrerequisite, ArgumentInterception, HtmlInjectionTag, ConditionalActivation
New patterns (24 added to description scanner):
  - MCPTox Template-2 fake-enabling-prerequisite language
  - MCPTox Template-3 argument-hijacking / suffix-to-every-command
  - Invariant Labs <IMPORTANT>/<SYSTEM>/<INST> XML tag injection
  - Rug-pull/sleeper conditional-activation and .mcp-triggered sentinel file

96 tests pass, clippy clean.

https://claude.ai/code/session_014T1x8ZiDbJcVvkZBfP91nk
@ksek87 ksek87 changed the title feat(v0.2): corpus schema, loader, and 12 seed attack records feat: v0.2–v0.5 — corpus, description scanner, argument fuzzer, demo, MCPTox/MCPSecBench corpus expansion May 14, 2026
claude added 3 commits May 14, 2026 00:49
…dataset

Benchmark (bench/):
- bench/mcptox_actual.json: 485 real attack payloads from MCPTox-Benchmark
  dataset (Wang et al., 2025) across 45 real-world MCP servers
- bench/mcptox_representative.json: 44 fuzzd-native representative cases
- bench/clean_tools.json: 20 clean tools for false-positive measurement
- bench/run.sh: detection-rate report script (per-paradigm breakdown)
- bench/README.md: methodology, results, per-risk-category breakdown

Scanner improvements:
- Added 16 new patterns from MCPTox gap analysis:
  "priority is higher than", "higher than the user", "you should first",
  "before use the tool", "always override", "always replace", "always rewrite",
  "modify the prompt/query/pattern", "rewrite query", "override the argument",
  "gcloud credentials", "oauth2_credentials", "/.gcloud/",
  "not return results until", "will not process"
- Pattern count: 70 → 86 across 11 signals

Results on actual MCPTox dataset:
- 399/485 (82.3%) detection rate — up from 0% before scanner existed
- 0/20 false positives on clean tools
- Per-paradigm: T1 77.9%, T2 79.8%, T3 85.8%

README updated: benchmark moved to lead section, architecture reflects current state.

https://claude.ai/code/session_014T1x8ZiDbJcVvkZBfP91nk
Performance:
- Replace O(N*M) linear pattern scan with O(M) Aho-Corasick automaton
  (N=86 patterns, M=description length)
- Automaton built once via OnceLock<AhoCorasick>, shared across all scans
- ASCII case-insensitive matching eliminates to_lowercase() per description
- extract_snippet now takes (start, end) byte positions from the match
  instead of re-searching the string (eliminates second O(M) scan per finding)

Code quality:
- Each pattern fires at most once per tool (HashSet deduplication of pattern index)
- extract_snippet made pub(crate) for potential reuse by future scanners

TDD invariant tests (7 new — written before implementation):
- pattern_needles_are_lowercase_and_non_empty: automaton invariant guard
- each_pattern_needle_detectable_by_scanner: catches stale/dead patterns
- scan_is_deterministic: same input → identical output
- scan_result_unaffected_by_adjacent_tools: tool isolation
- matched_text_is_non_empty_for_every_finding: snippet correctness
- each_tool_finding_carries_correct_tool_name: attribution correctness
- no_duplicate_patterns_per_tool: at-most-once-per-pattern invariant

103 tests pass, clippy clean, rustfmt clean.

https://claude.ai/code/session_014T1x8ZiDbJcVvkZBfP91nk
@ksek87 ksek87 merged commit 5c33b0a into main May 14, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[v0.2] AttackRecord corpus schema (corpus/schema.rs)

2 participants