diff --git a/agents/fajarsajid__agent-redteam/README.md b/agents/fajarsajid__agent-redteam/README.md new file mode 100644 index 0000000..f78edab --- /dev/null +++ b/agents/fajarsajid__agent-redteam/README.md @@ -0,0 +1,82 @@ +# agent-redteam + +A Claude-powered **Agentic LLM Red Team Harness** by Fajar Sajid (Purdue University) that systematically probes AI agent system prompts for identity abuse, credential exfiltration, privilege escalation, and safety boundary vulnerabilities — and maps findings to MITRE ATT&CK. + +--- + +## Run + +```bash +npx @open-gitagent/gitagent run -r https://github.com/fajarsajid/agent-redteam +``` + +--- + +## What It Does + +`agent-redteam` gives you a security analyst powered by Claude that attacks your agent's system prompt the way a real adversary would — before they get the chance. + +Point it at any system prompt file and it generates tailored adversarial probes across 8 attack categories, evaluates whether each probe would succeed, scores every finding on a CVSS-like scale, and produces both a human-readable Markdown incident report and machine-readable JSON for SIEM or CI pipeline integration. + +--- + +## Key Capabilities + +- **Adversarial Probe Generation** — Claude crafts realistic, targeted probes specific to the content of your system prompt (its tools, permissions, and stated constraints) — not generic toy examples +- **8 MITRE-Mapped Attack Categories** — covers Prompt Injection (Direct & Indirect), Identity Spoofing, Goal Hijacking, Privilege Escalation, Data Exfiltration, Credential Exfiltration, and Safety Boundary Bypass +- **5 Failure Mode Detection** — Instruction Override, Trust Propagation, Context Drift, Scope Ambiguity Exploitation, Helpfulness Override +- **CVSS-Like Scoring** — every finding gets a 0.0–10.0 severity score + critical/high/medium/low tier +- **CI/CD Integration** — exits with code `1` on critical/high findings, making it a drop-in gate in your deployment pipeline +- **Dual Output** — terminal summary + `--output report.md` for humans + `--json findings.json` for SIEM tooling + +--- + +## Example Usage + +```bash +git clone https://github.com/fajarsajid/agent-redteam +cd agent-redteam +pip install requests +export ANTHROPIC_API_KEY=sk-ant-... + +# Quick security scan of your agent's system prompt +python redteam.py --prompt path/to/your/system_prompt.txt + +# Full run with JSON + Markdown output +python redteam.py --prompt system_prompt.txt \ + --probes 24 --json results/findings.json --output results/report.md + +# CI pipeline gate (exits 1 on critical/high) +python redteam.py --prompt system_prompt.txt --quiet + +# List all attack categories +python redteam.py --list-categories +``` + +--- + +## Research Background + +Built from research conducted at Purdue University evaluating safety constraint violations in LLM-based agents under adversarial prompting. Key findings from 384 trials: + +| Finding | Result | +|---|---| +| Mean violation rate | **49.5%** (SD=12.1%) | +| Indirect vs. direct injection | **70.8% vs. 54.2%** | +| Single-turn vs. 7-turn violation rate | **45.8% vs. 77.1%** | +| No-tool vs. full tool access | **34.4% vs. 71.9%** | +| Most common failure mode | **Instruction Override** (28.4%) | + +**Core conclusion:** Prompt-level safety constraints are insufficient for agentic deployments. Static, single-turn evaluation underestimates real-world vulnerability by ~1.7× at 7 turns. + +--- + +## Why It Fits the GitAgent Protocol + +`agent-redteam` is a well-scoped, Claude-native agent with: +- A clear, singular purpose (AI security red-teaming) +- A defined behavioral contract (JSON-only output, grounded analysis, no hallucinated features) +- Explicit tool skills (probe generation, vulnerability analysis, incident reporting) +- CI-compatible exit codes and structured output for pipeline integration + +*Part of the [Open GAP Registry](https://registry.gitagent.sh) — the catalog of portable AI agents.* diff --git a/agents/fajarsajid__agent-redteam/metadata.json b/agents/fajarsajid__agent-redteam/metadata.json new file mode 100644 index 0000000..27c2cf7 --- /dev/null +++ b/agents/fajarsajid__agent-redteam/metadata.json @@ -0,0 +1,14 @@ +{ + "name": "agent-redteam", + "author": "fajarsajid", + "description": "CLI red-team harness that uses Claude to probe AI agent system prompts for identity abuse, credential exfiltration, and safety boundary vulnerabilities across 8 MITRE-mapped attack categories.", + "repository": "https://github.com/fajarsajid/agent-redteam", + "version": "1.0.0", + "category": "security", + "tags": ["red-team", "ai-safety", "llm-security", "prompt-injection", "mitre-attack", "vulnerability-scanning", "claude", "adversarial-testing"], + "license": "MIT", + "model": "claude-sonnet-4-20250514", + "adapters": ["claude-code", "system-prompt"], + "icon": false, + "banner": false +}