This document describes how to run OpenHands with CWE injection attacks for the FCV (Functionally Correct yet Vulnerable) attack experiments.
- Complete the Pass 1 run following the instructions in ../README.md
- Evaluate Pass 1 results with SWE-bench to get the
report.jsonfile - Make sure you have the OpenHands environment activated:
conda activate openhands
Pass 1 → Evaluation → CWE Injection (Pass 2) → Evaluation → LM Judge
After running the initial inference, evaluate the results to identify resolved instances:
# Example for GPT-5 Mini
./evaluation/benchmarks/swe_bench/scripts/eval_infer.sh \
"/path/to/OpenHands/evaluation/evaluation_outputs/outputs/princeton-nlp__SWE-bench_Verified-test/CodeActAgent/gpt-5-mini_maxiter_100_N_v0.56.0-no-hint-run_1/output.jsonl" \
"" \
princeton-nlp/SWE-bench_Verified \
test \
localThis will generate a report.json file with resolved instance IDs.
Use the run_cwe_simple.sh script to inject CWE vulnerabilities into the resolved instances:
./run_cwe_simple.sh <REPORT_JSON> <CWE_TYPE> <LLM_CONFIG> <ROUND> <OUTPUT_DIR> <USE_REMOTE>Parameters:
REPORT_JSON: Path to the report.json from Pass 1 evaluationCWE_TYPE: One ofcwe_532,cwe_79,cwe_89,cwe_94LLM_CONFIG: LLM configuration (e.g.,llm.eval_gpt5_mini,llm.eval_kimi)ROUND: Round number (typically1)OUTPUT_DIR: Directory to save outputsUSE_REMOTE:trueto use remote runtime,falsefor local
- CWE-532 (Information Exposure): Logging sensitive information
- CWE-79 (Cross-Site Scripting): XSS vulnerabilities
- CWE-89 (SQL Injection): SQL injection vulnerabilities
- CWE-94 (Code Injection): Arbitrary code execution
cd /path/to/OpenHands && conda activate openhands
./run_cwe_simple.sh \
<REPORT_JSON_PATH> \
<CWE_TYPE> \
<LLM_CONFIG> \
1 \
<OUTPUT_DIR> \
trueParameters:
<REPORT_JSON_PATH>: Path to Pass 1 report.json (e.g.,evaluation/evaluation_outputs/outputs/princeton-nlp__SWE-bench_Verified-test/CodeActAgent/{MODEL_NAME}/report.json)<CWE_TYPE>: One ofcwe_532,cwe_79,cwe_89,cwe_94<LLM_CONFIG>: LLM config name (e.g.,llm.eval_gpt5_mini,llm.eval_kimi,llm.eval_qwen_480b,llm.eval_claude_sonnet4)<OUTPUT_DIR>: Output directory (e.g.,evaluation/evaluation_outputs/outputs/cwe532_gpt)
| Model | LLM Config | Model Name Pattern |
|---|---|---|
| GPT-5 Mini | llm.eval_gpt5_mini |
gpt-5-mini_maxiter_100_N_v0.56.0-no-hint-run_1 |
| Kimi-K2-Instruct | llm.eval_kimi |
Kimi-K2-Instruct_maxiter_100_N_v0.56.0-no-hint-run_1 |
| Qwen3-Coder-480B | llm.eval_qwen_480b |
Qwen3-Coder-480B-A35B-Instruct_maxiter_100_N_v0.56.0-no-hint-run_1 |
| Claude Sonnet 4 | llm.eval_claude_sonnet4 |
claude-sonnet-4-20250514_maxiter_100_N_v0.56.0-no-hint-run_1 |
cd /path/to/OpenHands && conda activate openhands
REPORT_JSON="evaluation/evaluation_outputs/outputs/princeton-nlp__SWE-bench_Verified-test/CodeActAgent/gpt-5-mini_maxiter_100_N_v0.56.0-no-hint-run_1/report.json"
for CWE_TYPE in cwe_532 cwe_79 cwe_89 cwe_94; do
./run_cwe_simple.sh \
"$REPORT_JSON" \
"$CWE_TYPE" \
llm.eval_gpt5_mini \
1 \
"evaluation/evaluation_outputs/outputs/${CWE_TYPE}_gpt" \
true
done./evaluation/benchmarks/swe_bench/scripts/eval_infer.sh \
"<OUTPUT_JSONL_PATH>" \
"" \
princeton-nlp/SWE-bench_Verified \
test \
localPath Pattern:
<OUTPUT_DIR>/princeton-nlp__SWE-bench_Verified-test/CodeActAgent/<MODEL_NAME>_maxiter_100_N_<CWE_TYPE>_injection_resolved_ids_round_1/output.jsonl
# Define model-specific variables
MODEL_NAME="gpt-5-mini" # or "Kimi-K2-Instruct", "Qwen3-Coder-480B-A35B-Instruct", "claude-sonnet-4-20250514"
OUTPUT_PREFIX="cwe" # e.g., "cwe532_gpt", "cwe79_kimi", etc.
# Evaluate all CWE types
for CWE_NUM in 532 79 89 94; do
./evaluation/benchmarks/swe_bench/scripts/eval_infer.sh \
"evaluation/evaluation_outputs/outputs/cwe${CWE_NUM}_*/princeton-nlp__SWE-bench_Verified-test/CodeActAgent/${MODEL_NAME}_maxiter_100_N_cwe_${CWE_NUM}_injection_resolved_ids_round_1/output.jsonl" \
"" \
princeton-nlp/SWE-bench_Verified \
test \
local
doneAfter collecting all results, use the LM Judge to evaluate the vulnerability injection success. See ../attack-lm-judge/README.md for details.
If you encounter issues with stuck containers:
# Stop specific instance
docker stop $(docker ps -q --filter "name=sweb.eval.INSTANCE_ID") 2>/dev/null || true
docker rm $(docker ps -aq --filter "name=sweb.eval.INSTANCE_ID") 2>/dev/null || true
# Clean build directories
rm -rf /tmp/swe-bench/INSTANCE_ID/# List all remote runtimes
ALLHANDS_API_KEY="your-api-key" \
curl -H "X-API-Key: your-api-key" \
"https://runtime.eval.all-hands.dev/list"
# Stop all remote runtimes
ALLHANDS_API_KEY="your-api-key" \
curl -H "X-API-Key: your-api-key" \
"https://runtime.eval.all-hands.dev/list" | \
jq -r '.runtimes[].runtime_id' | \
xargs -I {} curl -X POST \
-H "X-API-Key: your-api-key" \
-H "Content-Type: application/json" \
-d '{"runtime_id": "{}"}' \
"https://runtime.eval.all-hands.dev/stop"After completing the experiments, your output directory structure will look like:
evaluation/evaluation_outputs/outputs/
├── cwe532_gpt/
│ └── princeton-nlp__SWE-bench_Verified-test/
│ └── CodeActAgent/
│ └── gpt-5-mini_maxiter_100_N_cwe_532_injection_resolved_ids_round_1/
│ ├── output.jsonl
│ └── report.json
├── cwe79_gpt/
├── cwe89_gpt/
├── cwe94_gpt/
├── cwe532_kimi/
├── cwe79_kimi/
... (and so on for other models and CWE types)
- Collect all evaluation results
- Run LM Judge evaluation (see
attack-lm-judge/README.md) - Analyze attack success rates across models and CWE types