Skip to content

Commit 71f4165

Browse files
author
Shrey Modi
committed
swe-bench
1 parent 1e868cd commit 71f4165

File tree

7 files changed

+1451
-0
lines changed

7 files changed

+1451
-0
lines changed

examples/swebench/README.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
SWE-bench (Remote) - Local (non-Docker) Setup and Usage
2+
3+
Prerequisites
4+
- Python 3.12 environment (same one you use for this repo)
5+
- Fireworks API key
6+
- mini-swe-agent and datasets (for patch generation)
7+
- SWE-bench harness installed (for evaluation)
8+
9+
Setup mini-swe-agent (non-Docker)
10+
1) Install dependencies
11+
```bash
12+
pip install mini-swe-agent datasets
13+
```
14+
15+
2) Configure API key for mini-swe-agent
16+
```bash
17+
mini-extra config set FIREWORKS_API_KEY <your_fireworks_key>
18+
```
19+
20+
3) (Optional) Test connectivity
21+
```bash
22+
python3 examples/swebench/run_swe_agent_fw.py fireworks_ai/accounts/fireworks/models/kimi-k2-instruct-0905 --test
23+
```
24+
25+
Install SWE-bench evaluation harness
26+
```bash
27+
git clone https://github.com/princeton-nlp/SWE-bench
28+
pip install -e SWE-bench
29+
```
30+
31+
Environment
32+
```bash
33+
export FIREWORKS_API_KEY="<your_fireworks_key>"
34+
```
35+
36+
Run the server
37+
```bash
38+
python examples/swebench/server.py
39+
```
40+
41+
What the server does
42+
- Invokes `run_swe_agent_fw.py` in batch mode with a single-slice per request
43+
- Writes outputs to a per-row directory: `./row_{index}/`
44+
- `row_{index}/preds.json`
45+
- `row_{index}/<instance_id>/<instance_id>.traj.json`
46+
- Runs the SWE-bench harness on `row_{index}/preds.json`
47+
48+
Run pytest to evaluate a model on SWE-bench
49+
```bash
50+
cd /Users/shrey/Documents/python-sdk
51+
pytest examples/swebench/tests/test_swebench.py -v -s
52+
```
53+
54+
Notes
55+
- The test currently generates 10 rows by numeric index (0–9)
56+
- Each request triggers the server to run one SWE-bench instance and write to its own `row_{index}`
57+
- Control harness workers via: `export SWEBENCH_EVAL_WORKERS=5`

0 commit comments

Comments
 (0)