|
| 1 | +SWE-bench (Remote) - Local (non-Docker) Setup and Usage |
| 2 | + |
| 3 | +Prerequisites |
| 4 | +- Python 3.12 environment (same one you use for this repo) |
| 5 | +- Fireworks API key |
| 6 | +- mini-swe-agent and datasets (for patch generation) |
| 7 | +- SWE-bench harness installed (for evaluation) |
| 8 | + |
| 9 | +Setup mini-swe-agent (non-Docker) |
| 10 | +1) Install dependencies |
| 11 | +```bash |
| 12 | +pip install mini-swe-agent datasets |
| 13 | +``` |
| 14 | + |
| 15 | +2) Configure API key for mini-swe-agent |
| 16 | +```bash |
| 17 | +mini-extra config set FIREWORKS_API_KEY <your_fireworks_key> |
| 18 | +``` |
| 19 | + |
| 20 | +3) (Optional) Test connectivity |
| 21 | +```bash |
| 22 | +python3 examples/swebench/run_swe_agent_fw.py fireworks_ai/accounts/fireworks/models/kimi-k2-instruct-0905 --test |
| 23 | +``` |
| 24 | + |
| 25 | +Install SWE-bench evaluation harness |
| 26 | +```bash |
| 27 | +git clone https://github.com/princeton-nlp/SWE-bench |
| 28 | +pip install -e SWE-bench |
| 29 | +``` |
| 30 | + |
| 31 | +Environment |
| 32 | +```bash |
| 33 | +export FIREWORKS_API_KEY="<your_fireworks_key>" |
| 34 | +``` |
| 35 | + |
| 36 | +Run the server |
| 37 | +```bash |
| 38 | +python examples/swebench/server.py |
| 39 | +``` |
| 40 | + |
| 41 | +What the server does |
| 42 | +- Invokes `run_swe_agent_fw.py` in batch mode with a single-slice per request |
| 43 | +- Writes outputs to a per-row directory: `./row_{index}/` |
| 44 | + - `row_{index}/preds.json` |
| 45 | + - `row_{index}/<instance_id>/<instance_id>.traj.json` |
| 46 | +- Runs the SWE-bench harness on `row_{index}/preds.json` |
| 47 | + |
| 48 | +Run pytest to evaluate a model on SWE-bench |
| 49 | +```bash |
| 50 | +cd /Users/shrey/Documents/python-sdk |
| 51 | +pytest examples/swebench/tests/test_swebench.py -v -s |
| 52 | +``` |
| 53 | + |
| 54 | +Notes |
| 55 | +- The test currently generates 10 rows by numeric index (0–9) |
| 56 | +- Each request triggers the server to run one SWE-bench instance and write to its own `row_{index}` |
| 57 | +- Control harness workers via: `export SWEBENCH_EVAL_WORKERS=5` |
0 commit comments