The minimal AI software engineering agent

📣 Run mini-swe-agent on our new & extremely challenging benchmark, ProgramBench
📣 New tutorial on building minimal AI agents
📣 Gemini 3 Pro reaches 74% on SWE-bench verified with mini-swe-agent!
📣 New blogpost: Randomly switching between GPT-5 and Sonnet 4 boosts performance

Warning

This is mini-swe-agent v2. Read the migration guide. For the previous version, check out the v1 branch.

In 2024, we built SWE-bench & SWE-agent and helped kickstart the coding agent revolution.

We now ask: What if our agent was 100x simpler, and still worked nearly as well?

mini is

Widely adopted: Used by Meta, NVIDIA, Essential AI, IBM, Nebius, Anyscale, Princeton University, Stanford University, and many more.
Minimal: Just some 100 lines of python for the agent class (and a bit more for the environment, model, and run script) — no fancy dependencies!
Performant: Scores >74% on the SWE-bench verified benchmark; starts much faster than Claude Code
Deployable: Supports local environments, docker/podman, singularity/apptainer, bublewrap, contree, and more
Compatible: Supports all models via litellm, openrouter, portkey, and more. Support for /completion and /response endpoints, interleaved thinking etc.
Built by the Princeton & Stanford team behind SWE-bench, SWE-agent, and more
Tested:

More motivation (for research)

SWE-agent jump-started the development of AI agents in 2024. Back then, we placed a lot of emphasis on tools and special interfaces for the agent. However, one year later, as LMs have become more capable, a lot of this is not needed at all to build a useful agent! In fact, the mini agent

Does not have any tools other than bash — it doesn't even need to use the tool-calling interface of the LMs. This means that you can run it with literally any model. When running in sandboxed environments you also don't need to take care of installing a single package — all it needs is bash.
Has a completely linear history — every step of the agent just appends to the messages and that's it. So there's no difference between the trajectory and the messages that you pass on to the LM. Great for debugging & fine-tuning.
Executes actions with subprocess.run — every action is completely independent (as opposed to keeping a stateful shell session running). This makes it trivial to execute the actions in sandboxes (literally just switch out subprocess.run with docker exec) and to scale up effortlessly. Seriously, this is a big deal, trust me.

This makes it perfect as a baseline system and for a system that puts the language model (rather than the agent scaffold) in the middle of our attention. You can see the result on the SWE-bench (bash only) leaderboard, that evaluates the performance of different LMs with mini.

More motivation (as a tool)

Some agents are overfitted research artifacts. Others are UI-heavy frontend monsters.

The mini agent wants to be a hackable tool, not a black box.

Simple enough to understand at a glance
Convenient enough to use in daily workflows
Flexible to extend

Unlike other agents (including our own swe-agent), it is radically simpler, because it:

Does not have any tools other than bash — it doesn't even need to use the tool-calling interface of the LMs. Instead of implementing custom tools for every specific thing the agent might want to do, the focus is fully on the LM utilizing the shell to its full potential. Want it to do something specific like opening a PR? Just tell the LM to figure it out rather than spending time to implement it in the agent.
Executes actions with subprocess.run — every action is completely independent (as opposed to keeping a stateful shell session running). This is a big deal for the stability of the agent, trust me.
Has a completely linear history — every step of the agent just appends to the messages that are passed to the LM in the next step and that's it. This is great for debugging and understanding what the LM is prompted with.

Should I use SWE-agent or mini-SWE-agent?

You should consider mini-swe-agent your default choice. In particular, you should use mini-swe-agent if

You want a quick command line tool that works locally
You want an agent with a very simple control flow
You want even faster, simpler & more stable sandboxing & benchmark evaluations
You are doing FT or RL and don't want to overfit to a specific agent scaffold

You should use swe-agent if

You want to experiment with different sets of tools, each with their own interface
You want to experiment with different history processors

What you get with both

Excellent performance on SWE-Bench
A trajectory browser

CLI (`mini`)	Batch inference

Trajectory browser	Python bindings
	agent = DefaultAgent( LitellmModel(model_name=...), LocalEnvironment(), ) agent.run("Write a sudoku game")

Let's get started!

Option 1: If you just want to try out the CLI (package installed in anonymous virtual environment)

pip install uv && uvx mini-swe-agent
# or
pip install pipx && pipx ensurepath && pipx run mini-swe-agent

Option 2: Install CLI & python bindings in current environment

pip install mini-swe-agent
mini  # run the CLI

Option 3: Install from source (developer setup)

git clone https://github.com/SWE-agent/mini-swe-agent.git
cd mini-swe-agent && pip install -e .
mini  # run the CLI

Verified source quick start with Claude Opus 4.8

The following flow is the fastest way to start this fork from source, verify that your Anthropic key works, and run a real end-to-end mini agent loop.

Important

Do not commit API keys. Export them in your shell or keep them in a local .env file outside the repository.

git clone https://github.com/benchflow-ai/mini-swe-agent.git
cd mini-swe-agent

# Create an isolated developer environment.
uv venv .venv
source .venv/bin/activate
uv pip install -e ".[opencode,dev]"

Set your Anthropic API key for the current shell:

export ANTHROPIC_API_KEY="<your-anthropic-api-key>"

First, verify the key and model with a tiny direct LiteLLM request:

python - <<'PY'
from litellm import completion

model = "anthropic/claude-opus-4-8"
response = completion(
    model=model,
    messages=[{"role": "user", "content": "Please answer exactly: startup ok"}],
    max_tokens=32,
)
print("model:", model)
print("answer:", response.choices[0].message.content.strip())
PY

Then run a real mini end-to-end smoke test. This exercises the full path: CLI -> config loading -> LiteLLM -> model tool call -> local bash execution -> trajectory save.

MSWEA_MODEL_RETRY_STOP_AFTER_ATTEMPT=1 \
MSWEA_COST_TRACKING=ignore_errors \
mini -y --exit-immediately \
  -m anthropic/claude-opus-4-8 \
  -c mini.yaml \
  -c model.model_kwargs.max_tokens=1024 \
  -t 'This is an end-to-end smoke test. First run exactly this command: echo mini_e2e_ok. After observing it succeeds, finish by issuing exactly this command and nothing else: echo COMPLETE_TASK_AND_SUBMIT_FINAL_OUTPUT.' \
  -o /tmp/mini-swe-agent-opus48-e2e.traj.json

A successful run prints mini_e2e_ok, then exits after COMPLETE_TASK_AND_SUBMIT_FINAL_OUTPUT, and saves the trajectory to /tmp/mini-swe-agent-opus48-e2e.traj.json.

Useful local checks:

MSWEA_SILENT_STARTUP=1 pytest -q \
  tests/models tests/agents tests/config tests/utils \
  tests/run/test_batch_progress.py tests/run/test_inspector.py

MSWEA_SILENT_STARTUP=1 pytest -q \
  tests/models/test_init.py tests/run/test_run_hello_world.py \
  tests/run/test_local.py tests/run/test_save.py

MSWEA_SILENT_STARTUP=1 ruff check src tests

If you see invalid x-api-key, your shell is using an invalid or stale ANTHROPIC_API_KEY; export a valid key again in the same shell. If you see LiteLLM cost metadata errors for a newly released model, keep MSWEA_COST_TRACKING=ignore_errors for the smoke test or add model pricing to a LiteLLM registry file.

Run in the opencode TUI (this fork)

This fork can run mini-swe-agent behind opencode's terminal UI — self-contained: a prebuilt TUI binary is bundled, so no external opencode repo or bun is needed at runtime.

Set ANTHROPIC_API_KEY for Claude, or set the corresponding provider key such as OPENAI_API_KEY or GEMINI_API_KEY.

git clone https://github.com/benchflow-ai/mini-swe-agent.git
cd mini-swe-agent

uv venv .venv
source .venv/bin/activate
uv pip install -e ".[opencode]"

export ANTHROPIC_API_KEY="<your-anthropic-api-key>"
mkdir -p /tmp/mini-swe-agent-scratch
mini-opencode --attach --cwd /tmp/mini-swe-agent-scratch

This opens opencode's TUI in the same terminal. Pick any model, type a task, and the agent's bash steps render as native tool calls; errors show in the conversation. The agent runs commands locally without confirmation in --cwd, so point it at a scratch directory.

Notes: the bundled binary is macOS arm64; on other platforms rebuild it (one-time). Full details: docs/usage/opencode_tui.md.

Attribution

If you found this work helpful, please consider citing the SWE-agent paper in your work:

@inproceedings{yang2024sweagent,
  title={{SWE}-agent: Agent-Computer Interfaces Enable Automated Software Engineering},
  author={John Yang and Carlos E Jimenez and Alexander Wettig and Kilian Lieret and Shunyu Yao and Karthik R Narasimhan and Ofir Press},
  booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
  year={2024},
  url={https://arxiv.org/abs/2405.15793}
}

Our other projects:

Name		Name	Last commit message	Last commit date
Latest commit History 994 Commits
.cursor/rules		.cursor/rules
.github		.github
docs		docs
src/minisweagent		src/minisweagent
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE.md		LICENSE.md
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The minimal AI software engineering agent

Let's get started!

Verified source quick start with Claude Opus 4.8

Run in the opencode TUI (this fork)

Attribution

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

The minimal AI software engineering agent

Let's get started!

Verified source quick start with Claude Opus 4.8

Run in the opencode TUI (this fork)

Attribution

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages