A ReAct coding agent with strict tool gating, pre-call cost projection, and persistent project memory.
Runs Claude, GPT, Gemini, or a local Ollama model. Sessions, cost ledger, and project memory stay on your machine. Hosted-provider API calls send your code to that provider — only Ollama keeps it fully local.
mAIke is a personal research and learning project — an end-to-end coding agent built to explore specific design questions: pre-call cost projection, risk-tiered tool gating, persistent auto-memory, multi-provider routing, and what a cheap-executor + frontier-advisor pairing can or can't do.
It is not trying to compete with Claude Code, Aider, OpenHands, or Devin. The repo is open so the design and the in-progress eval data can be inspected, discussed, copied, and argued with. If something here is useful in your own work, take it.
mAIke explores three ideas: every API call should be priced before it fires (not reconciled after); every tool that mutates state should pass through an explicit risk gate before it runs; and the project knowledge an agent earns in one session should carry into the next without re-exploration. The agent process runs on your machine and treats provider choice as a per-session decision; with Ollama, the entire loop stays on-device. Everything else — sub-agents, the eval harness — falls out of those questions.
- Multi-provider — Anthropic, OpenAI, Gemini (direct API or Vertex), and any Ollama model. Configured per session via
--provider/--model. - Pre-call cost projection — every API call is priced before it fires. Sessions abort gracefully at 95% of your budget instead of blowing past it.
- Tool risk gating — every tool is tiered
READ → WRITE → EXECUTE → DESTRUCTIVE. Writes need a git checkpoint; execution needs inline approval. - Read-before-edit enforcement — the Edit tool refuses to run on a file the agent hasn't Read this turn, killing a whole class of cascading edit bugs.
- Auto-memory — at session end, mAIke distills project overview / key decisions / resolved pitfalls into
.maike/memories/. The next session starts with that knowledge already in context. - Async sub-agents — spawn read-only delegates that run in parallel; the main agent collects results with
Delegate(action="check"|"wait"). - SWE-bench built in —
maike swe-bench --variant lite|verified|fullruns on real instances with clone caching and resume support. - Threads & worktrees — multiple concurrent conversations per workspace, isolated git worktrees for branch work.
- Extensible — Skills, Plugins, MCP servers, and LSP language servers as first-class surfaces.
Early in-progress eval data, not a leaderboard claim. Cold-start eval against SWE-bench Verified using the official Docker harness running hidden FAIL_TO_PASS tests. Stratified 48-instance subset of the 500-instance benchmark (April 2026, models: gemini-3.1-pro-preview, gemini-3.1-flash-lite-preview).
| Configuration | Resolved on subset | Total cost |
|---|---|---|
| Gemini 3.1 Pro alone | 25/48 (52.1%) | $62.93 |
| Gemini 3.1 Flash Lite alone | 9/48 (18.8%) | $8.25 |
| Bucket | n | Pro | Flash Lite |
|---|---|---|---|
Easy: <15min + 15min–1h |
36 | 24/36 (66.7%) | 9/36 (25.0%) |
Hard-tail: 1–4h + >4h |
12 | 1/12 (8.3%) | 0/12 (0%) |
Hard-tail instances make up 25% of this subset but only ~9% of full Verified, so the subset is biased toward harder cases.
- They are: a directional, in-progress signal that the agent + harness produce real (cold-start, FAIL_TO_PASS-verified) fixes on a stratified 48-instance subset.
- They are not: a SWE-bench Verified leaderboard score. A full 500-instance run is on the roadmap; until then, treat as early data.
- mAIke is alpha, single-engineer development. Behaviour, costs, and surface area are still moving.
- Numbers are mAIke alone. The advisor pairing mode (see below) has a known regression and is excluded from this run.
# Generate predictions
maike swe-bench --variant verified \
--provider gemini --model gemini-3.1-pro-preview \
--budget 5.0 --timeout 1800 \
--output predictions.jsonl
# Score with the official SWE-bench Docker harness
python -m swebench.harness.run_evaluation \
--predictions_path predictions.jsonl \
--dataset_name SWE-bench/SWE-bench_Verified \
--run_id maike-pro-verifiedThis is a personal research project, not a product. The CLI is stable enough for me to use day-to-day; it is not designed for long-term API stability and will change as I poke at the design questions above. Issues, PRs, and disagreement-with-receipts are all welcome — see CONTRIBUTING.md before non-trivial PRs.
- Quick Start
- Providers & Models
- Interactive Mode
- One-shot & Scripted Runs
- Cost & Budgets
- Tool Safety Model
- Auto-Memory
- Worktrees
- Advisor
- Sub-Agents & Delegation
- Extensibility
- Evaluation
- Configuration
- Architecture
- CLI Reference
# Python 3.11+
git clone https://github.com/lkshay/mAIke-oss.git
cd mAIke-oss
pip install -e .
# Set keys interactively (writes ~/.config/maike/.env)
maike setup
# Or export directly
export GEMINI_API_KEY="..." # ANTHROPIC_API_KEY / OPENAI_API_KEY / GOOGLE_API_KEY also accepted
# Ollama? Just `ollama serve` — no key needed.
# One-time per workspace — generates MAIKE.md (project context + protected files)
maike init
# Launch the TUI
maike| Provider | Auth | Notes |
|---|---|---|
| Anthropic | ANTHROPIC_API_KEY |
Claude family |
| OpenAI | OPENAI_API_KEY |
GPT family |
| Gemini (direct) | GEMINI_API_KEY or GOOGLE_API_KEY |
Native thought-signature support, 1M context |
| Gemini (Vertex) | gcloud auth application-default login + GOOGLE_GENAI_USE_VERTEXAI=True |
No API key needed |
| Ollama (local) | none | Set OLLAMA_HOST for remote daemons. Zero pricing. |
Default model and pricing live in maike/constants.py and maike/models_default.yaml. Override per-session with --provider / --model, or persist to ~/.config/maike/models.yaml (deep-merged with defaults).
maike --provider gemini --model gemini-3.1-flash-lite-preview
maike --provider anthropic --model claude-opus-4-20250514
maike --provider ollama --model gemma4:26bmaike (or equivalently maike chat) opens the Textual-based TUI in the current directory. Add --no-tui for the legacy text REPL.
maike # TUI, defaults
maike --provider gemini --budget 5.00 # override provider + budget
maike --yes --verbose # auto-approve + inline traces
maike chat --no-tui # legacy text REPL instead of TUI| Flag | Purpose |
|---|---|
--provider |
anthropic / openai / gemini / ollama |
--model / -m |
Override the provider default |
--budget / -b |
Session budget in USD (default 0 = unlimited) |
--yes / -y |
Auto-approve tool calls |
--verbose / -v |
Stream LLM and tool traces inline |
--no-tui |
Use the legacy text REPL instead of the Textual TUI |
--advisor |
Enable advisor pairing (alpha — see Advisor section) |
Inside the TUI
- Streamed output with collapsible tool-call widgets
@-mention to attach files or directories- Inline approval widgets — APPROVE / APPROVE_ALWAYS / DENY, with optional typed reason
- Ctrl+C copies selection (or quits when nothing is selected)
Slash commands
| Command | Purpose |
|---|---|
/help |
Show help and keybindings |
/cost |
Session cost and tokens |
/status |
Provider, model, budget, workspace |
/new |
Start a new conversation thread |
/clear |
Clear the screen |
/agent, /create-agent |
List / create custom agents |
/team, /create-team |
List / create agent teams |
/skill |
List, load, or install skills |
/plugin |
List / install / enable / disable plugins |
/worktree |
Manage git worktrees |
/quit |
Exit (aliases: /exit, /q) |
For CI, automation, or cron jobs:
maike run "Add input validation to the User model"
maike run "Fix failing tests" --provider gemini --budget 2.00 --yes
maike resume <session-id> --workspace <path>maike run is non-interactive — the agent runs to completion (or budget exhaustion) and exits with a status code.
mAIke prices every call before firing it. Two enforcement layers:
CostTracker(session-level) — runscheck_projected_session_budget()against the next call's expected cost. If the projection would push the session past 95% of budget, the call is blocked and the session terminates gracefully.BudgetEnforcer(per-agent) — sub-agents and delegates have their own caps so a runaway delegate can't drain the parent's budget.
maike --budget 5.00 # USD per session
maike cost # last/current session breakdown
maike cost <session-id> # specific session
maike history # workspace session historySessions persist a structured cost ledger in .maike/ for later analysis.
Every built-in tool carries a RiskLevel. The SafetyLayer intercepts every tool call and gates it:
| Level | Examples | Gate |
|---|---|---|
| READ | Read, Grep, SemanticSearch, WebSearch, WebFetch |
runs freely; Grep, SemanticSearch, WebSearch, and WebFetch execute in parallel |
| WRITE | Write, Edit |
requires a git checkpoint |
| EXECUTE | Bash |
requires checkpoint + inline approval (skipped with --yes) |
| DESTRUCTIVE | rm -rf, drop-table style |
always prompts; pattern-matched in safety/rules.py |
Read-before-edit: the Edit tool refuses to run on a file the agent hasn't Read in this turn. The bug class this prevents: an agent edits a file based on stale or hallucinated content after context pruning dropped the original Read — silently corrupting the edit because the line numbers, surrounding context, or assumed state no longer match. After every successful edit mAIke re-reads the file from disk and verifies the result matches what it wrote; the read state is refreshed so consecutive edits on the same file don't need a redundant Read. If verification fails (encoding round-trip, race), the read state is cleared and the next attempt forces a fresh Read.
Bash idle timeout: commands are killed if they produce no output for idle_timeout seconds (floor 10s). Errors include actionable recovery hints (use timeout_class="long" or background=true).
At session end, mAIke distills durable project knowledge into Markdown files under <workspace>/.maike/memories/. The distillation is synchronous and deterministic — no LLM call, no async work — which means it survives TUI cancellation, network failure, and budget exhaustion. Three memory types:
project_overview.md— purpose, tech stack, key modules. Extracted from the session's tool-call history (which files were read, which commands ran, what the test framework is).key_decisions.md— architectural choices the agent made or surfaced from the codebase, parsed from the agent's own milestone messages.pitfalls.md— errors encountered and the fix that resolved them, captured from theRepeatedFailureTracker's 13-category error classifier.
Each memory is a Markdown file with YAML frontmatter (name, description, type, created_at). On collision, a counter is appended rather than overwriting — older memories are preserved verbatim. A MEMORY.md index file lists all entries with their descriptions for quick scanning.
The next session reads MEMORY.md plus the topic files at startup (capped at 8K tokens total, 2K per file) and injects them as a <maike-memory priority="high" source="auto-memory"> context block before the first user message. The result: no re-exploration of the same files, no re-discovery of the same pitfalls, no repeated wrong turns across sessions.
For isolated branch work, maike worktree add/list/remove wraps git worktree so an agent can operate on a side branch without touching your main checkout.
maike worktree add <name> # new worktree on a new branch
maike worktree list # list mAIke-managed worktrees
maike worktree remove <name> # remove a worktree(Threads — multiple concurrent conversations per workspace — are tracked in the session DB but not yet exposed via dedicated CLI commands. Use maike history to inspect past sessions.)
⚠️ Status: not currently working. The advisor pairing mode has a runaway-trace regression that emits multi-GB of trace events on stuck loops in some configurations. The eval results above were run without the advisor. Treat this section as a description of the intended design, not a working feature, until the regression is fixed.
The intended design: pair a cheap fast executor (e.g. Gemini 3.1 Flash Lite) with a frontier-model advisor that fires only at decisive moments — long exploration, repeated failures, before the first edit, before completion.
# Same-provider pairing (matches the eval above)
maike --provider gemini --model gemini-3.1-flash-lite-preview \
--advisor --advisor-provider gemini --advisor-model gemini-3.1-pro-preview \
--advisor-budget-pct 0.2
# Cross-provider pairing also works
maike --provider gemini --model gemini-3.1-flash-lite-preview \
--advisor --advisor-provider anthropic --advisor-model claude-opus-4-20250514The advisor never runs tools. It returns 1–3 sentences of guidance that mAIke injects into the executor's next turn as a <maike-advisor> context block. It has its own budget cap (default 20% of session). Trigger conditions live in maike/agents/advisor.py.
The Delegate tool spawns read-only sub-agents in the background.
# Inside an agent's tool call (illustrative)
Delegate(
task="Find all callers of `parse_config`",
agent_type="explore", # explore | plan | verify | review | debug | implement | test
background=True, # default — returns a handle
)
# Later:
Delegate(action="check", handle="...")
Delegate(action="wait", handle="...")- Tool profiles scope what each delegate type can do —
exploregetsRead/Grep/SemanticSearch/read-onlyBash;debugaddsEdit;implement/testget everything. - Result delivery is inline (up to ~3K tokens / 12,000 chars) — not file pointers — and persisted as
agent_runsin the session DB. - Auto-resume waits for all running delegates to complete before injecting results in one batch — preventing the parent from re-spawning delegates for already-covered work.
MAX_ASYNC_DELEGATES = 5. The error message guides the LLM to useaction="wait"if hit.
Quality validation: assess_delegate_quality() checks delegate outputs. Zero-tool-call delegates auto-retry once with a nudge. Quality flags (good/suspect/hallucinated) appear in notifications.
| Surface | What it gives you | Install |
|---|---|---|
| Skills | Reusable Markdown procedures the agent invokes on demand | maike skill install <path-or-url> |
| Plugins | Bundles of skills, agents, hooks, MCP and LSP configs | maike plugin install <path-or-url> |
| MCP | Any Model Context Protocol server appears as a first-class tool | .mcp.json |
| LSP | Language servers feed diagnostics and symbols into context | declared in plugin manifest |
| Custom agents / teams | Build via /create-agent and /create-team interactively |
TUI |
Built-in agentic test cases — seeders inject bugs into a workspace, verifiers run the project's tests, and per-feature outcomes are scored.
maike eval --suite all
maike eval --suite hard-agentic --keep-workspacesThe agentic_eval_score() weights: workspace verified (35%) · session completed (25%) · tests passing (15%) · error recovery (10%) · change minimality (10%) · wasted-call efficiency (5%).
For SWE-bench, see SWE-bench Verified for results and the reproduction recipe; the CLI Reference lists every maike swe-bench flag.
| Variable | Purpose |
|---|---|
ANTHROPIC_API_KEY / OPENAI_API_KEY / GEMINI_API_KEY (or GOOGLE_API_KEY) |
Provider keys |
GOOGLE_GENAI_USE_VERTEXAI=True + GOOGLE_CLOUD_PROJECT + GOOGLE_CLOUD_LOCATION |
Use Gemini via Vertex AI |
OLLAMA_HOST |
Override the local Ollama endpoint |
BRAVE_SEARCH_API_KEY / GOOGLE_SEARCH_API_KEY + GOOGLE_SEARCH_ENGINE_ID |
Web search backends |
MAIKE_DEFAULT_BUDGET_USD |
Default session budget (default 5.00) |
MAIKE_WORKSPACE |
Default workspace path |
MAIKE_LOG_LEVEL |
Logging level |
maike init writes a per-workspace MAIKE.md with:
- Project context (auto-detected language, build commands, test commands)
- A
## Protected Filessection — files matching these patterns cannot be modified by the agent (post-hoc enforcement inagents/core.py)
MAIKE.md is yours to edit — mAIke reads it but never writes to it.
CLI → Orchestrator → AgentCore (ReAct loop)
├─ LLMGateway provider adapters · retry · adaptive routing
├─ ToolRegistry Read · Write · Edit · Bash · Grep · …
├─ SafetyLayer risk-tier gating · checkpoint gates
├─ CostTracker pre-call projection · 95% cutoff
└─ Memory SQLite session store · ChromaDB long-term
See docs/ARCHITECTURE.md for the design decisions behind each component.
maike # interactive TUI (alias of `maike chat`)
maike chat --no-tui # legacy text REPL
maike setup # write API keys to ~/.config/maike/.env
maike init # generate MAIKE.md for the current workspace
maike run <task> [--provider --model --budget --yes] # one-shot non-interactive
maike resume <session-id> --workspace <path> # resume a specific session
maike cost [<session-id>] # cost breakdown
maike history # workspace session history
maike eval --suite <name> [--keep-workspaces]
maike swe-bench --variant <lite|verified|full> [--instance-ids ... | --resume FILE]
maike worktree <add|list|remove> ... # git worktree management
maike skill install <path-or-url> # install a skill (no list/remove subcommands yet)
maike plugin <list|install|enable|disable|update|uninstall|validate> ...Apache 2.0 — see the LICENSE file for the full text.
