An open-source autonomous coding agent built on three ideas:
a coding engine that verifies its own work, infinite memory via Recursive Language Models,
and scalable multi-agent orchestration on an event-driven runtime.
Memfun is built on three foundational ideas that work together. Each one solves a hard problem in autonomous coding agents. Together, they produce something greater than the sum of their parts.
|
A multi-strategy solver that writes, verifies, and reviews its own code. Starts fast (2 LLM calls), escalates to parallel specialists when needed, and always runs your linters before delivering results. |
DSPy RLM separates reasoning from memory. The agent navigates million-line codebases via a sandboxed REPL -- variable state lives outside the token window. Effective context is unbounded. |
A pluggable backend architecture with 4 tiers (in-process to NATS JetStream) and 8 protocol interfaces. Same code runs on a laptop or a distributed cluster. Agents coordinate via event bus. |
Every coding agent today does roughly the same thing: cram files into a prompt, call the LLM, dump the output. If it fails, maybe retry. Memfun takes a different approach -- it has a three-tier escalation strategy that starts cheap and fast, automatically scales up for complex tasks, and verifies everything it produces.
Most tasks don't need 20 iterative loops. They need the right context and one good answer. The context-first solver does exactly this:
- Code Map -- Extracts a structural index of your codebase (classes, functions,
methods with full signatures) using Python
astand regex for JS/TS/Go/Rust/Java. The planner sees what's in each file, not just filenames. - Planner (1 LLM call) -- Selects files to read, patterns to search, and optional web queries. Projects under 200 KB skip this entirely.
- Gather (0 LLM calls) -- Reads the selected files, runs searches, fetches web results. Pure I/O.
- Solve (1 LLM call) -- Produces structured file operations (write, edit, run) in a single shot with all context available.
- Execute + Verify -- Applies operations, auto-detects your linter (ruff/eslint/go vet/cargo check), runs it, fixes errors (up to 2 cycles).
- Consistency Review -- Semantic check: does the output match what was asked? If not, targeted polish in edit-only mode.
Total: 2 LLM calls for the happy path. ~30 seconds. No iteration loops.
When context-first fails or truncates, Memfun automatically escalates to a full multi-agent workflow. No user intervention needed.
Task Decomposer (DAG with dependency analysis)
│
├── Group 1 (parallel) FileAgent PlannerAgent
│ read/analyze plan approach
│
├── Group 2 (parallel) CoderAgent TestAgent SecurityAgent
│ implement write tests audit vulns
│ │
│ Each runs: RLM → Verify → Review → Polish
│
├── Cross-Agent Review ReviewAgent checks all outputs for conflicts
│
└── Revision Up to 2 rounds: re-run failing agents with feedback
9 specialist agents, each with a focused system prompt and iteration cap:
| Agent | Role | Max Iterations |
|---|---|---|
| CoderAgent | Production code generation (prefers edit_file over write_file) |
15 |
| TestAgent | Test writing and execution | 10 |
| DebugAgent | Bug diagnosis and root cause analysis | 12 |
| ReviewAgent | Quality review and cross-agent conflict detection | 8 |
| FileAgent | File reading and analysis (never creates code) | 8 |
| SecurityAgent | Vulnerability detection (injection, SSRF, secrets, path traversal) | 8 |
| PlannerAgent | Sub-problem decomposition and approach planning | 6 |
| WebSearchAgent | Web search via DuckDuckGo | 8 |
| WebFetchAgent | URL content extraction and summarization | 8 |
Every code-producing agent runs the same quality pipeline after its RLM loop: auto-detect linters, verify, fix, consistency review, polish. Issues are caught per-agent before the cross-agent review, not after.
Last resort for tasks that require deep iterative exploration. The agent gets a sandboxed Python REPL and can read files, run commands, make sub-LLM calls, and accumulate state across iterations. See Pillar 2 for how this works.
Every code path -- context-first and each specialist -- runs the same post-processing:
Code produced
│
├─ Auto-detect linters (ruff, eslint, go vet, cargo check)
├─ Run verification (subprocess, 0 LLM calls)
├─ Fix lint errors if any (1 LLM call, edit-only)
├─ Consistency review: does output match the request? (1 LLM call)
└─ Polish if issues found (1 LLM call, edit-only)
Agents are explicitly prevented from destroying your code:
- Destructive write guard --
write_fileon existing files is blocked if it would lose >30% of content. Suggestsedit_fileinstead. - Edit-only mode -- Fix and polish steps can only modify existing files via
targeted
edit_file, neverwrite_file. - Fuzzy edit matching -- 3 strategies (exact, whitespace-normalized, difflib sliding window) so edits land even when LLM output has minor differences.
- Read caching -- MD5 hash per file; re-reads of unchanged files return a short summary instead of full content, preventing context burn.
- Stall detection -- Warns the agent when it's looping (no action ops at midpoint, same file read 3+ times, approaching iteration limit).
Every coding agent today hits the same wall: context windows are finite. Feed a 100k-line codebase into any frontier model and the agent either truncates, hallucinates, or gives up. Multi-file refactors across large monorepos remain out of reach.
Memfun uses the Recursive Language Model pattern from DSPy. Instead of cramming an entire codebase into a single prompt, the agent gets a sandboxed Python REPL where variable state lives outside the token window.
Traditional Agent: Memfun RLM Agent:
┌─────────────────┐ ┌──────────────────┐
│ Context Window │ │ Context Window │ <-- working register
│ (128k tokens) │ │ (128k tokens) │
│ │ ├──────────────────┤
│ [entire codebase│ │ REPL Variables │ <-- unbounded state
│ crammed in] │ │ files, ASTs, │
│ │ │ search results, │
│ [truncated...] │ │ computations │
│ │ ├──────────────────┤
│ │ │ Persistent DB │ <-- infinite memory
│ │ │ learnings, │
│ │ │ preferences, │
│ │ │ project facts │
└─────────────────┘ └──────────────────┘
The agent can read files, parse ASTs, run searches, store intermediate results in variables, and make recursive sub-LLM calls -- all without exceeding context limits. The token window becomes a working register, not a hard ceiling. This is what enables Memfun to handle repositories that are orders of magnitude larger than any model's native context window.
The RLM sandbox exposes these tools to the agent:
| Tool | Description |
|---|---|
read_file(path) |
Read file with MD5 caching (re-reads return short summary) |
write_file(path, content) |
Write with destructive-overwrite guard |
edit_file(path, old, new) |
Targeted edit with fuzzy matching |
run_cmd(cmd) |
Execute shell commands |
list_files(pattern) |
Glob with optional timestamps |
llm_query(prompt, context) |
Sub-LLM call for complex reasoning |
llm_query_batched(queries) |
Parallel sub-LLM calls |
web_search(query) |
DuckDuckGo search |
web_fetch(url) |
Fetch URL, convert to markdown |
search_history(query) |
Search conversation history |
The same unbounded-memory principle extends across sessions. After every conversation turn, the agent extracts reusable knowledge and stores it in two layers:
- MEMORY.md -- Human-readable, editable file. Always loaded as context. Manage
with
/rememberand/forget. - SQLite MemoryStore -- TF-IDF indexed database for efficient retrieval. Scales to thousands of entries.
Before each turn, relevant memories are retrieved and injected as high-priority context. The agent learns from how you work and applies it automatically.
A single-process agent is fine for experiments. Production needs horizontal scaling, fault tolerance, and shared state across nodes. Most frameworks punt on this entirely.
Memfun's runtime is built from the ground up on 8 protocol interfaces that abstract the infrastructure layer. Every agent, tool, and skill runs identically across all backend tiers -- swap at config time, no code changes.
┌──────────────────────────────────────────────────────┐
│ Your Agents │
│ CoderAgent TestAgent ReviewAgent CustomAgent │
├──────────────────────────────────────────────────────┤
│ 8 Protocol Interfaces │
│ EventBus │ StateStore │ TaskQueue │ AgentRegistry │
│ SessionMgr │ SandboxAdapter │ KnowledgeBase │ Tools │
├──────────────────────────────────────────────────────┤
│ Backend Tier │
│ T0: asyncio │ T1: SQLite │ T2: Redis │ T3: NATS│
└──────────────────────────────────────────────────────┘
| Tier | Backend | When to Use |
|---|---|---|
| T0 | In-Process (asyncio queues) | Unit tests, CI pipelines, quick experiments. Zero dependencies. |
| T1 | SQLite (WAL mode) | Individual developers, local projects. Single file, zero infrastructure. Default. |
| T2 | Redis (pub/sub + streams) | Team environments. Multiple developers sharing agent state. |
| T3 | NATS JetStream | Production. Distributed, fault-tolerant, multi-node clustering. NATS is a single binary, Apache 2.0, no ZooKeeper or Kafka needed. |
Agents are defined as AGENT.md files -- human-readable markdown documents that specify capabilities, constraints, and delegation rules. The orchestrator discovers and routes tasks to the right specialist dynamically.
# coder-agent
## Capabilities
- Write production code across multiple files
- Prefer edit_file over write_file for existing code
- Run linters and fix errors
## Constraints
- Max 15 iterations per task
- Never modify test files (delegate to test-agent)
## Delegation
- test-agent: when tests need writing
- review-agent: before final deliveryDefine custom agents in .memfun/agents/ or agents/.
All tools are exposed via FastMCP 3.0 (Model Context Protocol), making them portable across any MCP-compatible AI tool:
- Code tools: file read/write/edit, ripgrep search, ast-grep, git operations
- Web tools: DuckDuckGo search, URL fetch with markdown conversion, SSRF prevention
- Agent tools: agents and skills exposed as MCP tools for cross-system interoperability
Every memfun session launches a real-time web dashboard at http://localhost:8081
that shows what the agent is doing as it works:
- Active requests with status, elapsed time, and token count
- Sub-task breakdown for multi-agent workflows (which specialist is running, how many iterations)
- Event stream with live WebSocket updates
- Session history of all requests and results
Multiple terminals in the same project share a single dashboard instance via lockfile coordination and HTTP event forwarding -- open 3 terminals, see all their work in one place.
# One-line install
curl -fsSL https://raw.githubusercontent.com/indoor47/memfun/main/install.sh | bash
# Initialize (LLM provider, API key, backend)
memfun init
# Start chatting
memfunOr install manually:
git clone https://github.com/indoor47/memfun.git
cd memfun
uv sync --all-packagesmemfun init walks you through:
- LLM provider -- Anthropic (Claude), OpenAI, Ollama, or custom endpoint
- API key -- securely stored in
~/.memfun/credentials.json - Backend -- SQLite (default), in-memory, Redis, or NATS
- Sandbox -- local (default), Docker, or Modal
$ memfun
memfun > add JWT auth to the Flask API with refresh tokens
Context-first: gathered 12 files via code map (48 KB)
Solving...
Operations:
write src/auth/jwt.py JWT creation, verification, refresh
write src/auth/middleware.py require_auth decorator
edit src/routes/api.py added @require_auth to protected endpoints
write tests/test_auth.py 12 test cases
run ruff check src/ passed
Verified: 0 lint errors • Consistency: ok
32s • 8.2k tokens
memfun > now refactor the database layer to async with connection pooling
Workflow: decomposed into 3 parallel agents
✓ coder-agent 8 iter, 4 ops (42s)
✓ test-agent 5 iter, 2 ops (28s)
✓ review-agent 3 iter, 0 ops (12s)
Cross-agent review: no conflicts
Operations:
edit src/db/connection.py replaced sync pool with asyncpg
edit src/db/queries.py async context managers throughout
edit src/db/models.py async classmethod factories
write tests/test_db_async.py 8 new async test cases
run ruff check src/ passed
run pytest tests/test_db*.py 6 passed
1m 48s • 22.1k tokens
| Command | Description |
|---|---|
/help |
Show available commands |
/remember <text> |
Store a preference or project fact |
/memory |
View stored memories |
/forget <target> |
Remove a memory entry |
/context |
Rescan project files |
/agents |
Show running specialist agents |
/workflow |
Show current workflow DAG and status |
/model |
Show or switch LLM model |
/traces |
List recent execution traces |
/clear |
Clear conversation history |
memfun # Interactive chat (default)
memfun init # Initialize project
memfun ask "how does auth work" # Ask a question
memfun analyze src/ # Analyze code structure
memfun fix "TypeError in login" # Fix a bug
memfun review src/auth.py # Review code
memfun skill list # List available skills
memfun agent list # List agent definitions# memfun.toml
[project]
name = "my-project"
[llm]
provider = "anthropic" # anthropic | openai | ollama | custom
model = "claude-opus-4-6"
api_key_env = "ANTHROPIC_API_KEY"
temperature = 0.0
max_tokens = 128000
[backend]
tier = "sqlite" # in-process | sqlite | redis | nats
sqlite_path = ".memfun/memfun.db"
[sandbox]
backend = "local" # local | docker | modal
timeout_seconds = 30
[web]
search_backend = "duckduckgo" # duckduckgo | brave | tavily | searxngmemfun/
├── packages/
│ ├── memfun-core/ Types, config (memfun.toml), logging, errors
│ ├── memfun-runtime/ 8 protocol interfaces, 4 backend tiers, BaseAgent
│ ├── memfun-agent/ Coding agent, context-first solver, 9 specialists,
│ │ workflow engine, RLM, code map, DSPy signatures
│ ├── memfun-tools/ MCP server (FastMCP 3.0): code, git, web tools
│ ├── memfun-skills/ Agent Skills: discovery, loading, execution, synthesis
│ ├── memfun-optimizer/ Trace analysis, agent synthesis, MIPROv2, memory
│ └── memfun-cli/ Interactive chat, setup wizard, live dashboard, CLI
├── skills/ 8 built-in Agent Skills (SKILL.md format)
├── agents/ Built-in agent definitions (AGENT.md format)
├── evals/ SWE-bench + Terminal-Bench evaluation harnesses
└── tests/ 597 tests (88 skipped for Redis/NATS without servers)
8 portable skills following the Agent Skills standard:
| Skill | Description |
|---|---|
analyze-code |
Analyze code structure, quality, and issues |
review-code |
Structured code review with actionable feedback |
fix-bugs |
Diagnose and fix bugs from description |
explain-code |
Explain how code works |
generate-tests |
Generate test cases for code |
security-audit |
Security vulnerability analysis |
refactor |
Refactor code for improved quality |
ask |
General-purpose coding questions |
Skills are SKILL.md files, portable across Claude Code, Codex CLI, Cursor,
Gemini CLI, and 20+ other AI tools.
git clone https://github.com/indoor47/memfun.git
cd memfun
uv sync --all-packages
make test # 597 pass, 88 skip
make lint # 0 errors
make typecheck # Pyright strict mode
uv run memfun # run in dev mode| Layer | Technology |
|---|---|
| Intelligence | DSPy 2.6+ (RLM, MIPROv2, structured signatures) |
| Tools | FastMCP 3.0 (Model Context Protocol) |
| Backends | asyncio (T0), aiosqlite (T1), redis (T2), nats-py (T3) |
| Code Analysis | tree-sitter, ast-grep, Python ast |
| Web | DuckDuckGo Search, httpx, markdownify |
| CLI | Typer + Rich |
| Quality | Ruff, Pyright, pytest + pytest-asyncio |
| Language | Python 3.12+, asyncio, typing.Protocol |
See CONTRIBUTING.md for development setup, conventions, and pull request process.
See SECURITY.md for vulnerability reporting and security practices.
Apache License 2.0. See LICENSE.