zipcode is a Rust-based local-only AI coding agent. It runs GGUF models via llama.cpp (Rust bindings) or candle, providing Claude Code-like functionality (file editing, shell execution, code search) offline by default, with explicit user-requested GitHub repository fetching as the network exception. Designed for air-gapped environments — ships as a single ZIP.
Project knowledge base at wiki/. Read wiki/index.md for page catalog.
When learning something new about the project, update or create wiki pages.
- Language: Rust 2021
- Workspace: 4 crates (
inference,tools,runtime,cli) - Inference backends:
llama-cpp-2(default feature:candle, optional:llama-cpp) - CLI: clap, rustyline, termimad
# Build (candle backend only, default)
cargo build --release
# Build with llama-cpp backend (requires cmake + libclang-dev)
cargo build --release -p zipcode-inference --features llama-cpp
# Build CLI with llama-cpp
# First: set features = ["llama-cpp"] in crates/cli/Cargo.toml for zipcode-inference dep
cargo build --release -p zipcode
# Test
cargo test --workspace
# Lint
cargo clippy --workspace --all-targets -- -D warnings
# Format
cargo fmt --all -- --checkcli → runtime → inference
↓
tools
inference— GGUF model loading, tokenization, streaming generation, Gemma chat templatetools— 11 tool implementations behindTooltrait +ToolRegistryruntime— Agentic conversation loop, config, permissions, sessionscli— REPL, one-shot mode, doctor command
InferenceProvidertrait abstracts inference backends (candle, llama-cpp, mock)ConversationLoopis generic overBox<dyn InferenceProvider>MockInferenceProviderqueues predetermined responses for testing- Tool output auto-truncated at 8KB (
MAX_TOOL_OUTPUT_BYTES) - Path traversal prevention via
resolve_and_validate_path()in all file tools - Tool-call loop capped at 25 iterations (
MAX_TOOL_ITERATIONS) - Chat template hardcoded for Gemma format (
<start_of_turn>/<end_of_turn>,<tool_call>tags)
ZIPCODE_LLAMA_SERVER_BIN— Path to llama-server binaryZIPCODE_LLAMA_SERVER_ALIAS— Model alias (default: "zipcode")ZIPCODE_LLAMA_SERVER_CTX— Context window size (default: 131072, i.e. Gemma 4 128K native)ZIPCODE_GPU_LAYERS— Number of layers to offload to GPU (e.g., 99)ZIPCODE_FLASH_ATTENTION— Enable flash attention ("1" or "true")
- Gemma 4 on native bindings is still blocked —
llama-cpp-rs0.1.141 still lacksgemma4architecture support, so zipcode falls back to a recent externalllama-serverbinary whenZIPCODE_LLAMA_SERVER_BINis set orllama-serveris on PATH. - candle backend has no quantized Gemma — candle 0.8 has no
quantized_gemmamodule. Usesquantized_llamaas stand-in. Only compiles, does not actually load Gemma GGUF. - Native-format templates are metadata on llama-server —
ChatMLTemplate,Llama31Template, andGemmaTemplate.render_promptare all bypassed when llama-server runs with--jinja(production default). The GGUF's embedded Jinja template renders prompts and OpenAI-shapetool_callsare parsed byparse_response_tool_calls. The trait still drives the candle/llama-cpp-rs paths and the entireEmulatorTemplatepath. See module docstring oncrates/inference/src/chat_template.rsfor the table. - Small instruction-tuned models can't drive sub-agent or 12-call compaction flows end-to-end — the harness logic is verified by mock-based integration tests (
tests/agent_delegation.rs,tests/compaction.rs); a realzipcode promptexercising spawn_child or tier-1 compaction needs a model with strong tool-stopping training. Gemma 4 E2B reliably runs single tool calls + skilltool_allowlist. Thesupergemma4-26b-uncensoredvariant from Phase 5 testing under-stops on multi-tool tasks (hitsMAX_TOOL_ITERATIONS=25guard), but that is a model issue, not a harness one — the guard fires correctly. zipcode skillwas previously bypassingtool_allowlist— fixed in commit121bc87. CLI one-shot now applies the sameToolRegistry::create_filteredstep the in-loopSkillTool::executeuses, so authors can rely on the allowlist as a real security boundary.
.zipcode.mdin project root = project-specific instructions injected into system prompt.zipcode.jsonin project root = project config (overrides~/.zipcode/config.json).zipcode-todos.json= todo list created by TodoWrite tool~/.zipcode/models/= GGUF model storage~/.zipcode/sessions/= conversation history persistence
- Tests first when possible. Currently 75 tests across all crates.
cargo clippy -- -D warningsmust pass before commit.- Commit messages in English, conventional format (
feat:,fix:,test:,docs:,chore:). - Do not commit
.envor model files. - Keep inference backends behind feature flags to avoid forcing heavy deps.