Adaptive Test-time Learning and Autonomous Specialization
ATLAS is a self-hosted coding assistant built on intelligent inference infrastructure. You point it at an open-weight model running locally, and it turns that model into something that competes with frontier systems, with no fine-tuning, no API calls, and no cloud in between.
Instead of training a larger model or routing to a hosted one, ATLAS wraps a frozen local model in a pipeline that plans before generating, verifies its own output against constraints it extracts from the problem, scores candidates with an energy-based lens, and repairs failures through self-generated test feedback. The weights never change. The intelligence lives in the scaffolding around them.
The result is a serious coding assistant that runs on a single consumer GPU for fractions of a cent per task. Nothing leaves your machine, no vendor can pull the model out from under you, and the entire stack is open source. One model, one GPU, no one else's infrastructure in the loop.
- 2026-04-13 - "How to Run an AI Coding Assistant on a $500 GPU and Beat Claude Sonnet" - devtrends.ru
- 2026-04-05 - V3.0.1 released - interactive CLI, Docker Compose deployment, 95.8% reliability
- 2026-04-03 - "$500 GPU Beats Claude: Local AI Revolution for Web Devs" - ownet.it
- 2026-03-29 - "A $500 GPU Just Outscored Claude Sonnet on Coding Benchmarks" - Aivy
- 2026-03-28 - "Why a $500 GPU Can Beat Claude Sonnet on Coding Benchmarks" - Data Science Collective
- 2026-03-27 - "ATLAS: A $500 GPU Outperforms Claude Sonnet" - Clauday
- 2026-03-26 - Hacker News front page - 489 points, 285 comments
- 2026-03-05 - V3.0 released - 74.6% LiveCodeBench pass@1-v(k=3) on frozen Qwen3-14B
- 2026-02-18 - V2.0 released - benchmark infrastructure, HumanEval/MBPP/LiveCodeBench/GPQA/SciCode evaluation suite
- atlas-proxy - Go-based agent loop that orchestrates the entire system.
- a. Tool-call routing - classifies file operations by complexity tier
- b. Grammar enforcement - GBNF schemas guarantee 100% valid JSON output
- c. Safety limits - turn caps, token budgets, timeout enforcement
- V3 Pipeline - multi-phase code generation that turns a single prompt into verified, high-quality output.
- a. PlanSearch - constraint-driven structured planning
- b. DivSampling - diverse candidate generation across temperature and strategy
- c. Budget Forcing - controls thinking token allocation per phase
- d. PR-CoT Repair - self-generated test cases for iterative fix cycles
- e. Refinement Loops - repeated sandbox verification and correction
- f. Derivation Chains - multi-step reasoning for complex problems
- Geometric Lens - energy-based scoring and retrieval without external oracles. (What is a "Geometric Lens"?)
- a. C(x) Cost Field - MLP that scores candidate quality from embeddings
- b. G(x) Quality Prediction - XGBoost model for selection decisions
- c. RAG / PageIndex V2 - AST-aware code retrieval and project indexing
- d. Confidence Router - Thompson Sampling routes compute where it matters
- Sandbox - isolated execution environment for build verification.
- a. Multi-language execution - Python, Rust, Go, C, Shell, and more
- b. Compilation and linting - syntax verification before scoring
- c. Test running - executes generated and existing test suites
- llama-server - local LLM inference on a single consumer GPU.
- a. CUDA acceleration - quantized model inference (Q6_K / Q4_K_M)
- b. Grammar-constrained decoding - structured output at the token level
- c. Self-embeddings - embedding extraction without a separate model
- Interactive CLI - type
atlasin any project directory and start building.
- a. Tool-call agent loop - read, write, edit, delete, run commands
- b. Streaming output - real-time response via SSE
- c. Project-aware context - automatic file discovery and injection
Full documentation - setup guides, architecture, configuration, troubleshooting, and benchmark reports - lives in the docs/ directory.
ATLAS requires a GPU with 16GB+ VRAM, Docker (with nvidia-container-toolkit) or Podman, and Python 3.9+. Currently tested on NVIDIA GPUs - ATLAS is not NVIDIA-specific, and ROCm support for AMD GPUs is on the roadmap. See SETUP.md for full installation instructions covering Docker Compose, bare-metal, and K3s deployment. Once running, type atlas in any project directory and start building.
- Tested on NVIDIA only - ATLAS uses llama.cpp for inference, which supports multiple accelerator backends. ROCm support is a V3.1 priority.
- 9B model not formally benchmarked - the CLI ships Qwen3.5-9B with the full V3 pipeline, but formal LiveCodeBench scores are from the 14B model. 9B benchmarks are V3.1 work.
- Complex feature additions can fail - adding features to existing projects succeeds ~67% of the time. The model sometimes over-explores instead of writing code.
- Grammar-constrained inference speed - ~51 tok/s on llama-server. Faster grammar integration is planned for V3.1.
V3.0.1 - Current release. Interactive CLI, Docker Compose deployment, V3 pipeline integration.
V3.1 - In progress.
- ROCm support - AMD GPU inference via llama.cpp ROCm backend
- Formal 9B benchmarks - LiveCodeBench, GPQA Diamond, SciCode on Qwen3.5-9B
- CLI reliability - expanded testing, targeting L6 ≥ 90%
- Grammar speed - C-side sampler chain for faster constrained decoding
We're building ATLAS in the open and we're actively looking for contributors and core maintainers. Whether you're fixing a bug, adding accelerator support, or rethinking a whole subsystem - there's a place for you here. If you believe open models deserve better infrastructure, come build with us.
Found a bug or hit a wall? Open an issue - you don't need to submit a fix. Bug reports and feedback help just as much as code.
See CONTRIBUTING.md for guidelines.
Licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).
