Local-first TUI tool that takes you from zero to a fully working local LLM stack.
Async, cross-platform, zero-overhead. Production-grade Rust.
LocalForge is a terminal-based application that guides you through:
- Setup Flow (Wizard): Hardware detection → Backend selection → llama.cpp build → Model download
- Main Workspace: Chat interface with streaming + Settings for server exposure
All steps are resumable and expertise-aware (Beginner/Intermediate/Expert).
- Hardware Detection — Auto-detect CPU, GPU (NVIDIA/AMD/Apple), VRAM, CUDA, ROCm, WSL
- Optimal Backend Selection — Ranked recommendations (CUDA, Metal, ROCm, Vulkan, CPU)
- Automated llama.cpp Build — Async build with tuned CMake flags per backend
- Optional llama-swap — Hot-swap models without restarting server
- Curated Model Registry — Hardware-fitted GGUF recommendations
- HuggingFace Search — Async search with GGUF filtering
- Robust Downloads — Async downloads with resume support
- Chat Interface — Streaming token generation, context meter, history
- Model Hot-Swap — Switch models instantly (if llama-swap installed)
- HTTPS Server — OpenAI-compatible API with API key authentication
- KV Cache Advisor — Intelligent slot/VRAM management for 5-10 concurrent users
- Expertise-Aware UI — Dynamic tooltips based on user level
- Async Runtime — Built on Tokio for responsive I/O
- Structured Logging — Tracing with rotating file appenders
- Typed Errors — Thiserror for robust error handling
- Retry Logic — Exponential backoff for all network ops
- Minimal Dependencies — Single binary, no heavy runtimes
Before building llama.cpp, you need Git, CMake, and a C/C++ Compiler:
Linux (Ubuntu/Debian):
sudo apt update
sudo apt install git cmake build-essential pkg-config libssl-devLinux (Fedora):
sudo dnf install git cmake gcc gcc-c++ makemacOS:
xcode-select --install
brew install cmake pkg-configWindows:
winget install Git.Git Kitware.CMake
# Install Visual Studio Build Tools (C++ workload)- CUDA (NVIDIA): NVIDIA CUDA Toolkit
- HIP (AMD): AMD ROCm
- Vulkan: Vulkan SDK
curl -fsSL https://raw.githubusercontent.com/madbeast-16/LocalForge/localforge-v1/install.sh | bashThe install script auto-detects your OS/architecture, downloads the v0.1.5 binary from GitHub Releases, and falls back to building from source if needed.
You can control the installation:
# Install to a specific directory
INSTALL_DIR=/usr/local/bin curl -fsSL https://raw.githubusercontent.com/madbeast-16/LocalForge/localforge-v1/install.sh | bash
# Install specific version
VERSION=v0.1.5 curl -fsSL https://raw.githubusercontent.com/madbeast-16/LocalForge/localforge-v1/install.sh | bashPre-built binaries are available at: https://github.com/madbeast-16/LocalForge/releases/tag/v0.1.5
Supported platforms:
x86_64-unknown-linux-gnu(Linux x86_64)aarch64-unknown-linux-gnu(Linux ARM64)x86_64-apple-darwin(macOS Intel)aarch64-apple-darwin(macOS Apple Silicon)x86_64-pc-windows-msvc(Windows)
git clone https://github.com/madbeast-16/LocalForge
cd LocalForge
cargo build --releaseBinary will be at target/release/localforge (~3-4 MB).
cargo install --path .localforgeThe TUI guides you through the complete setup flow.
localforge --headless build --cuda # Force CUDA build
localforge --headless build --metal # Force Metal buildDetect hardware:
localforge detectShow model recommendations:
localforge models # Hardware-fitted
localforge models --all # Show all curatedSearch HuggingFace:
localforge search "llama 3"Build llama.cpp:
localforge build # Auto-detect
localforge build --cuda # Force CUDA
localforge build --vulkan # Force VulkanConfiguration:
localforge config show
localforge config --init # Write default configLocation: ~/.config/localforge/config.toml
install_prefix = "~/.local"
models_dir = "~/.local/share/localforge/models"
headless = false
# backend = "Cuda"
# parallel_jobs = 8Clean, layered module structure:
src/
├── main.rs # Entry point, CLI args
├── lib.rs # Library root
├── app.rs # App state & routing
├── config/ # Config/state persistence
├── error.rs # Typed errors + retry
├── logger.rs # Tracing setup
├── hardware/ # Hardware detection
├── build/ # llama.cpp pipeline
├── models/ # Model discovery & download
├── inference/ # Inference engines
├── server/ # HTTP API server
├── compute/ # KV cache calculator
└── tui/ # Terminal UI
AppState: Central state with expertise level, persistence
struct AppState {
expertise: ExpertiseLevel, // Beginner/Intermediate/Expert
setup_complete: bool,
build_complete: bool,
downloaded_models: Vec<String>,
server_running: bool,
}RetryPolicy: Exponential backoff with jitter
pub struct RetryPolicy {
pub max_attempts: u32,
pub initial_backoff: Duration,
pub backoff_multiplier: f64,
pub jitter: bool,
}InferenceEngine: Pluggable inference backend
#[async_trait]
pub trait InferenceEngine: Send + Sync {
async fn load_model(&mut self, model: &ModelEntry) -> Result<()>;
async fn generate(&mut self, session: &ChatSession) -> Result<String>;
}cargo build --releasecargo testcargo doc --no-deps --open| Platform | Status | Notes |
|---|---|---|
| Linux x86_64 | ✅ Full | Primary target |
| Linux ARM64 | ✅ Full | Tested on AWS Graviton |
| macOS Intel | ✅ Full | Tested |
| macOS Apple Silicon | ✅ Full | Native Metal support |
| Windows x86_64 | ✅ Full | Native + WSL |
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Run
cargo fmtandcargo clippy - Submit a PR
Binary Size: < 12 MB (release, stripped) Idle RAM: < 30 MB Zero Dependencies: Single static binary
MIT
Built with: