We build open-source tools for running large language models locally — on servers, desktops, mobile devices, and bare metal. No cloud. No vendor lock-in. Every language, every device.
pip install mullama && mullama run llama3.2:1b "Hello, local AI"
Cloud LLM APIs charge per token, require sending your data to third parties, and break when the internet does. We think inference should be a local capability — something your software has, not something it calls.
We cover the full stack. Most projects solve one layer. We solve five — from a unikernel that boots straight into inference, to a Flutter plugin that runs models in your pocket.
We embed, not just serve. Load a model inside your Python, Rust, Go, PHP, Node.js, Dart, or C app and call it like a function. No HTTP. No sidecar. No daemon. Zero overhead. Or spin up an OpenAI-compatible server when you actually need one.
We're polyglot by design. Six native language bindings, not wrappers around REST. PHP developers get the same first-class inference as Rust developers. That's rare — and intentional.
| Project | Strength | |
|---|---|---|
| mullama | Local LLM server & library | Drop-in Ollama replacement. 6 native language bindings (Python, Node.js, Go, PHP, Rust, C/C++). OpenAI + Anthropic API compatible. Embed in-process or run as a server — your choice. |
| llamafu | Mobile inference (Flutter) | Run LLMs directly on iOS and Android via FFI. Vision, tool calling, streaming, LoRA hot-swapping. No cloud, no latency, complete privacy. Works offline. |
| unillm | Inference runtime (Rust) | 47 model architectures through three composable abstractions. Hybrid KV cache (RadixAttention + PagedAttention). Continuous batching. Loads SafeTensors, GGUF, and PyTorch weights. |
| cllm | Bare-metal unikernel (C) | No operating system. Boots directly on x86 hardware, initializes a NIC driver, and serves inference over HTTP. The kernel is the application. |
| zigllm | Educational LLM (Zig) | Learn by building. 18 model families, 285+ tests, 6 progressive layers from raw tensors to text generation. Every component documented to teach why, not just how. |
47 model architectures — LLaMA, Qwen, Gemma, Phi, DeepSeek, Mistral, GPT-2, Whisper, BERT, Mamba, and 37 more. All sharing a unified Model trait.
6 native bindings — Python, Node.js, Go, PHP, Rust, C/C++. Call model.generate() directly. No HTTP serialization, no connection pooling, no JSON parsing.
7 GPU backends — CUDA, Metal, ROCm, OpenCL, Vulkan, SYCL, RPC. Run on whatever hardware you have.
18+ quantization formats — K-quant, IQ-quant, up to 95% memory reduction. Run 7B models on a phone.
Multimodal — Text, image, and real-time audio with voice activity detection. Vision models on mobile via llamafu.
Full Ollama compatibility — Same CLI commands, same Modelfile format, same model registry. Migrate in minutes.
# Local LLM server + library
pip install mullama && mullama run llama3.2:1b "Explain gravity"
# Mobile (Flutter)
flutter pub add llamafu
# Rust runtime
cargo add mullama
# Build an LLM from scratch
git clone https://github.com/cognisoc/zigllm && cd zigllm && zig build test
# Boot a unikernel
git clone https://github.com/cognisoc/cllm && cd cllm && make runEvery project is MIT or Apache-2.0. We especially welcome: