Cognisoc

LLM inference. Everywhere.

We build open-source tools for running large language models locally — on servers, desktops, mobile devices, and bare metal. No cloud. No vendor lock-in. Every language, every device.

pip install mullama && mullama run llama3.2:1b "Hello, local AI"

Why we exist

Cloud LLM APIs charge per token, require sending your data to third parties, and break when the internet does. We think inference should be a local capability — something your software has, not something it calls.

We cover the full stack. Most projects solve one layer. We solve five — from a unikernel that boots straight into inference, to a Flutter plugin that runs models in your pocket.

We embed, not just serve. Load a model inside your Python, Rust, Go, PHP, Node.js, Dart, or C app and call it like a function. No HTTP. No sidecar. No daemon. Zero overhead. Or spin up an OpenAI-compatible server when you actually need one.

We're polyglot by design. Six native language bindings, not wrappers around REST. PHP developers get the same first-class inference as Rust developers. That's rare — and intentional.

The Stack

	Project	Strength
mullama	Local LLM server & library	Drop-in Ollama replacement. 6 native language bindings (Python, Node.js, Go, PHP, Rust, C/C++). OpenAI + Anthropic API compatible. Embed in-process or run as a server — your choice.
llamafu	Mobile inference (Flutter)	Run LLMs directly on iOS and Android via FFI. Vision, tool calling, streaming, LoRA hot-swapping. No cloud, no latency, complete privacy. Works offline.
unillm	Inference runtime (Rust)	47 model architectures through three composable abstractions. Hybrid KV cache (RadixAttention + PagedAttention). Continuous batching. Loads SafeTensors, GGUF, and PyTorch weights.
cllm	Bare-metal unikernel (C)	No operating system. Boots directly on x86 hardware, initializes a NIC driver, and serves inference over HTTP. The kernel is the application.
zigllm	Educational LLM (Zig)	Learn by building. 18 model families, 285+ tests, 6 progressive layers from raw tensors to text generation. Every component documented to teach why, not just how.

What sets us apart

47 model architectures — LLaMA, Qwen, Gemma, Phi, DeepSeek, Mistral, GPT-2, Whisper, BERT, Mamba, and 37 more. All sharing a unified Model trait.

6 native bindings — Python, Node.js, Go, PHP, Rust, C/C++. Call model.generate() directly. No HTTP serialization, no connection pooling, no JSON parsing.

7 GPU backends — CUDA, Metal, ROCm, OpenCL, Vulkan, SYCL, RPC. Run on whatever hardware you have.

18+ quantization formats — K-quant, IQ-quant, up to 95% memory reduction. Run 7B models on a phone.

Multimodal — Text, image, and real-time audio with voice activity detection. Vision models on mobile via llamafu.

Full Ollama compatibility — Same CLI commands, same Modelfile format, same model registry. Migrate in minutes.

Get started

# Local LLM server + library
pip install mullama && mullama run llama3.2:1b "Explain gravity"

# Mobile (Flutter)
flutter pub add llamafu

# Rust runtime
cargo add mullama

# Build an LLM from scratch
git clone https://github.com/cognisoc/zigllm && cd zigllm && zig build test

# Boot a unikernel
git clone https://github.com/cognisoc/cllm && cd cllm && make run

Contributing

Every project is MIT or Apache-2.0. We especially welcome:

New model architectures for unillm
Language bindings and platform support for mullama
Mobile optimizations for llamafu
Educational content and tests for zigllm
Driver and inference work for cllm

cognisoc.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cognisoc

LLM inference. Everywhere.

Why we exist

The Stack

What sets us apart

Get started

Contributing

Pinned Loading

Repositories

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

People

Top languages

Uh oh!

Most used topics

Uh oh!