Skip to content
@cognisoc

Cognisoc

LLM inference. Everywhere.

LLM inference. Everywhere.

We build open-source tools for running large language models locally — on servers, desktops, mobile devices, and bare metal. No cloud. No vendor lock-in. Every language, every device.

pip install mullama && mullama run llama3.2:1b "Hello, local AI"

Why we exist

Cloud LLM APIs charge per token, require sending your data to third parties, and break when the internet does. We think inference should be a local capability — something your software has, not something it calls.

We cover the full stack. Most projects solve one layer. We solve five — from a unikernel that boots straight into inference, to a Flutter plugin that runs models in your pocket.

We embed, not just serve. Load a model inside your Python, Rust, Go, PHP, Node.js, Dart, or C app and call it like a function. No HTTP. No sidecar. No daemon. Zero overhead. Or spin up an OpenAI-compatible server when you actually need one.

We're polyglot by design. Six native language bindings, not wrappers around REST. PHP developers get the same first-class inference as Rust developers. That's rare — and intentional.


The Stack

Project Strength
mullama Local LLM server & library Drop-in Ollama replacement. 6 native language bindings (Python, Node.js, Go, PHP, Rust, C/C++). OpenAI + Anthropic API compatible. Embed in-process or run as a server — your choice.
llamafu Mobile inference (Flutter) Run LLMs directly on iOS and Android via FFI. Vision, tool calling, streaming, LoRA hot-swapping. No cloud, no latency, complete privacy. Works offline.
unillm Inference runtime (Rust) 47 model architectures through three composable abstractions. Hybrid KV cache (RadixAttention + PagedAttention). Continuous batching. Loads SafeTensors, GGUF, and PyTorch weights.
cllm Bare-metal unikernel (C) No operating system. Boots directly on x86 hardware, initializes a NIC driver, and serves inference over HTTP. The kernel is the application.
zigllm Educational LLM (Zig) Learn by building. 18 model families, 285+ tests, 6 progressive layers from raw tensors to text generation. Every component documented to teach why, not just how.

What sets us apart

47 model architectures — LLaMA, Qwen, Gemma, Phi, DeepSeek, Mistral, GPT-2, Whisper, BERT, Mamba, and 37 more. All sharing a unified Model trait.

6 native bindings — Python, Node.js, Go, PHP, Rust, C/C++. Call model.generate() directly. No HTTP serialization, no connection pooling, no JSON parsing.

7 GPU backends — CUDA, Metal, ROCm, OpenCL, Vulkan, SYCL, RPC. Run on whatever hardware you have.

18+ quantization formats — K-quant, IQ-quant, up to 95% memory reduction. Run 7B models on a phone.

Multimodal — Text, image, and real-time audio with voice activity detection. Vision models on mobile via llamafu.

Full Ollama compatibility — Same CLI commands, same Modelfile format, same model registry. Migrate in minutes.


Get started

# Local LLM server + library
pip install mullama && mullama run llama3.2:1b "Explain gravity"

# Mobile (Flutter)
flutter pub add llamafu

# Rust runtime
cargo add mullama

# Build an LLM from scratch
git clone https://github.com/cognisoc/zigllm && cd zigllm && zig build test

# Boot a unikernel
git clone https://github.com/cognisoc/cllm && cd cllm && make run

Contributing

Every project is MIT or Apache-2.0. We especially welcome:

  • New model architectures for unillm
  • Language bindings and platform support for mullama
  • Mobile optimizations for llamafu
  • Educational content and tests for zigllm
  • Driver and inference work for cllm

cognisoc.com

Pinned Loading

  1. mullama mullama Public

    Run any LLM locally. Use it from any language. Deploy anywhere.

    Python 1

  2. llamafu llamafu Public

    Run AI models directly on mobile devices. No cloud. No latency. Complete privacy.

    Dart 1 1

  3. zigllm zigllm Public

    Learn how LLMs work by building one in Zig -- from tensors to text generation.

    Zig 5

  4. cllm cllm Public

    A bare-metal C unikernel for serving large language models -- no OS, no overhead.

    C

  5. unillm unillm Public

    A modular LLM inference runtime written in Rust.

    Rust

Repositories

Showing 7 of 7 repositories

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…