go-mlx

Native Apple Metal GPU inference via mlx-c CGO bindings, implementing the inference.Backend and inference.TextModel interfaces from go-inference for Apple Silicon (M1-M4). Supports Gemma 3, Gemma 4 (dense and MoE), Qwen 2/3, and Llama 3 architectures from HuggingFace safetensors directories and GGUF checkpoints, with fused Metal kernels for RMSNorm, RoPE, scaled dot-product attention, KV cache management, LoRA fine-tuning with AdamW, and batch inference. The root package also exposes an RFC-style direct model API (mlx.LoadModel, model.Generate, model.GenerateStream) and a non-LLM frame-compute API (mlx.NewSession, Session.BeginFrame, Session.FinishFrame, PixelBuffer, KernelRGB565ToRGBA8, KernelNearestScale, KernelScanlineFilter, KernelCRTFilter, KernelSoftenFilter, KernelSharpenFilter) for Apple GPU-accelerated image and emulator workloads. A Python subprocess backend (mlxlm) is provided as a CGO-free alternative. Platform-restricted: darwin/arm64 only; a no-op stub compiles on all other platforms.

Module: dappco.re/go/mlx Licence: EUPL-1.2 Language: Go 1.26

Quick Start

import (
    "context"
    "fmt"

    "dappco.re/go/inference"
    _ "dappco.re/go/mlx"  // registers "metal" backend via init()
)

model, err := inference.LoadModel("/Volumes/Data/lem/safetensors/gemma-3-1b/")
if err != nil {
    panic(err)
}
defer model.Close()

for tok := range model.Generate(context.Background(), "Hello", inference.WithMaxTokens(256)) {
    fmt.Print(tok.Text)
}
if err := model.Err(); err != nil {
    panic(err)
}

Root API

import (
    "fmt"

    mlx "dappco.re/go/mlx"
)

model, err := mlx.LoadModel("/path/to/model",
    mlx.WithContextLength(8192),
    mlx.WithQuantization(4),
    mlx.WithDevice("gpu"),
)
if err != nil {
    panic(err)
}
defer model.Close()

reply, err := model.Generate("Explain Gemma 4 shared KV layers", mlx.WithMaxTokens(128))
if err != nil {
    panic(err)
}
fmt.Println(reply)

Frame Compute

import mlx "dappco.re/go/mlx"

session, err := mlx.NewSession(mlx.WithSessionLabel("frame-pipeline"))
if err != nil {
    panic(err)
}
defer session.Close()

src, err := session.NewPixelBuffer(mlx.PixelBufferDesc{
    Width:  320,
    Height: 224,
    Stride: 640,
    Format: mlx.PixelRGB565,
})
if err != nil {
    panic(err)
}
rgba, err := session.NewPixelBuffer(mlx.PixelBufferDesc{
    Width:  320,
    Height: 224,
    Stride: 1280,
    Format: mlx.PixelRGBA8,
})
if err != nil {
    panic(err)
}
scaled, err := session.NewPixelBuffer(mlx.PixelBufferDesc{
    Width:  960,
    Height: 672,
    Stride: 3840,
    Format: mlx.PixelRGBA8,
})
if err != nil {
    panic(err)
}

frameBytes := make([]byte, src.Descriptor().SizeBytes())
if err := src.Upload(frameBytes); err != nil {
    panic(err)
}
if err := session.BeginFrame(); err != nil {
    panic(err)
}
if err := session.Run(mlx.KernelRGB565ToRGBA8, mlx.KernelArgs{
    Inputs:  map[string]mlx.Buffer{"src": src},
    Outputs: map[string]mlx.Buffer{"dst": rgba},
}); err != nil {
    panic(err)
}
if err := session.Run(mlx.KernelNearestScale, mlx.KernelArgs{
    Inputs:  map[string]mlx.Buffer{"src": rgba},
    Outputs: map[string]mlx.Buffer{"dst": scaled},
}); err != nil {
    panic(err)
}
if err := session.Run(mlx.KernelScanlineFilter, mlx.KernelArgs{
    Inputs:  map[string]mlx.Buffer{"src": scaled},
    Outputs: map[string]mlx.Buffer{"dst": scaled},
    Scalars: map[string]float64{"strength": 0.3},
}); err != nil {
    panic(err)
}
frameMetrics, err := session.FinishFrame()
if err != nil {
    panic(err)
}

finalFrame, err := scaled.Read()
if err != nil {
    panic(err)
}
_ = finalFrame
_ = frameMetrics

Research-Grade Pipeline

go-mlx is positioned as a Go-native research-grade model runner — not just inference. The root package exposes the full training and operations pipeline so harnesses can stop reaching for Python mlx-lm:

Feature	Function	What it does
LoRA fine-tuning	`mlx.ApplyLoRA` + `mlx.NewAdamW`	Low-rank adaptation training with AdamW, mixed precision, gradient checkpointing
LoRA fusion	`mlx.FuseLoRAIntoModelPack(ctx, opts)`	Bake a trained LoRA adapter into the base model as a fresh safetensors pack
Knowledge distillation	`mlx.RunKnowledgeDistillation(ctx, runner, dataset, cfg)`	KL or soft-CE loss against a teacher's logits, with checkpoint resumption
GRPO	`mlx.RunGRPOReasoningTraining(ctx, runner, dataset, cfg)`	Group-relative policy optimisation with reward functions and reference KL
Eval	`mlx.RunModelEval(ctx, model, dataset, cfg)`	Dataset-native perplexity plus pluggable quality probes
Model merge	`mlx.MergeModelPacks(ctx, opts)`	Linear / SLERP / TIES / DARE merging of multiple model packs with provenance
GGUF quantise	`mlx.QuantizeModelPackToGGUF(ctx, opts)`	Native Go safetensors → GGUF Q8_0 / Q4_0 / Q4_K_M
KV snapshot	`snapshot.Save(path)` / `mlx.LoadKVSnapshot(path)`	Portable binary KV cache (Float32 or Q8 symmetric int8) for session restore
HF fit	`mlx.PlanHFModelFits(ctx, cfg)`	HuggingFace Hub metadata search to plan what fits on local hardware
Attention probe	`inference.AttentionInspector` adapter	Extract post-RoPE K vectors per head per layer for analysis

See docs/ and examples/ for the full surface.

Documentation

Compute Guide — frame-oriented Metal compute sessions, pixel buffers, kernels, metrics
Architecture — CGO binding, model architectures, weight loading, KV cache, attention, batch inference, LoRA training, mlxlm backend
Models — model loading, supported architectures, tokenisation, chat templates
Training — LoRA fine-tuning, AdamW, gradient computation, checkpoints, fusion
Distillation — knowledge distillation (KL, soft cross-entropy)
GRPO — group-relative policy optimisation for RL
Eval — dataset-native perplexity, quality probes, eval reports
Model Operations — merge, GGUF quantise, KV snapshot, HF fit
Development Guide — prerequisites (mlx-c CMake build), CGO flags, test patterns, benchmarks
Project History — completed phases, commit hashes, known limitations
Examples — runnable usage examples organised by type

Build & Test

git submodule update --init --recursive
go generate ./...        # builds mlx-c C library (required first time)
go test ./...
go build ./...

Licence

European Union Public Licence 1.2 — see LICENCE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 220 Commits
.core		.core
.forgejo/workflows		.forgejo/workflows
cpp		cpp
docs		docs
examples		examples
external		external
go		go
lib		lib
patches		patches
scripts		scripts
.codecov.yml		.codecov.yml
.editorconfig		.editorconfig
.gitignore		.gitignore
.gitmodules		.gitmodules
.golangci.yml		.golangci.yml
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CMakeLists.txt		CMakeLists.txt
CONTRIBUTING.md		CONTRIBUTING.md
LICENCE		LICENCE
README.md		README.md
go.work		go.work
go.work.sum		go.work.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

go-mlx

Quick Start

Root API

Frame Compute

Research-Grade Pipeline

Documentation

Build & Test

Licence

About

Uh oh!

Releases 1

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

go-mlx

Quick Start

Root API

Frame Compute

Research-Grade Pipeline

Documentation

Build & Test

Licence

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Contributors

Uh oh!

Languages