GitHub - CogitatorTech/zigformer: An educational transformer-based LLM in pure Zig

ZigFormer

An educational transformer-based LLM in pure Zig

ZigFormer is a fully functional implementation of a transformer-based large language model (LLM) written in Zig programming language. It aims to provide a clean, easy-to-understand LLM implementation with no large dependencies like PyTorch or TensorFlow. ZigFormer was mainly made for learning how a conventional transformer-based LLM works under the hood and is inspired by Andrej Karpathy's nanoGPT and nanochat projects, and follows the architecture described in the "Attention Is All You Need" and "Language Models are Unsupervised Multitask Learners" papers. It can be used as a Zig library for building LLMs or as a standalone application for training, inference, and chatting with the model.

The diagrams below show the high-level architecture and its core components.

Features

Implements core transformer architecture with multi-head self-attention
Supports both pretraining and instruction fine-tuning
Provides multiple decoding strategies (like greedy and beam search)
Includes a CLI for training and inference
Has a web-based UI for chatting with the model
Supports model checkpointing and the use of configuration files

See the ROADMAP.md for the list of implemented and planned features.

Important

ZigFormer is in early development, so bugs and breaking changes are expected. Please use the issues page to report bugs or request features.

Getting Started

You can get started with ZigFormer by following the steps below.

Installation

git clone https://github.com/CogitatorTech/zigformer.git
cd zigformer
zig build

Important

ZigFormer is developed and tested with Zig 0.15.2. It should work with newer versions, but it is not guaranteed.

Training a Model

zig build run -- --save-model model.bin

This will:

Load the training datasets from datasets/simple_dataset/
Build a vocabulary of tokens from the data
Pretrain the model on the pretraining examples (raw text)
Train (or fine-tune) the model on question-answer pairs
Save the trained model to model.bin

Training parameters can be given through a configuration file or CLI arguments.

zig build run -- --config my_config.json

Sample CLI Configuration:

{
    "pretrain_path": "datasets/simple_dataset/pretrain.json",
    "train_path": "datasets/simple_dataset/train.json",
    "pre_epochs": 10,
    "chat_epochs": 10,
    "batch_size": 32,
    "accumulation_steps": 1,
    "pre_lr": 0.0005,
    "chat_lr": 0.0001,
    "save_model_path": "model.bin",
    "interactive": true
}

Important

A saved model only works with the model with the same configuration.

Using the Web UI

You can run the web-based UI to chat with the trained model:

zig build run-gui -- --load-model model.bin

The UI can be accessed at http://localhost:8085 by default.

You can also provide a configuration file for the UI:

zig build run-gui -- --config gui_config.json

Sample Web UI Configuration:

{
    "port": 8085,
    "host": "0.0.0.0",
    "pretrain_path": "datasets/simple_dataset/pretrain.json",
    "train_path": "datasets/simple_dataset/train.json",
    "load_model_path": "model.bin",
    "max_request_size": 1048576,
    "max_prompt_length": 1000,
    "timeout_seconds": 30
}

Available Options (CLI and Web UI)

zig build run -- --help
zig build run -- predict --help
zig build run-gui -- --help

Example Usage

# Train the model (using the default dataset and save the weights to 'model.bin')
zig build run -- --save-model model.bin

# Generate coherent text (using Beam search with beam width of 5)
zig build run -- predict --prompt "How do mountains form?" --beam-width 5

# Generate more diverse text (using top-k sampling with k=5)
zig build run -- predict --prompt "How do mountains form?" --top-k 5 --load-model model.bin

# Launch the web UI server and chat with the trained model on http://localhost:8085
zig build run-gui -- --load-model model.bin

Documentation

You can find the full API documentation for the latest release of ZigFormer here.

Contributing

Contributions are always welcome! See CONTRIBUTING.md for details on how to make a contribution.

License

ZigFormer is licensed under the MIT License (see LICENSE).

Acknowledgements

The logo is from SVG Repo with some modifications.
This project uses the Chilli CLI framework.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github		.github
assets		assets
datasets		datasets
src		src
.editorconfig		.editorconfig
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
ROADMAP.md		ROADMAP.md
build.zig		build.zig
build.zig.zon		build.zig.zon
logo.svg		logo.svg
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

ZigFormer

Features

Getting Started

Installation

Training a Model

Using the Web UI

Available Options (CLI and Web UI)

Example Usage

Documentation

Contributing

License

Acknowledgements

About

Uh oh!

Releases 2

Sponsor this project

Uh oh!

Uh oh!

Languages

Uh oh!

License

CogitatorTech/zigformer

Folders and files

Latest commit

History

Repository files navigation

ZigFormer

Features

Getting Started

Installation

Training a Model

Using the Web UI

Available Options (CLI and Web UI)

Example Usage

Documentation

Contributing

License

Acknowledgements

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Sponsor this project

Uh oh!

Uh oh!

Languages