Skip to content

CogitatorTech/zigformer

ZigFormer Logo

ZigFormer

Tests License Docs Zig Version Release

An educational transformer-based LLM in pure Zig


ZigFormer is a fully functional implementation of a transformer-based large language model (LLM) written in Zig programming language. It aims to provide a clean, easy-to-understand LLM implementation with no large dependencies like PyTorch or TensorFlow. ZigFormer was mainly made for learning how a conventional transformer-based LLM works under the hood and is inspired by Andrej Karpathy's nanoGPT and nanochat projects, and follows the architecture described in the "Attention Is All You Need" and "Language Models are Unsupervised Multitask Learners" papers. It can be used as a Zig library for building LLMs or as a standalone application for training, inference, and chatting with the model.

The diagrams below show the high-level architecture and its core components.

ZigFormer Architecture

ZigFormer Workflow

Features

  • Implements core transformer architecture with multi-head self-attention
  • Supports both pretraining and instruction fine-tuning
  • Provides multiple decoding strategies (like greedy and beam search)
  • Includes a CLI for training and inference
  • Has a web-based UI for chatting with the model
  • Supports model checkpointing and the use of configuration files

See the ROADMAP.md for the list of implemented and planned features.

Important

ZigFormer is in early development, so bugs and breaking changes are expected. Please use the issues page to report bugs or request features.


Getting Started

You can get started with ZigFormer by following the steps below.

Installation

git clone https://github.com/CogitatorTech/zigformer.git
cd zigformer
zig build

Important

ZigFormer is developed and tested with Zig 0.15.2. It should work with newer versions, but it is not guaranteed.

Training a Model

zig build run -- --save-model model.bin

This will:

  1. Load the training datasets from datasets/simple_dataset/
  2. Build a vocabulary of tokens from the data
  3. Pretrain the model on the pretraining examples (raw text)
  4. Train (or fine-tune) the model on question-answer pairs
  5. Save the trained model to model.bin

Training parameters can be given through a configuration file or CLI arguments.

zig build run -- --config my_config.json

Sample CLI Configuration:

{
    "pretrain_path": "datasets/simple_dataset/pretrain.json",
    "train_path": "datasets/simple_dataset/train.json",
    "pre_epochs": 10,
    "chat_epochs": 10,
    "batch_size": 32,
    "accumulation_steps": 1,
    "pre_lr": 0.0005,
    "chat_lr": 0.0001,
    "save_model_path": "model.bin",
    "interactive": true
}

Important

A saved model only works with the model with the same configuration.

Using the Web UI

You can run the web-based UI to chat with the trained model:

zig build run-gui -- --load-model model.bin

The UI can be accessed at http://localhost:8085 by default.

You can also provide a configuration file for the UI:

zig build run-gui -- --config gui_config.json

Sample Web UI Configuration:

{
    "port": 8085,
    "host": "0.0.0.0",
    "pretrain_path": "datasets/simple_dataset/pretrain.json",
    "train_path": "datasets/simple_dataset/train.json",
    "load_model_path": "model.bin",
    "max_request_size": 1048576,
    "max_prompt_length": 1000,
    "timeout_seconds": 30
}

Available Options (CLI and Web UI)

zig build run -- --help
zig build run -- predict --help
zig build run-gui -- --help

Example Usage

# Train the model (using the default dataset and save the weights to 'model.bin')
zig build run -- --save-model model.bin
# Generate coherent text (using Beam search with beam width of 5)
zig build run -- predict --prompt "How do mountains form?" --beam-width 5
# Generate more diverse text (using top-k sampling with k=5)
zig build run -- predict --prompt "How do mountains form?" --top-k 5 --load-model model.bin
# Launch the web UI server and chat with the trained model on http://localhost:8085
zig build run-gui -- --load-model model.bin

ZigFormer Web UI


Documentation

You can find the full API documentation for the latest release of ZigFormer here.


Contributing

Contributions are always welcome! See CONTRIBUTING.md for details on how to make a contribution.

License

ZigFormer is licensed under the MIT License (see LICENSE).

Acknowledgements

  • The logo is from SVG Repo with some modifications.
  • This project uses the Chilli CLI framework.

About

An educational transformer-based LLM in pure Zig

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Sponsor this project