MicroGPT-C

Tiny specialist models, coordinated by a pipeline, outperform single models on focused tasks.

The Story

This project started as a C port of Andrej Karpathy's microGPT.py — a ~200 line Python GPT that trains a character-level Transformer from scratch. We rewrote it in pure C99 with zero dependencies, and as you'd expect from C, it's much faster.

Then we asked a bigger question: can tiny models actually be intelligent?

Not by making them bigger — the industry already does that. Instead, by making them work together. We took the same ~460K parameter engine and trained it on different tasks: one becomes a planner, another becomes a player, another becomes a judge. Each one starts as the same blank "stem cell" and differentiates based on its training data.

We call them organelles — like the specialised structures inside a biological cell.

The result surprised us. A single organelle playing Connect-4 wins about 55% of the time. But when a planner and player coordinate through a shared protocol, the system hits 90% — even though the individual models are still wrong half the time. The pipeline catches the mistakes. The coordination is the intelligence.

We've now tested this across 11 logic games, from Tic-Tac-Toe to Sudoku, with models ranging from 30K to 460K parameters. The pattern holds: right-sized specialists working together consistently outperform a single larger model working alone.

Then we asked: does it work on real-world data?

We ran a lottery prediction experiment as a negative control for organelle intelligence. The lottery model hit an entropy floor at 0.50 loss — it learned nothing, because lottery draws are random. This proves the engine's integrity: the impressive 78–91% accuracy seen in Mastermind and Connect-4 is a result of the model genuinely learning underlying rules, not some hidden flaw in the training engine.

We also explored applying OPA to continuous-valued domains like financial time-series. This revealed a fundamental insight: the 31-character vocabulary that makes game coordination reliable destroys the continuous gradients that prediction requires — what we call the "Discretisation Wall." Bridging categorical reasoning with numerical sensing is an active research direction.

Same engine. Same architecture. One learns game patterns, one hits an entropy floor on random data, and one maps the boundary between pattern matching and temporal prediction. That's three kinds of proof.

The full research journey — from character-level Transformer to VM-based code generation through the calibrated three-bound scaling-curve closure — is documented in Composable Intelligence at the Edge (21 chapters + appendix, online version).

Honest-claim note (May 2026): the project's headline numbers were re-audited after the closing scaling-curve experiment caught a curator-self-overlap leakage incident. The restated calibrated claim — ~75-80 % retrieval on novel-paraphrase tests in distinctive-noun domains, three documented structural bounds (curator-, model-, domain-bounded), audit infrastructure baked in via tools/scaling_leakage_audit.sh — replaces earlier inflated retrieval claims. See docs/research/ORGANELLE_STATE.md for the synthesis and docs/engineering/CLEAN_ROOM_IMPLEMENTATION/RESEARCH_DISCLOSURE.md for the regulator-friendly disclosure register. This repository is research-only; the productisation strategy and per-vertical implementation plans were migrated to a private companion repo (organelles.bio) on 2026-05-01 — see docs/MIGRATED_TO_ORGANELLES_BIO.md for the index.

Quick Start

git clone https://github.com/enjector/microgpt-c.git
cd microgpt-c
mkdir build && cd build
cmake ..
cmake --build . --config Release

# Train a name generator in < 1 second (4K params)
./names_demo

# Train Shakespeare text generation (840K params, character-level)
./shakespeare_demo

# Train Shakespeare word-level generation (510K params, ~40K tok/s inference, 2 min training)
./shakespeare_word_demo

# Generate infinite Word-Level Shakespeare using Memory Sparse Attention (MSA)
./msa_infinite_shakespeare

# Generate word-level Shakespeare with TurboQuant 4-bit memory compression
cd demos/turbo_quant
../../build/tq_shakespeare_tq

# Run a multi-organelle game pipeline (88% win rate)
./connect4_demo

All 11 game experiments, the lottery negative control, 3 pretrained checkpoints, 97 unit tests, and 22 benchmarks are included. See the full list in demos/character-level/.

Performance Highlights

All benchmarks on Apple M2 Max (dev machine), single-threaded unless noted. Models are 360KB–5.4MB and compile anywhere with a C99 compiler. Edge device testing is a future research stage. See PERFORMANCE for full details.

Engine	Params	Training	Inference	Notes
Character-level (Shakespeare)	841K	28K tok/s	16K tok/s	14 min, 12 threads
Word-level (Shakespeare)	510K	12.5K tok/s	40K tok/s	2 min, 12 threads
VM engine (dispatch)	—	—	3.7–5.8M ops/s	Single-threaded
Micro-benchmark (tiny model)	6.5K	642K tok/s	1.55M infer/s	Float32, 1 thread
SSD ensemble (5-vote, prefix cache)	6.5K	—	1.9× faster	vs old ensemble (arXiv:2603.03251)

vs. Karpathy's microgpt.py: ~1,000× faster training, ~700× faster inference (expected for C vs Python; the real contribution is the orchestration layer).

Game Leaderboard (11 Games)

All games: trained organelle vs random opponent, 100 evaluation games each. Full details in RESEARCH_ORGANELLE_GAMES.

Game	Organelles	Params	Size	Total	Training	Result
Pentago	2	92K	1.1 MB	2.2 MB	~9 min	91% win
8-Puzzle	5	460K	5.4 MB	27 MB	~7 min	90% solve
Connect-4	2	460K	5.4 MB	10.8 MB	~21 min	88% win
Tic-Tac-Toe	2	460K	5.4 MB	10.8 MB	~17 min	87% w+d
Mastermind	2	92K	1.1 MB	2.2 MB	~8 min	79% solve
Sudoku	2	160K	1.9 MB	3.8 MB	~3 min	78% solve
Othello	2	92K	1.1 MB	2.2 MB	~8 min	67% win
Klotski	2	30K	360 KB	720 KB	~36 sec	62% solve
Red Donkey	2	30K	360 KB	720 KB	~38 sec	19% solve
Lights Out	2	160K	1.9 MB	3.8 MB	~4 min	10% solve
Hex	2	92K	1.1 MB	2.2 MB	~3 min	27% win

Negative Control

Experiment	Organelles	Params	Size	Training	Result	Interpretation
Lottery	2	163K	1.9 MB	~5 min	Entropy floor	Negative control ✓

Innovations

Key technical contributions shipped in this engine:

Innovation	Description	Evidence
🧬 Organelle Pipeline Architecture	Composable specialist micro-models coordinated by deterministic C scaffolding	11 games, 91% win (Pentago) to 90% solve (8-Puzzle)
💾 Memory Sparse Attention (MSA)	Infinite sequence lengths routed via $O(1)$ LRU-paged latent storage chunks. Removes quadratic memory growth for endless CRM interactions.
🗜️ TurboQuant Memory Compression	4-bit dual-state quantization (MSE Codebooks + 1-bit QJL Residuals). 8x memory reduction with +25% generation speedup, validated at 1.3M+ encodes/sec.
💪 TinyLlama-Class Resiliency	Zero `NaN` instability. SwiGLU, RMSNorm, grouped-query attention, and decoupled weight decay, rigorously audited against PyTorch output logits.	Zero invalid moves across all 11 games
⚡ Prefix KV Cache Sharing	Prompt processed once, KV state copied per ensemble vote — eliminates redundant inference	1.9–5.7× ensemble speedup (arXiv:2603.03251)
🔮 Speculative Decoding	Draft organelle generates candidates, target verifies with KV rollback on rejection	Functional with acceptance statistics tracking
🧠 Neural Algorithmic Reasoning	Deterministic scaffolding (Kanban, cycle detector, judge) frees model capacity for pattern matching	~340 lines of C replaces what gradient descent handles poorly
📝 Dual Tokenisation	Character-level (zero `<unk>`) and word-level (O(1) hash, 2.5× faster inference)	Shakespeare: 16K→40K tok/s
🔧 Compile-Time Architecture	`N_EMBD`, `N_LAYER`, `BLOCK_SIZE` etc. as CMake defines — zero runtime overhead	30K–841K params, 360KB–5.4MB
🖥️ Metal GPU + SIMD + BLAS	Optional Apple Metal shaders, NEON auto-vectorisation, Accelerate BLAS	All opt-in, zero-dependency baseline
📦 Paged KV Cache	Memory-efficient attention for constrained deployments	Opt-in via `-DMICROGPT_PAGED_KV=ON`
🔀 Block Attention Residuals	Learned depth-attention replaces additive residuals — preserves prompt signal through deep layers	Opt-in via `-DMICROGPT_ATTN_RES=ON` (paper)
🎯 Negative Control Methodology	Lottery experiment proves engine learns patterns, not artefacts	Entropy floor at 0.50 (theoretical maximum)
🧭 DeepSeek-V4 Port Stack	Active-attention triumvirate (Partial RoPE + Attention Sink + Q/K RMSNorm) ported from DeepSeek-V4 §2.3.3 onto a CPU-first C99 engine. Rope-aware MSA pool/recency injection makes long-context inference relative-position-correct. All four flags off by default; combined stack opt-in.	−8.7% held-out PPL on deep config (4-layer 138K-param), 0 new params, ~1% extra runtime. See V4 port roadmap.
🔌 Pipeline IR + Wiring Organelle (multi-organelle + manifold-retrieval)	Typed graph IR (DAG + verifier + text round-trip + DOT) emitted by a 540K-param word-level wiring organelle plus a 540K-param planner organelle. Phase 2c ships anchor-retrieval generation over a 20D Geodesic manifold: a 20-entry canonical @graph table indexed by Geodesic top-1 prediction over a handcoded keyword embedder. Replaces autoregressive token generation with table lookup. Phase 2d leakage audit (§38) confirmed that 13 of 20 original held-out prompts were verbatim in the wiring training corpus (introduced by Phase 13). Restated honest headlines: anchor-retrieval mechanism — 🎯 100% (20/20) on the clean Phase 2c paraphrases (no training-corpus overlap); wiring transformer alone — 7/20 (35%) on the same clean set. Run `./wiring_organelle_demo --clean-only` (anchor retrieval, 100%), `./wiring_organelle_demo --composition` (multi-stage composition, 60%), or `./wiring_organelle_demo --no-anchor --clean-only` (wiring-only baseline, 35%). For Phase 4: `./corpus_expand pipeline_corpus_phase4_train.txt 42` then `./manifold_tfidf_demo pipeline_corpus_adversarial.txt pipeline_corpus_phase4_train.txt` (TF-IDF on expanded corpus, 90% on adversarial axis-2 vs handcoded 10%). See the standalone paper, the development log including §38 leakage audit, §39 works/doesn't-work examples, §40 pre-registered Phase 3, §41 Phase 3a falsification, §42+§43 Phase 3b shipped, §44 state-of-the-arc, §45 pre-registered Phase 4 + §46 corpus expansion shipped. Manifold-learning research.

Explore Further

Topic	Link
🧭 Where the research stands today (start here)	ORGANELLE_STATE
📖 Book: Composable Intelligence at the Edge	PDF · Online · Chapters
❓ FAQ	FAQ.md
🧬 The stem cell philosophy	VISION.md
🏆 Game leaderboard (11 games)	RESEARCH_ORGANELLE_GAMES
🎲 Lottery experiment (entropy baseline)	lottery/README.md
🔬 Pipeline architecture (white paper)	RESEARCH_ORGANELLE_PIPELINE
🧠 Reasoning conclusion	RESEARCH_ORGANELLE_REASONING
🔌 Pipeline IR + Wiring Organelle (paper)	RESEARCH_WIRING_ORGANELLE_PAPER
🔬 Pipeline IR + Wiring Organelle (full development log)	RESEARCH_PIPELINE_IR
📐 Calibrated three-bound scaling claim (post-Phase-3)	wiring_scaling_post_phase3
🛡️ Standing leakage-audit protection	tools/scaling_leakage_audit.sh
📑 Honest disclosure register (cancelled phases, restated headlines)	RESEARCH_DISCLOSURE
🏛️ Clean-room rebuild-test corpus (BS / TDD / FS / BRD / FRD / NFRD / TRACEABILITY)	docs/engineering/CLEAN_ROOM_IMPLEMENTATION/
🌐 Manifold-learning composition (research sketch)	RESEARCH_MANIFOLD_LEARNING
📚 Using as a library	FUNCTIONAL_SPEC
⚡ Performance & benchmarks	PERFORMANCE
🚀 SSD inference optimisations	RESEARCH_SSD
🔀 Attention Residuals research	RESEARCH_ATTN_RES
🔧 Build options (Metal, BLAS, INT8, SIMD)	BUILD_OPTIONS
🛠️ Extending the Wiring Organelle	EXTENDING_WIRING_ORGANELLE
🌳 Project as a Node-style runtime (analysis)	NODE_ANALYSIS
🤝 Contributing	CONTRIBUTING.md
📋 Data licensing	DATA_LICENSE.md
🔒 Productisation artefacts (migrated to private companion repo)	MIGRATED_TO_ORGANELLES_BIO

Requirements

C99 compiler (GCC, Clang, MSVC)
CMake 3.10+
No other dependencies

Optional: Git LFS for pretrained checkpoints (git lfs pull).

Responsible Use

MicroGPT-C runs entirely on-device with no telemetry, no cloud calls, and no data collection. Small models trained on narrow corpora inherit the biases of that corpus — be aware of this when deploying. High confidence means the model has seen similar patterns, not that the output is correct. Always validate through deterministic checks (the Judge pattern) or human review for safety-critical applications.

See CONTRIBUTING.md for ethics guidelines.

Research Team

This project was built transparently with human–AI collaboration — the same philosophy of coordinated intelligence that MicroGPT-C explores.

Role	Member
🧭 Principal Research Manager	Ajay Soni — research direction, validation, and decisions
💻 Engineering & Documentation	Claude — coding, documentation, and junior research
🔬 Senior Research Assistant	Grok — in-depth analysis and insights
🎨 Senior Research Assistant	Gemini — creative synthesis and validation
📚 Community Education	NotebookLM — accessible explanations and education materials

License

MIT — see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MicroGPT-C

Tiny specialist models, coordinated by a pipeline, outperform single models on focused tasks.

The Story

Quick Start

Performance Highlights

Game Leaderboard (11 Games)

Negative Control

Innovations

Explore Further

Requirements

Responsible Use

Research Team

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 226 Commits
.github/workflows		.github/workflows
book		book
demos		demos
docs		docs
models		models
src		src
tests		tests
tools		tools
.gitattributes		.gitattributes
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CMakeLists.txt		CMakeLists.txt
CONTRIBUTING.md		CONTRIBUTING.md
DATA_LICENSE.md		DATA_LICENSE.md
FAQ.md		FAQ.md
LICENSE		LICENSE
README.md		README.md
VISION.md		VISION.md
bootstrap.bat		bootstrap.bat
bootstrap.sh		bootstrap.sh

Folders and files

Latest commit

History

Repository files navigation

MicroGPT-C

Tiny specialist models, coordinated by a pipeline, outperform single models on focused tasks.

The Story

Quick Start

Performance Highlights

Game Leaderboard (11 Games)

Negative Control

Innovations

Explore Further

Requirements

Responsible Use

Research Team

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages