This project started as a C port of Andrej Karpathy's microGPT.py — a ~200 line Python GPT that trains a character-level Transformer from scratch. We rewrote it in pure C99 with zero dependencies, and as you'd expect from C, it's much faster.
Then we asked a bigger question: can tiny models actually be intelligent?
Not by making them bigger — the industry already does that. Instead, by making them work together. We took the same ~460K parameter engine and trained it on different tasks: one becomes a planner, another becomes a player, another becomes a judge. Each one starts as the same blank "stem cell" and differentiates based on its training data.
We call them organelles — like the specialised structures inside a biological cell.
The result surprised us. A single organelle playing Connect-4 wins about 55% of the time. But when a planner and player coordinate through a shared protocol, the system hits 90% — even though the individual models are still wrong half the time. The pipeline catches the mistakes. The coordination is the intelligence.
We've now tested this across 11 logic games, from Tic-Tac-Toe to Sudoku, with models ranging from 30K to 460K parameters. The pattern holds: right-sized specialists working together consistently outperform a single larger model working alone.
Then we asked: does it work on real-world data?
We ran a lottery prediction experiment as a negative control for organelle intelligence. The lottery model hit an entropy floor at 0.50 loss — it learned nothing, because lottery draws are random. This proves the engine's integrity: the impressive 78–91% accuracy seen in Mastermind and Connect-4 is a result of the model genuinely learning underlying rules, not some hidden flaw in the training engine.
We also explored applying OPA to continuous-valued domains like financial time-series. This revealed a fundamental insight: the 31-character vocabulary that makes game coordination reliable destroys the continuous gradients that prediction requires — what we call the "Discretisation Wall." Bridging categorical reasoning with numerical sensing is an active research direction.
Same engine. Same architecture. One learns game patterns, one hits an entropy floor on random data, and one maps the boundary between pattern matching and temporal prediction. That's three kinds of proof.
The full research journey — from character-level Transformer to VM-based code generation through the calibrated three-bound scaling-curve closure — is documented in Composable Intelligence at the Edge (21 chapters + appendix, online version).
Honest-claim note (May 2026): the project's headline numbers were re-audited after the closing scaling-curve experiment caught a curator-self-overlap leakage incident. The restated calibrated claim — ~75-80 % retrieval on novel-paraphrase tests in distinctive-noun domains, three documented structural bounds (curator-, model-, domain-bounded), audit infrastructure baked in via
tools/scaling_leakage_audit.sh— replaces earlier inflated retrieval claims. Seedocs/research/ORGANELLE_STATE.mdfor the synthesis anddocs/engineering/CLEAN_ROOM_IMPLEMENTATION/RESEARCH_DISCLOSURE.mdfor the regulator-friendly disclosure register. This repository is research-only; the productisation strategy and per-vertical implementation plans were migrated to a private companion repo (organelles.bio) on 2026-05-01 — seedocs/MIGRATED_TO_ORGANELLES_BIO.mdfor the index.
git clone https://github.com/enjector/microgpt-c.git
cd microgpt-c
mkdir build && cd build
cmake ..
cmake --build . --config Release
# Train a name generator in < 1 second (4K params)
./names_demo
# Train Shakespeare text generation (840K params, character-level)
./shakespeare_demo
# Train Shakespeare word-level generation (510K params, ~40K tok/s inference, 2 min training)
./shakespeare_word_demo
# Generate infinite Word-Level Shakespeare using Memory Sparse Attention (MSA)
./msa_infinite_shakespeare
# Generate word-level Shakespeare with TurboQuant 4-bit memory compression
cd demos/turbo_quant
../../build/tq_shakespeare_tq
# Run a multi-organelle game pipeline (88% win rate)
./connect4_demoAll 11 game experiments, the lottery negative control, 3 pretrained checkpoints, 97 unit tests, and 22 benchmarks are included. See the full list in demos/character-level/.
All benchmarks on Apple M2 Max (dev machine), single-threaded unless noted. Models are 360KB–5.4MB and compile anywhere with a C99 compiler. Edge device testing is a future research stage. See PERFORMANCE for full details.
| Engine | Params | Training | Inference | Notes |
|---|---|---|---|---|
| Character-level (Shakespeare) | 841K | 28K tok/s | 16K tok/s | 14 min, 12 threads |
| Word-level (Shakespeare) | 510K | 12.5K tok/s | 40K tok/s | 2 min, 12 threads |
| VM engine (dispatch) | — | — | 3.7–5.8M ops/s | Single-threaded |
| Micro-benchmark (tiny model) | 6.5K | 642K tok/s | 1.55M infer/s | Float32, 1 thread |
| SSD ensemble (5-vote, prefix cache) | 6.5K | — | 1.9× faster | vs old ensemble (arXiv:2603.03251) |
vs. Karpathy's microgpt.py: ~1,000× faster training, ~700× faster inference (expected for C vs Python; the real contribution is the orchestration layer).
All games: trained organelle vs random opponent, 100 evaluation games each. Full details in RESEARCH_ORGANELLE_GAMES.
| Game | Organelles | Params | Size | Total | Training | Result |
|---|---|---|---|---|---|---|
| Pentago | 2 | 92K | 1.1 MB | 2.2 MB | ~9 min | 91% win |
| 8-Puzzle | 5 | 460K | 5.4 MB | 27 MB | ~7 min | 90% solve |
| Connect-4 | 2 | 460K | 5.4 MB | 10.8 MB | ~21 min | 88% win |
| Tic-Tac-Toe | 2 | 460K | 5.4 MB | 10.8 MB | ~17 min | 87% w+d |
| Mastermind | 2 | 92K | 1.1 MB | 2.2 MB | ~8 min | 79% solve |
| Sudoku | 2 | 160K | 1.9 MB | 3.8 MB | ~3 min | 78% solve |
| Othello | 2 | 92K | 1.1 MB | 2.2 MB | ~8 min | 67% win |
| Klotski | 2 | 30K | 360 KB | 720 KB | ~36 sec | 62% solve |
| Red Donkey | 2 | 30K | 360 KB | 720 KB | ~38 sec | 19% solve |
| Lights Out | 2 | 160K | 1.9 MB | 3.8 MB | ~4 min | 10% solve |
| Hex | 2 | 92K | 1.1 MB | 2.2 MB | ~3 min | 27% win |
| Experiment | Organelles | Params | Size | Training | Result | Interpretation |
|---|---|---|---|---|---|---|
| Lottery | 2 | 163K | 1.9 MB | ~5 min | Entropy floor | Negative control ✓ |
Key technical contributions shipped in this engine:
| Innovation | Description | Evidence |
|---|---|---|
| 🧬 Organelle Pipeline Architecture | Composable specialist micro-models coordinated by deterministic C scaffolding | 11 games, 91% win (Pentago) to 90% solve (8-Puzzle) |
| 💾 Memory Sparse Attention (MSA) | Infinite sequence lengths routed via |
|
| 🗜️ TurboQuant Memory Compression | 4-bit dual-state quantization (MSE Codebooks + 1-bit QJL Residuals). 8x memory reduction with +25% generation speedup, validated at 1.3M+ encodes/sec. | |
| 💪 TinyLlama-Class Resiliency | Zero NaN instability. SwiGLU, RMSNorm, grouped-query attention, and decoupled weight decay, rigorously audited against PyTorch output logits. |
Zero invalid moves across all 11 games |
| ⚡ Prefix KV Cache Sharing | Prompt processed once, KV state copied per ensemble vote — eliminates redundant inference | 1.9–5.7× ensemble speedup (arXiv:2603.03251) |
| 🔮 Speculative Decoding | Draft organelle generates candidates, target verifies with KV rollback on rejection | Functional with acceptance statistics tracking |
| 🧠 Neural Algorithmic Reasoning | Deterministic scaffolding (Kanban, cycle detector, judge) frees model capacity for pattern matching | ~340 lines of C replaces what gradient descent handles poorly |
| 📝 Dual Tokenisation | Character-level (zero <unk>) and word-level (O(1) hash, 2.5× faster inference) |
Shakespeare: 16K→40K tok/s |
| 🔧 Compile-Time Architecture |
N_EMBD, N_LAYER, BLOCK_SIZE etc. as CMake defines — zero runtime overhead |
30K–841K params, 360KB–5.4MB |
| 🖥️ Metal GPU + SIMD + BLAS | Optional Apple Metal shaders, NEON auto-vectorisation, Accelerate BLAS | All opt-in, zero-dependency baseline |
| 📦 Paged KV Cache | Memory-efficient attention for constrained deployments | Opt-in via -DMICROGPT_PAGED_KV=ON
|
| 🔀 Block Attention Residuals | Learned depth-attention replaces additive residuals — preserves prompt signal through deep layers | Opt-in via -DMICROGPT_ATTN_RES=ON (paper) |
| 🎯 Negative Control Methodology | Lottery experiment proves engine learns patterns, not artefacts | Entropy floor at 0.50 (theoretical maximum) |
| 🧭 DeepSeek-V4 Port Stack | Active-attention triumvirate (Partial RoPE + Attention Sink + Q/K RMSNorm) ported from DeepSeek-V4 §2.3.3 onto a CPU-first C99 engine. Rope-aware MSA pool/recency injection makes long-context inference relative-position-correct. All four flags off by default; combined stack opt-in. | −8.7% held-out PPL on deep config (4-layer 138K-param), 0 new params, ~1% extra runtime. See V4 port roadmap. |
| 🔌 Pipeline IR + Wiring Organelle (multi-organelle + manifold-retrieval) | Typed graph IR (DAG + verifier + text round-trip + DOT) emitted by a 540K-param word-level wiring organelle plus a 540K-param planner organelle. Phase 2c ships anchor-retrieval generation over a 20D Geodesic manifold: a 20-entry canonical @graph table indexed by Geodesic top-1 prediction over a handcoded keyword embedder. Replaces autoregressive token generation with table lookup. Phase 2d leakage audit (§38) confirmed that 13 of 20 original held-out prompts were verbatim in the wiring training corpus (introduced by Phase 13). Restated honest headlines: anchor-retrieval mechanism — 🎯 100% (20/20) on the clean Phase 2c paraphrases (no training-corpus overlap); wiring transformer alone — 7/20 (35%) on the same clean set. Run ./wiring_organelle_demo --clean-only (anchor retrieval, 100%), ./wiring_organelle_demo --composition (multi-stage composition, 60%), or ./wiring_organelle_demo --no-anchor --clean-only (wiring-only baseline, 35%). For Phase 4: ./corpus_expand pipeline_corpus_phase4_train.txt 42 then ./manifold_tfidf_demo pipeline_corpus_adversarial.txt pipeline_corpus_phase4_train.txt (TF-IDF on expanded corpus, 90% on adversarial axis-2 vs handcoded 10%). See the standalone paper, the development log including §38 leakage audit, §39 works/doesn't-work examples, §40 pre-registered Phase 3, §41 Phase 3a falsification, §42+§43 Phase 3b shipped, §44 state-of-the-arc, §45 pre-registered Phase 4 + §46 corpus expansion shipped. Manifold-learning research. |
| Topic | Link |
|---|---|
| 🧭 Where the research stands today (start here) | ORGANELLE_STATE |
| 📖 Book: Composable Intelligence at the Edge | PDF · Online · Chapters |
| ❓ FAQ | FAQ.md |
| 🧬 The stem cell philosophy | VISION.md |
| 🏆 Game leaderboard (11 games) | RESEARCH_ORGANELLE_GAMES |
| 🎲 Lottery experiment (entropy baseline) | lottery/README.md |
| 🔬 Pipeline architecture (white paper) | RESEARCH_ORGANELLE_PIPELINE |
| 🧠 Reasoning conclusion | RESEARCH_ORGANELLE_REASONING |
| 🔌 Pipeline IR + Wiring Organelle (paper) | RESEARCH_WIRING_ORGANELLE_PAPER |
| 🔬 Pipeline IR + Wiring Organelle (full development log) | RESEARCH_PIPELINE_IR |
| 📐 Calibrated three-bound scaling claim (post-Phase-3) | wiring_scaling_post_phase3 |
| 🛡️ Standing leakage-audit protection | tools/scaling_leakage_audit.sh |
| 📑 Honest disclosure register (cancelled phases, restated headlines) | RESEARCH_DISCLOSURE |
| 🏛️ Clean-room rebuild-test corpus (BS / TDD / FS / BRD / FRD / NFRD / TRACEABILITY) | docs/engineering/CLEAN_ROOM_IMPLEMENTATION/ |
| 🌐 Manifold-learning composition (research sketch) | RESEARCH_MANIFOLD_LEARNING |
| 📚 Using as a library | FUNCTIONAL_SPEC |
| ⚡ Performance & benchmarks | PERFORMANCE |
| 🚀 SSD inference optimisations | RESEARCH_SSD |
| 🔀 Attention Residuals research | RESEARCH_ATTN_RES |
| 🔧 Build options (Metal, BLAS, INT8, SIMD) | BUILD_OPTIONS |
| 🛠️ Extending the Wiring Organelle | EXTENDING_WIRING_ORGANELLE |
| 🌳 Project as a Node-style runtime (analysis) | NODE_ANALYSIS |
| 🤝 Contributing | CONTRIBUTING.md |
| 📋 Data licensing | DATA_LICENSE.md |
| 🔒 Productisation artefacts (migrated to private companion repo) | MIGRATED_TO_ORGANELLES_BIO |
- C99 compiler (GCC, Clang, MSVC)
- CMake 3.10+
- No other dependencies
Optional: Git LFS for pretrained checkpoints (git lfs pull).
MicroGPT-C runs entirely on-device with no telemetry, no cloud calls, and no data collection. Small models trained on narrow corpora inherit the biases of that corpus — be aware of this when deploying. High confidence means the model has seen similar patterns, not that the output is correct. Always validate through deterministic checks (the Judge pattern) or human review for safety-critical applications.
See CONTRIBUTING.md for ethics guidelines.
This project was built transparently with human–AI collaboration — the same philosophy of coordinated intelligence that MicroGPT-C explores.
| Role | Member |
|---|---|
| 🧭 Principal Research Manager | Ajay Soni — research direction, validation, and decisions |
| 💻 Engineering & Documentation | Claude — coding, documentation, and junior research |
| 🔬 Senior Research Assistant | Grok — in-depth analysis and insights |
| 🎨 Senior Research Assistant | Gemini — creative synthesis and validation |
| 📚 Community Education | NotebookLM — accessible explanations and education materials |
MIT — see LICENSE.
