| title | One Compiler, 50 Years: From Z80 (1976) to NVIDIA GPU (2026) |
|---|---|
| author | MinZ Project |
| date | 2026-03-27 |
Same source code. Same frontend. Five backends. Z80 to GPU.
1976 2026
┌─────────┐ ┌──────────────┐
│ Zilog │ │ NVIDIA A100 │
│ Z80 │ │ AMD RX 580 │
│ 3.5 MHz │ │ Apple M2 │
│ 64 KB │ │ 16+ GB │
└────┬────┘ └──────┬───────┘
│ │
└───────────┐ ┌─────────────────┘
│ │
┌────┴───┴────┐
│ MinZ │
│ Compiler │
│ 8 frontends│
│ 5 backends │
└──────┬──────┘
│
┌───────┴───────┐
│ Nanz / Frill │
│ C23 / ABAP │
│ source code │
└───────────────┘
fun double(x: u8) -> u8 {
return x + x
}
This function compiles to:
double:
ADD A, A ; 1 byte, 4 T-states
RET ; 1 byte, 10 T-states
; Total: 2 bytes, 14 T-states
__global__ void kernel(uint8_t* in, uint8_t* out, int n) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
if (i < n) out[i] = in[i] + in[i];
}
// All 256 inputs computed in parallel: ~0.001ms__kernel void kernel(__global uchar* in, __global uchar* out, int n) {
int i = get_global_id(0);
if (i < n) out[i] = in[i] + in[i];
}layout(set=0, binding=0) buffer In { uint data_in[]; };
layout(set=0, binding=1) buffer Out { uint data_out[]; };
void main() {
uint i = gl_GlobalInvocationID.x;
data_out[i] = (data_in[i] + data_in[i]) & 0xFF;
}kernel void func(device uint8_t* in, device uint8_t* out,
uint i [[thread_position_in_grid]]) {
out[i] = in[i] + in[i];
}One function. Five targets. 50 years of hardware.
Source (.nanz / .frl / .c / .abap)
│
├─ Frontend (8 languages)
│ Nanz, Frill, C23, ABAP, Pascal, PL/M, Lanz, Lizp
│
▼
HIR (High-Level IR)
│
▼
MIR2 (Mid-Level IR — target-independent)
│
├────────────────┬────────────────┐
│ │ │
▼ ▼ ▼
Z80 Backend GPU Backend QBE Backend
(VIR + PBQP) (mir2gpu) (correctness oracle)
│ │
├──────┐ ┌────┼────────┬──────────┐
│ │ │ │ │ │
▼ ▼ ▼ ▼ ▼ ▼
Z80 CP/M CUDA OpenCL Vulkan Metal
asm .COM .cu .cl GLSL .metal
The key insight: MIR2 is target-independent. It has ~30 opcodes (add, sub, mul, cmp, branch, call, return). Any backend that can lower these opcodes to its target can compile any MinZ program.
The GPU backend (mir2gpu) is 700 LOC total. 95% is shared across all four GPU APIs. Only 5% is backend-specific: kernel qualifiers, thread ID syntax, parameter passing.
Compile the same function to Z80 and GPU. Run all 256 inputs on GPU in parallel (~0.001ms). Run all 256 inputs on Z80 emulator (~0.1s). Compare outputs.
If they match → mathematically proven correct for the entire u8 domain.
This is not testing. This is exhaustive proof.
GPU (parallel) Z80 (sequential)
Input: [0, 1, 2, ... 255] [0, 1, 2, ... 255]
Output: [0, 2, 4, ... 254] [0, 2, 4, ... 254]
↓ ↓
MATCH → PROVEN CORRECT ✓
For u16 functions: GPU tests all 65,536 inputs in parallel. For two-argument u8 functions: 256×256 = 65,536 combinations. Still instant on GPU.
The same MIR2 function is lowered by completely independent backends:
- Z80 backend: VIR (Z3 SMT solver) + PBQP heuristic
- CUDA backend: direct C translation
- OpenCL backend: direct C translation
- Vulkan backend: GLSL compute shader
If all four backends produce the same output for all inputs, the MIR2 semantics are correct. Each backend is a witness to the others.
Frill (ML-style) compiles through the same pipeline:
type Entity = Player | Enemy | Bullet | Coin | Wall
let is_solid (e : u8) : u8 =
match e with
| Player -> 0 | Enemy -> 1 | Bullet -> 0
| Coin -> 0 | Wall -> 1
end
This compiles to:
- Z80: 175 bytes, pattern match → conditional jumps
- CUDA: parallel evaluation of all entity types
ADTs, pattern matching, pipe operators — all on GPU. Not through a VM or interpreter. Native compiled code.
| Frontend | Z80 | CUDA | OpenCL | Vulkan | Metal |
|---|---|---|---|---|---|
| Nanz (Swift-like) | ✅ | ✅ | ✅ | ✅ | ✅* |
| Frill (ML) | ✅ | ✅ | ✅ | ✅ | ✅* |
| C23 | ✅ | ✅ | ✅ | ✅ | ✅* |
| ABAP | ✅ | ✅ | ✅ | ✅ | ✅* |
| Pascal | ✅ | ✅ | ✅ | ✅ | ✅* |
| PL/M | ✅ | ✅ | ✅ | ✅ | ✅* |
| Lanz | ✅ | ✅ | ✅ | ✅ | ✅* |
| Lizp | ✅ | ✅ | ✅ | ✅ | ✅* |
All 4 GPU backends verified 256/256 on real hardware (NVIDIA, AMD RX 580, Apple M2).
Any function written in any of the 8 languages can run on any of the 5 backends. The MIR2 intermediate representation is the universal bridge.
| Metric | Value |
|---|---|
| Frontend languages | 8 |
| Backend targets | 5 (Z80 + CUDA + OpenCL + Vulkan + Metal) |
| Year span | 1976–2026 (50 years) |
| GPU backend LOC | 700 (95% shared across 4 APIs) |
| CUDA verification | 256/256 on NVIDIA ✅ |
| OpenCL verification | 256/256 on NVIDIA ✅ |
| Vulkan verification | 256/256 on AMD RX 580 ✅ |
| Metal verification | 256/256 on Apple M2 ✅ |
| Z80 corpus asserts | 1046 |
| Z80 VIR codegen | -71% vs SDCC |
| GPU precomputed tables | 83.6M register allocations, 501 arithmetic sequences |
MinZ isn't a transpiler. It's a real compiler with:
- Z3 SMT solver for provably optimal register allocation
- PBQP heuristic fallback for complex functions
- ISLE term rewriting for instruction selection
- Grace graph rewriting for CFG optimization
- GPU precomputed tables (83.6M entries) for O(1) register allocation
- RLCA sled (9-byte multi-entry barrel shifter) for Z80 rotation
- TSMC (True Self-Modifying Code) for runtime optimization
- #embed (C23) for compile-time binary data inclusion
- BCD packed decimal types for COBOL/financial arithmetic
The Z80 backend alone is ~11K LOC. The entire compiler is ~90K LOC in Go.
- BCD arithmetic on GPU: verify Z80 DAA sequences against GPU reference
- FP16 soft-float: GPU-precomputed mantissa tables
- COBOL frontend: PIC 9 types → BCD → DAA on Z80 / parallel on GPU
- BASIC frontend: The most iconic retro language, now compilable to GPU
- WebGPU backend: Run the same code in the browser
# Z80
git clone https://github.com/oisee/minz
cd minz/minzc && go build -o mz ./cmd/minzc
./mz examples/frill/state_machine.frl -o out.a80 # 175 bytes
# GPU (requires feat/mir2gpu branch)
./mz examples/nanz/01_hello.nanz --target=cuda -o out.cu
nvcc out.cu -o gpu_test && ./gpu_test # 256/256 correct50 years. One compiler. Five backends. Zero compromises.
MinZ v0.23.0 — Birthday Marathon Release. "The compiler never fails. It only varies in how optimal the result is."