Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
956 commits
Select commit Hold shift + click to select a range
604cdbf
am: large allocs aligned to 2mb to use 2mb pages (#15609)
nimlgen Apr 5, 2026
5e134aa
hcq: add write/poll_bit commands (#15610)
nimlgen Apr 5, 2026
e0988db
hcq: support non for signal_t and compute_t (#15611)
nimlgen Apr 5, 2026
e3986a6
mlx: init runtime (#15612)
nimlgen Apr 5, 2026
6a334ce
hotfix: fix bert (#15613)
nimlgen Apr 5, 2026
e39cfe6
validate lr, momentum, weight_decay in optimizers (#15576)
Andrew-most-likely Apr 5, 2026
ff0c941
remove redundant iteration and toposort in _deepwalk (#15532)
13Perrius Apr 5, 2026
86c4431
add gpu_family detection to Metal, target MSL 4.0 on macOS 26+ (#15079)
valtterivalo Apr 5, 2026
e270047
mlx: cleaner (#15617)
nimlgen Apr 6, 2026
01b49c8
support int operand for shifts (#15618)
chenyuxyz Apr 6, 2026
a444be1
lower fuzz_symbolic_symbolic_div timeout (#15619)
chenyuxyz Apr 6, 2026
6e30a5f
update shifts in torch backend (#15622)
chenyuxyz Apr 6, 2026
1483f7e
support shift by Tensor (#15623)
chenyuxyz Apr 6, 2026
66ec188
more activations to mixin (#15624)
chenyuxyz Apr 6, 2026
2f7d085
shared _normalize_indices for getitem (#15625)
chenyuxyz Apr 6, 2026
8ba5830
viz: reenable tests (#15626)
Qazalin Apr 6, 2026
19e9649
interface in DEV (#15620)
sirhcm Apr 6, 2026
810d7c0
llama: unify scripts (#15628)
wozeparrot Apr 7, 2026
2b01ca5
USB driver for custom ASM firmware (#15597)
geohot Apr 7, 2026
d3de63d
improvements to apps.llm (#15631)
geohot Apr 7, 2026
d29f0ef
viz: speed up profiler first render (#15632)
Qazalin Apr 7, 2026
b78b384
mlx: graph (#15621)
nimlgen Apr 7, 2026
890286e
update llama profile.sh (#15633)
Qazalin Apr 7, 2026
9c6e925
move lerp to mixin (#15634)
chenyuxyz Apr 7, 2026
a508b8f
viz: delete redundant things (#15637)
Qazalin Apr 7, 2026
bf37635
llm: buffer SSE chunks to fix parse errors from split reads (#15641)
b1tg Apr 8, 2026
f930579
llm: change the default port to 8000 so you can remember it (match vLLM)
geohot Apr 8, 2026
bcf6931
fix: comma 4 does not have pcie (#15642)
sirhcm Apr 8, 2026
70dbd35
llama: move custom_kernel into flat_llama (#15643)
wozeparrot Apr 8, 2026
dc6a51e
viz: add # of bytes to sdma (#15639)
Qazalin Apr 8, 2026
39a029e
remove ASM_GEMM context var (#15645)
Qazalin Apr 8, 2026
35e3983
Add Q5_0, Q5_1, and bfloat16 GGUF types (#15644)
geohot Apr 8, 2026
3ac16b3
viz: add wmma row, update exec duration logic (#15646)
Qazalin Apr 8, 2026
b1e52ba
the slowest line in hcq graph (#15635)
nimlgen Apr 8, 2026
1ebeb52
RDNA4 asm gemm (#15427)
geohot Apr 8, 2026
dae9dea
clean up tensor random functions (#15648)
chenyuxyz Apr 8, 2026
839d37b
update median_step_time in model_train.py (#15649)
chenyuxyz Apr 8, 2026
71c83cc
viz: put OTHER_ on the wave row (#15650)
Qazalin Apr 8, 2026
1b44cb2
split update stat from execitem (#15654)
nimlgen Apr 8, 2026
28b14b0
mlx: remove to_be, use helpers (#15655)
nimlgen Apr 8, 2026
cb681da
move UOp.pad to mixin (#15657)
chenyuxyz Apr 8, 2026
4cf2759
fix merge_reduce_ends (#15659)
chenyuxyz Apr 8, 2026
742b389
viz/cli: add pmc printer (#15651)
Qazalin Apr 8, 2026
d08c76d
c.Struct cleanup (#15640)
sirhcm Apr 9, 2026
6837881
remove same_shape_noop [pr] (#15662)
chenyuxyz Apr 9, 2026
48a7627
add RDNA4 support to copy WMMA (#15663)
geohot Apr 9, 2026
0ff30b0
am: reset queues from spi (#15664)
nimlgen Apr 9, 2026
057dc17
beam uop (#15660)
nimlgen Apr 9, 2026
fa02105
hotfix: pin amd isa xml version
geohot Apr 9, 2026
dbc23e8
move HCQ_VISIBLE_DEVICES into DEV (#15668)
sirhcm Apr 10, 2026
ed2a72b
work on abstractions4 (#15671)
geohot Apr 10, 2026
16f3448
Add HIP to abstractions4 (#15672)
geohot Apr 10, 2026
55bcd7c
llama amax outside (#15670)
wozeparrot Apr 10, 2026
9ab1415
llm: fix streaming UTF-8 decode (#15653)
b1tg Apr 10, 2026
9092f2a
llm: add shared_expert and rope_dim support from qwen35 (#15673)
geohot Apr 10, 2026
8e7fcc8
remove _include_initial in _cumalu (#15674)
chenyuxyz Apr 10, 2026
e1334d3
move canonicalize_device to device.py (#15675)
chenyuxyz Apr 10, 2026
0d5cdc9
viz: split draw loop (#15676)
Qazalin Apr 10, 2026
58646f9
usb fast copyout (#15677)
nimlgen Apr 10, 2026
aa012d6
usb: faster custom (#15678)
nimlgen Apr 10, 2026
590464c
llama: only support wqkv path + cleanups (#15680)
wozeparrot Apr 10, 2026
b5a9465
llm: add support for moonlight (deepseek MLA) (#15466)
geohot Apr 11, 2026
29238b7
AMD USB: support for 0xF3 power toggle
geohot Apr 11, 2026
457508d
llama: save more 2 (#15681)
wozeparrot Apr 11, 2026
5156a04
add support for AM_POWER_LIMIT (#15684)
geohot Apr 11, 2026
4ca844e
add Q1_0 gguf type (#15683)
graham-ro Apr 11, 2026
054d78e
fix llama profile.sh NULL source (#15685)
Qazalin Apr 11, 2026
938cba4
amd: a bit faster usb, skip interrupts on sync (#15686)
nimlgen Apr 11, 2026
e706f40
suppress test warnings from numpy (#15688)
chenyuxyz Apr 12, 2026
e9b2e15
add jitbeam to tinygpu docs (#15691)
nimlgen Apr 12, 2026
0254cfe
move usum and uprod to mixin (#15690)
chenyuxyz Apr 12, 2026
ff1de5a
normalize logsumexp contiguous_backward to mixin (#15692)
chenyuxyz Apr 12, 2026
77385cc
more trivial stuff to mixin (#15693)
chenyuxyz Apr 12, 2026
f7ff480
start mixin getitem tests (#15695)
chenyuxyz Apr 12, 2026
2ada38f
viz: execv after all producers complete (#15696)
Qazalin Apr 12, 2026
2b5ba00
qwen3.5 (#15210)
b1tg Apr 13, 2026
6f5d756
Tests for GatedDeltaNetBlock + fix multi after assign issue (#15700)
geohot Apr 13, 2026
0cec42d
Revert "Tests for GatedDeltaNetBlock + fix multi after assign issue (…
geohot Apr 13, 2026
4c1fb18
Revert "Revert "Tests for GatedDeltaNetBlock + fix multi after assign…
geohot Apr 13, 2026
ac02705
viz: no global state (#15705)
Qazalin Apr 13, 2026
16f50a4
remove REMU from tree (#15706)
geohot Apr 13, 2026
84d64b5
hotfix: abstractions4 works in mock except asm
geohot Apr 13, 2026
7610bdc
block multistore, it's not supported (#15708)
geohot Apr 13, 2026
931d6cc
basic getitem to mixin (#15697)
chenyuxyz Apr 13, 2026
b370f5c
hcq: call free for unmap (#15710)
nimlgen Apr 13, 2026
eac481b
mlx: fix ctypes (#15711)
nimlgen Apr 13, 2026
ac41f15
cumsum to mixin (#15712)
chenyuxyz Apr 13, 2026
d83707e
autogen: explicit types (#15679)
sirhcm Apr 13, 2026
905b8ad
viz: cli and server cleanups (#15713)
Qazalin Apr 13, 2026
355e272
viz: keep program UOp in data (#15714)
Qazalin Apr 13, 2026
70883a6
cat the stack to mixin (#15715)
chenyuxyz Apr 13, 2026
5683126
llm: support for tekken tokenizer (#15720)
geohot Apr 14, 2026
2b8d303
allreduce in precast dtype (#15689)
wozeparrot Apr 14, 2026
359b158
amd: EMU DPP support (#15719)
geohot Apr 14, 2026
528faa1
update env_vars.md (#15722)
chenyuxyz Apr 14, 2026
2450c8c
rename to callify + fix mypy (#15727)
geohot Apr 14, 2026
e9ecc99
amd: add r9700 devid (#15721)
nimlgen Apr 14, 2026
3394d18
size*itemsize -> nbytes (#15729)
chenyuxyz Apr 14, 2026
adc96cd
qcom: synchronize for copyin (#15731)
sirhcm Apr 14, 2026
480ad26
llama: per device amax (#15735)
wozeparrot Apr 15, 2026
3721c60
llama: bs 16 (#15737)
wozeparrot Apr 15, 2026
1ae6528
move schedule into schedule (#15736)
geohot Apr 15, 2026
1c36878
DEV: suggest alternatives (#15732)
sirhcm Apr 15, 2026
7cbfa18
comment out unused arm, triton in toml (#15741)
chenyuxyz Apr 15, 2026
1f26584
viz/cli: cleanups from linter (#15745)
Qazalin Apr 15, 2026
1644956
test_graph to use uops (#15746)
nimlgen Apr 15, 2026
507c02c
fix symbolic contiguous_view_offset (#15749)
chenyuxyz Apr 15, 2026
be8005c
DEV: secondary targets (#15748)
sirhcm Apr 15, 2026
41421c3
BUFFER size is their arg (#15750)
chenyuxyz Apr 15, 2026
96092d1
fix process_replay Ops.BEAM [pr] (#15752)
Qazalin Apr 15, 2026
10c262c
update tests that use UOp.size (#15753)
chenyuxyz Apr 16, 2026
8bd4fea
UOp.size -> prod(max_shape) (#15755)
chenyuxyz Apr 16, 2026
983a7bb
exclude __del__ from TRACEMETA wrapping (#15747)
humblemuzzu Apr 16, 2026
d090732
usbgpu: reset endpoint for custom fw (#15754)
wozeparrot Apr 16, 2026
218d6b8
delete old UOp.size [pr] (#15756)
chenyuxyz Apr 16, 2026
d24466c
CALL with return value is FUNCTION (#15758)
geohot Apr 16, 2026
d1cce7a
put the ranges on store instead of after (#15759)
geohot Apr 16, 2026
c04f3ea
jit: capturedjit is linear (#15743)
nimlgen Apr 16, 2026
f57380c
simplify GatedDeltaNetBlock using two state tensors (#15704)
geohot Apr 16, 2026
126cda4
viz/cli: cleanups, add memory printer (#15762)
Qazalin Apr 16, 2026
d147e2a
update test_nested_after_contiguous_store (#15763)
chenyuxyz Apr 16, 2026
4e88d87
llm: glm 4.7 flash (#15738)
b1tg Apr 16, 2026
f0c12a2
another form of assign to itself (#15770)
chenyuxyz Apr 16, 2026
12c653a
remove opts arg in get_program, everything uses opts_to_apply [pr] (#…
Qazalin Apr 16, 2026
6d9320f
add NO_COLOR (#15765)
Qazalin Apr 16, 2026
9f4b7be
add pickled jit regression test (#15774)
sirhcm Apr 16, 2026
2d196fb
move Tensor.size to mixin (#15775)
chenyuxyz Apr 16, 2026
0e69388
viz/cli: add DEBUG, optional number of rows (#15777)
Qazalin Apr 17, 2026
ec00cef
llm is the only app (#15779)
geohot Apr 17, 2026
1fac03c
softmax and friends to mixin (#15778)
chenyuxyz Apr 17, 2026
a9b6cfe
refactor llm into files (#15780)
geohot Apr 17, 2026
9e60e4a
llama: native fp8 (#15733)
wozeparrot Apr 17, 2026
e1d13bc
add GGUF IQ4_XS support (#15766)
geohot Apr 17, 2026
7bdb3ad
viz/cli: simplification and reordering (#15785)
Qazalin Apr 17, 2026
9f2a578
unskip TestCall.test_call_gemm_uop [pr] (#15786)
Qazalin Apr 17, 2026
afc3904
viz/cli: unit tests in CI (#15788)
Qazalin Apr 17, 2026
601d137
viz: rename to rewrites_data, only use ContextVar (#15790)
Qazalin Apr 17, 2026
a227dbe
viz/cli: reconstruct DEBUG output (#15791)
Qazalin Apr 17, 2026
482c8c1
Fix no module named error (#15792)
googlefan256 Apr 17, 2026
8fcaaed
fix root cause of TestVizIntegration.test_link_sched_codegen flakines…
Qazalin Apr 17, 2026
23ca680
run_linear (#15784)
nimlgen Apr 17, 2026
0191cc7
update arange range check (#15794)
chenyuxyz Apr 17, 2026
2581985
viz/cli: multi device profiler output, print markers (#15795)
Qazalin Apr 17, 2026
8da3085
update test_assign_changes_alt with clone (#15802)
chenyuxyz Apr 18, 2026
6adf4c3
MOCKGPU interfaces (#15796)
sirhcm Apr 18, 2026
0634309
llama: combined w13 (#15803)
wozeparrot Apr 18, 2026
022d8c4
remove jit_cache usage in extra/examples (#15808)
nimlgen Apr 18, 2026
5bdfd48
update test_assign (#15809)
chenyuxyz Apr 19, 2026
f28ea84
llama: fused silu fp8 amax (#15798)
wozeparrot Apr 19, 2026
cace07c
clean up untag_and_append [pr] (#15812)
chenyuxyz Apr 19, 2026
50a7b82
merge untag_and_append and append_after [pr] (#15815)
chenyuxyz Apr 19, 2026
c6d8753
viz/cli: --json support, refine docs (#15528)
Qazalin Apr 19, 2026
2a5a623
UOp.empty and UOp.empty_like (#15816)
chenyuxyz Apr 19, 2026
8b87b35
more UOp empty cleanups [pr] (#15818)
chenyuxyz Apr 19, 2026
b05b101
viz/cli: ux cleanups, show user python (#15817)
Qazalin Apr 20, 2026
f551a4b
add threefry const folding (#15787)
oxrinz Apr 20, 2026
a1696e8
objc: fix _classmethods_ dispatch flag (#14854)
KartikVashishta Apr 20, 2026
538841d
remove_tags and _remove_all_tags are the same [pr] (#15819)
chenyuxyz Apr 20, 2026
67ed4c4
move gguf stuff from nn/state.py to llm/gguf.py (#15783)
geohot Apr 20, 2026
5819c0a
fix gc in gguf (#15820)
geohot Apr 20, 2026
c0d7135
do not use jit_cache in test (#15823)
nimlgen Apr 20, 2026
80c7327
resolve Metal ARC FIXME with explanation comment (#13688)
ayanhan Apr 20, 2026
601b9d3
viz/cli: dedup DEBUG=3 pyrender (#15826)
Qazalin Apr 20, 2026
72ecc61
use more UOp method [pr] (#15821)
chenyuxyz Apr 20, 2026
04e8dbd
remove getitem check in get_shape (#15830)
chenyuxyz Apr 20, 2026
3a55701
delete UOp.get_consumer_map [pr] (#15832)
chenyuxyz Apr 20, 2026
b017044
einsum to ReduceMixin (#15833)
chenyuxyz Apr 20, 2026
8eeb77a
flat_to_grouped and resolve_pool_pads to helpers (#15834)
chenyuxyz Apr 20, 2026
667b30b
tensor pad arg cleanups (#15836)
chenyuxyz Apr 20, 2026
e00cc8a
split Tensor._conv2d_winograd (#15837)
chenyuxyz Apr 20, 2026
b8d3bf8
run_linear in jit (#15827)
nimlgen Apr 20, 2026
cabc347
conv2d and conv_transpose2d to mixin (#15838)
chenyuxyz Apr 20, 2026
1a8ba4c
CPU renderers use arch (#15839)
sirhcm Apr 21, 2026
f9655af
viz/cli: move to tinygrad (#15835)
Qazalin Apr 21, 2026
01ac1c8
remove all run_schedule from tests (#15846)
nimlgen Apr 21, 2026
ae9b84d
rm beam uop (#15844)
nimlgen Apr 21, 2026
d08b5d0
full to mixin (#15840)
chenyuxyz Apr 21, 2026
bfe28ee
rm run_schedule (#15847)
nimlgen Apr 21, 2026
9192c93
Tensor.invalid -> Tesnor.invalids (#15849)
chenyuxyz Apr 21, 2026
420e4c4
zeros, ones, invalids to mixin (#15850)
chenyuxyz Apr 21, 2026
86ceb3b
arange to mixin (#15852)
chenyuxyz Apr 21, 2026
0fbe0a6
viz/cli: ux tweaks (#15853)
Qazalin Apr 21, 2026
1946ae8
linspace and eye to mixin (#15854)
chenyuxyz Apr 21, 2026
99a0deb
Device.count() (#15842)
sirhcm Apr 21, 2026
e36ff22
fix dev syntax in emulated amd tests, skip test_tk (#15856)
Qazalin Apr 21, 2026
75ee51a
triu tril _tri to mixin (#15857)
chenyuxyz Apr 21, 2026
697e7aa
MOCK+AMD and MOCK+NV interfaces (#15858)
sirhcm Apr 21, 2026
f911a63
don't allow negative num_classes in one_hot (#15859)
chenyuxyz Apr 21, 2026
3821e44
_one_hot_along_dim and one_hot to mixin (#15861)
chenyuxyz Apr 22, 2026
0560fa7
add shape to range/special (#15862)
geohot Apr 22, 2026
8737833
llama: fused mul quantize fp8 (#15863)
wozeparrot Apr 22, 2026
d4c344b
hotfix: keep VCONST exclude in viz
geohot Apr 22, 2026
de8f588
move elf assembler to renderer (#15855)
Qazalin Apr 22, 2026
2d7fa58
fix shapes to match vecless (#15866)
geohot Apr 22, 2026
719a7bd
viz: respect optional estimates in kernel info (#15867)
Qazalin Apr 22, 2026
af93a67
llm: glm 4.5 air (#15771)
b1tg Apr 22, 2026
09ff3e1
hotfix: add bytes back to llm
geohot Apr 22, 2026
3c8daa9
update test_where_removal (#15872)
chenyuxyz Apr 22, 2026
e9ebd03
update reduce_to_acc index dtype [pr] (#15873)
chenyuxyz Apr 22, 2026
2041945
cuda graph to linear (#15870)
nimlgen Apr 22, 2026
b9e2bc6
simplify bool.cast() != const (#15874)
chenyuxyz Apr 22, 2026
e5891ac
jit: precompile (#15848)
nimlgen Apr 22, 2026
b0dc95a
AMX in arch, better docs (#15871)
sirhcm Apr 22, 2026
684e95e
UOp binary op broadcasts dtype (#15875)
chenyuxyz Apr 23, 2026
1fc4b37
cummax/cummin to mixin (#15877)
chenyuxyz Apr 23, 2026
7c9bc29
Tensor method raise if arg is on different device (#15879)
chenyuxyz Apr 23, 2026
0c3260d
rename VECTORIZE to STACK (#15880)
geohot Apr 23, 2026
d3cbd78
llama: use fused norm mul quantize for w13 (#15878)
wozeparrot Apr 23, 2026
e469618
cleaner cuda graph (#15886)
nimlgen Apr 23, 2026
5cf4ad2
fix resolve param (#15889)
nimlgen Apr 23, 2026
87223f8
logcumsumexp, argmax, argmin, sequential to mixin (#15890)
chenyuxyz Apr 23, 2026
f0dbc68
gather to mixin (#15891)
chenyuxyz Apr 23, 2026
11c1979
interpolate and cross_entropy to mixin (#15895)
chenyuxyz Apr 23, 2026
ee76449
viz/cli: -t default number (#15894)
Qazalin Apr 23, 2026
7745e05
sqtt: update wave end packet names (#15896)
Qazalin Apr 23, 2026
782bc6a
broadcast in ElementwiseMixin.div [pr] (#15897)
chenyuxyz Apr 23, 2026
3072862
metal to linear (#15884)
nimlgen Apr 23, 2026
8cc2c69
fix isclose mixin (#15898)
chenyuxyz Apr 24, 2026
08d9106
scatter_reduce and sparse_categorical_crossentropy to mixin (#15902)
chenyuxyz Apr 24, 2026
c24da99
avg_pool2d, max_pool2d to mixin (#15903)
chenyuxyz Apr 24, 2026
f379b5a
sqtt: match amd's TS_DELTA_SHORT offset (#15901)
Qazalin Apr 24, 2026
aab50d1
llm: dedup MLA cache_v (#15887)
b1tg Apr 24, 2026
9d134a2
llama: fix fakedata timing (#15905)
wozeparrot Apr 24, 2026
cbf4946
usb: multiple gpus and better error messages (#15900)
sirhcm Apr 24, 2026
c0f77c2
hcq graph to linear (#15888)
nimlgen Apr 24, 2026
5eb6413
viz/cli: select kernel events in -s DEV (#15909)
Qazalin Apr 24, 2026
48d7ab2
no uv.lock (#15893)
eitanturok Apr 24, 2026
7a1adfd
update Tensor.allclose to return Tensor (#15904)
chenyuxyz Apr 24, 2026
4010aa4
jit: no jit_cache in graphrunner (#15907)
nimlgen Apr 24, 2026
03a7604
sort argsort topk allclose to mixin (#15910)
chenyuxyz Apr 24, 2026
56a9f1e
remove last jit_cahce (#15911)
nimlgen Apr 24, 2026
f275195
remove linear_to_schedule from tests (#15912)
nimlgen Apr 24, 2026
2f9fdb4
scatter to mixin (#15917)
chenyuxyz Apr 24, 2026
b501ba3
nll_loss to mixin (#15918)
chenyuxyz Apr 24, 2026
d337801
schedule() -> schedule_linear() in tests (batch 1) (#15915)
nimlgen Apr 24, 2026
4b908b6
llama: fused ce loss (#15920)
wozeparrot Apr 25, 2026
57fbaa3
amd: fallback to llvm when comgr is not available (#15914)
sirhcm Apr 25, 2026
8b2826e
nv: fix shader local memory for NAK (#15921)
sirhcm Apr 25, 2026
1fdcb13
webgpu: fix weight lookup in export_model after compile_net key chang…
germiBest Apr 25, 2026
3c8a2db
remove schedule() from tests batch 2 (#15923)
nimlgen Apr 25, 2026
d2ab6ea
remove schedule batch 3 (#15924)
nimlgen Apr 25, 2026
a5e9ea7
remove schedule batch 4 (#15927)
nimlgen Apr 25, 2026
768106a
remove schedule from extra/docs/examples (#15929)
nimlgen Apr 25, 2026
9a23de7
viz/cli: unify profile and rewrites, -s ALL default (#15931)
Qazalin Apr 25, 2026
e0ff6cc
remove old schedule (#15930)
nimlgen Apr 25, 2026
e27444a
remove unused UOp.shard_size [pr] (#15933)
chenyuxyz Apr 25, 2026
bb65235
remove execitem (#15932)
nimlgen Apr 25, 2026
ac3494a
remove some runners (#15934)
nimlgen Apr 25, 2026
e9983e3
remove unused QCOMTextureInfo, QueueType [pr] (#15935)
chenyuxyz Apr 25, 2026
62ff96b
Merge remote-tracking branch 'tinygrad/master' into HEAD
Discountchubbs Apr 26, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
38 changes: 23 additions & 15 deletions .github/actions/setup-tinygrad/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,12 +45,16 @@ inputs:
description: "Install mesa"
required: false
default: 'false'
tinydreno:
description: "Install tinydreno"
required: false
default: 'false'
runs:
using: "composite"
steps:
- name: Set up Python ${{ inputs.python-version }}
id: setup-python
uses: actions/setup-python@v5
uses: actions/setup-python@v6
with:
python-version: ${{ inputs.python-version }}

Expand All @@ -62,14 +66,14 @@ runs:
uses: actions/cache/restore@v4
with:
path: ${{ github.workspace }}/.venv
key: venv-${{ runner.os }}-python-${{ steps.setup-python.outputs.python-version }}-${{ inputs.deps }}-${{ inputs.pydeps }}-${{ env.CACHE_VERSION }}
key: venv-${{ runner.os }}-${{ runner.arch }}-python-${{ steps.setup-python.outputs.python-version }}-${{ inputs.deps }}-${{ inputs.pydeps }}-${{ env.CACHE_VERSION }}
- name: Cache Python packages
if: github.event_name != 'pull_request'
id: restore-venv
uses: actions/cache@v4
uses: actions/cache@v5
with:
path: ${{ github.workspace }}/.venv
key: venv-${{ runner.os }}-python-${{ steps.setup-python.outputs.python-version }}-${{ inputs.deps }}-${{ inputs.pydeps }}-${{ env.CACHE_VERSION }}
key: venv-${{ runner.os }}-${{ runner.arch }}-python-${{ steps.setup-python.outputs.python-version }}-${{ inputs.deps }}-${{ inputs.pydeps }}-${{ env.CACHE_VERSION }}

# **** Caching downloads ****

Expand All @@ -81,7 +85,7 @@ runs:
key: downloads-${{ github.job }}-${{ inputs.key }}-${{ env.CACHE_VERSION }}
- name: Cache downloads
if: inputs.key != '' && github.event_name != 'pull_request'
uses: actions/cache@v4
uses: actions/cache@v5
with:
path: ${{ runner.os == 'Linux' && '~/.cache/tinygrad/downloads/' || '~/Library/Caches/tinygrad/downloads/' }}
key: downloads-${{ github.job }}-${{ inputs.key }}-${{ env.CACHE_VERSION }}
Expand Down Expand Up @@ -145,7 +149,7 @@ runs:
run: |
wget https://repo.radeon.com/rocm/rocm.gpg.key -O - | gpg --dearmor | sudo tee /etc/apt/keyrings/rocm.gpg > /dev/null
sudo tee /etc/apt/sources.list.d/rocm.list <<EOF
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/6.2 $(lsb_release -cs) main
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/7.1 $(lsb_release -cs) main
EOF
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600

Expand Down Expand Up @@ -195,13 +199,13 @@ runs:
uses: actions/cache/restore@v4
with:
path: /var/cache/apt/archives/
key: ${{ runner.os }}-apt-${{ steps.apt-pkgs.outputs.hash }}-${{ env.CACHE_VERSION }}
key: ${{ runner.os }}-${{ runner.arch }}-apt-${{ steps.apt-pkgs.outputs.hash }}-${{ env.CACHE_VERSION }}
- name: Cache apt
if: runner.os == 'Linux' && (inputs.opencl == 'true' || inputs.amd == 'true' || inputs.cuda == 'true' || inputs.webgpu == 'true' || inputs.llvm == 'true') && github.event_name != 'pull_request'
uses: actions/cache@v4
uses: actions/cache@v5
with:
path: /var/cache/apt/archives/
key: ${{ runner.os }}-apt-${{ steps.apt-pkgs.outputs.hash }}-${{ env.CACHE_VERSION }}
key: ${{ runner.os }}-${{ runner.arch }}-apt-${{ steps.apt-pkgs.outputs.hash }}-${{ env.CACHE_VERSION }}

- name: Run apt Update + Install
if: runner.os == 'Linux' && (inputs.opencl == 'true' || inputs.amd == 'true' || inputs.cuda == 'true' || inputs.webgpu == 'true' || inputs.llvm == 'true')
Expand All @@ -221,22 +225,19 @@ runs:
if: inputs.amd == 'true' && runner.os == 'Linux'
shell: bash
run: |
cargo build --release --manifest-path ./extra/remu/Cargo.toml
sudo ln -sf ${{ github.workspace }}/extra/remu/target/release/libremu.so /usr/local/lib/libremu.so
sudo tee --append /etc/ld.so.conf.d/rocm.conf <<'EOF'
/opt/rocm/lib
/opt/rocm/lib64
EOF
sudo ldconfig
- name: Setup AMD comgr+remu (macOS)
- name: Setup AMD comgr (macOS)
if: inputs.amd == 'true' && runner.os == 'macOS'
shell: bash
run: |
sudo mkdir -p /usr/local/lib
curl -s -H "Authorization: token $GH_TOKEN" curl -s https://api.github.com/repos/nimlgen/amdcomgr_dylib/releases/latest | \
curl -s -H "Authorization: token $GH_TOKEN" curl -s https://api.github.com/repos/tinygrad/amdcomgr_dylib/releases/latest | \
jq -r '.assets[] | select(.name == "libamd_comgr.dylib").browser_download_url' | \
sudo xargs curl -fL -o /usr/local/lib/libamd_comgr.dylib
cargo build --release --manifest-path ./extra/remu/Cargo.toml

# **** gpuocelot ****

Expand Down Expand Up @@ -265,7 +266,7 @@ runs:
- name: Cache gpuocelot
if: inputs.ocelot == 'true' && github.event_name != 'pull_request'
id: cache-build
uses: actions/cache@v4
uses: actions/cache@v5
env:
cache-name: cache-gpuocelot-build-1
with:
Expand All @@ -283,6 +284,7 @@ runs:

CMAKE_ARGS="-Wno-dev -G Ninja -DOCELOT_BUILD_TOOLS=OFF -DCMAKE_BUILD_ALWAYS=0 -DBUILD_TESTS_CUDA=OFF -DCMAKE_POLICY_VERSION_MINIMUM=3.5"
if [[ "${{ runner.os }}" == "macOS" ]]; then
sudo xcode-select -s /Applications/Xcode_16.2.app/Contents/Developer
CMAKE_ARGS="$CMAKE_ARGS -DBoost_INCLUDE_DIR=$(brew --prefix boost)/include -DBoost_LIBRARY_DIR=$(brew --prefix boost)/lib"
fi

Expand Down Expand Up @@ -326,3 +328,9 @@ runs:
if: inputs.mesa == 'true' && runner.os == 'macOS'
shell: bash
run: brew install sirhcm/tinymesa/tinymesa_cpu

# *** tinydreno ***
- name: Install tinydreno (linux)
if: inputs.tinydreno == 'true' && runner.os == 'Linux'
shell: bash
run: sudo curl -fL https://github.com/sirhcm/tinydreno/raw/refs/heads/master/libllvm-qcom.so -o /usr/lib/libllvm-qcom.so
50 changes: 30 additions & 20 deletions .github/workflows/autogen.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,10 +28,11 @@ jobs:
timeout-minutes: 15
steps:
- name: Checkout Code
uses: actions/checkout@v4
uses: actions/checkout@v6
- name: Setup Environment
uses: ./.github/actions/setup-tinygrad
with:
key: 'autogen'
opencl: 'true'
amd: 'true'
cuda: 'true'
Expand All @@ -43,29 +44,33 @@ jobs:
run: sudo apt-get install -y --no-install-recommends libclang-20-dev llvm-20-dev hip-dev libusb-1.0-0-dev libdrm-dev
- name: Regenerate autogen files
run: |
find tinygrad/runtime/autogen -type f -name "*.py" -not -name "__init__.py" -not -name "comgr_3.py" -not -name "metal.py" -not -name "iokit.py" -not -name "corefoundation.py" -not -name "libclang.py" -delete
find tinygrad/runtime/autogen -type f -name "*.py" -not -path "*/amd/*" -not -name "__init__.py" -not -name "comgr.py" -not -name "metal.py" -not -name "iokit.py" -not -name "corefoundation.py" -not -name "libclang.py" -delete
python3 -c "from tinygrad.runtime.autogen import opencl"
python3 -c "from tinygrad.runtime.autogen import cuda, nvrtc, nvjitlink, nv_570, nv_580, nv"
python3 -c "from tinygrad.runtime.autogen import comgr, hsa, hip, amd_gpu, sqtt, rocprof, amdgpu_kd, amdgpu_drm"
python3 -c "from tinygrad.runtime.autogen.am import am, pm4_soc15, pm4_nv, sdma_4_0_0, sdma_5_0_0, sdma_6_0_0, smu_v13_0_0, smu_v13_0_6, smu_v14_0_2"
python3 -c "from tinygrad.runtime.autogen import comgr_3, hsa, hip, amd_gpu, sqtt, rocprof, amdgpu_kd, amdgpu_drm"
python3 -c "from tinygrad.runtime.autogen.am import am, pm4_soc15, pm4_nv, sdma_4_0_0, sdma_5_0_0, sdma_6_0_0, smu_v13_0_0, smu_v13_0_6, smu_v13_0_12, smu_v14_0_2"
python3 -c "from tinygrad.runtime.autogen import libc, kfd, io_uring, ib, pci, vfio"
python3 -c "from tinygrad.runtime.autogen import llvm"
python3 -c "from tinygrad.runtime.autogen import webgpu"
python3 -c "from tinygrad.runtime.autogen import kgsl, qcom_dsp"
python3 -c "from tinygrad.runtime.autogen import libusb"
python3 -c "from tinygrad.runtime.autogen import mesa"
python3 -c "from tinygrad.runtime.autogen import avcodec"
python3 -c "from tinygrad.runtime.autogen import llvm_qcom"
python3 -c "from tinygrad.runtime.autogen import mlx5"
python3 -c "from tinygrad.runtime.autogen import ggml_common"
REGEN=1 python3 -c "from tinygrad.runtime.autogen import libclang"
- name: Check for differences
run: |
if ! git diff --quiet; then
git diff
git diff > autogen-ubuntu.patch
echo "Autogen files out of date. Apply patch from: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}#artifacts"
echo "Autogen mismatch detected. Patch available at: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}#artifacts"
exit 1
fi
- name: Upload patch artifact
if: failure()
uses: actions/upload-artifact@v4
uses: actions/upload-artifact@v7
with:
name: autogen-ubuntu-patch
path: autogen-ubuntu.patch
Expand All @@ -76,10 +81,11 @@ jobs:
timeout-minutes: 15
steps:
- name: Checkout Code
uses: actions/checkout@v4
uses: actions/checkout@v6
- name: Setup Environment
uses: ./.github/actions/setup-tinygrad
with:
key: 'autogen-mac'
llvm: 'true'
- name: Regenerate autogen files
run: |
Expand All @@ -88,49 +94,53 @@ jobs:
- name: Check for differences
run: |
if ! git diff --quiet; then
git diff
git diff > autogen-macos.patch
echo "Autogen files out of date. Apply patch from: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}#artifacts"
echo "Autogen mismatch detected. Patch available at: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}#artifacts"
exit 1
fi
- name: Upload patch artifact
if: failure()
uses: actions/upload-artifact@v4
uses: actions/upload-artifact@v7
with:
name: autogen-macos-patch
path: autogen-macos.patch

autogen-comgr-3:
name: In-tree Autogen (comgr 3)
autogen-comgr-2:
name: In-tree Autogen (comgr 2)
runs-on: ubuntu-24.04
timeout-minutes: 15
steps:
- name: Checkout Code
uses: actions/checkout@v4
uses: actions/checkout@v6
- name: Setup Environment
uses: ./.github/actions/setup-tinygrad
with:
key: 'autogen-comgr'
- name: Install autogen support packages
run: |
wget https://repo.radeon.com/rocm/rocm.gpg.key -O - | gpg --dearmor | sudo tee /etc/apt/keyrings/rocm.gpg > /dev/null
sudo tee /etc/apt/sources.list.d/rocm.list <<EOF
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/6.4 $(lsb_release -cs) main
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/6.2 $(lsb_release -cs) main
EOF
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600
sudo apt -qq update || true
sudo apt-get install -y --no-install-recommends libclang-20-dev comgr
- name: Regenerate autogen files
run: |
rm tinygrad/runtime/autogen/comgr_3.py
python3 -c "from tinygrad.runtime.autogen import comgr_3"
rm tinygrad/runtime/autogen/comgr.py
python3 -c "from tinygrad.runtime.autogen import comgr"
- name: Check for differences
run: |
if ! git diff --quiet; then
git diff > autogen-comgr3.patch
echo "Autogen files out of date. Apply patch from: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}#artifacts"
git diff
git diff > autogen-comgr2.patch
echo "Autogen mismatch detected. Patch available at: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}#artifacts"
exit 1
fi
- name: Upload patch artifact
if: failure()
uses: actions/upload-artifact@v4
uses: actions/upload-artifact@v7
with:
name: autogen-comgr3-patch
path: autogen-comgr3.patch
name: autogen-comgr2-patch
path: autogen-comgr2.patch
Loading
Loading