refactor(compute): consolidate quantized matmul methods by dndungu · Pull Request #76 · zerfoo/ztensor

dndungu · 2026-04-06T21:07:57Z

Summary

Extract 6 shared helper functions from 14 copy-paste quantized matmul methods into new file compute/gpu_engine_matmul.go
Rewrite matMulQ4, matMulQ4K, matMulQ5_0, matMulQ5K, matMulQ6K, matMulQ8, matMulBF16 and their BWeight variants to use shared helpers
matMulMmap/matMulMmapB left unchanged (meta-dispatchers with different pattern)
Net reduction: -557 lines (4318→3521 in gpu_engine.go, +240 in new file)
Zero exported API changes, all method signatures preserved

Closes E63 tasks T63.1.1, T63.1.2.

Test plan

go build ./... passes
go vet ./compute/ clean (no new warnings)
go test ./compute/ -timeout 120s passes
DGX Spark benchmark for Q4_K, Q5_0, Q8, BF16 (T63.2.1)
Full ztensor test suite with race detector (T63.2.2)
zerfoo inference parity tests (T63.2.3)

Consolidate repeated patterns from 14 quantized matmul methods into 6 shared helpers: uploadRawBytes, aShapeCheck2D, bweightShapeMKN, quantGemvResult, dequantSgemm, and sgemmNTOrFallback. Each original method is now a thin wrapper calling these helpers, reducing gpu_engine.go by 797 lines (net -557 across both files). Zero behavioral changes -- all method signatures remain identical.

…ature The GemvQ5_0F32 kernel was updated to accept qhOffset and qsOffset parameters for the GPU-separated layout, but the test still called with the old 6-arg signature. Fix all 3 call sites: - TestGemvQ5_0F32_Parity - TestGemvQ5_0F32_MultipleSizes - BenchmarkGemvQ5_0F32_4096 Add q5_0ToGPULayout helper to convert standard block format to the GPU-separated layout (scales | qh | qs) needed by the kernel.

These packages use unsafe.Pointer for GPU/accelerator runtime bindings via purego/dlopen, same as cuda/hip/opencl. The pjrt package was added after the initial CI exclusion list.

dndungu added 3 commits April 6, 2026 14:06

fix(ci): exclude metal and pjrt from go vet

601a677

These packages use unsafe.Pointer for GPU/accelerator runtime bindings via purego/dlopen, same as cuda/hip/opencl. The pjrt package was added after the initial CI exclusion list.

dndungu merged commit 5a7fdc3 into main Apr 6, 2026
1 check passed

dndungu deleted the e63-matmul-dispatcher branch April 6, 2026 21:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(compute): consolidate quantized matmul methods#76

refactor(compute): consolidate quantized matmul methods#76
dndungu merged 3 commits intomainfrom
e63-matmul-dispatcher

dndungu commented Apr 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dndungu commented Apr 6, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant