Skip to content

refactor(compute): consolidate quantized matmul methods#76

Merged
dndungu merged 3 commits intomainfrom
e63-matmul-dispatcher
Apr 6, 2026
Merged

refactor(compute): consolidate quantized matmul methods#76
dndungu merged 3 commits intomainfrom
e63-matmul-dispatcher

Conversation

@dndungu
Copy link
Copy Markdown
Contributor

@dndungu dndungu commented Apr 6, 2026

Summary

  • Extract 6 shared helper functions from 14 copy-paste quantized matmul methods into new file compute/gpu_engine_matmul.go
  • Rewrite matMulQ4, matMulQ4K, matMulQ5_0, matMulQ5K, matMulQ6K, matMulQ8, matMulBF16 and their BWeight variants to use shared helpers
  • matMulMmap/matMulMmapB left unchanged (meta-dispatchers with different pattern)
  • Net reduction: -557 lines (4318→3521 in gpu_engine.go, +240 in new file)
  • Zero exported API changes, all method signatures preserved

Closes E63 tasks T63.1.1, T63.1.2.

Test plan

  • go build ./... passes
  • go vet ./compute/ clean (no new warnings)
  • go test ./compute/ -timeout 120s passes
  • DGX Spark benchmark for Q4_K, Q5_0, Q8, BF16 (T63.2.1)
  • Full ztensor test suite with race detector (T63.2.2)
  • zerfoo inference parity tests (T63.2.3)

dndungu added 3 commits April 6, 2026 14:06
Consolidate repeated patterns from 14 quantized matmul methods into 6
shared helpers: uploadRawBytes, aShapeCheck2D, bweightShapeMKN,
quantGemvResult, dequantSgemm, and sgemmNTOrFallback. Each original
method is now a thin wrapper calling these helpers, reducing
gpu_engine.go by 797 lines (net -557 across both files).

Zero behavioral changes -- all method signatures remain identical.
…ature

The GemvQ5_0F32 kernel was updated to accept qhOffset and qsOffset
parameters for the GPU-separated layout, but the test still called
with the old 6-arg signature. Fix all 3 call sites:
- TestGemvQ5_0F32_Parity
- TestGemvQ5_0F32_MultipleSizes
- BenchmarkGemvQ5_0F32_4096

Add q5_0ToGPULayout helper to convert standard block format to the
GPU-separated layout (scales | qh | qs) needed by the kernel.
These packages use unsafe.Pointer for GPU/accelerator runtime
bindings via purego/dlopen, same as cuda/hip/opencl. The pjrt
package was added after the initial CI exclusion list.
@dndungu dndungu merged commit 5a7fdc3 into main Apr 6, 2026
1 check passed
@dndungu dndungu deleted the e63-matmul-dispatcher branch April 6, 2026 21:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant