refactor(compute): consolidate quantized matmul methods#76
Merged
Conversation
Consolidate repeated patterns from 14 quantized matmul methods into 6 shared helpers: uploadRawBytes, aShapeCheck2D, bweightShapeMKN, quantGemvResult, dequantSgemm, and sgemmNTOrFallback. Each original method is now a thin wrapper calling these helpers, reducing gpu_engine.go by 797 lines (net -557 across both files). Zero behavioral changes -- all method signatures remain identical.
…ature The GemvQ5_0F32 kernel was updated to accept qhOffset and qsOffset parameters for the GPU-separated layout, but the test still called with the old 6-arg signature. Fix all 3 call sites: - TestGemvQ5_0F32_Parity - TestGemvQ5_0F32_MultipleSizes - BenchmarkGemvQ5_0F32_4096 Add q5_0ToGPULayout helper to convert standard block format to the GPU-separated layout (scales | qh | qs) needed by the kernel.
These packages use unsafe.Pointer for GPU/accelerator runtime bindings via purego/dlopen, same as cuda/hip/opencl. The pjrt package was added after the initial CI exclusion list.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
compute/gpu_engine_matmul.goCloses E63 tasks T63.1.1, T63.1.2.
Test plan
go build ./...passesgo vet ./compute/clean (no new warnings)go test ./compute/ -timeout 120spasses