Skip to content

v1.2.0

Choose a tag to compare

@github-actions github-actions released this 02 Apr 07:26

1.2.0 (2026-04-01)

Features

  • cuda: add Q6_K, Q5_K, Q5_0 GPU dequant kernels for M>1 prefill (d57e37e)
  • cuda: add Q8 Gather kernel for GPU embedding lookup (30eb9c4)
  • tensor: add QuantizeQ4K for float32 to Q4_K quantization (d0d3a82)

Bug Fixes

  • compute: add Q4KStorage to UploadWeights F32 skip list (cc071b6)
  • compute: CPU dequant fallback for Q4_K when K%256!=0 (f50ffa7)
  • compute: use dequant+cuBLAS for Q4_K when K%256!=0 (5f21cbb)
  • compute: use pool-backed GPUStorage for pool allocations (4367330)
  • cuda: byte-wise loads in Q5_0 GEMV for ARM64 alignment (5f19e54)
  • kernels: check null function pointer in FusedSoftmaxVMulF32 (935ad61)

Performance Improvements

  • cuda: separated GPU layout for Q5_0 GEMV (d456c39)