Releases · defilantech/LLMKube

18 Apr 07:38

github-actions

v0.7.0

237a4d8

v0.7.0 Latest

Latest

0.7.0 (2026-04-18)

⚠ BREAKING CHANGES

sharding: sharding.strategy: tensor on a Model now correctly maps to llama.cpp's --split-mode row instead of silently falling back to --split-mode layer. Configs that set strategy: tensor expecting layer behavior may see performance regressions or new failure modes under concurrent load (particularly on consumer PCIe multi-GPU setups with quantized models). Explicitly set strategy: layer to retain the previous behavior. (#291)
vllm: InferenceService spec.extraArgs is now forwarded to the vLLM runtime. Previously extraArgs was silently ignored when runtime: vllm. Configs that placed llama.cpp-only flags in extraArgs on a vLLM InferenceService will start failing at pod startup. Audit any vLLM InferenceService that sets extraArgs before upgrading. (#291)

Features

add hybrid GPU/CPU offloading support for MoE models (#281) (2287f66)
add tensor overrides and batch size controls for hybrid offloading (#283) (8be4adc)
expose additional runtime controls for llama.cpp and vllm (#291) (2245718)
recognize runtime-resolved sources (HF repo IDs) in Model controller (#293) (953e8a7)

Bug Fixes

inherit runAsUser/runAsGroup from podSecurityContext (#274) (72b9b5c)

Documentation

surface breaking behavior changes for 0.7.0 (#294) (e234a40)

Assets 10

18 Apr 07:39

github-actions

llmkube-0.7.0

237a4d8

llmkube-0.7.0

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

Assets 3

08 Apr 00:52

github-actions

v0.6.0

02a9242

v0.6.0

0.6.0 (2026-04-08)

⚠ BREAKING CHANGES

update default CUDA image to server-cuda13 for Qwen3.5 and Blackwell support (#262)

Features

add first-class PersonaPlex (Moshi) runtime backend (#272) (2b1c948)
add Grafana inference metrics dashboard (#269) (be376c6)
add HPA autoscaling for InferenceService (#260) (2d16502)
add pluggable runtime backends for non-llama.cpp inference engines (#271) (bb1576c)
add vLLM and TGI runtime backends with per-runtime HPA metrics (#273) (441c7c7)
separate image registry from repository in Helm chart (#268) (5c059a4)
support custom layer splits from GPUShardingSpec (#267) (a37701c)
update default CUDA image to server-cuda13 for Qwen3.5 and Blackwell support (#262) (cc9a95e)

Assets 10

08 Apr 00:52

github-actions

llmkube-0.6.0

02a9242

llmkube-0.6.0

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

Assets 3

01 Apr 17:10

github-actions

v0.5.3

86f9bbe

v0.5.3

0.5.3 (2026-04-01)

Features

add KV cache type configuration and extraArgs escape hatch (#256) (7a4b855)
add Ollama as runtime backend for Metal agent (#258) (6148b89)
add oMLX as alternative runtime backend for Metal agent (#257) (eaf9045)

Bug Fixes

improve Metal agent usability (#254) (149c582)

Assets 10

01 Apr 17:10

github-actions

llmkube-0.5.3

86f9bbe

llmkube-0.5.3

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

Assets 3

28 Mar 02:36

github-actions

v0.5.2

eed8274

v0.5.2

0.5.2 (2026-03-27)

Features

add pod security context defaults and CRD overrides (#239) (904432b)

Documentation

add CNCF/Kubernetes trademark disclaimer (#246) (27a49eb)
add Discord community link (#236) (c0d499d)
add OpenShift troubleshooting to README (#241) (47fd1b0)

Assets 10

28 Mar 02:36

github-actions

llmkube-0.5.2

eed8274

llmkube-0.5.2

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

Assets 3

16 Mar 06:48

github-actions

v0.5.1

4a22006

v0.5.1

0.5.1 (2026-03-16)

Features

add memory pressure watchdog with runtime monitoring (#216) (5fa6d54)
add pvc:// model source and SHA256 integrity verification (#229) (1b94f5d)
auto-detect llama-server from Homebrew paths on macOS (#215) (a1e4302)

Bug Fixes

controller metrics port declarations and ServiceMonitor consistency (#214) (296ec99)
correct CHANGELOG entry from 0.4.21 to 0.5.0 (#212) (f7f703a)
quote job-level if expression to fix YAML parsing in helm-chart workflow (8714b9f)

Assets 10

16 Mar 06:48

github-actions

llmkube-0.5.1

4a22006

llmkube-0.5.1

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

Assets 3

Releases: defilantech/LLMKube

v0.7.0

0.7.0 (2026-04-18)

⚠ BREAKING CHANGES

Features

Bug Fixes

Documentation

Uh oh!

llmkube-0.7.0

Uh oh!

v0.6.0

0.6.0 (2026-04-08)

⚠ BREAKING CHANGES

Features

Uh oh!

llmkube-0.6.0

Uh oh!

v0.5.3

0.5.3 (2026-04-01)

Features

Bug Fixes

Uh oh!

llmkube-0.5.3

Uh oh!

v0.5.2

0.5.2 (2026-03-27)

Features

Documentation

Uh oh!

llmkube-0.5.2

Uh oh!

v0.5.1

0.5.1 (2026-03-16)

Features

Bug Fixes

Uh oh!

llmkube-0.5.1

Uh oh!