You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+7-3Lines changed: 7 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -29,8 +29,10 @@ limitations under the License. -->
29
29
30
30
31
31
### 📢 News
32
-
- 🎉 We recently have released our [xLLM Technical Report](https://arxiv.org/abs/2510.14686) on arXiv, providing comprehensive technical blueprints and implementation insights.
33
-
32
+
- 2025-12-05: 🎉 We now support high-performance inference for the [GLM-4.5/GLM-4.6](https://github.com/zai-org/GLM-4.5/blob/main/README_zh.md) series models.
33
+
- 2025-12-05: 🎉 We now support high-performance inference for the [VLM-R1](https://github.com/om-ai-lab/VLM-R1) model.
34
+
- 2025-12-05: 🎉 We build hybrid KV cache management based on [Mooncake](https://github.com/kvcache-ai/Mooncake), supporting global KV cache management with intelligent offloading and prefetching.
35
+
- 2025-10-16: 🎉 We recently have released our [xLLM Technical Report](https://arxiv.org/abs/2510.14686) on arXiv, providing comprehensive technical blueprints and implementation insights.
34
36
35
37
## 1. Project Overview
36
38
@@ -112,6 +114,8 @@ Supported models list:
112
114
- Qwen2.5-VL
113
115
- Qwen3 / Qwen3-MoE
114
116
- Qwen3-VL / Qwen3-VL-MoE
117
+
- GLM4.5 / GLM4.6
118
+
- VLM-R1
115
119
116
120
---
117
121
@@ -244,4 +248,4 @@ If you think this repository is helpful to you, welcome to cite us:
0 commit comments