zenlm · hanzo-dev · May 30, 2026 · May 30, 2026
diff --git a/docs/DEEPSEEKV2.md b/docs/DEEPSEEKV2.md
@@ -1,6 +1,6 @@
-# DeepSeek V2: [`deepseek-ai/DeepSeek-V2-Lite`](https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite)
+# V2: [`deepseek-ai/DeepSeek-V2-Lite`](https://huggingface.co/deepseek-ai/-Lite)
 
-The DeepSeek V2 is a mixture of expert (MoE) model featuring ["Multi-head Latent Attention"](https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite#5-model-architecture).
+The V2 is a mixture of expert (MoE) model featuring ["Multi-head Latent Attention"](https://huggingface.co/deepseek-ai/-Lite#5-model-architecture).
 
 - Context length of **32k tokens** (Lite model), **128k tokens** (full model)
 - 64 routed experts (Lite model), 160 routed experts (full model)

diff --git a/docs/DEEPSEEKV3.md b/docs/DEEPSEEKV3.md
@@ -1,6 +1,6 @@
-# DeepSeek V3: [`deepseek-ai/DeepSeek-V3`](https://huggingface.co/deepseek-ai/DeepSeek-V3), [`deepseek-ai/DeepSeek-R1`](https://huggingface.co/deepseek-ai/DeepSeek-R1)
+# V3: [`deepseek-ai/DeepSeek-V3`](https://huggingface.co/deepseek-ai/), [`deepseek-ai/DeepSeek-R1`](https://huggingface.co/deepseek-ai/-R1)
 
-The DeepSeek V3 is a mixture of expert (MoE) model.
+The V3 is a mixture of expert (MoE) model.
 
 ```
 ./mistralrs-server --isq 4 -i plain -m deepseek-ai/DeepSeek-R1

diff --git a/docs/ISQ.md b/docs/ISQ.md
@@ -48,8 +48,8 @@ When using ISQ, it will automatically load ISQ-able weights into CPU memory befo
 
 For Mixture of Expert models, a method called [MoQE](https://arxiv.org/abs/2310.02410) can be applied to only quantize MoE layers. This is configured via the ISQ "organization" parameter in all APIs. The following models support MoQE:
 - [Phi 3.5 MoE](PHI3.5MOE.md)
-- [DeepSeek V2](DEEPSEEKV2.md)
-- [DeepSeek V3 / DeepSeek R1](DEEPSEEKV3.md)
+- [ V2](DEEPSEEKV2.md)
+- [ V3 / R1](DEEPSEEKV3.md)
 
 ## Accuracy
 

diff --git a/docs/QWEN2VL.md b/docs/QWEN2VL.md
@@ -1,6 +1,6 @@
-# Qwen 2 Vision Model: [`Qwen2-VL Collection`](https://huggingface.co/collections/Qwen/qwen2-vl-66cee7455501d7126940800d)
+# 2 Vision Model: [`Qwen2-VL Collection`](https://huggingface.co/collections//qwen2-vl-66cee7455501d7126940800d)
 
-Mistral.rs supports the Qwen2-VL vision model family, with examples in the Rust, Python, and HTTP APIs. ISQ quantization is supported to allow running the model with less memory requirements.
+Mistral.rs supports the -VL vision model family, with examples in the Rust, Python, and HTTP APIs. ISQ quantization is supported to allow running the model with less memory requirements.
 
 UQFF quantizations are also available.
 
@@ -14,7 +14,7 @@ The Rust API takes an image from the [image](https://docs.rs/image/latest/image/
 > Note: When using device mapping or model topology, only the text model and its layers will be managed. This is because it contains most of the model parameters. *The text model has 28 layers*.
 
 ## ToC
-- [Qwen 2 Vision Model: `Qwen2-VL Collection`](#qwen-2-vision-model-qwen2-vl-collection)
+- [ 2 Vision Model: `Qwen2-VL Collection`](#qwen-2-vision-model-qwen2-vl-collection)
   - [ToC](#toc)
   - [Interactive mode](#interactive-mode)
   - [HTTP server](#http-server)
@@ -25,7 +25,7 @@ The Rust API takes an image from the [image](https://docs.rs/image/latest/image/
 
 Mistral.rs supports interactive mode for vision models! It is an easy way to interact with the model.
 
-1) Start up interactive mode with the Qwen2-VL model
+1) Start up interactive mode with the -VL model
 
 > [!NOTE]
 > You should replace `--features ...` with one of the features specified [here](../README.md#supported-accelerators), or remove it for pure CPU inference.

diff --git a/docs/QWEN3.md b/docs/QWEN3.md
@@ -1,6 +1,6 @@
-# Qwen 3: [`collection`](https://huggingface.co/collections/Qwen/qwen3-67dd247413f0e2e4f653967f)
+# 3: [`collection`](https://huggingface.co/collections//qwen3-67dd247413f0e2e4f653967f)
 
-The Qwen 3 family is a collection of hybrid reasoning MoE and non-MoE models ranging from 0.6b to 235b parameters.
+The 3 family is a collection of hybrid reasoning MoE and non-MoE models ranging from 0.6b to 235b parameters.
 
 ```
 ./mistralrs-server --isq 4 -i plain -m Qwen/Qwen3-8B
@@ -12,7 +12,7 @@ The Qwen 3 family is a collection of hybrid reasoning MoE and non-MoE models ran
 > Note: tool calling support is fully implemented for the Qwen 3 models, including agentic web search.
 
 ## Enabling thinking
-The Qwen 3 models are hybrid reasoning models which can be controlled at inference-time. **By default, reasoning is enabled for these models.** To dynamically control this, it is recommended to either add `/no_think` or `/think` to your prompt. Alternatively, you can specify the `enable_thinking` flag as detailed by the API-specific examples.
+The 3 models are hybrid reasoning models which can be controlled at inference-time. **By default, reasoning is enabled for these models.** To dynamically control this, it is recommended to either add `/no_think` or `/think` to your prompt. Alternatively, you can specify the `enable_thinking` flag as detailed by the API-specific examples.
 
 ## HTTP API
 You can find a more detailed example demonstrating enabling/disabling thinking [here](../examples/server/qwen3.py).

diff --git a/docs/README.md b/docs/README.md
@@ -11,15 +11,15 @@
 - [Phi 3.5 MoE](PHI3.5MOE.md)
 - [Phi 3.5 Vision](PHI3V.md)
 - [Llama 3.2 Vision](VLLAMA.md)
-- [Qwen2-VL](QWEN2VL.md)
+- [-VL](QWEN2VL.md)
 - [Idefics 3 and Smol VLM](IDEFICS3.md)
-- [DeepSeek V2](DEEPSEEKV2.md)
-- [DeepSeek V3](DEEPSEEKV3.md)
+- [ V2](DEEPSEEKV2.md)
+- [ V3](DEEPSEEKV3.md)
 - [MiniCPM-O 2.6](MINICPMO_2_6.md)
 - [Gemma 3](GEMMA3.md)
 - [Mistral 3](MISTRAL3.md)
 - [Llama 4](LLAMA4.md)
-- [Qwen 3](QWEN3.md)
+- [ 3](QWEN3.md)
 
 ## Adapters
 - [Docs](ADAPTER_MODELS.md)

diff --git a/docs/TOOL_CALLING.md b/docs/TOOL_CALLING.md
@@ -20,8 +20,8 @@ We support the following models' tool calling in OpenAI-compatible and parse nat
 - Mistral Nemo
 - Hermes 2 Pro
 - Hermes 3
-- DeepSeek V2/V3/R1
-- Qwen 3
+- V2/V3/R1
+- 3
 
 All models that support tool calling will respond according to the OpenAI tool calling API.
 

diff --git a/docs/VISION_MODELS.md b/docs/VISION_MODELS.md
@@ -8,7 +8,7 @@ Please see docs for the following model types:
 - Idefics2: [IDEFICS2.md](IDEFICS2.md)
 - LLaVA and LLaVANext: [LLAVA.md](LLaVA.md)
 - Llama 3.2 Vision: [VLLAMA.md](VLLAMA.md)
-- Qwen2-VL: [QWEN2VL.md](QWEN2VL.md)
+- -VL: [QWEN2VL.md](QWEN2VL.md)
 - Idefics 3 and Smol VLM: [IDEFICS3.md](IDEFICS3.md)
 - Phi 4 Multimodal: [PHI4MM.md](PHI4MM.md)
 

diff --git a/docs/WEB_SEARCH.md b/docs/WEB_SEARCH.md
@@ -7,7 +7,7 @@ This works with all models that support [tool calling](TOOL_CALLING.md). However
 - Hermes 3 3b/8b
 - Mistral 3 24b
 - Llama 4 Scout/Maverick
-- Qwen 3 (⭐ Recommended!)
+- 3 (⭐ Recommended!)
 
 Web search is supported both in streaming and completion responses! This makes it easy to integrate and test out in interactive mode!