Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/DEEPSEEKV2.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# DeepSeek V2: [`deepseek-ai/DeepSeek-V2-Lite`](https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite)
# V2: [`deepseek-ai/DeepSeek-V2-Lite`](https://huggingface.co/deepseek-ai/-Lite)

The DeepSeek V2 is a mixture of expert (MoE) model featuring ["Multi-head Latent Attention"](https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite#5-model-architecture).
The V2 is a mixture of expert (MoE) model featuring ["Multi-head Latent Attention"](https://huggingface.co/deepseek-ai/-Lite#5-model-architecture).

- Context length of **32k tokens** (Lite model), **128k tokens** (full model)
- 64 routed experts (Lite model), 160 routed experts (full model)
Expand Down
4 changes: 2 additions & 2 deletions docs/DEEPSEEKV3.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# DeepSeek V3: [`deepseek-ai/DeepSeek-V3`](https://huggingface.co/deepseek-ai/DeepSeek-V3), [`deepseek-ai/DeepSeek-R1`](https://huggingface.co/deepseek-ai/DeepSeek-R1)
# V3: [`deepseek-ai/DeepSeek-V3`](https://huggingface.co/deepseek-ai/), [`deepseek-ai/DeepSeek-R1`](https://huggingface.co/deepseek-ai/-R1)

The DeepSeek V3 is a mixture of expert (MoE) model.
The V3 is a mixture of expert (MoE) model.

```
./mistralrs-server --isq 4 -i plain -m deepseek-ai/DeepSeek-R1
Expand Down
4 changes: 2 additions & 2 deletions docs/ISQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,8 +48,8 @@ When using ISQ, it will automatically load ISQ-able weights into CPU memory befo

For Mixture of Expert models, a method called [MoQE](https://arxiv.org/abs/2310.02410) can be applied to only quantize MoE layers. This is configured via the ISQ "organization" parameter in all APIs. The following models support MoQE:
- [Phi 3.5 MoE](PHI3.5MOE.md)
- [DeepSeek V2](DEEPSEEKV2.md)
- [DeepSeek V3 / DeepSeek R1](DEEPSEEKV3.md)
- [ V2](DEEPSEEKV2.md)
- [ V3 / R1](DEEPSEEKV3.md)

## Accuracy

Expand Down
8 changes: 4 additions & 4 deletions docs/QWEN2VL.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Qwen 2 Vision Model: [`Qwen2-VL Collection`](https://huggingface.co/collections/Qwen/qwen2-vl-66cee7455501d7126940800d)
# 2 Vision Model: [`Qwen2-VL Collection`](https://huggingface.co/collections//qwen2-vl-66cee7455501d7126940800d)

Mistral.rs supports the Qwen2-VL vision model family, with examples in the Rust, Python, and HTTP APIs. ISQ quantization is supported to allow running the model with less memory requirements.
Mistral.rs supports the -VL vision model family, with examples in the Rust, Python, and HTTP APIs. ISQ quantization is supported to allow running the model with less memory requirements.

UQFF quantizations are also available.

Expand All @@ -14,7 +14,7 @@ The Rust API takes an image from the [image](https://docs.rs/image/latest/image/
> Note: When using device mapping or model topology, only the text model and its layers will be managed. This is because it contains most of the model parameters. *The text model has 28 layers*.

## ToC
- [Qwen 2 Vision Model: `Qwen2-VL Collection`](#qwen-2-vision-model-qwen2-vl-collection)
- [ 2 Vision Model: `Qwen2-VL Collection`](#qwen-2-vision-model-qwen2-vl-collection)
- [ToC](#toc)
- [Interactive mode](#interactive-mode)
- [HTTP server](#http-server)
Expand All @@ -25,7 +25,7 @@ The Rust API takes an image from the [image](https://docs.rs/image/latest/image/

Mistral.rs supports interactive mode for vision models! It is an easy way to interact with the model.

1) Start up interactive mode with the Qwen2-VL model
1) Start up interactive mode with the -VL model

> [!NOTE]
> You should replace `--features ...` with one of the features specified [here](../README.md#supported-accelerators), or remove it for pure CPU inference.
Expand Down
6 changes: 3 additions & 3 deletions docs/QWEN3.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Qwen 3: [`collection`](https://huggingface.co/collections/Qwen/qwen3-67dd247413f0e2e4f653967f)
# 3: [`collection`](https://huggingface.co/collections//qwen3-67dd247413f0e2e4f653967f)

The Qwen 3 family is a collection of hybrid reasoning MoE and non-MoE models ranging from 0.6b to 235b parameters.
The 3 family is a collection of hybrid reasoning MoE and non-MoE models ranging from 0.6b to 235b parameters.

```
./mistralrs-server --isq 4 -i plain -m Qwen/Qwen3-8B
Expand All @@ -12,7 +12,7 @@ The Qwen 3 family is a collection of hybrid reasoning MoE and non-MoE models ran
> Note: tool calling support is fully implemented for the Qwen 3 models, including agentic web search.

## Enabling thinking
The Qwen 3 models are hybrid reasoning models which can be controlled at inference-time. **By default, reasoning is enabled for these models.** To dynamically control this, it is recommended to either add `/no_think` or `/think` to your prompt. Alternatively, you can specify the `enable_thinking` flag as detailed by the API-specific examples.
The 3 models are hybrid reasoning models which can be controlled at inference-time. **By default, reasoning is enabled for these models.** To dynamically control this, it is recommended to either add `/no_think` or `/think` to your prompt. Alternatively, you can specify the `enable_thinking` flag as detailed by the API-specific examples.

## HTTP API
You can find a more detailed example demonstrating enabling/disabling thinking [here](../examples/server/qwen3.py).
Expand Down
8 changes: 4 additions & 4 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,15 +11,15 @@
- [Phi 3.5 MoE](PHI3.5MOE.md)
- [Phi 3.5 Vision](PHI3V.md)
- [Llama 3.2 Vision](VLLAMA.md)
- [Qwen2-VL](QWEN2VL.md)
- [-VL](QWEN2VL.md)
- [Idefics 3 and Smol VLM](IDEFICS3.md)
- [DeepSeek V2](DEEPSEEKV2.md)
- [DeepSeek V3](DEEPSEEKV3.md)
- [ V2](DEEPSEEKV2.md)
- [ V3](DEEPSEEKV3.md)
- [MiniCPM-O 2.6](MINICPMO_2_6.md)
- [Gemma 3](GEMMA3.md)
- [Mistral 3](MISTRAL3.md)
- [Llama 4](LLAMA4.md)
- [Qwen 3](QWEN3.md)
- [ 3](QWEN3.md)

## Adapters
- [Docs](ADAPTER_MODELS.md)
Expand Down
4 changes: 2 additions & 2 deletions docs/TOOL_CALLING.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ We support the following models' tool calling in OpenAI-compatible and parse nat
- Mistral Nemo
- Hermes 2 Pro
- Hermes 3
- DeepSeek V2/V3/R1
- Qwen 3
- V2/V3/R1
- 3

All models that support tool calling will respond according to the OpenAI tool calling API.

Expand Down
2 changes: 1 addition & 1 deletion docs/VISION_MODELS.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Please see docs for the following model types:
- Idefics2: [IDEFICS2.md](IDEFICS2.md)
- LLaVA and LLaVANext: [LLAVA.md](LLaVA.md)
- Llama 3.2 Vision: [VLLAMA.md](VLLAMA.md)
- Qwen2-VL: [QWEN2VL.md](QWEN2VL.md)
- -VL: [QWEN2VL.md](QWEN2VL.md)
- Idefics 3 and Smol VLM: [IDEFICS3.md](IDEFICS3.md)
- Phi 4 Multimodal: [PHI4MM.md](PHI4MM.md)

Expand Down
2 changes: 1 addition & 1 deletion docs/WEB_SEARCH.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ This works with all models that support [tool calling](TOOL_CALLING.md). However
- Hermes 3 3b/8b
- Mistral 3 24b
- Llama 4 Scout/Maverick
- Qwen 3 (⭐ Recommended!)
- 3 (⭐ Recommended!)

Web search is supported both in streaming and completion responses! This makes it easy to integrate and test out in interactive mode!

Expand Down
Loading