Skip to content
This repository was archived by the owner on Jul 4, 2025. It is now read-only.

Commit d8d0b21

Browse files
committed
add(vision): add new documentation for vision
1 parent 49ae0fe commit d8d0b21

File tree

2 files changed

+66
-0
lines changed

2 files changed

+66
-0
lines changed

docs/docs/features/vision.md

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
---
2+
title: Vision
3+
description: Inference engine for vision, the same as OpenAI's
4+
keywords: [Nitro, Jan, fast inference, inference server, local AI, large language model, OpenAI compatible, open source, llava, bakllava, vision]
5+
---
6+
7+
## Load model
8+
Just like loading the Chat model, for the vision model, you need two specific types:
9+
- the `GGUF model`
10+
- the `mmproj model`.
11+
12+
You can load the model using:
13+
14+
```bash title="Load Model" {3,4}
15+
curl -X POST 'http://127.0.0.1:3928/inferences/llamacpp/loadmodel' -H 'Content-Type: application/json' -d '{
16+
"llama_model_path": "/path/to/gguf/model/",
17+
"mmproj": "/path/to/mmproj/model/",
18+
"ctx_len": 2048,
19+
"ngl": 100,
20+
"cont_batching": false,
21+
"embedding": false,
22+
"system_prompt": "",
23+
"user_prompt": "\n### Instruction:\n",
24+
"ai_prompt": "\n### Response:\n"
25+
}'
26+
```
27+
28+
Download the models here:
29+
- [Llava Model](https://huggingface.co/jartine/llava-v1.5-7B-GGUF/tree/main): Large Language and Vision Assistant achieves SoTA on 11 benchmarks.
30+
- [Bakllava Model](https://huggingface.co/mys/ggml_bakllava-1/tree/main) is a Mistral 7B base augmented with the LLaVA architecture.
31+
32+
## Inference
33+
34+
Nitro currently only works with images converted to base64 format. Use this [base64 converter](https://www.base64-image.de/) to prepare your images.
35+
36+
To get the model's understanding of an image, do the following:
37+
38+
```bash title="Inference"
39+
curl http://127.0.0.1:3928/v1/chat/completions \
40+
-H "Content-Type: application/json" \
41+
-H "Authorization: Bearer $OPENAI_API_KEY" \
42+
-d '{
43+
"model": "gpt-4-vision-preview",
44+
"messages": [
45+
{
46+
"role": "user",
47+
"content": [
48+
{
49+
"type": "text",
50+
"text": "What’s in this image?"
51+
},
52+
{
53+
"type": "image_url",
54+
"image_url": {
55+
"url": "<base64>"
56+
}
57+
}
58+
]
59+
}
60+
],
61+
"max_tokens": 300
62+
}'
63+
```
64+
65+
If the base64 string is too long and causes errors, consider using [Postman](https://www.postman.com/) as an alternative.

docs/sidebars.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ const sidebars = {
3939
items: [
4040
"features/chat",
4141
"features/embed",
42+
"features/vision"
4243
],
4344
},
4445
{

0 commit comments

Comments
 (0)