|
| 1 | +--- |
| 2 | +title: Vision |
| 3 | +description: Inference engine for vision, the same as OpenAI's |
| 4 | +keywords: [Nitro, Jan, fast inference, inference server, local AI, large language model, OpenAI compatible, open source, llava, bakllava, vision] |
| 5 | +--- |
| 6 | + |
| 7 | +## Load model |
| 8 | +Just like loading the Chat model, for the vision model, you need two specific types: |
| 9 | +- the `GGUF model` |
| 10 | +- the `mmproj model`. |
| 11 | + |
| 12 | +You can load the model using: |
| 13 | + |
| 14 | +```bash title="Load Model" {3,4} |
| 15 | +curl -X POST 'http://127.0.0.1:3928/inferences/llamacpp/loadmodel' -H 'Content-Type: application/json' -d '{ |
| 16 | + "llama_model_path": "/path/to/gguf/model/", |
| 17 | + "mmproj": "/path/to/mmproj/model/", |
| 18 | + "ctx_len": 2048, |
| 19 | + "ngl": 100, |
| 20 | + "cont_batching": false, |
| 21 | + "embedding": false, |
| 22 | + "system_prompt": "", |
| 23 | + "user_prompt": "\n### Instruction:\n", |
| 24 | + "ai_prompt": "\n### Response:\n" |
| 25 | + }' |
| 26 | +``` |
| 27 | + |
| 28 | +Download the models here: |
| 29 | +- [Llava Model](https://huggingface.co/jartine/llava-v1.5-7B-GGUF/tree/main): Large Language and Vision Assistant achieves SoTA on 11 benchmarks. |
| 30 | +- [Bakllava Model](https://huggingface.co/mys/ggml_bakllava-1/tree/main) is a Mistral 7B base augmented with the LLaVA architecture. |
| 31 | + |
| 32 | +## Inference |
| 33 | + |
| 34 | +Nitro currently only works with images converted to base64 format. Use this [base64 converter](https://www.base64-image.de/) to prepare your images. |
| 35 | + |
| 36 | +To get the model's understanding of an image, do the following: |
| 37 | + |
| 38 | +```bash title="Inference" |
| 39 | +curl http://127.0.0.1:3928/v1/chat/completions \ |
| 40 | + -H "Content-Type: application/json" \ |
| 41 | + -H "Authorization: Bearer $OPENAI_API_KEY" \ |
| 42 | + -d '{ |
| 43 | + "model": "gpt-4-vision-preview", |
| 44 | + "messages": [ |
| 45 | + { |
| 46 | + "role": "user", |
| 47 | + "content": [ |
| 48 | + { |
| 49 | + "type": "text", |
| 50 | + "text": "What’s in this image?" |
| 51 | + }, |
| 52 | + { |
| 53 | + "type": "image_url", |
| 54 | + "image_url": { |
| 55 | + "url": "<base64>" |
| 56 | + } |
| 57 | + } |
| 58 | + ] |
| 59 | + } |
| 60 | + ], |
| 61 | + "max_tokens": 300 |
| 62 | + }' |
| 63 | +``` |
| 64 | + |
| 65 | +If the base64 string is too long and causes errors, consider using [Postman](https://www.postman.com/) as an alternative. |
0 commit comments