Skip to content
This repository was archived by the owner on Jul 4, 2025. It is now read-only.

Commit 5503006

Browse files
authored
Merge pull request #174 from janhq/tidy-Nitro
Tidy Nitro docs
2 parents 0110656 + f72cc2a commit 5503006

File tree

12 files changed

+119
-9925
lines changed

12 files changed

+119
-9925
lines changed

README.md

Lines changed: 41 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,9 @@
1717
- Quick Setup: Approximately 10-second initialization for swift deployment.
1818
- Enhanced Web Framework: Incorporates drogon cpp to boost web service efficiency.
1919

20-
## Documentation
21-
2220
## About Nitro
2321

24-
Nitro is a light-weight integration layer (and soon to be inference engine) for cutting edge inference engine, make deployment of AI models easier than ever before!
22+
Nitro is a high-efficiency C++ inference engine for edge computing, powering [Jan](https://jan.ai/). It is lightweight and embeddable, ideal for product integration.
2523

2624
The binary of nitro after zipped is only ~3mb in size with none to minimal dependencies (if you use a GPU need CUDA for example) make it desirable for any edge/server deployment 👍.
2725

@@ -40,37 +38,57 @@ The binary of nitro after zipped is only ~3mb in size with none to minimal depen
4038

4139
## Quickstart
4240

43-
**Step 1: Download Nitro**
41+
**Step 1: Install Nitro**
4442

45-
To use Nitro, download the released binaries from the release page below:
43+
- For Linux and MacOS
4644

47-
[![Download Nitro](https://img.shields.io/badge/Download-Nitro-blue.svg)](https://github.com/janhq/nitro/releases)
45+
```bash
46+
curl -sfL https://raw.githubusercontent.com/janhq/nitro/main/install.sh | sudo /bin/bash -
47+
```
4848

49-
After downloading the release, double-click on the Nitro binary.
49+
- For Windows
5050

51-
**Step 2: Download a Model**
51+
```bash
52+
powershell -Command "& { Invoke-WebRequest -Uri 'https://raw.githubusercontent.com/janhq/nitro/main/install.bat' -OutFile 'install.bat'; .\install.bat; Remove-Item -Path 'install.bat' }"
53+
```
5254

53-
Download a llama model to try running the llama C++ integration. You can find a "GGUF" model on The Bloke's page below:
55+
**Step 2: Downloading a Model**
5456

55-
[![Download Model](https://img.shields.io/badge/Download-Model-green.svg)](https://huggingface.co/TheBloke)
57+
```bash
58+
mkdir model && cd model
59+
wget -O llama-2-7b-model.gguf https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_K_M.gguf?download=true
60+
```
5661

57-
**Step 3: Run Nitro**
62+
**Step 3: Run Nitro server**
5863

59-
Double-click on Nitro to run it. After downloading your model, make sure it's saved to a specific path. Then, make an API call to load your model into Nitro.
64+
```bash title="Run Nitro server"
65+
nitro
66+
```
6067

68+
**Step 4: Load model**
6169

62-
```zsh
63-
curl -X POST 'http://localhost:3928/inferences/llamacpp/loadmodel' \
70+
```bash title="Load model"
71+
curl http://localhost:3928/inferences/llamacpp/loadmodel \
6472
-H 'Content-Type: application/json' \
6573
-d '{
66-
"llama_model_path": "/path/to/your_model.gguf",
67-
"ctx_len": 2048,
74+
"llama_model_path": "/model/llama-2-7b-model.gguf",
75+
"ctx_len": 512,
6876
"ngl": 100,
69-
"embedding": true,
70-
"n_parallel": 4,
71-
"pre_prompt": "A chat between a curious user and an artificial intelligence",
72-
"user_prompt": "USER: ",
73-
"ai_prompt": "ASSISTANT: "
77+
}'
78+
```
79+
80+
**Step 5: Making an Inference**
81+
82+
```bash title="Nitro Inference"
83+
curl http://localhost:3928/v1/chat/completions \
84+
-H "Content-Type: application/json" \
85+
-d '{
86+
"messages": [
87+
{
88+
"role": "user",
89+
"content": "Who won the world series in 2020?"
90+
},
91+
]
7492
}'
7593
```
7694

@@ -89,7 +107,6 @@ Table of parameters
89107
| `system_prompt` | String | The prompt to use for system rules. |
90108
| `pre_prompt` | String | The prompt to use for internal configuration. |
91109

92-
93110
***OPTIONAL***: You can run Nitro on a different port like 5000 instead of 3928 by running it manually in terminal
94111
```zsh
95112
./nitro 1 127.0.0.1 5000 ([thread_num] [host] [port])
@@ -98,32 +115,13 @@ Table of parameters
98115
- host : host value normally 127.0.0.1 or 0.0.0.0
99116
- port : the port that nitro got deployed onto
100117

101-
**Step 4: Perform Inference on Nitro for the First Time**
102-
103-
```zsh
104-
curl --location 'http://localhost:3928/inferences/llamacpp/chat_completion' \
105-
--header 'Content-Type: application/json' \
106-
--header 'Accept: text/event-stream' \
107-
--header 'Access-Control-Allow-Origin: *' \
108-
--data '{
109-
"messages": [
110-
{"content": "Hello there 👋", "role": "assistant"},
111-
{"content": "Can you write a long story", "role": "user"}
112-
],
113-
"stream": true,
114-
"model": "gpt-3.5-turbo",
115-
"max_tokens": 2000
116-
}'
117-
```
118-
119118
Nitro server is compatible with the OpenAI format, so you can expect the same output as the OpenAI ChatGPT API.
120119

121120
## Compile from source
122-
To compile nitro please visit [Compile from source](docs/manual_install.md)
121+
To compile nitro please visit [Compile from source](docs/new/build-source.md)
123122

124123
### Contact
125124

126125
- For support, please file a GitHub ticket.
127126
- For questions, join our Discord [here](https://discord.gg/FTk2MvZwJH).
128-
- For long-form inquiries, please email hello@jan.ai.
129-
127+
- For long-form inquiries, please email hello@jan.ai.

docs/docs/examples/chatbox.md

Lines changed: 44 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,50 @@
22
title: Nitro with Chatbox
33
---
44

5-
:::info COMING SOON
6-
:::
5+
This guide demonstrates how to integrate Nitro with Chatbox, showcasing the compatibility of Nitro with various platforms.
76

8-
<!--
97
## What is Chatbox?
8+
Chatbox is a versatile desktop client that supports multiple cutting-edge Large Language Models (LLMs). It is available for Windows, Mac, and Linux operating systems.
109

11-
## How to use Nitro as backend -->
10+
For more information, please visit the [Chatbox official GitHub page](https://github.com/Bin-Huang/chatbox).
11+
12+
13+
## Downloading and Installing Chatbox
14+
15+
To download and install Chatbox, follow the instructions available at this [link](https://github.com/Bin-Huang/chatbox#download).
16+
17+
## Using Nitro as a Backend
18+
19+
1. Start Nitro server
20+
21+
Open your command line tool and enter:
22+
```
23+
nitro
24+
```
25+
26+
> Ensure you are using the latest version of [Nitro](new/install.md)
27+
28+
2. Run the Model
29+
30+
To load the model, use the following command:
31+
32+
```
33+
curl http://localhost:3928/inferences/llamacpp/loadmodel \
34+
-H 'Content-Type: application/json' \
35+
-d '{
36+
"llama_model_path": "model/llama-2-7b-chat.Q5_K_M.gguf",
37+
"ctx_len": 512,
38+
"ngl": 100,
39+
}'
40+
```
41+
42+
3. Config chatbox
43+
Adjust the `settings` in Chatbox to connect with Nitro. Change your settings to match the configuration shown in the image below:
44+
45+
![Settings](img/chatbox.PNG)
46+
47+
4. Chat with the Model
48+
49+
Once the setup is complete, you can start chatting with the model using Chatbox. All functions of Chatbox are now enabled with Nitro as the backend.
50+
51+
## Video demo

docs/docs/examples/img/chatbox.PNG

66 KB
Loading

docs/docs/new/about.md

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: About Nitro
3-
slug: /docs
3+
slug: /about
44
---
55

66
Nitro is a high-efficiency C++ inference engine for edge computing, powering [Jan](https://jan.ai/). It is lightweight and embeddable, ideal for product integration.
@@ -119,7 +119,3 @@ Nitro welcomes contributions in various forms, not just coding. Here are some wa
119119

120120
- [drogon](https://github.com/drogonframework/drogon): The fast C++ web framework
121121
- [llama.cpp](https://github.com/ggerganov/llama.cpp): Inference of LLaMA model in pure C/C++
122-
123-
## FAQ
124-
:::info COMING SOON
125-
:::

docs/docs/new/architecture.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
---
22
title: Architecture
3+
slug: /achitecture
34
---
45

56
![Nitro Architecture](img/architecture.drawio.png)

docs/docs/new/build-source.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
---
22
title: Build From Source
3+
slug: /build-source
34
---
45

56
This guide provides step-by-step instructions for building Nitro from source on Linux, macOS, and Windows systems.

docs/docs/new/faq.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
---
2+
title: FAQs
3+
slug: /faq
4+
---
5+
6+
### 1. Is Nitro the same as Llama.cpp with an API server?
7+
8+
Yes, that's correct. However, Nitro isn't limited to just Llama.cpp; it will soon integrate multiple other models like Whisper, Bark, and Stable Diffusion, all in a single binary. This eliminates the need for you to develop a separate API server on top of AI models. Nitro is a comprehensive solution, designed for ease of use and efficiency.
9+
10+
### 2. Is Nitro simply Llama-cpp-python?
11+
12+
Indeed, Nitro isn't bound to Python, which allows you to leverage high-performance software that fully utilizes your system's capabilities. With Nitro, learning how to deploy a Python web server or use FastAPI isn't necessary. The Nitro web server is already fully optimized.
13+
14+
### 3. Why should I switch to Nitro over Ollama?
15+
16+
While Ollama does provide similar functionalities, its design serves a different purpose. Ollama has a larger size (around 200MB) compared to Nitro's 3MB distribution. Nitro's compact size allows for easy embedding into subprocesses, ensuring minimal concerns about package size for your application. This makes Nitro a more suitable choice for applications where efficiency and minimal resource usage are key.
17+
18+
### 4. Why is the model named "chat-gpt-3.5"?
19+
20+
Many applications implement the OpenAI ChatGPT API, and we want Nitro to be versatile for any AI client. While you can use any model name, we've ensured that if you're already using the chatgpt API, switching to Nitro is seamless. Just replace api.openai.com with localhost:3928 in your client settings (like Chatbox, Sillytavern, Oobaboga, etc.), and it will work smoothly with Nitro.

docs/docs/new/model-cycle.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
---
22
title: Model Life Cycle
3+
slug: /model-cycle
34
---
45

56
## Load model

docs/docs/new/quickstart.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
---
22
title: Quickstart
3+
slug: /quickstart
34
---
45

56
## Step 1: Install Nitro

docs/openapi/NitroAPI.yaml

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -437,6 +437,10 @@ components:
437437
default: true
438438
nullable: true
439439
description: Determines if output generation is in a streaming manner.
440+
cache_prompt:
441+
type: boolean
442+
default: true
443+
description: Optimize performance in repeated or similar requests.
440444
temp:
441445
type: number
442446
default: 0.7
@@ -577,7 +581,10 @@ components:
577581
min: 0
578582
max: 1
579583
description: Set probability threshold for more relevant outputs
580-
584+
cache_prompt:
585+
type: boolean
586+
default: true
587+
description: Optimize performance in repeated or similar requests.
581588
ChatCompletionResponse:
582589
type: object
583590
description: Description of the response structure

0 commit comments

Comments
 (0)