Skip to content
This repository was archived by the owner on Jul 4, 2025. It is now read-only.

Commit b20a8b7

Browse files
authored
Merge pull request #84 from janhq/76-massively-improve-nitro-documentation
76 massively improve nitro documentation
2 parents 1e51200 + 515978b commit b20a8b7

File tree

3 files changed

+73
-29
lines changed

3 files changed

+73
-29
lines changed

README.md

Lines changed: 73 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -4,62 +4,106 @@
44
<img alt="nitrologo" src="https://user-images.githubusercontent.com/69952136/266939567-4a7d24f0-9338-4ab5-9261-cb3c71effe35.png">
55
</p>
66

7-
<p align="center">
8-
<!-- ALL-CONTRIBUTORS-BADGE:START - Do not remove or modify this section -->
9-
<img alt="GitHub commit activity" src="https://img.shields.io/github/commit-activity/m/janhq/nitro"/>
10-
<img alt="Github Last Commit" src="https://img.shields.io/github/last-commit/janhq/nitro"/>
11-
<img alt="Github Contributors" src="https://img.shields.io/github/contributors/janhq/nitro"/>
12-
<img alt="GitHub closed issues" src="https://img.shields.io/github/issues-closed/janhq/nitro"/>
13-
<img alt="Discord" src="https://img.shields.io/discord/1107178041848909847?label=discord"/>
14-
</p>
15-
167
<p align="center">
178
<a href="https://docs.jan.ai/">Getting Started</a> - <a href="https://docs.jan.ai">Docs</a>
189
- <a href="https://docs.jan.ai/changelog/">Changelog</a> - <a href="https://github.com/janhq/nitro/issues">Bug reports</a> - <a href="https://discord.gg/AsJ8krTT3N">Discord</a>
1910
</p>
2011

2112
> ⚠️ **Nitro is currently in Development**: Expect breaking changes and bugs!
2213
23-
2414
## Features
2515

2616
### Supported features
27-
- Simple http webserver to do inference on triton (without triton client)
28-
- Upload inference result to s3 (txt2img)
2917
- GGML inference support (llama.cpp, etc...)
3018

3119
### TODO:
3220
- [ ] Local file server
3321
- [ ] Cache
34-
- [ ] Plugins support
22+
- [ ] Plugin support
3523

36-
### Nitro Endpoints
24+
## Documentation
3725

38-
```zsh
39-
WIP
26+
## About Nitro
27+
28+
Nitro is a light-weight integration layer (and soon to be inference engine) for cutting edge inference engine, make deployment of AI models easier than ever before!
29+
30+
The binary of nitro after zipped is only ~3mb in size with none to minimal dependencies (if you use a GPU need CUDA for example) make it desirable for any edge/server deployment 👍.
31+
32+
### Repo Structure
33+
34+
```
35+
.
36+
├── controllers
37+
├── docs
38+
├── llama.cpp -> Upstream llama C++
39+
├── nitro_deps -> Dependencies of the Nitro project as a sub-project
40+
└── utils
4041
```
4142

42-
## Documentation
43+
## Quickstart
4344

44-
## Installation
45+
**Step 1: Download Nitro**
4546

46-
WIP
47+
To use Nitro, download the released binaries from the release page below:
4748

48-
## About Nitro
49+
[![Download Nitro](https://img.shields.io/badge/Download-Nitro-blue.svg)](https://github.com/janhq/nitro/releases)
4950

50-
### Repo Structure
51+
After downloading the release, double-click on the Nitro binary.
5152

52-
WIP
53+
**Step 2: Download a Model**
5354

54-
### Architecture
55-
![Current architecture](docs/architecture.png)
55+
Download a llama model to try running the llama C++ integration. You can find a "GGUF" model on The Bloke's page below:
56+
57+
[![Download Model](https://img.shields.io/badge/Download-Model-green.svg)](https://huggingface.co/TheBloke)
58+
59+
**Step 3: Run Nitro**
5660

57-
### Contributing
61+
Double-click on Nitro to run it. After downloading your model, make sure it's saved to a specific path. Then, make an API call to load your model into Nitro.
5862

59-
WIP
63+
```zsh
64+
curl -X POST 'http://localhost:3928/inferences/llamacpp/loadmodel' \
65+
-H 'Content-Type: application/json' \
66+
-d '{
67+
"llama_model_path": "/path/to/your_model.gguf",
68+
"ctx_len": 2048,
69+
"ngl": 100,
70+
"embedding": true
71+
}'
72+
```
73+
74+
`ctx_len` and `ngl` are typical llama C++ parameters, and `embedding` determines whether to enable the embedding endpoint or not.
75+
76+
**Step 4: Perform Inference on Nitro for the First Time**
77+
78+
```zsh
79+
curl --location 'http://localhost:3928/inferences/llamacpp/chat_completion' \
80+
--header 'Content-Type: application/json' \
81+
--header 'Accept: text/event-stream' \
82+
--header 'Access-Control-Allow-Origin: *' \
83+
--data '{
84+
"messages": [
85+
{"content": "Hello there 👋", "role": "assistant"},
86+
{"content": "Can you write a long story", "role": "user"}
87+
],
88+
"stream": true,
89+
"model": "gpt-3.5-turbo",
90+
"max_tokens": 2000
91+
}'
92+
```
93+
94+
Nitro server is compatible with the OpenAI format, so you can expect the same output as the OpenAI ChatGPT API.
95+
96+
## Compile from source
97+
To compile nitro please visit [Compile from source](docs/manual_install.md)
98+
99+
### Architecture
100+
Nitro is an integration layer with the most cutting-edge inference engine. Its structure can be simplified as follows:
101+
102+
![Current architecture](docs/architecture.png)
60103

61104
### Contact
62105

63-
- For support: please file a Github ticket
64-
- For questions: join our Discord [here](https://discord.gg/FTk2MvZwJH)
65-
- For long form inquiries: please email hello@jan.ai
106+
- For support, please file a GitHub ticket.
107+
- For questions, join our Discord [here](https://discord.gg/FTk2MvZwJH).
108+
- For long-form inquiries, please email hello@jan.ai.
109+

docs/architecture.png

778 KB
Loading
File renamed without changes.

0 commit comments

Comments
 (0)