Skip to content
This repository was archived by the owner on Jul 4, 2025. It is now read-only.

Commit 96d913c

Browse files
committed
format
1 parent 1e51200 commit 96d913c

File tree

1 file changed

+64
-16
lines changed

1 file changed

+64
-16
lines changed

README.md

Lines changed: 64 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
Sure, I've reformatted your Markdown text into a code block and fixed some grammar issues:
2+
13
# Nitro - Accelerated AI Inference Engine
24

35
<p align="center">
@@ -20,46 +22,92 @@
2022

2123
> ⚠️ **Nitro is currently in Development**: Expect breaking changes and bugs!
2224
23-
2425
## Features
2526

2627
### Supported features
27-
- Simple http webserver to do inference on triton (without triton client)
28-
- Upload inference result to s3 (txt2img)
28+
- Simple HTTP webserver for inference on Triton (without Triton client)
29+
- Upload inference results to S3 (txt2img)
2930
- GGML inference support (llama.cpp, etc...)
3031

3132
### TODO:
3233
- [ ] Local file server
3334
- [ ] Cache
34-
- [ ] Plugins support
35+
- [ ] Plugin support
36+
37+
38+
39+
## Documentation
40+
41+
## Quickstart
42+
43+
Step 1: Download Nitro
44+
To use Nitro, download the released binaries from the release page below.
45+
[Download Nitro](https://github.com/janhq/nitro/releases)
46+
47+
After downloading the release, double-click on the Nitro binary.
48+
49+
Step 2: Download a Model
50+
Download a llama model to try running the llama C++ integration. You can find a "GGUF" model on The Bloke's page below.
51+
[Download Model](https://huggingface.co/TheBloke)
3552

36-
### Nitro Endpoints
53+
Step 3: Run Nitro
54+
Double-click on Nitro to run it. After downloading your model, make sure it's saved to a specific path. Then, make an API call to load your model into Nitro.
3755

3856
```zsh
39-
WIP
57+
curl -X POST 'http://localhost:3928/inferences/llamacpp/loadmodel' \
58+
-H 'Content-Type: application/json' \
59+
-d '{
60+
"llama_model_path": "/path/to/your_model.gguf",
61+
"ctx_len": 2048,
62+
"ngl": 100,
63+
"embedding": true
64+
}'
4065
```
66+
ctx_len and ngl are typical llama C++ parameters, and embedding determines whether to enable the embedding endpoint or not.
4167

42-
## Documentation
68+
Step 4: Perform Inference on Nitro for the First Time
4369

44-
## Installation
70+
```zsh
71+
curl --location 'http://localhost:3928/inferences/llamacpp/chat_completion' \
72+
--header 'Content-Type: application/json' \
73+
--header 'Accept: text/event-stream' \
74+
--header 'Access-Control-Allow-Origin: *' \
75+
--data '{
76+
"messages": [
77+
{"content": "Hello there 👋", "role": "assistant"},
78+
{"content": "Can you write a long story", "role": "user"}
79+
],
80+
"stream": true,
81+
"model": "gpt-3.5-turbo",
82+
"max_tokens": 2000
83+
}'
84+
```
4585

46-
WIP
86+
Nitro server is compatible with the OpenAI format, so you can expect the same output as the OpenAI ChatGPT API.
4787

4888
## About Nitro
4989

5090
### Repo Structure
5191

52-
WIP
92+
```
93+
.
94+
├── controllers
95+
├── docs
96+
├── llama.cpp -> Upstream llama C++
97+
├── nitro_deps -> Dependencies of the Nitro project as a sub-project
98+
└── utils
99+
```
53100

54101
### Architecture
55-
![Current architecture](docs/architecture.png)
56-
57-
### Contributing
102+
Nitro is an integration layer with the most cutting-edge inference engine. Its structure can be simplified as follows:
58103

59-
WIP
104+
![Current architecture](docs/architecture.png)
60105

61106
### Contact
62107

63-
- For support: please file a Github ticket
108+
- For support: please file a GitHub ticket
64109
- For questions: join our Discord [here](https://discord.gg/FTk2MvZwJH)
65-
- For long form inquiries: please email hello@jan.ai
110+
- For long-form inquiries: please email hello@jan.ai
111+
```
112+
113+
I've made formatting improvements and fixed some grammatical issues. If you have any further questions or need additional assistance, please let me know!

0 commit comments

Comments
 (0)