|
4 | 4 | <img alt="nitrologo" src="https://user-images.githubusercontent.com/69952136/266939567-4a7d24f0-9338-4ab5-9261-cb3c71effe35.png"> |
5 | 5 | </p> |
6 | 6 |
|
7 | | -<p align="center"> |
8 | | - <!-- ALL-CONTRIBUTORS-BADGE:START - Do not remove or modify this section --> |
9 | | - <img alt="GitHub commit activity" src="https://img.shields.io/github/commit-activity/m/janhq/nitro"/> |
10 | | - <img alt="Github Last Commit" src="https://img.shields.io/github/last-commit/janhq/nitro"/> |
11 | | - <img alt="Github Contributors" src="https://img.shields.io/github/contributors/janhq/nitro"/> |
12 | | - <img alt="GitHub closed issues" src="https://img.shields.io/github/issues-closed/janhq/nitro"/> |
13 | | - <img alt="Discord" src="https://img.shields.io/discord/1107178041848909847?label=discord"/> |
14 | | -</p> |
15 | | - |
16 | 7 | <p align="center"> |
17 | 8 | <a href="https://docs.jan.ai/">Getting Started</a> - <a href="https://docs.jan.ai">Docs</a> |
18 | 9 | - <a href="https://docs.jan.ai/changelog/">Changelog</a> - <a href="https://github.com/janhq/nitro/issues">Bug reports</a> - <a href="https://discord.gg/AsJ8krTT3N">Discord</a> |
19 | 10 | </p> |
20 | 11 |
|
21 | 12 | > ⚠️ **Nitro is currently in Development**: Expect breaking changes and bugs! |
22 | 13 |
|
23 | | - |
24 | 14 | ## Features |
25 | 15 |
|
26 | 16 | ### Supported features |
27 | | -- Simple http webserver to do inference on triton (without triton client) |
28 | | -- Upload inference result to s3 (txt2img) |
29 | 17 | - GGML inference support (llama.cpp, etc...) |
30 | 18 |
|
31 | 19 | ### TODO: |
32 | 20 | - [ ] Local file server |
33 | 21 | - [ ] Cache |
34 | | -- [ ] Plugins support |
| 22 | +- [ ] Plugin support |
35 | 23 |
|
36 | | -### Nitro Endpoints |
| 24 | +## Documentation |
37 | 25 |
|
38 | | -```zsh |
39 | | -WIP |
| 26 | +## About Nitro |
| 27 | + |
| 28 | +Nitro is a light-weight integration layer (and soon to be inference engine) for cutting edge inference engine, make deployment of AI models easier than ever before! |
| 29 | + |
| 30 | +The binary of nitro after zipped is only ~3mb in size with none to minimal dependencies (if you use a GPU need CUDA for example) make it desirable for any edge/server deployment 👍. |
| 31 | + |
| 32 | +### Repo Structure |
| 33 | + |
| 34 | +``` |
| 35 | +. |
| 36 | +├── controllers |
| 37 | +├── docs |
| 38 | +├── llama.cpp -> Upstream llama C++ |
| 39 | +├── nitro_deps -> Dependencies of the Nitro project as a sub-project |
| 40 | +└── utils |
40 | 41 | ``` |
41 | 42 |
|
42 | | -## Documentation |
| 43 | +## Quickstart |
43 | 44 |
|
44 | | -## Installation |
| 45 | +**Step 1: Download Nitro** |
45 | 46 |
|
46 | | -WIP |
| 47 | +To use Nitro, download the released binaries from the release page below: |
47 | 48 |
|
48 | | -## About Nitro |
| 49 | +[](https://github.com/janhq/nitro/releases) |
49 | 50 |
|
50 | | -### Repo Structure |
| 51 | +After downloading the release, double-click on the Nitro binary. |
51 | 52 |
|
52 | | -WIP |
| 53 | +**Step 2: Download a Model** |
53 | 54 |
|
54 | | -### Architecture |
55 | | - |
| 55 | +Download a llama model to try running the llama C++ integration. You can find a "GGUF" model on The Bloke's page below: |
| 56 | + |
| 57 | +[](https://huggingface.co/TheBloke) |
| 58 | + |
| 59 | +**Step 3: Run Nitro** |
56 | 60 |
|
57 | | -### Contributing |
| 61 | +Double-click on Nitro to run it. After downloading your model, make sure it's saved to a specific path. Then, make an API call to load your model into Nitro. |
58 | 62 |
|
59 | | -WIP |
| 63 | +```zsh |
| 64 | +curl -X POST 'http://localhost:3928/inferences/llamacpp/loadmodel' \ |
| 65 | + -H 'Content-Type: application/json' \ |
| 66 | + -d '{ |
| 67 | + "llama_model_path": "/path/to/your_model.gguf", |
| 68 | + "ctx_len": 2048, |
| 69 | + "ngl": 100, |
| 70 | + "embedding": true |
| 71 | + }' |
| 72 | +``` |
| 73 | + |
| 74 | +`ctx_len` and `ngl` are typical llama C++ parameters, and `embedding` determines whether to enable the embedding endpoint or not. |
| 75 | + |
| 76 | +**Step 4: Perform Inference on Nitro for the First Time** |
| 77 | + |
| 78 | +```zsh |
| 79 | +curl --location 'http://localhost:3928/inferences/llamacpp/chat_completion' \ |
| 80 | + --header 'Content-Type: application/json' \ |
| 81 | + --header 'Accept: text/event-stream' \ |
| 82 | + --header 'Access-Control-Allow-Origin: *' \ |
| 83 | + --data '{ |
| 84 | + "messages": [ |
| 85 | + {"content": "Hello there 👋", "role": "assistant"}, |
| 86 | + {"content": "Can you write a long story", "role": "user"} |
| 87 | + ], |
| 88 | + "stream": true, |
| 89 | + "model": "gpt-3.5-turbo", |
| 90 | + "max_tokens": 2000 |
| 91 | + }' |
| 92 | +``` |
| 93 | + |
| 94 | +Nitro server is compatible with the OpenAI format, so you can expect the same output as the OpenAI ChatGPT API. |
| 95 | + |
| 96 | +## Compile from source |
| 97 | +To compile nitro please visit [Compile from source](docs/manual_install.md) |
| 98 | + |
| 99 | +### Architecture |
| 100 | +Nitro is an integration layer with the most cutting-edge inference engine. Its structure can be simplified as follows: |
| 101 | + |
| 102 | + |
60 | 103 |
|
61 | 104 | ### Contact |
62 | 105 |
|
63 | | -- For support: please file a Github ticket |
64 | | -- For questions: join our Discord [here](https://discord.gg/FTk2MvZwJH) |
65 | | -- For long form inquiries: please email hello@jan.ai |
| 106 | +- For support, please file a GitHub ticket. |
| 107 | +- For questions, join our Discord [here](https://discord.gg/FTk2MvZwJH). |
| 108 | +- For long-form inquiries, please email hello@jan.ai. |
| 109 | + |
0 commit comments