|
| 1 | +Sure, I've reformatted your Markdown text into a code block and fixed some grammar issues: |
| 2 | + |
1 | 3 | # Nitro - Accelerated AI Inference Engine |
2 | 4 |
|
3 | 5 | <p align="center"> |
|
20 | 22 |
|
21 | 23 | > ⚠️ **Nitro is currently in Development**: Expect breaking changes and bugs! |
22 | 24 |
|
23 | | - |
24 | 25 | ## Features |
25 | 26 |
|
26 | 27 | ### Supported features |
27 | | -- Simple http webserver to do inference on triton (without triton client) |
28 | | -- Upload inference result to s3 (txt2img) |
| 28 | +- Simple HTTP webserver for inference on Triton (without Triton client) |
| 29 | +- Upload inference results to S3 (txt2img) |
29 | 30 | - GGML inference support (llama.cpp, etc...) |
30 | 31 |
|
31 | 32 | ### TODO: |
32 | 33 | - [ ] Local file server |
33 | 34 | - [ ] Cache |
34 | | -- [ ] Plugins support |
| 35 | +- [ ] Plugin support |
| 36 | + |
| 37 | + |
| 38 | + |
| 39 | +## Documentation |
| 40 | + |
| 41 | +## Quickstart |
| 42 | + |
| 43 | +Step 1: Download Nitro |
| 44 | +To use Nitro, download the released binaries from the release page below. |
| 45 | +[Download Nitro](https://github.com/janhq/nitro/releases) |
| 46 | + |
| 47 | +After downloading the release, double-click on the Nitro binary. |
| 48 | + |
| 49 | +Step 2: Download a Model |
| 50 | +Download a llama model to try running the llama C++ integration. You can find a "GGUF" model on The Bloke's page below. |
| 51 | +[Download Model](https://huggingface.co/TheBloke) |
35 | 52 |
|
36 | | -### Nitro Endpoints |
| 53 | +Step 3: Run Nitro |
| 54 | +Double-click on Nitro to run it. After downloading your model, make sure it's saved to a specific path. Then, make an API call to load your model into Nitro. |
37 | 55 |
|
38 | 56 | ```zsh |
39 | | -WIP |
| 57 | +curl -X POST 'http://localhost:3928/inferences/llamacpp/loadmodel' \ |
| 58 | + -H 'Content-Type: application/json' \ |
| 59 | + -d '{ |
| 60 | + "llama_model_path": "/path/to/your_model.gguf", |
| 61 | + "ctx_len": 2048, |
| 62 | + "ngl": 100, |
| 63 | + "embedding": true |
| 64 | + }' |
40 | 65 | ``` |
| 66 | +ctx_len and ngl are typical llama C++ parameters, and embedding determines whether to enable the embedding endpoint or not. |
41 | 67 |
|
42 | | -## Documentation |
| 68 | +Step 4: Perform Inference on Nitro for the First Time |
43 | 69 |
|
44 | | -## Installation |
| 70 | +```zsh |
| 71 | +curl --location 'http://localhost:3928/inferences/llamacpp/chat_completion' \ |
| 72 | + --header 'Content-Type: application/json' \ |
| 73 | + --header 'Accept: text/event-stream' \ |
| 74 | + --header 'Access-Control-Allow-Origin: *' \ |
| 75 | + --data '{ |
| 76 | + "messages": [ |
| 77 | + {"content": "Hello there 👋", "role": "assistant"}, |
| 78 | + {"content": "Can you write a long story", "role": "user"} |
| 79 | + ], |
| 80 | + "stream": true, |
| 81 | + "model": "gpt-3.5-turbo", |
| 82 | + "max_tokens": 2000 |
| 83 | + }' |
| 84 | +``` |
45 | 85 |
|
46 | | -WIP |
| 86 | +Nitro server is compatible with the OpenAI format, so you can expect the same output as the OpenAI ChatGPT API. |
47 | 87 |
|
48 | 88 | ## About Nitro |
49 | 89 |
|
50 | 90 | ### Repo Structure |
51 | 91 |
|
52 | | -WIP |
| 92 | +``` |
| 93 | +. |
| 94 | +├── controllers |
| 95 | +├── docs |
| 96 | +├── llama.cpp -> Upstream llama C++ |
| 97 | +├── nitro_deps -> Dependencies of the Nitro project as a sub-project |
| 98 | +└── utils |
| 99 | +``` |
53 | 100 |
|
54 | 101 | ### Architecture |
55 | | - |
56 | | - |
57 | | -### Contributing |
| 102 | +Nitro is an integration layer with the most cutting-edge inference engine. Its structure can be simplified as follows: |
58 | 103 |
|
59 | | -WIP |
| 104 | + |
60 | 105 |
|
61 | 106 | ### Contact |
62 | 107 |
|
63 | | -- For support: please file a Github ticket |
| 108 | +- For support: please file a GitHub ticket |
64 | 109 | - For questions: join our Discord [here](https://discord.gg/FTk2MvZwJH) |
65 | | -- For long form inquiries: please email hello@jan.ai |
| 110 | +- For long-form inquiries: please email hello@jan.ai |
| 111 | +``` |
| 112 | +
|
| 113 | +I've made formatting improvements and fixed some grammatical issues. If you have any further questions or need additional assistance, please let me know! |
0 commit comments