format

tikikun · tikikun · commit 96d913cd51d6 · 2023-10-18T16:57:05.000+07:00
diff --git a/README.md b/README.md
@@ -1,3 +1,5 @@
+Sure, I've reformatted your Markdown text into a code block and fixed some grammar issues:
+
 # Nitro - Accelerated AI Inference Engine
 
 <p align="center">
@@ -20,46 +22,92 @@
 
 > ⚠️ **Nitro is currently in Development**: Expect breaking changes and bugs!
 
-
 ## Features
 
 ### Supported features
-- Simple http webserver to do inference on triton (without triton client)
-- Upload inference result to s3 (txt2img)
+- Simple HTTP webserver for inference on Triton (without Triton client)
+- Upload inference results to S3 (txt2img)
 - GGML inference support (llama.cpp, etc...)
 
 ### TODO:
 - [ ] Local file server
 - [ ] Cache
-- [ ] Plugins support
+- [ ] Plugin support
+
+
+
+## Documentation
+
+## Quickstart
+
+Step 1: Download Nitro
+To use Nitro, download the released binaries from the release page below.
+[Download Nitro](https://github.com/janhq/nitro/releases)
+
+After downloading the release, double-click on the Nitro binary.
+
+Step 2: Download a Model
+Download a llama model to try running the llama C++ integration. You can find a "GGUF" model on The Bloke's page below.
+[Download Model](https://huggingface.co/TheBloke)
 
-### Nitro Endpoints
+Step 3: Run Nitro
+Double-click on Nitro to run it. After downloading your model, make sure it's saved to a specific path. Then, make an API call to load your model into Nitro.
 
 ```zsh
-WIP
+curl -X POST 'http://localhost:3928/inferences/llamacpp/loadmodel' \
+  -H 'Content-Type: application/json' \
+  -d '{
+    "llama_model_path": "/path/to/your_model.gguf",
+    "ctx_len": 2048,
+    "ngl": 100,
+    "embedding": true
+  }'
 ```
+ctx_len and ngl are typical llama C++ parameters, and embedding determines whether to enable the embedding endpoint or not.
 
-## Documentation
+Step 4: Perform Inference on Nitro for the First Time
 
-## Installation
+```zsh
+curl --location 'http://localhost:3928/inferences/llamacpp/chat_completion' \
+     --header 'Content-Type: application/json' \
+     --header 'Accept: text/event-stream' \
+     --header 'Access-Control-Allow-Origin: *' \
+     --data '{
+        "messages": [
+            {"content": "Hello there 👋", "role": "assistant"},
+            {"content": "Can you write a long story", "role": "user"}
+        ],
+        "stream": true,
+        "model": "gpt-3.5-turbo",
+        "max_tokens": 2000
+     }'
+```
 
-WIP
+Nitro server is compatible with the OpenAI format, so you can expect the same output as the OpenAI ChatGPT API.
 
 ## About Nitro
 
 ### Repo Structure
 
-WIP
+```
+.
+├── controllers
+├── docs 
+├── llama.cpp -> Upstream llama C++
+├── nitro_deps -> Dependencies of the Nitro project as a sub-project
+└── utils
+```
 
 ### Architecture
-![Current architecture](docs/architecture.png)
-
-### Contributing
+Nitro is an integration layer with the most cutting-edge inference engine. Its structure can be simplified as follows:
 
-WIP
+![Current architecture](docs/architecture.png)
 
 ### Contact
 
-- For support: please file a Github ticket
+- For support: please file a GitHub ticket
 - For questions: join our Discord [here](https://discord.gg/FTk2MvZwJH)
-- For long form inquiries: please email hello@jan.ai
+- For long-form inquiries: please email hello@jan.ai
+```
+
+I've made formatting improvements and fixed some grammatical issues. If you have any further questions or need additional assistance, please let me know!