OpenAI-compatible API server for Jina embeddings and reranking models.
- Embedding: jina-embeddings-v5-text-small (with LoRA task adapters)
- Reranking: jina-reranker-v3
- OpenAI-compatible API: Works with existing OpenAI client libraries
- Task-adaptive embeddings: Switch LoRA adapter per request (
retrieval,text-matching,clustering,classification)
uv sync
Fetch_Models.bat #This is for windows, it contains two 'hf download' commands, adapt for linux accordinglyStart the server:
uv run jina_server.pyOr activate the virtual environment:
uv venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
python jina_server.pyCreate embeddings for input text. Supports task-adaptive LoRA adapters.
| Parameter | Type | Default | Description |
|---|---|---|---|
input |
string | string[] |
(required) | Text(s) to embed |
task |
string |
"retrieval" |
Task adapter: retrieval, text-matching, clustering, classification |
prompt_name |
string |
null |
Required when task="retrieval": "query" or "document" |
model |
string |
"jina-embeddings-v5-text-small" |
Model name |
batch_size |
int |
32 |
Processing batch size (1–128) |
Retrieval (query side):
curl http://localhost:8000/v1/embeddings \
-H "Content-Type: application/json" \
-d '{"input": "What is machine learning?", "task": "retrieval", "prompt_name": "query"}'Retrieval (document side):
curl http://localhost:8000/v1/embeddings \
-H "Content-Type: application/json" \
-d '{"input": "Machine learning is a field of AI...", "task": "retrieval", "prompt_name": "document"}'Text matching:
curl http://localhost:8000/v1/embeddings \
-H "Content-Type: application/json" \
-d '{"input": ["Hello", "Hi"], "task": "text-matching"}'Classification:
curl http://localhost:8000/v1/embeddings \
-H "Content-Type: application/json" \
-d '{"input": "This product is great!", "task": "classification"}'Clustering:
curl http://localhost:8000/v1/embeddings \
-H "Content-Type: application/json" \
-d '{"input": ["neural networks", "monetary policy"], "task": "clustering"}'Rerank documents based on query relevance.
curl http://localhost:8000/v1/rerank \
-H "Content-Type: application/json" \
-d '{
"query": "What is machine learning?",
"documents": ["Doc 1", "Doc 2"],
"top_n": 3
}'List available models.
Health check endpoint.
Run the test suite:
uv run test_server.pyI'm telling my opencode to implement avx512 for my 9900x rig, brb
MIT