Welcome to OpenArc documentation!
This document collects information about the codebase structure, APIs, architecture and design patterns to help you explore the codebase.
- Server - FastAPI server documentation with endpoint details
- Model Registration - How models are registered, loaded, and managed
- Worker Orchestration - Worker system architecture and request routing
- Inference - Inference engines, class structure, and implementation details
┌─────────────────┐
│ FastAPI │ HTTP API Layer
│ Server │ (OpenAI-compatible endpoints)
└────────┬────────┘
│
▼
┌─────────────────┐
│ WorkerRegistry │ Request Routing & Orchestration
└────────┬────────┘
│
▼
┌─────────────────┐
│ ModelRegistry │ Model Lifecycle Management
└────────┬────────┘
│
▼
┌─────────────────┐
│ Inference │ Engine-specific implementations
│ Engines │ (OVGenAI, Optimum, OpenVINO)
└─────────────────┘
-
Server (
src/server/main.py)- FastAPI application with OpenAI-compatible endpoints
- Authentication middleware
- Request/response handling
-
Model Registry (
src/server/model_registry.py)- Model lifecycle management (load/unload)
- Status tracking
- Factory pattern for engine instantiation
-
Worker Registry (
src/server/worker_registry.py)- Per-model worker queues
- Request routing and orchestration
- Async packet processing
-
Inference Engines (
src/engine/)- OVGenAI: LLM, VLM, Whisper models
- Optimum: Embedding, Reranker models
- OpenVINO: Kokoro TTS models
- LLM: Text-to-text language models
- VLM: Vision-language models (image-to-text)
- Whisper: Automatic speech recognition
- Kokoro: Text-to-speech
- Embedding: Text-to-vector embeddings
- Reranker: Document reranking
- OVGenAI: OpenVINO GenAI pipeline (LLM, VLM, Whisper)
- Optimum: Optimum-Intel (Embedding, Reranker)
- OpenVINO: Native OpenVINO runtime (Kokoro TTS)
This project is about intel devices, so expect we may expand to other frameworks/libraries in the future.