The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
-
Updated
Mar 16, 2026 - Python
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!
A conceptual framework for a high-scale Agentic AI orchestrator, inspired by enterprise-grade inference platforms.
AI inference platform architecture lab demonstrating admission control, fairness scheduling, bounded queues, and graceful degradation under burst traffic.
Add a description, image, and links to the inference-platform topic page so that developers can more easily learn about it.
To associate your repository with the inference-platform topic, visit your repo's landing page and select "manage topics."