EvalRouteOps

Distributed adaptive inference infrastructure for reinforcement-learning-driven LLM routing, online optimization, and production-scale serving experimentation.

EvalRouteOps is a production-oriented AI systems platform for studying how adaptive routing policies optimize tradeoffs between:

response quality
latency
infrastructure cost
throughput
reliability

under realistic distributed inference constraints.

The platform combines:

contextual bandits
Thompson Sampling
policy-gradient routing
adaptive traffic shaping
Redis-backed distributed workers
Kubernetes deployment infrastructure
GPU-aware scheduling manifests
Prometheus/OpenTelemetry observability
streaming inference APIs
large-scale deterministic benchmarking

Technical Report

Full systems and infrastructure report:

Markdown version: docs/technical_report.md
PDF version: docs/EvalRouteOps_Technical_Report.pdf

The report covers:

adaptive routing optimization
reinforcement-learning-driven inference allocation
distributed serving infrastructure
Kubernetes orchestration
observability systems
benchmark methodology
experimental analysis
infrastructure tradeoff evaluation

EvalRouteOps is designed as intelligent serving infrastructure — not a chatbot wrapper or prompt-engineering project.

Core Capabilities

Adaptive Routing

Implemented routing strategies include:

latency-aware routing
cost-aware routing
quality-first routing
epsilon-greedy contextual bandits
Thompson Sampling
policy-gradient routing
adaptive traffic shaping
Pareto tradeoff analysis
oracle benchmarking

Distributed Inference Infrastructure

EvalRouteOps supports distributed inference execution through:

Redis-backed inference queues
asynchronous inference workers
provider retry wrappers
timeout wrappers
fallback provider chains
concurrency-limited providers
streaming provider interfaces

Infrastructure components include:

Docker Compose
Kubernetes deployment manifests
horizontal autoscaling (HPA)
GPU-aware scheduling manifests

Serving Layer

The serving stack includes:

FastAPI routing APIs
streaming inference APIs
structured request logging
Prometheus metrics
OpenTelemetry tracing
request timing instrumentation

Experimental Scale

Current benchmarked infrastructure includes:

Metric	Result
Routing simulation scale	100,000 requests
Adaptive routing experiments	20,000 requests
Replay throughput	7,600+ requests/sec
Live API throughput	58 requests/sec
API P95 latency	~28 ms
Automated tests	53 passing
Failure rate (fallback-enabled routing)	0%

Benchmark Visualizations

Bandit Cumulative Regret

Bandit Backend Allocation

Bandit Rolling Quality

Load Sweep Throughput

Architecture

Client
  |
  v
FastAPI Serving Layer
  |
  v
Routing Policies
  |
  v
Redis Queue Backend
  |
  +----------------------+
  |                      |
  v                      v
CPU Workers         GPU Workers
  |                      |
  +----------+-----------+
             |
             v
     Provider Execution Layer

System Architecture

See:

docs/images/architecture_diagram.md

Quickstart

Environment Setup

python -m venv .venv

# Windows PowerShell
.venv\Scripts\Activate.ps1

# macOS/Linux
# source .venv/bin/activate

pip install -e ".[dev]"

Run Tests

pytest
ruff check .

Start API Server

uvicorn evalrouteops.serving.main:app --reload

Interactive API docs:

http://127.0.0.1:8000/docs

Run API Load Test

python scripts/run_api_loadtest.py

Run Adaptive Routing Experiments

python scripts/run_adaptive_traffic_experiment.py
python scripts/run_policy_gradient_experiment.py

Docker Deployment

Build the image:

docker build -t evalrouteops:latest .

Run the distributed stack:

docker compose up

Compose services include:

EvalRouteOps API
Redis backend
distributed inference worker

Kubernetes Deployment

Kubernetes manifests are provided in:

k8s/

Supported deployment infrastructure includes:

API deployment
distributed workers
Redis deployment
horizontal autoscaling
GPU-aware worker scheduling

Deploy core stack:

kubectl apply -f k8s/redis-deployment.yaml
kubectl apply -f k8s/api-deployment.yaml
kubectl apply -f k8s/worker-deployment.yaml
kubectl apply -f k8s/services.yaml

Optional GPU workers:

kubectl apply -f k8s/gpu-worker-deployment.yaml

Optional autoscaling:

kubectl apply -f k8s/hpa.yaml

Documentation

Additional documentation is available in:

docs/

Included documentation:

architecture overview
benchmark methodology
deployment documentation
API reference
reproducibility guarantees
benchmark summaries

Design Principles

EvalRouteOps is built around:

deterministic experimentation
reproducible benchmarks
adaptive online optimization
production-oriented infrastructure
typed interfaces
observability-first design
scalable distributed execution

Research Objective

EvalRouteOps investigates whether reinforcement-learning-inspired routing strategies can dynamically optimize distributed inference systems under competing objectives:

latency
quality
infrastructure cost
throughput
reliability

using adaptive online optimization and scalable serving infrastructure.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
configs		configs
docs		docs
k8s		k8s
logs		logs
reports		reports
scripts		scripts
src/evalrouteops		src/evalrouteops
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EvalRouteOps

Technical Report

Core Capabilities

Adaptive Routing

Distributed Inference Infrastructure

Serving Layer

Experimental Scale

Benchmark Visualizations

Bandit Cumulative Regret

Bandit Backend Allocation

Bandit Rolling Quality

Load Sweep Throughput

Architecture

System Architecture

Quickstart

Environment Setup

Run Tests

Start API Server

Run API Load Test

Run Adaptive Routing Experiments

Docker Deployment

Kubernetes Deployment

Documentation

Design Principles

Research Objective

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

EvalRouteOps

Technical Report

Core Capabilities

Adaptive Routing

Distributed Inference Infrastructure

Serving Layer

Experimental Scale

Benchmark Visualizations

Bandit Cumulative Regret

Bandit Backend Allocation

Bandit Rolling Quality

Load Sweep Throughput

Architecture

System Architecture

Quickstart

Environment Setup

Run Tests

Start API Server

Run API Load Test

Run Adaptive Routing Experiments

Docker Deployment

Kubernetes Deployment

Documentation

Design Principles

Research Objective

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages