Arbiter

Arbiter is a high-performance inference server for DeBERTa-based text classification models. It provides a REST API that accepts text inputs and returns classification predictions with confidence scores. The server implements batched processing to efficiently handle multiple concurrent requests, automatically grouping individual requests into batches for optimal GPU utilization and throughput. This batching approach significantly reduces inference latency when processing multiple texts compared to individual sequential requests, making it practical for production workloads that require real-time text classification at scale.

Getting Started

Prerequisites

Rust 1.88 or later
CUDA 12.4+ (for GPU support, optional)
A DeBERTa model (from Hugging Face Hub or local path)

Installation

Clone the repository:
```
git clone <repository-url>
cd arbiter
```

Build the project:

# For CPU-only inference
cargo build --release

# For GPU inference (requires CUDA)
cargo build --release --features cuda

Running the Server

Basic Usage

# Using a Hugging Face model
./target/release/arbiter --model-id microsoft/deberta-v3-base

# Using a local model
./target/release/arbiter --model-path /path/to/local/model

Configuration Options

--host: Server host (default: 127.0.0.1)
--port: Server port (default: 8000)
--batch-size: Batch size for processing (default: 8)
--tick-duration-ms: Batch processing interval in milliseconds (default: 100)
--max-sequence-length: Maximum input sequence length (default: 512)
--cpu-only: Force CPU-only inference
--id2label: Label mapping in format "0=No Claim,1=Claim"

Example API Usage

curl -X POST http://localhost:8000/classify \
  -H "Content-Type: application/json" \
  -d '{
    "model": "microsoft/deberta-v3-base",
    "input": ["This is a test sentence", "Another text to classify"]
  }'

Docker

# CPU version
docker build --target cpu -t arbiter:cpu .

# GPU version
docker build --target gpu -t arbiter:gpu .

# Run with GPU support
docker run --gpus all -p 8000:8000 -e MODEL_ID=microsoft/deberta-v3-base arbiter:gpu

Monitoring

The server exposes Prometheus metrics at /metrics for monitoring request throughput, latency, and other operational metrics.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/workflows		.github/workflows
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
README.md		README.md
depot.json		depot.json
docker-bake.hcl		docker-bake.hcl
renovate.json		renovate.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Arbiter

Getting Started

Prerequisites

Installation

Running the Server

Basic Usage

Configuration Options

Example API Usage

Docker

Monitoring

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

doublewordai/arbiter

Folders and files

Latest commit

History

Repository files navigation

Arbiter

Getting Started

Prerequisites

Installation

Running the Server

Basic Usage

Configuration Options

Example API Usage

Docker

Monitoring

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages