whisperX REST API

The whisperX API is a tool for enhancing and analyzing audio content. This API provides a suite of services for processing audio and video files, including transcription, alignment, diarization, and combining transcript with diarization results.

Documentation

Swagger UI is available at /docs for all the services, dump of OpenAPI definition is available in folder app/docs as well. You can explore it directly in Swagger Editor

See the WhisperX Documentation for details on whisperX functions.

Language and Whisper model settings

in .env you can define default Language DEFAULT_LANG, if not defined en is used (you can also set it in the request)
.env contains definition of Whisper model using WHISPER_MODEL (you can also set it in the request)
.env contains definition of logging level using LOG_LEVEL, if not defined DEBUG is used in development and INFO in production
.env contains definition of environment using ENVIRONMENT, if not defined production is used
.env contains a boolean DEV to indicate if the environment is development, if not defined true is used
.env contains a boolean FILTER_WARNING to enable or disable filtering of specific warnings, if not defined true is used

Supported File Formats

Audio Files

.oga, .m4a, .aac, .wav, .amr, .wma, .awb, .mp3, .ogg

Video Files

.wmv, .mkv, .avi, .mov, .mp4

Available Services

Speech-to-Text (/speech-to-text)
- Upload audio/video files for transcription
- Supports multiple languages and Whisper models
Speech-to-Text URL (/speech-to-text-url)
- Transcribe audio/video from URLs
- Same features as direct upload
Individual Services:
- Transcribe (/service/transcribe): Convert speech to text
- Align (/service/align): Align transcript with audio
- Diarize (/service/diarize): Speaker diarization
- Combine (/service/combine): Merge transcript with diarization
Task Management:
- Get all tasks (/task/all)
- Get task status (/task/{identifier})
Health Check Endpoints:
- Basic health check (/health): Simple service status check
- Liveness probe (/health/live): Verifies if application is running
- Readiness probe (/health/ready): Checks if application is ready to accept requests (includes database connectivity check)

OpenAI-compatible audio endpoints

The API also exposes synchronous OpenAI Whisper-compatible endpoints:

POST /v1/audio/transcriptions
POST /v1/audio/translations

These endpoints accept the same multipart/form-data style requests expected by OpenAI SDK clients and return the transcript directly in the response body. Each request is also persisted as a task, so its result (or error) can be retrieved later via the /task endpoints.

model="whisper-1" maps to the local checkpoint configured by WHISPER_MODEL
Direct local Whisper checkpoint names such as tiny, base, large-v3, or distil-large-v3 are also accepted
response_format supports json, text, srt, verbose_json, and vtt
timestamp_granularities[]=word is supported with response_format=verbose_json on /v1/audio/transcriptions and triggers alignment for word timings

Example with the official OpenAI Python SDK:

from openai import OpenAI

client = OpenAI(
   api_key="not-used-but-required-by-some-clients",
   base_url="http://127.0.0.1:8000/v1",
)

with open("tests/test_files/audio_en.mp3", "rb") as audio_file:
   transcript = client.audio.transcriptions.create(
       model="whisper-1",
       file=audio_file,
       response_format="verbose_json",
       timestamp_granularities=["segment"],
   )

print(transcript.text)

Task management and result storage

flowchart TD
    Client(["Client / OpenAI SDK"])

    subgraph AsyncAPI ["Asynchronous API — background jobs"]
        STT["POST /speech-to-text<br/>POST /speech-to-text-url"]
        SVC["POST /service/transcribe<br/>POST /service/align<br/>POST /service/diarize<br/>POST /service/combine"]
        BG{{"Background task<br/>WhisperX pipeline"}}
        TASK["GET /task/all<br/>GET /task/{identifier}<br/>DELETE /task/{identifier}/delete"]
    end

    subgraph SyncAPI ["OpenAI-compatible API — synchronous"]
        OAI["POST /v1/audio/transcriptions<br/>POST /v1/audio/translations"]
    end

    subgraph SpeakerAPI ["Speaker management"]
        SPK["POST / GET / PUT / DELETE /speakers<br/>POST /speakers/search<br/>POST /speakers/identify"]
    end

    SEM(["GPU semaphore<br/>MAX_CONCURRENT_GPU_TASKS"])
    DB[("Database<br/>tasks, results and speaker embeddings")]

    Client -->|submit job| STT
    Client -->|submit job| SVC
    Client -->|poll / manage| TASK
    Client -->|request| OAI
    Client -->|manage speakers| SPK

    STT --> BG
    SVC --> BG
    BG -->|store status + result| DB
    OAI -->|store status + result| DB
    DB -->|read| TASK

    SPK -->|CRUD / search / identify| DB
    BG -. identify / auto-store speakers .-> DB

    BG -. acquire .-> SEM
    OAI -. acquire .-> SEM
    OAI -->|transcript in response| Client

The asynchronous endpoints enqueue a background WhisperX job, persist its status and result to the database, and let clients poll or manage it via the /task endpoints. The OpenAI-compatible endpoints run the pipeline synchronously and return the transcript directly in the response, while also persisting the task and its result to the same database — so completed (and failed) synchronous requests are queryable via the /task endpoints just like the asynchronous ones. Both paths share the same GPU semaphore (MAX_CONCURRENT_GPU_TASKS) to prevent out-of-memory errors.

The /speakers endpoints provide CRUD, similarity search, and identification over speaker embeddings persisted in the same database. Diarization tasks can optionally identify against, or auto-store into, these embeddings (identify_speakers / auto_store_speakers).

Task status and results are stored in a database via async SQLAlchemy. The DB connection is configured with DB_URL (default: sqlite:///records.db).

See SQLAlchemy Engine configuration for supported database URLs.

Async drivers are required — the application rewrites the URL scheme automatically:

`DB_URL` scheme	Async driver used
`sqlite://`	`aiosqlite` (included by default)
`postgresql://`	`asyncpg` (install with `--extra postgres`)

For PostgreSQL, install the driver extra: uv sync --no-dev --extra postgres. The Docker image includes it automatically.

Performance note: SQLite is suitable for development and low-concurrency use. For production or sustained concurrent load, use PostgreSQL — it sustains 350+ req/s at 200 concurrent users vs. ~15 req/s with SQLite. See Async SQLAlchemy concurrency guide for full load test results.

Database schema

Structure of the of the db is described in DB Schema

Compute Settings

Configure compute options in .env:

DEVICE: Device for inference (cuda or cpu, default: cuda)
COMPUTE_TYPE: Computation type (float16, float32, int8, default: float16)

Note: When using CPU, COMPUTE_TYPE must be set to int8

Available Models

WhisperX supports these model sizes:

tiny, tiny.en
base, base.en
small, small.en
medium, medium.en
large, large-v1, large-v2, large-v3, large-v3-turbo
Distilled models: distil-large-v2, distil-medium.en, distil-small.en, distil-large-v3
Custom models: nyrahealth/faster_CrisperWhisper

Set default model in .env using WHISPER_MODEL= (default: tiny)

API protection: upload cap, rate limiting, concurrency, and auth

The transcription endpoints (/speech-to-text, /service/*, /v1/audio/*) can be protected with optional, configurable safeguards. Every option is a no-op by default, so existing deployments are unaffected until they opt in.

Variable	Default	Effect
`MAX_UPLOAD_SIZE_MB`	`0`	Reject uploads larger than this many MB with HTTP 413, checked from `Content-Length` before the body is read. `0` = unlimited. (Requests without a `Content-Length`, e.g. chunked uploads, are not pre-checked — see the note below.)
`MAX_QUEUED_GPU_REQUESTS`	`0`	Cap on concurrent in-flight transcription requests admitted across the API, returning HTTP 503 (with `Retry-After`) when exceeded. `0` = unlimited. Use `>= 2` for an exact split; `1` admits up to 2 (one per path).
`SYNC_GPU_QUOTA_FRACTION`	`0.5`	Fraction of `MAX_QUEUED_GPU_REQUESTS` reserved for the synchronous (`/v1/audio/*`) path; the async path gets the remainder. For `total >= 2` the split is exact and each path keeps at least one slot.
`RATE_LIMIT__ENABLED`	`false`	Enable per-caller rate limiting (slowapi). Returns HTTP 429 with `Retry-After`.
`RATE_LIMIT__REQUESTS_PER_MINUTE`	`60`	Sustained per-caller budget per minute.
`RATE_LIMIT__BURST`	`10`	Short-term per-caller burst budget per second.
`RATE_LIMIT__KEY_STRATEGY`	`ip`	How callers are identified: `ip` or `bearer_token`.
`AUTH__ENABLED`	`false`	Require a shared bearer token on protected endpoints (HTTP 401 otherwise).
`AUTH__BEARER_TOKEN`	(empty)	The shared token. Required when `AUTH__ENABLED=true`.

Example .env snippet enabling all of them:

MAX_UPLOAD_SIZE_MB=25
MAX_QUEUED_GPU_REQUESTS=20
SYNC_GPU_QUOTA_FRACTION=0.5
RATE_LIMIT__ENABLED=true
RATE_LIMIT__REQUESTS_PER_MINUTE=60
RATE_LIMIT__BURST=10
RATE_LIMIT__KEY_STRATEGY=ip
AUTH__ENABLED=true
AUTH__BEARER_TOKEN=replace-with-a-long-random-secret

Notes:

Values are read from the environment at startup and are fixed for the life of the process; changing them requires a restart.

Rate-limit and concurrency state are kept in-process. Running multiple workers (uvicorn --workers >1) gives each worker its own budget, so these limits — like the GPU semaphore — assume a single worker process.

The upload cap is enforced from the Content-Length header. Uploads sent without one (chunked transfer encoding) are not rejected up front and are bounded only by available memory/disk; the common multipart upload path always sends Content-Length.

For async endpoints (/speech-to-text, /service/*), the concurrency gate is admission control for the request phase (validation, audio decode, enqueue). The GPU pipeline runs in a background task bounded by MAX_CONCURRENT_GPU_TASKS, not by this gate.

System Requirements

NVIDIA GPU with CUDA 12.8+ support
At least 8GB RAM (16GB+ recommended for large models)
Storage space for models (varies by model size):
- tiny/base: ~1GB
- small: ~2GB
- medium: ~5GB
- large: ~10GB

Getting Started

Local Run

To get started with the API, follow these steps:

Install uv package manager

Create virtual environment and install dependencies:

# For production dependencies only
uv sync --no-dev

# For development (includes testing, linting, async SQLite driver)
uv sync --all-extras

Configure your environment (see .env file setup below)

Note: This project uses uv for dependency management with platform-specific PyTorch configuration (CUDA 12.8 on Linux, CPU-only on macOS/Windows). All dependencies are defined in pyproject.toml.

Logging Configuration

The application uses two logging configuration files:

uvicorn_log_conf.yaml: Used by Uvicorn for logging configuration.
gunicorn_logging.conf: Used by Gunicorn for logging configuration (located in the root of the app directory).

Ensure these files are correctly configured and placed in the app directory.

Create .env file

define your Whisper Model and token for Huggingface

HF_TOKEN=<<YOUR HUGGINGFACE TOKEN>>
WHISPER_MODEL=<<WHISPER MODEL SIZE>>
LOG_LEVEL=<<LOG LEVEL>>

Run the FastAPI application:

uvicorn app.main:app --reload --log-config uvicorn_log_conf.yaml --log-level $LOG_LEVEL

The API will be accessible at http://127.0.0.1:8000.

Docker Build

Create .env file

define your Whisper Model and token for Huggingface

HF_TOKEN=<<YOUR HUGGINGFACE TOKEN>>
WHISPER_MODEL=<<WHISPER MODEL SIZE>>
LOG_LEVEL=<<LOG LEVEL>>

Build Image

using docker-compose.yaml

#build and start the image using compose file
docker-compose up

alternative approach

#build image
docker build -t whisperx-service .

# Run Container
docker run -d --gpus all -p 8000:8000 --env-file .env whisperx-service

The API will be accessible at http://127.0.0.1:8000.

Note: The Docker build uses uv for installing dependencies, as specified in the Dockerfile. The main entrypoint for the Docker container is via Gunicorn (not Uvicorn directly), using the configuration in app/gunicorn_logging.conf.

Important: For GPU support in Docker, you must have CUDA drivers 12.8+ installed on your host system.

Model cache

The models used by whisperX are stored in root/.cache, if you want to avoid downloanding the models each time the container is starting you can store the cache in persistent storage. docker-compose.yaml defines a volume whisperx-models-cache to store this cache.

faster-whisper cache: root/.cache/huggingface/hub
pyannotate and other models cache: root/.cache/torch

Troubleshooting

Common Issues

Environment Variables Not Loaded
- Ensure your .env file is correctly formatted and placed in the root directory.
- Verify that all required environment variables are defined.
Database Connection Issues
- Check the DB_URL environment variable for correctness.
- Ensure the database server is running and accessible.
- PostgreSQL driver: when using DB_URL=postgresql://... outside Docker, install the driver with uv sync --extra postgres.
- Async driver mismatch: if you set a DB_URL with a sync scheme (e.g. postgresql+psycopg2://), the app will fail to start. Use the plain scheme (postgresql://) and let the app rewrite it to postgresql+asyncpg:// automatically.
Model Download Failures
- Verify your internet connection.
- Ensure the HF_TOKEN is correctly set in the .env file.
GPU Not Detected
- Ensure NVIDIA drivers and CUDA are correctly installed.
- Verify that Docker is configured to use the GPU (nvidia-docker).
Warnings Not Filtered
- Ensure the FILTER_WARNING environment variable is set to true in the .env file.

Logs and Debugging

Check the logs for detailed error messages.
Use the LOG_LEVEL environment variable to set the appropriate logging level (DEBUG, INFO, WARNING, ERROR).

Monitoring and Health Checks

The API provides built-in health check endpoints that can be used for monitoring and orchestration:

Basic Health Check (/health)
- Returns a simple status check with HTTP 200 if the service is running
- Useful for basic availability monitoring
Liveness Probe (/health/live)
- Includes a timestamp with status information
- Designed for Kubernetes liveness probes or similar orchestration systems
- Returns HTTP 200 if the application is running
Readiness Probe (/health/ready)
- Tests if the application is fully ready to accept requests
- Checks connectivity to the database
- Returns HTTP 200 if all dependencies are available
- Returns HTTP 503 if there's an issue with dependencies (e.g., database connection)

Support

For further assistance, please open an issue on the GitHub repository.

Name		Name	Last commit message	Last commit date
Latest commit History 1,091 Commits
.claude		.claude
.devcontainer		.devcontainer
.github		.github
.vscode		.vscode
app		app
docs		docs
scripts		scripts
taskfiles		taskfiles
tests		tests
.coderabbit.yaml		.coderabbit.yaml
.dockerignore		.dockerignore
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitleaks.toml		.gitleaks.toml
.markdownlint.json		.markdownlint.json
.markdownlintignore		.markdownlintignore
.mcp.json		.mcp.json
.pre-commit-config.yaml		.pre-commit-config.yaml
.release-please-manifest.json		.release-please-manifest.json
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
Taskfile.yml		Taskfile.yml
coverage.xml		coverage.xml
docker-compose.yml		docker-compose.yml
dockerfile		dockerfile
pyproject.toml		pyproject.toml
pytest-report.xml		pytest-report.xml
release-please-config.json		release-please-config.json
renovate.json		renovate.json
sonar-project.properties		sonar-project.properties
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

whisperX REST API

Documentation

Language and Whisper model settings

Supported File Formats

Audio Files

Video Files

Available Services

OpenAI-compatible audio endpoints

Task management and result storage

Database schema

Compute Settings

Available Models

API protection: upload cap, rate limiting, concurrency, and auth

System Requirements

Getting Started

Local Run

Logging Configuration

Docker Build

Model cache

Troubleshooting

Common Issues

Logs and Debugging

Monitoring and Health Checks

Support

Related

About

Uh oh!

Releases 18

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

whisperX REST API

Documentation

Language and Whisper model settings

Supported File Formats

Audio Files

Video Files

Available Services

OpenAI-compatible audio endpoints

Task management and result storage

Database schema

Compute Settings

Available Models

API protection: upload cap, rate limiting, concurrency, and auth

System Requirements

Getting Started

Local Run

Logging Configuration

Docker Build

Model cache

Troubleshooting

Common Issues

Logs and Debugging

Monitoring and Health Checks

Support

Related

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 18

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages