OpenAI-compatible Text-to-Speech API
Features β’ Quick Start β’ API Reference β’ Deployment β’ Configuration
- π OpenAI-Compatible API - Drop-in replacement for OpenAI's TTS API
- β‘ High Performance - Optimized with NumPy vectorization and async processing
- π΅ Multiple Formats - Support for MP3, WAV, FLAC, Opus, AAC, and PCM
- π£οΈ Multiple Voices - OpenAI voice names mapped to native Supertonic styles
- π API Key Authentication - Secure access with usage tracking
- π³ Docker Ready - Production-ready containerization with nginx
- π GPU Acceleration - Support for CUDA, CoreML, and Metal backends
- π Smart Text Processing - Automatic text normalization and chunking
- π Multi-Version Support - Supports both Supertonic v1 and v2 models
- Python 3.10+
- ONNX Runtime (CPU/CUDA/CoreML)
- Supertonic TTS library
# Clone the repository
git clone https://github.com/yourusername/supertonic-fastapi.git
cd supertonic-fastapi
# Start with Docker Compose
docker-compose up -d
# API will be available at http://localhost:8800# Clone and install
git clone https://github.com/yourusername/supertonic-fastapi.git
cd supertonic-fastapi
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Run the server
python -m app.main# Generate speech
curl -X POST "http://localhost:8800/v1/audio/speech" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"input": "Hello, this is a test of the Supertonic TTS API!",
"voice": "alloy",
"response_format": "mp3"
}' \
--output speech.mp3POST /v1/audio/speech
curl -X POST "http://localhost:8800/v1/audio/speech" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "tts-1",
"input": "Your text here...",
"voice": "alloy",
"response_format": "mp3",
"speed": 1.0
}' \
--output output.mp3| Parameter | Type | Default | Description |
|---|---|---|---|
model |
string | tts-1 |
TTS model (tts-1, tts-1-hd, tts-2, tts-2-hd, supertonic, supertonic-v2) |
input |
string | required | Text to convert (max 4096 chars) |
voice |
string | alloy |
Voice: alloy, echo, fable, onyx, nova, shimmer |
response_format |
string | mp3 |
Output format: mp3, opus, aac, flac, wav, pcm |
speed |
float | 1.0 |
Speed multiplier (0.25 to 4.0) |
normalize |
boolean | true |
Pre-normalize text for better synthesis |
GET /v1/models
curl "http://localhost:8800/v1/models"GET /voices
curl "http://localhost:8800/voices" \
-H "Authorization: Bearer YOUR_API_KEY"GET /health
curl "http://localhost:8800/health"| OpenAI Voice | Description |
|---|---|
alloy |
Neutral, balanced voice |
echo |
Warm, conversational voice |
fable |
Expressive, storytelling voice |
onyx |
Deep, authoritative voice |
nova |
Friendly, upbeat voice |
shimmer |
Soft, gentle voice |
Environment variables can be set in .env file:
# Server
HOST=0.0.0.0
PORT=8800
LOG_LEVEL=INFO
# Model Performance
MODEL_THREADS=12
MODEL_INTER_THREADS=12
MAX_WORKERS=8
# GPU Acceleration
FORCE_PROVIDERS=auto # auto, cuda, coreml, metal, cpu
# Audio Settings
SAMPLE_RATE=44100
gap_trim_ms=100
# Model Version (v1 or v2)
DEFAULT_MODEL_VERSION=v1Set FORCE_PROVIDERS based on your hardware:
| Value | Description |
|---|---|
auto |
Auto-detect best available provider |
cuda |
NVIDIA GPU acceleration |
coreml |
Apple CoreML (M-series chips) |
metal |
Apple Metal (maps to CoreML) |
cpu |
CPU only |
# docker-compose.yml
version: "3.8"
services:
tts-api:
build: .
ports:
- "8800:8800"
environment:
- FORCE_PROVIDERS=auto
volumes:
- ./data:/app/dataapiVersion: apps/v1
kind: Deployment
metadata:
name: supertonic-tts
spec:
replicas: 2
selector:
matchLabels:
app: supertonic-tts
template:
metadata:
labels:
app: supertonic-tts
spec:
containers:
- name: tts-api
image: supertonic-tts:latest
ports:
- containerPort: 8800
resources:
limits:
nvidia.com/gpu: 1 # Optional GPUOptimized for high-throughput production workloads:
- NumPy Vectorization - Audio processing uses vectorized operations for 10x faster silence detection
- Pre-compiled Regex - Text normalization patterns compiled at startup
- Async Processing - Non-blocking I/O for concurrent requests
- Connection Pooling - Efficient database connections with Tortoise ORM
- Semaphore Limits - Configurable concurrency control
# Install dev dependencies
pip install -r requirements.txt
# Run in development mode
uvicorn app.main:app --reload --port 8800
# Run tests
python -m pytest tests/supertonic-fastapi/
βββ app/
β βββ api/
β β βββ routes.py # API endpoints
β β βββ schemas.py # Pydantic models
β β βββ deps.py # Dependencies
β β βββ auth/ # Authentication
β βββ core/
β β βββ config.py # Configuration
β β βββ database.py # Database setup
β β βββ voices.py # Voice mappings
β βββ services/
β β βββ tts.py # TTS service
β β βββ audio.py # Audio processing
β β βββ streaming_audio_writer.py # Format encoding
β βββ utils/
β β βββ text.py # Text processing
β βββ inference/
β β βββ base.py # Data models
β βββ main.py # FastAPI app
βββ tests/
βββ Dockerfile
βββ docker-compose.yml
βββ requirements.txt
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Supertonic - TTS engine
- FastAPI - Web framework
- PyAV - Audio encoding
Made with β€οΈ by the community