Transcribo Backend is a powerful Python FastAPI service that provides advanced audio and video transcription capabilities with speaker diarization and AI-powered text summarization. This backend service enables high-quality transcription using OpenAI's Whisper API and intelligent summarization using large language models.
DCC Documentation & Guidelines | DCC Website
- Audio & Video Transcription: High-quality transcription of audio and video files using OpenAI's Whisper API
- Speaker Diarization: Identify and separate different speakers in recordings
- Language Detection: Automatic language detection or specify the source language
- AI Summarization: Generate intelligent summaries of transcribed text using LLMs
- Asynchronous Processing: Task-based processing with status tracking for long-running transcriptions
- Multi-format Support: Handle various audio formats (MP3, WAV, etc.) and video files
- Audio Conversion: Automatic conversion to MP3 format for optimal processing
- Privacy-Focused: Pseudonymized user tracking for usage analytics
- Framework: FastAPI with Python 3.12+
- Package Manager: uv
- Transcription: OpenAI Whisper API integration
- AI Models: LLM integration for text summarization
- Audio Processing: Audio format conversion with audioop
- Logging: Structured logging with structlog
- Containerization: Docker and Docker Compose
- Python 3.12+
- uv package manager
- Docker and Docker Compose (for containerized deployment)
- Access to OpenAI Whisper API or compatible service
- LLM API access for summarization features
Create a .env file in the project root with the required environment variables:
# Whisper API Configuration
WHISPER_API=http://localhost:8001
WHISPER_API_KEY=your_whisper_api_key_here
# LLM API Configuration
LLM_API=http://localhost:8002
LLM_API_KEY=your_llm_api_key_here
# Security
HMAC_SECRET=your_secret_key_here
# Client Configuration (optional)
CLIENT_PORT=3000
CLIENT_URL=http://localhost:${CLIENT_PORT}Note: Configure the Whisper API and LLM API endpoints to match your deployment setup.
Install dependencies using uv:
make installThis will:
- Create a virtual environment using uv
- Install all dependencies
- Install pre-commit hooks
uv run fastapi dev ./src/transcribo_backend/app.pyOr use the provided task:
make devRun code quality checks:
# Run all quality checks
make check
# Format code with ruff
uv run ruff format .
# Run linting
uv run ruff check .
# Run type checking
uv run pyrefly checkRun the production server:
make runThe application includes a Dockerfile and Docker Compose configuration for easy deployment:
# Start all services with Docker Compose
docker compose up -d
# Build and start all services
docker compose up --build -d
# View logs
docker compose logs -f# Build the Docker image
docker build -t transcribo-backend .
# Run the container
docker run --rm --env-file .env -p 8000:8000 transcribo-backendRun tests with pytest:
# Run tests
make test
# Run tests with pytest directly
uv run pytest-
POST
/transcribe: Submit an audio or video file for transcription- Parameters:
audio_file: The audio/video file to transcribenum_speakers(optional): Number of speakers for diarizationlanguage(optional): Source language code
- Returns: Task status with task ID for tracking
- Parameters:
-
GET
/task/{task_id}/status: Get the status of a transcription task- Returns: Current task status (pending, processing, completed, failed)
-
GET
/task/{task_id}/result: Get the transcription result- Returns: Transcription response with text and metadata
- POST
/summarize: Generate an AI summary of transcribed text- Body:
SummaryRequestwith transcript text - Returns: Generated summary
- Body:
- GET
/health/liveness: Liveness probe for Kubernetes deployments- Returns: Application status and uptime
src/transcribo_backend/
├── app.py # FastAPI application entry point
├── config.py # Configuration management
├── helpers/ # Helper utilities
│ └── file_type.py # File type detection
├── models/ # Data models and schemas
│ ├── progress.py # Progress tracking models
│ ├── response_format.py # Response format definitions
│ ├── summary.py # Summary models
│ ├── task_status.py # Task status models
│ └── transcription_response.py # Transcription response models
├── services/ # Business logic services
│ ├── audio_converter.py # Audio format conversion
│ ├── summary_service.py # Text summarization service
│ └── whisper_service.py # Whisper API integration
└── utils/ # Utility functions
├── logger.py # Logging configuration
└── usage_tracking.py # Privacy-focused usage analytics
This application is based on Transcribo from the Statistical Office of the Canton of Zurich. We have rewritten the functionality of the original application to fit into a modular and modern web application that separates frontend, backend and AI models.
MIT © Data Competence Center Basel-Stadt
Datenwissenschaften und KI Developed with ❤️ by DCC - Data Competence Center
