Skip to content

DCC-BS/transcribo-backend

Repository files navigation

Transcribo Backend

Transcribo Backend is a powerful Python FastAPI service that provides advanced audio and video transcription capabilities with speaker diarization and AI-powered text summarization. This backend service enables high-quality transcription using OpenAI's Whisper API and intelligent summarization using large language models.

Ruff License


DCC Documentation & Guidelines | DCC Website


Features

  • Audio & Video Transcription: High-quality transcription of audio and video files using OpenAI's Whisper API
  • Speaker Diarization: Identify and separate different speakers in recordings
  • Language Detection: Automatic language detection or specify the source language
  • AI Summarization: Generate intelligent summaries of transcribed text using LLMs
  • Asynchronous Processing: Task-based processing with status tracking for long-running transcriptions
  • Multi-format Support: Handle various audio formats (MP3, WAV, etc.) and video files
  • Audio Conversion: Automatic conversion to MP3 format for optimal processing
  • Privacy-Focused: Pseudonymized user tracking for usage analytics

Technology Stack

  • Framework: FastAPI with Python 3.12+
  • Package Manager: uv
  • Transcription: OpenAI Whisper API integration
  • AI Models: LLM integration for text summarization
  • Audio Processing: Audio format conversion with audioop
  • Logging: Structured logging with structlog
  • Containerization: Docker and Docker Compose

Setup

Prerequisites

  • Python 3.12+
  • uv package manager
  • Docker and Docker Compose (for containerized deployment)
  • Access to OpenAI Whisper API or compatible service
  • LLM API access for summarization features

Environment Configuration

Create a .env file in the project root with the required environment variables:

# Whisper API Configuration
WHISPER_API=http://localhost:8001
WHISPER_API_KEY=your_whisper_api_key_here

# LLM API Configuration
LLM_API=http://localhost:8002
LLM_API_KEY=your_llm_api_key_here

# Security
HMAC_SECRET=your_secret_key_here

# Client Configuration (optional)
CLIENT_PORT=3000
CLIENT_URL=http://localhost:${CLIENT_PORT}

Note: Configure the Whisper API and LLM API endpoints to match your deployment setup.

Install Dependencies

Install dependencies using uv:

make install

This will:

  • Create a virtual environment using uv
  • Install all dependencies
  • Install pre-commit hooks

Development

Start the Development Server

uv run fastapi dev ./src/transcribo_backend/app.py

Or use the provided task:

make dev

Code Quality Tools

Run code quality checks:

# Run all quality checks
make check

# Format code with ruff
uv run ruff format .

# Run linting
uv run ruff check .

# Run type checking
uv run pyrefly check

Production

Run the production server:

make run

Docker Deployment

The application includes a Dockerfile and Docker Compose configuration for easy deployment:

Using Docker Compose

# Start all services with Docker Compose
docker compose up -d

# Build and start all services
docker compose up --build -d

# View logs
docker compose logs -f

Using Dockerfile Only

# Build the Docker image
docker build -t transcribo-backend .

# Run the container
docker run --rm --env-file .env -p 8000:8000 transcribo-backend

Testing & Development Tools

Run tests with pytest:

# Run tests
make test

# Run tests with pytest directly
uv run pytest

API Endpoints

Transcription

  • POST /transcribe: Submit an audio or video file for transcription

    • Parameters:
      • audio_file: The audio/video file to transcribe
      • num_speakers (optional): Number of speakers for diarization
      • language (optional): Source language code
    • Returns: Task status with task ID for tracking
  • GET /task/{task_id}/status: Get the status of a transcription task

    • Returns: Current task status (pending, processing, completed, failed)
  • GET /task/{task_id}/result: Get the transcription result

    • Returns: Transcription response with text and metadata

Summarization

  • POST /summarize: Generate an AI summary of transcribed text
    • Body: SummaryRequest with transcript text
    • Returns: Generated summary

Health Checks

  • GET /health/liveness: Liveness probe for Kubernetes deployments
    • Returns: Application status and uptime

Project Architecture

src/transcribo_backend/
├── app.py                      # FastAPI application entry point
├── config.py                   # Configuration management
├── helpers/                    # Helper utilities
│   └── file_type.py           # File type detection
├── models/                     # Data models and schemas
│   ├── progress.py            # Progress tracking models
│   ├── response_format.py     # Response format definitions
│   ├── summary.py             # Summary models
│   ├── task_status.py         # Task status models
│   └── transcription_response.py  # Transcription response models
├── services/                   # Business logic services
│   ├── audio_converter.py     # Audio format conversion
│   ├── summary_service.py     # Text summarization service
│   └── whisper_service.py     # Whisper API integration
└── utils/                      # Utility functions
    ├── logger.py              # Logging configuration
    └── usage_tracking.py      # Privacy-focused usage analytics

Acknowledgments

This application is based on Transcribo from the Statistical Office of the Canton of Zurich. We have rewritten the functionality of the original application to fit into a modular and modern web application that separates frontend, backend and AI models.

License

MIT © Data Competence Center Basel-Stadt


DCC Logo

Datenwissenschaften und KI Developed with ❤️ by DCC - Data Competence Center

About

FastAPI backend to transcribe audios and videos with editor, export and summary functionallity. Inspired by machinelearningZH

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors