Skip to content

facuzarate04/media-ms

Repository files navigation

Audio Intelligence Service

Microservice for audio ingestion, metadata extraction, and AI enrichment. Receives audio assets, validates them, extracts technical metadata, and orchestrates AI capabilities like transcription. Exposes job status and results through a stable API.

Highlights

  • Upload by remote URL or presigned S3 flow
  • Async job processing with BullMQ and Redis
  • Metadata extraction separated from AI enrichment
  • Partial success model for resilient processing
  • Clean architecture layering across domain, application, infrastructure, and HTTP
  • Test suite with integration and service-level coverage

Architecture

Client → HTTP API (Express) → AudioService (fast path: validate + enqueue)
                                         ↓
                                   BullMQ Queue (Redis)
                                         ↓
                                   AudioJobWorker → S3 + metadata + transcription
                                         ↓
                                   MongoDB (job state + results)
                                         ↓
Client ← HTTP API (polling) ← AudioService (read path)

Layers:

Layer Directory Responsibility
Domain src/domain/ Pure interfaces, types, port definitions. Zero external dependencies.
Application src/application/ Use case orchestration. Depends only on domain ports.
Infrastructure src/infrastructure/ MongoDB repos, S3 storage, BullMQ queue/worker, OpenAI adapter.
HTTP src/http/ Express controllers, validators, routes.

API

All audio endpoints live under /v1/audio.

Job creation

POST /v1/audio/jobs
{ "sourceUrl": "https://...", "capabilities": ["transcription"] }
→ 202 { "data": { "_id": "...", "status": "pending", ... } }

POST /v1/audio/jobs/upload-url
{ "filename": "episode.mp3", "contentType": "audio/mpeg", "capabilities": ["transcription"] }
→ 201 { "data": { "uploadUrl": "...", "storageKey": "...", "jobId": "..." } }

POST /v1/audio/jobs/:id/confirm
→ 202 { "data": { "_id": "...", "status": "pending", ... } }

POST /v1/audio/jobs/batch
{ "items": [{ "sourceUrl": "...", "capabilities": ["transcription"] }, ...] }
→ 202 { "data": { "batchId": "...", "jobs": [...] } }

Query

GET /v1/audio/jobs/:id           → 200 job object
GET /v1/audio/jobs/:id/result    → 200 metadata + enrichment status
GET /v1/audio/transcripts/:jobId → 200 transcript

System

GET /health → 200 (liveness)
GET /ready  → 200/503 (readiness, checks MongoDB)

Auth

When API_KEY is configured, requests to /v1/* must include:

Authorization: Bearer <API_KEY>

Job lifecycle

Job status:    pending → processing → ready | failed
Enrichment:    pending → processing → completed | failed
  • ready means metadata extracted successfully. Capabilities may still be pending or failed.
  • failed only happens when the file can't be downloaded or metadata can't be extracted.
  • Capability failures (e.g. transcription) do NOT fail the job — partial success model.

Setup

Prerequisites

  • Node.js 20+
  • MongoDB 7+
  • Redis 7+

Environment variables

# Required
MONGO_URL=mongodb://127.0.0.1:27017/media
S3_REGION=us-east-1
S3_BUCKET=your-bucket
S3_ACCESS_KEY_ID=your-key
S3_SECRET_ACCESS_KEY=your-secret

# Optional
APP_PORT=3000
API_KEY=your-api-key
REDIS_HOST=127.0.0.1
REDIS_PORT=6379
QUEUE_NAME=audio-processing
QUEUE_CONCURRENCY=3
QUEUE_MAX_RETRIES=3
QUEUE_RETRY_DELAY=5000
PRESIGNED_URL_EXPIRATION=3600
MAX_TRANSCRIPTION_FILE_SIZE=26214400
OPENAI_API_KEY=sk-...
OPENAI_WHISPER_MODEL=whisper-1

Example request

curl -X POST http://localhost:3000/v1/audio/jobs \
  -H 'content-type: application/json' \
  -H 'authorization: Bearer your-api-key' \
  -d '{
    "sourceUrl": "https://example.com/audio/episode.mp3",
    "capabilities": ["transcription"]
  }'

Development

# Start MongoDB + Redis
docker compose -f docker-compose.dev.yaml up -d

# Install dependencies
npm install

# Run in dev mode
npm run dev

Production

npm run build
npm start

Docker

docker compose up

Tests

npm test

Project structure

src/
  domain/           # Core entities and ports
  application/      # Use-case orchestration
  infrastructure/   # MongoDB, S3, BullMQ, OpenAI adapters
  http/             # Express routes, controllers, validators
  providers/        # Capability registry
  shared/           # Middleware, errors, health checks

Provider strategy

AI providers are behind capability-oriented interfaces. The application layer depends on abstract ports, not vendor SDKs.

Currently implemented:

  • Transcription: OpenAI Whisper API (whisper-1)

Planned:

  • Summarization
  • Topic/keyword extraction
  • Language detection
  • Embeddings

To add a new provider, implement the corresponding interface in src/infrastructure/ and register it in src/server.ts.

Tech stack

  • Runtime: Node.js 20, TypeScript
  • Framework: Express
  • Database: MongoDB (Mongoose)
  • Queue: BullMQ + Redis
  • Storage: AWS S3 (presigned URLs)
  • AI: OpenAI Whisper API
  • Validation: Zod
  • Tests: Vitest + Supertest

About

Audio intelligence microservice for ingestion, metadata extraction, async processing, and AI transcription.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors