Skip to content

WXYC/library-metadata-lookup

Repository files navigation

Library Metadata Lookup

CI

A FastAPI service for WXYC radio that searches the library catalog and cross-references results with Discogs metadata.

Dependents

These projects depend on library-metadata-lookup:

Project Relationship
request-o-matic Primary consumer. Calls POST /api/v1/lookup to resolve song requests before posting to Slack.
Backend-Service Calls this service for library search and Discogs metadata.
dj-site Indirect via Backend-Service. DJ flowsheet and card catalog frontend.
wxyc-ios-64 Indirect via Backend-Service. iOS/macOS/tvOS/watchOS app.
WXYC-Android Indirect via Backend-Service. Android app.
discogs-cache Shared data. Its ETL pipeline produces library.db (consumed at runtime by this service) and populates the PostgreSQL Discogs cache (queried by discogs/cache_service.py).

Dependencies

Package Purpose
wxyc-etl Shared Rust library (PyO3 bindings) for artist name normalization, diacritics stripping, compilation artist detection, and library.db schema validation.

What it does

Given an artist, song, and/or album, the service:

  1. Corrects artist spelling via fuzzy matching against the library catalog
  2. Resolves album names from Discogs when only a song title is provided
  3. Searches the library catalog with a multi-strategy fallback chain
  4. Validates fallback results against Discogs tracklists
  5. Fetches album artwork from Discogs
  6. Returns enriched results with metadata

API

POST /api/v1/lookup

Primary endpoint. Accepts a parsed request and returns library results with artwork.

{
  "artist": "Stereolab",
  "song": "Percolator",
  "raw_message": "Play Percolator by Stereolab"
}

Response:

{
  "results": [
    {
      "library_item": {
        "id": 10,
        "artist": "Stereolab",
        "title": "Emperor Tomato Ketchup",
        "call_letters": "S",
        "artist_call_number": 1,
        "release_call_number": 1,
        "genre": "Rock",
        "format": "CD"
      },
      "artwork": {
        "album": "Emperor Tomato Ketchup",
        "artist": "Stereolab",
        "release_id": 123456,
        "release_url": "https://www.discogs.com/release/123456",
        "artwork_url": "https://img.discogs.com/...",
        "confidence": 0.95
      }
    }
  ],
  "search_type": "direct",
  "song_not_found": false,
  "found_on_compilation": false,
  "context_message": null,
  "corrected_artist": null,
  "cache_stats": null
}

GET /api/v1/library/search

Direct library catalog search.

POST /api/v1/discogs/search

Search Discogs releases.

GET /api/v1/discogs/track-releases

Find all releases containing a specific track.

GET /api/v1/discogs/release/{release_id}

Get full release metadata from Discogs.

POST /api/v1/releases/resolve

Resolve a Discogs release URL or Bandcamp album URL (or an explicit (source, id) pair) to canonical release metadata, cross-source identifiers, and streaming availability — the prefill payload for tubafrenzy's rotation-release create form.

Request: {"url": "https://www.discogs.com/release/12345"} or {"url": "https://artist.bandcamp.com/album/slug"} or {"source": "discogs_release", "id": "12345"}.

Response includes canonical (artist, title, label, catno, year), identifiers (cross-source IDs learned from Discogs + the streaming check), streaming (per-service availability), and warnings[] for non-fatal issues. Always returns 200 — the form falls back to manual entry on partial prefill.

POST /admin/upload-library-db

Upload a new library.db file. Requires Authorization: Bearer <ADMIN_TOKEN> header. The file is validated (must be a SQLite database with a library table), then atomically replaces the current database. Returns {"status": "ok", "row_count": <int>, "timestamp": "<ISO8601>"}.

GET /health

Health check with real connectivity probes for the database, Discogs API, and Discogs cache.

Search Strategy Pipeline

The lookup orchestrator tries strategies in order until results are found:

Strategy Condition What it does
ARTIST_PLUS_ALBUM Has artist, album, or song Search by artist + album(s) from Discogs, fall back to artist + song, then artist only
SWAPPED_INTERPRETATION No results + ambiguous "X - Y" format Try both "X as artist, Y as title" and vice versa
TRACK_ON_COMPILATION Song not found + has artist and song Cross-reference Discogs track listings with library to find compilations
SONG_AS_ARTIST No results + song parsed but no artist Try the parsed song title as an artist name

After the pipeline, if results came from an artist-only fallback (song_not_found=True), each album is validated against Discogs tracklists to filter to only albums containing the requested track.

Project Structure

library-metadata-lookup/
  main.py                      # FastAPI app entry point
  config/settings.py           # Environment-based configuration
  core/
    dependencies.py            # DI for LibraryDB + DiscogsService
    search.py                  # Search strategy pattern + ambiguous format detection
    telemetry.py               # PostHog telemetry
    logging.py, sentry.py, exceptions.py
  discogs/
    service.py                 # Discogs API client with caching + confidence scoring
    cache_service.py           # PostgreSQL cache (asyncpg + pg_trgm)
    matching.py                # Discogs-specific normalization (suffix stripping, track comparison)
    memory_cache.py            # In-memory TTL cache
    lookup.py                  # Track/artist lookup helpers
    models.py, ratelimit.py, router.py
  library/
    db.py                      # SQLite FTS5 + fuzzy fallback search
    models.py, router.py
  generated/
    api_models.py              # Pydantic v2 models from wxyc-shared/api.yaml
  lookup/
    orchestrator.py            # Core search pipeline
    models.py                  # Re-exports generated API contract models
    router.py                  # POST /lookup endpoint
  routers/
    admin.py                   # POST /admin/upload-library-db
    health.py                  # GET /health
  scripts/
    benchmark_cache.py         # Discogs PG cache vs API benchmarks
    generate_api_models.sh     # Generate Python models from api.yaml
  services/
    parser.py                  # Minimal ParsedRequest model (no Groq)
  tests/
    factories.py               # Shared test factories (make_library_item, make_discogs_result)
    unit/                      # 472 mocked unit tests (97% source coverage)
    integration/               # 48 integration tests with real SQLite/FTS5

Development

Prerequisites

  • Python 3.12+
  • uv for dependency management

Running locally

uvicorn main:app --reload

Running tests

The repo follows the architecture-A marker scheme (see the WXYC test-patterns guide, Section 3): markers route CI by infrastructure, not by tier. Tier directories (tests/unit/, tests/integration/, tests/e2e/) survive for documentation only.

# Default: every unmarked test (unit-equivalent + the in-memory-SQLite/mocked
# integration + e2e tests). Excludes pg + external_api per pyproject addopts.
uv run pytest -v

# PG-backed tests (entity store CRUD, etc.). Needs DATABASE_URL_TEST.
uv run pytest -v -m pg

# Real-Discogs-API tests. Needs DISCOGS_TOKEN.
uv run pytest -v -m external_api

Environment Variables

Required:

  • DISCOGS_TOKEN -- Discogs API token for artwork and track lookups

Optional:

  • DATABASE_URL_DISCOGS -- PostgreSQL URL for Discogs cache (e.g. postgresql://user:pass@host:5432/discogs)
  • SENTRY_DSN -- Sentry error tracking
  • POSTHOG_API_KEY -- PostHog telemetry
  • LIBRARY_DB_PATH -- Path to SQLite library database (default: library.db)
  • ADMIN_TOKEN -- Bearer token for admin endpoints (library.db upload)
  • STREAMING_WEBHOOK_URLS -- Comma-separated URLs to POST streaming status changes after library.db upload
  • ETL_NOTIFY_KEY -- Bearer token used by LML when pushing the streaming-status webhook to tubafrenzy
  • LML_API_KEY -- Bearer token required from tubafrenzy / Backend-Service callers on protected endpoints (see "Inbound auth" below)
  • LML_REQUIRE_AUTH -- When true, enforce LML_API_KEY on protected endpoints. Defaults to false so the dep can be deployed before consumers are updated; flip after all callers send the bearer header.
  • LOG_LEVEL -- Logging level (default: INFO)

Inbound auth (LML_API_KEY)

Tubafrenzy and Backend-Service call LML for streaming checks, library search, autocomplete, and Discogs lookups. When LML_REQUIRE_AUTH=true, those callers must send Authorization: Bearer <LML_API_KEY> on every request to:

  • POST /api/v1/streaming-check
  • POST /api/v1/lookup
  • GET /api/v1/library/search
  • GET /api/v1/discogs/... (all five endpoints)

/health, /admin/* (uses its own ADMIN_TOKEN), and /identity/* are not gated by LML_API_KEY.

If LML_REQUIRE_AUTH=true and LML_API_KEY is unset, protected endpoints return 500 (fail loudly rather than silently accepting all requests).

Discogs cache TTL settings

  • DISCOGS_TRACK_CACHE_TTL -- In-memory track cache TTL in seconds (default: 3600)
  • DISCOGS_RELEASE_CACHE_TTL -- In-memory release cache TTL (default: 14400)
  • DISCOGS_SEARCH_CACHE_TTL -- In-memory search cache TTL (default: 3600)
  • DISCOGS_CACHE_MAXSIZE -- Max entries per cache (default: 1000)

Discogs rate limiting settings

  • DISCOGS_RATE_LIMIT -- Max requests/minute (default: 50)
  • DISCOGS_MAX_CONCURRENT -- Max concurrent requests (default: 5)
  • DISCOGS_MAX_RETRIES -- Max retries on 429 errors (default: 2)

Deployment

Hosted on Railway with CI-driven deploys (automatic deploys are disabled).

  • main branch -- CI deploys to staging after lint, typecheck, and unit tests pass
  • prod branch -- CI deploys to production after lint, typecheck, and unit tests pass
  • Health check at /health with real dependency probes
  • Optional PostgreSQL cache for Discogs data via DATABASE_URL_DISCOGS (gracefully degrades to API-only)
  • Railway volume mounted at /data stores library.db persistently across deploys

Library Database

The library.db file is uploaded to the Railway volume via POST /admin/upload-library-db, not committed to git. The discogs-cache ETL script (scripts/sync-library.sh) handles daily uploads to both staging and production environments.

On first deploy, the volume is empty. The service starts healthy for non-database endpoints but the health check reports unhealthy (503) until library.db is uploaded.

The API contract is defined in wxyc-shared/api.yaml with generated models for Python, TypeScript, Swift, and Kotlin.

About

WXYC library catalog lookup service with Discogs integration

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages