A FastAPI service for WXYC radio that searches the library catalog and cross-references results with Discogs metadata.
These projects depend on library-metadata-lookup:
| Project | Relationship |
|---|---|
| request-o-matic | Primary consumer. Calls POST /api/v1/lookup to resolve song requests before posting to Slack. |
| Backend-Service | Calls this service for library search and Discogs metadata. |
| dj-site | Indirect via Backend-Service. DJ flowsheet and card catalog frontend. |
| wxyc-ios-64 | Indirect via Backend-Service. iOS/macOS/tvOS/watchOS app. |
| WXYC-Android | Indirect via Backend-Service. Android app. |
| discogs-cache | Shared data. Its ETL pipeline produces library.db (consumed at runtime by this service) and populates the PostgreSQL Discogs cache (queried by discogs/cache_service.py). |
| Package | Purpose |
|---|---|
| wxyc-etl | Shared Rust library (PyO3 bindings) for artist name normalization, diacritics stripping, compilation artist detection, and library.db schema validation. |
Given an artist, song, and/or album, the service:
- Corrects artist spelling via fuzzy matching against the library catalog
- Resolves album names from Discogs when only a song title is provided
- Searches the library catalog with a multi-strategy fallback chain
- Validates fallback results against Discogs tracklists
- Fetches album artwork from Discogs
- Returns enriched results with metadata
Primary endpoint. Accepts a parsed request and returns library results with artwork.
{
"artist": "Stereolab",
"song": "Percolator",
"raw_message": "Play Percolator by Stereolab"
}Response:
{
"results": [
{
"library_item": {
"id": 10,
"artist": "Stereolab",
"title": "Emperor Tomato Ketchup",
"call_letters": "S",
"artist_call_number": 1,
"release_call_number": 1,
"genre": "Rock",
"format": "CD"
},
"artwork": {
"album": "Emperor Tomato Ketchup",
"artist": "Stereolab",
"release_id": 123456,
"release_url": "https://www.discogs.com/release/123456",
"artwork_url": "https://img.discogs.com/...",
"confidence": 0.95
}
}
],
"search_type": "direct",
"song_not_found": false,
"found_on_compilation": false,
"context_message": null,
"corrected_artist": null,
"cache_stats": null
}Direct library catalog search.
Search Discogs releases.
Find all releases containing a specific track.
Get full release metadata from Discogs.
Resolve a Discogs release URL or Bandcamp album URL (or an explicit (source, id) pair) to canonical release metadata, cross-source identifiers, and streaming availability — the prefill payload for tubafrenzy's rotation-release create form.
Request: {"url": "https://www.discogs.com/release/12345"} or {"url": "https://artist.bandcamp.com/album/slug"} or {"source": "discogs_release", "id": "12345"}.
Response includes canonical (artist, title, label, catno, year), identifiers (cross-source IDs learned from Discogs + the streaming check), streaming (per-service availability), and warnings[] for non-fatal issues. Always returns 200 — the form falls back to manual entry on partial prefill.
Upload a new library.db file. Requires Authorization: Bearer <ADMIN_TOKEN> header.
The file is validated (must be a SQLite database with a library table), then atomically
replaces the current database. Returns {"status": "ok", "row_count": <int>, "timestamp": "<ISO8601>"}.
Health check with real connectivity probes for the database, Discogs API, and Discogs cache.
The lookup orchestrator tries strategies in order until results are found:
| Strategy | Condition | What it does |
|---|---|---|
ARTIST_PLUS_ALBUM |
Has artist, album, or song | Search by artist + album(s) from Discogs, fall back to artist + song, then artist only |
SWAPPED_INTERPRETATION |
No results + ambiguous "X - Y" format | Try both "X as artist, Y as title" and vice versa |
TRACK_ON_COMPILATION |
Song not found + has artist and song | Cross-reference Discogs track listings with library to find compilations |
SONG_AS_ARTIST |
No results + song parsed but no artist | Try the parsed song title as an artist name |
After the pipeline, if results came from an artist-only fallback (song_not_found=True), each album is validated against Discogs tracklists to filter to only albums containing the requested track.
library-metadata-lookup/
main.py # FastAPI app entry point
config/settings.py # Environment-based configuration
core/
dependencies.py # DI for LibraryDB + DiscogsService
search.py # Search strategy pattern + ambiguous format detection
telemetry.py # PostHog telemetry
logging.py, sentry.py, exceptions.py
discogs/
service.py # Discogs API client with caching + confidence scoring
cache_service.py # PostgreSQL cache (asyncpg + pg_trgm)
matching.py # Discogs-specific normalization (suffix stripping, track comparison)
memory_cache.py # In-memory TTL cache
lookup.py # Track/artist lookup helpers
models.py, ratelimit.py, router.py
library/
db.py # SQLite FTS5 + fuzzy fallback search
models.py, router.py
generated/
api_models.py # Pydantic v2 models from wxyc-shared/api.yaml
lookup/
orchestrator.py # Core search pipeline
models.py # Re-exports generated API contract models
router.py # POST /lookup endpoint
routers/
admin.py # POST /admin/upload-library-db
health.py # GET /health
scripts/
benchmark_cache.py # Discogs PG cache vs API benchmarks
generate_api_models.sh # Generate Python models from api.yaml
services/
parser.py # Minimal ParsedRequest model (no Groq)
tests/
factories.py # Shared test factories (make_library_item, make_discogs_result)
unit/ # 472 mocked unit tests (97% source coverage)
integration/ # 48 integration tests with real SQLite/FTS5
- Python 3.12+
- uv for dependency management
uvicorn main:app --reloadThe repo follows the architecture-A marker scheme (see the WXYC test-patterns guide, Section 3): markers route CI by infrastructure, not by tier. Tier directories (tests/unit/, tests/integration/, tests/e2e/) survive for documentation only.
# Default: every unmarked test (unit-equivalent + the in-memory-SQLite/mocked
# integration + e2e tests). Excludes pg + external_api per pyproject addopts.
uv run pytest -v
# PG-backed tests (entity store CRUD, etc.). Needs DATABASE_URL_TEST.
uv run pytest -v -m pg
# Real-Discogs-API tests. Needs DISCOGS_TOKEN.
uv run pytest -v -m external_apiRequired:
DISCOGS_TOKEN-- Discogs API token for artwork and track lookups
Optional:
DATABASE_URL_DISCOGS-- PostgreSQL URL for Discogs cache (e.g.postgresql://user:pass@host:5432/discogs)SENTRY_DSN-- Sentry error trackingPOSTHOG_API_KEY-- PostHog telemetryLIBRARY_DB_PATH-- Path to SQLite library database (default:library.db)ADMIN_TOKEN-- Bearer token for admin endpoints (library.db upload)STREAMING_WEBHOOK_URLS-- Comma-separated URLs to POST streaming status changes after library.db uploadETL_NOTIFY_KEY-- Bearer token used by LML when pushing the streaming-status webhook to tubafrenzyLML_API_KEY-- Bearer token required from tubafrenzy / Backend-Service callers on protected endpoints (see "Inbound auth" below)LML_REQUIRE_AUTH-- Whentrue, enforceLML_API_KEYon protected endpoints. Defaults tofalseso the dep can be deployed before consumers are updated; flip after all callers send the bearer header.LOG_LEVEL-- Logging level (default:INFO)
Tubafrenzy and Backend-Service call LML for streaming checks, library search, autocomplete, and Discogs lookups. When LML_REQUIRE_AUTH=true, those callers must send Authorization: Bearer <LML_API_KEY> on every request to:
POST /api/v1/streaming-checkPOST /api/v1/lookupGET /api/v1/library/searchGET /api/v1/discogs/...(all five endpoints)
/health, /admin/* (uses its own ADMIN_TOKEN), and /identity/* are not gated by LML_API_KEY.
If LML_REQUIRE_AUTH=true and LML_API_KEY is unset, protected endpoints return 500 (fail loudly rather than silently accepting all requests).
DISCOGS_TRACK_CACHE_TTL-- In-memory track cache TTL in seconds (default: 3600)DISCOGS_RELEASE_CACHE_TTL-- In-memory release cache TTL (default: 14400)DISCOGS_SEARCH_CACHE_TTL-- In-memory search cache TTL (default: 3600)DISCOGS_CACHE_MAXSIZE-- Max entries per cache (default: 1000)
DISCOGS_RATE_LIMIT-- Max requests/minute (default: 50)DISCOGS_MAX_CONCURRENT-- Max concurrent requests (default: 5)DISCOGS_MAX_RETRIES-- Max retries on 429 errors (default: 2)
Hosted on Railway with CI-driven deploys (automatic deploys are disabled).
mainbranch -- CI deploys to staging after lint, typecheck, and unit tests passprodbranch -- CI deploys to production after lint, typecheck, and unit tests pass- Health check at
/healthwith real dependency probes - Optional PostgreSQL cache for Discogs data via
DATABASE_URL_DISCOGS(gracefully degrades to API-only) - Railway volume mounted at
/datastoreslibrary.dbpersistently across deploys
The library.db file is uploaded to the Railway volume via POST /admin/upload-library-db,
not committed to git. The discogs-cache ETL script (scripts/sync-library.sh) handles
daily uploads to both staging and production environments.
On first deploy, the volume is empty. The service starts healthy for non-database endpoints
but the health check reports unhealthy (503) until library.db is uploaded.
The API contract is defined in wxyc-shared/api.yaml with generated models for Python, TypeScript, Swift, and Kotlin.