Merged
Conversation
added 3 commits
March 11, 2026 12:29
Partition dedup by (master_id, format) instead of just master_id so different formats (CD, LP, Cassette, etc.) of the same album survive dedup independently. Add format-aware verify/prune that checks the library's owned formats against release formats, downgrading exact-match KEEP releases to PRUNE when the format doesn't match. - Add lib/format_normalization.py with normalize_format(), normalize_library_format(), and format_matches() functions - Add format column to release table schema and import pipeline with normalize_format transform - Change dedup PARTITION BY from master_id to (master_id, format) - Update LibraryIndex to track format_by_pair from library.db (with backward-compatible fallback for old schemas) - Update classify_all_releases to apply format filtering on exact-match KEEP releases - Update copy-swap column lists and COPY_TABLE_SPEC to include format - Update test fixtures with format data in library.db and release_artist.csv
The import_csv.py script is invoked as a subprocess by run_pipeline.py, which means the repo root may not be on Python's module search path. Add sys.path.insert following the same pattern used by verify_cache.py and run_pipeline.py.
Format-aware dedup partitions by (master_id, format), so fixture releases with different formats all survive dedup. Update E2E test expectations: test_format_aware_dedup_and_prune verifies both CD and Vinyl survive while Cassette is pruned (library owns CD and LP only). Label-aware dedup tests verify all format variants survive. master_id persists when no dedup copy-swap runs. Add test_format_column_present.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
formatcolumn to the release table schema, partitioning dedup by(master_id, format)so different formats of the same album survive independentlyCloses #42
Test plan