SoulSync

History

Broque Thomas 4b15fe0b75 Fix album MBID inconsistency: detector + persistent release-MBID cache Discord report (Samuel [KC]): tracks of the same album sometimes carry different MUSICBRAINZ_ALBUMID tags, which causes Navidrome (and other media servers grouping by album MBID) to split the album into multiple entries. Two-part fix — one for existing libraries, one for the root cause that lets new imports drift. Part 1 — Detector + fix action (catches existing dissenters): `core/repair_jobs/mbid_mismatch_detector.py`: - New helpers: `_read_album_mbid_from_file` and `_write_album_mbid_to_file` use the Picard-standard tag conventions (`TXXX:MusicBrainz Album Id` for MP3, `MUSICBRAINZ_ALBUMID` for FLAC/OGG, `----:com.apple.iTunes:MusicBrainz Album Id` for MP4). - New scan phase `_scan_album_mbid_consistency` runs after the existing track-MBID scan: groups tracks by DB `album_id`, reads each track's embedded album MBID, finds the consensus (most-common) MBID via `Counter`, flags dissenters. Tracks without an album MBID at all are skipped (they don't break Navidrome — only an explicit MBID disagreement does). Albums where MBIDs are perfectly tied (no clear consensus) are skipped too — surface as a manual decision instead of fixing toward a 1/N tie. - New finding type `album_mbid_mismatch` carries `consensus_mbid`, `wrong_mbid`, `consensus_count`, `total_tracks_with_mbid`, and a human-readable reason string. `core/repair_worker.py`: - Added `'album_mbid_mismatch': self._fix_album_mbid_mismatch` to the fix dispatch dict and to the `fixable_types` tuple so auto-fix + bulk-fix paths pick it up. - New `_fix_album_mbid_mismatch` method reads `consensus_mbid` from finding details, resolves the dissenter's file path via the shared library resolver, calls `_write_album_mbid_to_file` to rewrite the tag in place. Doesn't touch the album's other tracks (they're already in agreement). Part 2 — Root cause fix (prevents new SoulSync imports from drifting): The original in-memory `mb_release_cache` in `core/metadata/source.py` maps `(normalized_album, artist) -> release_mbid` so per-track enrichment of the same album hits the cache and writes the same MUSICBRAINZ_ALBUMID to every track. That cache is bounded (4096 entries) and in-process — so cache eviction (when other albums are processed in between) and server restart can BOTH cause inconsistency. Per-track album-name variation (e.g. some tracks tagged `"Album"`, others tagged `"Album (Deluxe)"`) and per-track artist variation (features) make it worse. `core/metadata/album_mbid_cache.py` (new module): - DB-backed `lookup(normalized_album, artist) -> release_mbid` and `record(...)` functions. Same key shape as the in-memory cache. - Strict additive design: every public function is wrapped in try/except and degrades to None / no-op on ANY database error. The existing in-memory cache + MusicBrainz lookup remains the authoritative fallback. If this module breaks, downloads continue exactly as they would today. `database/music_database.py`: - New `mb_album_release_cache` table with composite primary key `(normalized_album_key, artist_key)`. Reverse-lookup index on `release_mbid` for future debug tooling. Created via the existing `CREATE TABLE IF NOT EXISTS` migration pattern — idempotent, no schema version bump needed. `core/metadata/source.py`: - Surgical change inside the existing `embed_source_ids` in-memory-cache-miss branch: BEFORE calling MusicBrainz, consult the persistent cache. If a previous SoulSync run already resolved this album's release MBID, reuse it. After a successful MB lookup, store in BOTH caches. Both calls wrapped in defensive try/except so any failure falls through to existing logic. Tests: - `tests/metadata/test_album_mbid_cache.py` — 16 cache tests: round-trip, idempotent re-record, overwrite semantics, clear_all, album+artist independence (no Greatest Hits collisions), defensive None-on-empty-input, graceful degradation when the DB is unavailable / connection raises / commit fails, schema sanity (table + index exist after init). - `tests/test_album_mbid_consistency.py` — 13 detector tests: tag read/write round-trip on real FLAC files, Picard-standard tag descriptors, defensive paths (unreadable file, empty input), detector behavior (agreement → no flags, lone dissenter → flag, ties → no flag, single-track albums → skipped, no-MBID tracks → skipped, unresolvable file paths → skipped). - `tests/metadata/test_metadata_enrichment.py` — added autouse fixture monkeypatching the persistent cache to no-op for tests in this file. The existing tests pin per-call MB counts and in-memory cache state; without the fixture, persistent rows from earlier tests would bypass the MB call. Persistent layer has its own dedicated tests. Verified: 1782 tests pass (29 new), ruff clean, smoke test confirms end-to-end cache round-trip works. WHATS_NEW entry under '2.4.2' dev cycle.		3 weeks ago
..
__init__.py	Move image URL normalization into metadata helpers	3 weeks ago
album_mbid_cache.py	Fix album MBID inconsistency: detector + persistent release-MBID cache	3 weeks ago
album_tracks.py	Move metadata helpers into package modules	4 weeks ago
artist_image.py	Move metadata helpers into package modules	4 weeks ago
artwork.py	Move image URL normalization into metadata helpers	3 weeks ago
cache.py	Move metadata API into package	4 weeks ago
common.py	Tighten metadata and import safety	4 weeks ago
completion.py	Move metadata helpers into package modules	4 weeks ago
discography.py	Move metadata helpers into package modules	4 weeks ago
enrichment.py	fix post-download tagging, and enable tagging for hifi	3 weeks ago
lookup.py	Move metadata API into package	4 weeks ago
lyrics.py	Move shared metadata helpers into package	4 weeks ago
registry.py	Split metadata source and Spotify status	3 weeks ago
service.py	Move metadata API into package	4 weeks ago
similar_artists.py	Move metadata helpers into package modules	4 weeks ago
source.py	Fix album MBID inconsistency: detector + persistent release-MBID cache	3 weeks ago
status.py	Make Spotify status updates event-driven	3 weeks ago