Discord report (Samuel [KC]): tracks of the same album sometimes carry
different MUSICBRAINZ_ALBUMID tags, which causes Navidrome (and other
media servers grouping by album MBID) to split the album into multiple
entries. Two-part fix — one for existing libraries, one for the root
cause that lets new imports drift.
Part 1 — Detector + fix action (catches existing dissenters):
`core/repair_jobs/mbid_mismatch_detector.py`:
- New helpers: `_read_album_mbid_from_file` and
`_write_album_mbid_to_file` use the Picard-standard tag conventions
(`TXXX:MusicBrainz Album Id` for MP3, `MUSICBRAINZ_ALBUMID` for
FLAC/OGG, `----:com.apple.iTunes:MusicBrainz Album Id` for MP4).
- New scan phase `_scan_album_mbid_consistency` runs after the
existing track-MBID scan: groups tracks by DB `album_id`, reads
each track's embedded album MBID, finds the consensus
(most-common) MBID via `Counter`, flags dissenters. Tracks without
an album MBID at all are skipped (they don't break Navidrome —
only an explicit MBID disagreement does). Albums where MBIDs are
perfectly tied (no clear consensus) are skipped too — surface as
a manual decision instead of fixing toward a 1/N tie.
- New finding type `album_mbid_mismatch` carries `consensus_mbid`,
`wrong_mbid`, `consensus_count`, `total_tracks_with_mbid`, and a
human-readable reason string.
`core/repair_worker.py`:
- Added `'album_mbid_mismatch': self._fix_album_mbid_mismatch` to the
fix dispatch dict and to the `fixable_types` tuple so auto-fix +
bulk-fix paths pick it up.
- New `_fix_album_mbid_mismatch` method reads `consensus_mbid` from
finding details, resolves the dissenter's file path via the shared
library resolver, calls `_write_album_mbid_to_file` to rewrite the
tag in place. Doesn't touch the album's other tracks (they're
already in agreement).
Part 2 — Root cause fix (prevents new SoulSync imports from drifting):
The original in-memory `mb_release_cache` in `core/metadata/source.py`
maps `(normalized_album, artist) -> release_mbid` so per-track
enrichment of the same album hits the cache and writes the same
MUSICBRAINZ_ALBUMID to every track. That cache is bounded (4096
entries) and in-process — so cache eviction (when other albums are
processed in between) and server restart can BOTH cause
inconsistency. Per-track album-name variation (e.g. some tracks
tagged `"Album"`, others tagged `"Album (Deluxe)"`) and per-track
artist variation (features) make it worse.
`core/metadata/album_mbid_cache.py` (new module):
- DB-backed `lookup(normalized_album, artist) -> release_mbid` and
`record(...)` functions. Same key shape as the in-memory cache.
- Strict additive design: every public function is wrapped in
try/except and degrades to None / no-op on ANY database error.
The existing in-memory cache + MusicBrainz lookup remains the
authoritative fallback. If this module breaks, downloads continue
exactly as they would today.
`database/music_database.py`:
- New `mb_album_release_cache` table with composite primary key
`(normalized_album_key, artist_key)`. Reverse-lookup index on
`release_mbid` for future debug tooling. Created via the existing
`CREATE TABLE IF NOT EXISTS` migration pattern — idempotent, no
schema version bump needed.
`core/metadata/source.py`:
- Surgical change inside the existing `embed_source_ids`
in-memory-cache-miss branch: BEFORE calling MusicBrainz, consult
the persistent cache. If a previous SoulSync run already resolved
this album's release MBID, reuse it. After a successful MB lookup,
store in BOTH caches. Both calls wrapped in defensive try/except
so any failure falls through to existing logic.
Tests:
- `tests/metadata/test_album_mbid_cache.py` — 16 cache tests:
round-trip, idempotent re-record, overwrite semantics, clear_all,
album+artist independence (no Greatest Hits collisions),
defensive None-on-empty-input, graceful degradation when the DB
is unavailable / connection raises / commit fails, schema sanity
(table + index exist after init).
- `tests/test_album_mbid_consistency.py` — 13 detector tests:
tag read/write round-trip on real FLAC files, Picard-standard tag
descriptors, defensive paths (unreadable file, empty input),
detector behavior (agreement → no flags, lone dissenter → flag,
ties → no flag, single-track albums → skipped, no-MBID tracks →
skipped, unresolvable file paths → skipped).
- `tests/metadata/test_metadata_enrichment.py` — added autouse
fixture monkeypatching the persistent cache to no-op for tests in
this file. The existing tests pin per-call MB counts and
in-memory cache state; without the fixture, persistent rows from
earlier tests would bypass the MB call. Persistent layer has its
own dedicated tests.
Verified: 1782 tests pass (29 new), ruff clean, smoke test confirms
end-to-end cache round-trip works.
WHATS_NEW entry under '2.4.2' dev cycle.