SoulSync

History

Broque Thomas 4b15fe0b75 Fix album MBID inconsistency: detector + persistent release-MBID cache Discord report (Samuel [KC]): tracks of the same album sometimes carry different MUSICBRAINZ_ALBUMID tags, which causes Navidrome (and other media servers grouping by album MBID) to split the album into multiple entries. Two-part fix — one for existing libraries, one for the root cause that lets new imports drift. Part 1 — Detector + fix action (catches existing dissenters): `core/repair_jobs/mbid_mismatch_detector.py`: - New helpers: `_read_album_mbid_from_file` and `_write_album_mbid_to_file` use the Picard-standard tag conventions (`TXXX:MusicBrainz Album Id` for MP3, `MUSICBRAINZ_ALBUMID` for FLAC/OGG, `----:com.apple.iTunes:MusicBrainz Album Id` for MP4). - New scan phase `_scan_album_mbid_consistency` runs after the existing track-MBID scan: groups tracks by DB `album_id`, reads each track's embedded album MBID, finds the consensus (most-common) MBID via `Counter`, flags dissenters. Tracks without an album MBID at all are skipped (they don't break Navidrome — only an explicit MBID disagreement does). Albums where MBIDs are perfectly tied (no clear consensus) are skipped too — surface as a manual decision instead of fixing toward a 1/N tie. - New finding type `album_mbid_mismatch` carries `consensus_mbid`, `wrong_mbid`, `consensus_count`, `total_tracks_with_mbid`, and a human-readable reason string. `core/repair_worker.py`: - Added `'album_mbid_mismatch': self._fix_album_mbid_mismatch` to the fix dispatch dict and to the `fixable_types` tuple so auto-fix + bulk-fix paths pick it up. - New `_fix_album_mbid_mismatch` method reads `consensus_mbid` from finding details, resolves the dissenter's file path via the shared library resolver, calls `_write_album_mbid_to_file` to rewrite the tag in place. Doesn't touch the album's other tracks (they're already in agreement). Part 2 — Root cause fix (prevents new SoulSync imports from drifting): The original in-memory `mb_release_cache` in `core/metadata/source.py` maps `(normalized_album, artist) -> release_mbid` so per-track enrichment of the same album hits the cache and writes the same MUSICBRAINZ_ALBUMID to every track. That cache is bounded (4096 entries) and in-process — so cache eviction (when other albums are processed in between) and server restart can BOTH cause inconsistency. Per-track album-name variation (e.g. some tracks tagged `"Album"`, others tagged `"Album (Deluxe)"`) and per-track artist variation (features) make it worse. `core/metadata/album_mbid_cache.py` (new module): - DB-backed `lookup(normalized_album, artist) -> release_mbid` and `record(...)` functions. Same key shape as the in-memory cache. - Strict additive design: every public function is wrapped in try/except and degrades to None / no-op on ANY database error. The existing in-memory cache + MusicBrainz lookup remains the authoritative fallback. If this module breaks, downloads continue exactly as they would today. `database/music_database.py`: - New `mb_album_release_cache` table with composite primary key `(normalized_album_key, artist_key)`. Reverse-lookup index on `release_mbid` for future debug tooling. Created via the existing `CREATE TABLE IF NOT EXISTS` migration pattern — idempotent, no schema version bump needed. `core/metadata/source.py`: - Surgical change inside the existing `embed_source_ids` in-memory-cache-miss branch: BEFORE calling MusicBrainz, consult the persistent cache. If a previous SoulSync run already resolved this album's release MBID, reuse it. After a successful MB lookup, store in BOTH caches. Both calls wrapped in defensive try/except so any failure falls through to existing logic. Tests: - `tests/metadata/test_album_mbid_cache.py` — 16 cache tests: round-trip, idempotent re-record, overwrite semantics, clear_all, album+artist independence (no Greatest Hits collisions), defensive None-on-empty-input, graceful degradation when the DB is unavailable / connection raises / commit fails, schema sanity (table + index exist after init). - `tests/test_album_mbid_consistency.py` — 13 detector tests: tag read/write round-trip on real FLAC files, Picard-standard tag descriptors, defensive paths (unreadable file, empty input), detector behavior (agreement → no flags, lone dissenter → flag, ties → no flag, single-track albums → skipped, no-MBID tracks → skipped, unresolvable file paths → skipped). - `tests/metadata/test_metadata_enrichment.py` — added autouse fixture monkeypatching the persistent cache to no-op for tests in this file. The existing tests pin per-call MB counts and in-memory cache state; without the fixture, persistent rows from earlier tests would bypass the MB call. Persistent layer has its own dedicated tests. Verified: 1782 tests pass (29 new), ruff clean, smoke test confirms end-to-end cache round-trip works. WHATS_NEW entry under '2.4.2' dev cycle.	3 weeks ago
..
__init__.py	basic db structure	10 months ago
music_database.py	Fix album MBID inconsistency: detector + persistent release-MBID cache	3 weeks ago

Broque Thomas 4b15fe0b75 Fix album MBID inconsistency: detector + persistent release-MBID cache

Discord report (Samuel [KC]): tracks of the same album sometimes carry
different MUSICBRAINZ_ALBUMID tags, which causes Navidrome (and other
media servers grouping by album MBID) to split the album into multiple
entries. Two-part fix — one for existing libraries, one for the root
cause that lets new imports drift.

Part 1 — Detector + fix action (catches existing dissenters):

`core/repair_jobs/mbid_mismatch_detector.py`:
- New helpers: `_read_album_mbid_from_file` and
  `_write_album_mbid_to_file` use the Picard-standard tag conventions
  (`TXXX:MusicBrainz Album Id` for MP3, `MUSICBRAINZ_ALBUMID` for
  FLAC/OGG, `----:com.apple.iTunes:MusicBrainz Album Id` for MP4).
- New scan phase `_scan_album_mbid_consistency` runs after the
  existing track-MBID scan: groups tracks by DB `album_id`, reads
  each track's embedded album MBID, finds the consensus
  (most-common) MBID via `Counter`, flags dissenters. Tracks without
  an album MBID at all are skipped (they don't break Navidrome —
  only an explicit MBID disagreement does). Albums where MBIDs are
  perfectly tied (no clear consensus) are skipped too — surface as
  a manual decision instead of fixing toward a 1/N tie.
- New finding type `album_mbid_mismatch` carries `consensus_mbid`,
  `wrong_mbid`, `consensus_count`, `total_tracks_with_mbid`, and a
  human-readable reason string.

`core/repair_worker.py`:
- Added `'album_mbid_mismatch': self._fix_album_mbid_mismatch` to the
  fix dispatch dict and to the `fixable_types` tuple so auto-fix +
  bulk-fix paths pick it up.
- New `_fix_album_mbid_mismatch` method reads `consensus_mbid` from
  finding details, resolves the dissenter's file path via the shared
  library resolver, calls `_write_album_mbid_to_file` to rewrite the
  tag in place. Doesn't touch the album's other tracks (they're
  already in agreement).

Part 2 — Root cause fix (prevents new SoulSync imports from drifting):

The original in-memory `mb_release_cache` in `core/metadata/source.py`
maps `(normalized_album, artist) -> release_mbid` so per-track
enrichment of the same album hits the cache and writes the same
MUSICBRAINZ_ALBUMID to every track. That cache is bounded (4096
entries) and in-process — so cache eviction (when other albums are
processed in between) and server restart can BOTH cause
inconsistency. Per-track album-name variation (e.g. some tracks
tagged `"Album"`, others tagged `"Album (Deluxe)"`) and per-track
artist variation (features) make it worse.

`core/metadata/album_mbid_cache.py` (new module):
- DB-backed `lookup(normalized_album, artist) -> release_mbid` and
  `record(...)` functions. Same key shape as the in-memory cache.
- Strict additive design: every public function is wrapped in
  try/except and degrades to None / no-op on ANY database error.
  The existing in-memory cache + MusicBrainz lookup remains the
  authoritative fallback. If this module breaks, downloads continue
  exactly as they would today.

`database/music_database.py`:
- New `mb_album_release_cache` table with composite primary key
  `(normalized_album_key, artist_key)`. Reverse-lookup index on
  `release_mbid` for future debug tooling. Created via the existing
  `CREATE TABLE IF NOT EXISTS` migration pattern — idempotent, no
  schema version bump needed.

`core/metadata/source.py`:
- Surgical change inside the existing `embed_source_ids`
  in-memory-cache-miss branch: BEFORE calling MusicBrainz, consult
  the persistent cache. If a previous SoulSync run already resolved
  this album's release MBID, reuse it. After a successful MB lookup,
  store in BOTH caches. Both calls wrapped in defensive try/except
  so any failure falls through to existing logic.

Tests:
- `tests/metadata/test_album_mbid_cache.py` — 16 cache tests:
  round-trip, idempotent re-record, overwrite semantics, clear_all,
  album+artist independence (no Greatest Hits collisions),
  defensive None-on-empty-input, graceful degradation when the DB
  is unavailable / connection raises / commit fails, schema sanity
  (table + index exist after init).
- `tests/test_album_mbid_consistency.py` — 13 detector tests:
  tag read/write round-trip on real FLAC files, Picard-standard tag
  descriptors, defensive paths (unreadable file, empty input),
  detector behavior (agreement → no flags, lone dissenter → flag,
  ties → no flag, single-track albums → skipped, no-MBID tracks →
  skipped, unresolvable file paths → skipped).
- `tests/metadata/test_metadata_enrichment.py` — added autouse
  fixture monkeypatching the persistent cache to no-op for tests in
  this file. The existing tests pin per-call MB counts and
  in-memory cache state; without the fixture, persistent rows from
  earlier tests would bypass the MB call. Persistent layer has its
  own dedicated tests.

Verified: 1782 tests pass (29 new), ruff clean, smoke test confirms
end-to-end cache round-trip works.

WHATS_NEW entry under '2.4.2' dev cycle.

__init__.py

basic db structure

music_database.py

Fix album MBID inconsistency: detector + persistent release-MBID cache