AcoustID scanner: prefer track_artist for compilation albums

Discord report (Skowl): downloaded a compilation album ("High Tea Music: Vol 1") where every track has a different artist (Eclypse, Andromedik, T & Sugah, Gourski, etc.) and the AcoustID scanner flagged every single track as Wrong Song. The file tags had the correct per-track artist (e.g. "Eclypse" for "City Lights"), but the scanner compared against the album-level artist ("Andromedik", the curator). Raw similarity 12% → Wrong Song flag. # Why the prior multi-value fix didn't help Foxxify's case (just-merged PR): AcoustID returned multi-value credit "Okayracer, aldrch & poptropicaslutz!" — primary IS in the credit. Splitting found it. Skowl's case: both sides single-value but DIFFERENT artists. Splitter has nothing to find — Eclypse simply isn't in "Andromedik". Different bug. # Cause Scanner SQL at `core/repair_jobs/acoustid_scanner.py:281` joined the `artists` table via `tracks.artist_id` which points at the ALBUM artist (the curator/label-name applied to every row in a compilation). The `tracks.track_artist` column already holds the correct per-track artist for compilations — populated by every server-scan path (Plex `originalTitle`, Jellyfin `ArtistItems`, Navidrome per-track `artist`) AND the auto-import / direct-download post-process flow (`record_soulsync_library_entry` writes it when different from album artist). Scanner just wasn't reading it. # Fix ```sql SELECT t.id, t.title, COALESCE(NULLIF(t.track_artist, ''), ar.name) AS artist, ... ``` Prefers per-track artist when populated, falls back to album artist for legacy rows / single-artist albums where `track_artist` is NULL. `NULLIF(t.track_artist, '')` handles the empty-string-instead-of-null case some legacy rows might have. # Composes with Foxxify's multi-value fix For the rare compilation track where AcoustID ALSO returns a multi-value credit (e.g. compilation track has multiple credited performers), both paths work together — `track_artist` gives the correct expected primary, then the helper splits the credit and finds it. # Tests added (2) - `test_load_db_tracks_prefers_track_artist_for_compilation` — reporter's exact case: track with `track_artist='Eclypse'` AND `artist_id` pointing at album artist 'Andromedik' resolves to 'Eclypse'. Second track with NULL `track_artist` falls back to album artist 'Andromedik' (single-artist + legacy compat). - `test_load_db_tracks_falls_back_when_track_artist_empty_string` — empty string in `track_artist` (some legacy rows) → NULLIF returns NULL → COALESCE falls back to album artist. Both use a real SQLite DB so the COALESCE/NULLIF logic + JOIN runs against actual schema (SimpleNamespace fakes can't simulate JOINs). # Verification - 6/6 scanner tests pass (2 new + 4 existing) - 2586 full suite passes (+2 from prior commit) - Ruff clean
3 days ago · 812db1fbbf
parent 08a0b39b91
commit 812db1fbbf
3 changed files with 149 additions and 1 deletions
--- a/core/repair_jobs/acoustid_scanner.py
+++ b/core/repair_jobs/acoustid_scanner.py
@ -274,8 +274,20 @@ class AcoustIDScannerJob(RepairJob):
        try:
            conn = context.db._get_connection()
            cursor = conn.cursor()
+            # Discord report (Skowl): compilation albums like "High Tea
+            # Music: Vol 1" have a different artist per track but the
+            # `tracks.artist_id` foreign key points at the ALBUM artist
+            # (curator / label-name applied to every track). AcoustID
+            # returns the actual per-track artist → 12% similarity →
+            # Wrong Song flag. Fix: prefer `tracks.track_artist` (the
+            # per-track artist, populated by every server-scan + auto-
+            # import path when different from album artist) and fall
+            # back to the album artist only when the per-track column
+            # is NULL or empty (legacy rows / single-artist albums).
            cursor.execute("""
-                SELECT t.id, t.title, ar.name, t.file_path, t.track_number,
+                SELECT t.id, t.title,
+                       COALESCE(NULLIF(t.track_artist, ''), ar.name) AS artist,
+                       t.file_path, t.track_number,
                       al.title AS album_title, al.thumb_url, ar.thumb_url
                FROM tracks t
                LEFT JOIN artists ar ON ar.id = t.artist_id
--- a/tests/test_acoustid_scanner.py
+++ b/tests/test_acoustid_scanner.py
@ -225,3 +225,138 @@ class JobResultStub:
    errors = 0
    scanned = 0
    skipped = 0
+
+
+# ---------------------------------------------------------------------------
+# Compilation albums — Skowl Discord report
+# ---------------------------------------------------------------------------
+#
+# Compilation albums (e.g. "High Tea Music: Vol 1") have different
+# artists per track but `tracks.artist_id` points at the ALBUM artist
+# (curator / label name applied to every row). The scanner used to
+# compare AcoustID's per-track artist against the album artist →
+# 12% sim → Wrong Song flag on every track. The `tracks.track_artist`
+# column already holds the correct per-track artist for these cases
+# (populated by every server-scan + auto-import path) — scanner just
+# wasn't reading it. Post-fix `_load_db_tracks` prefers track_artist
+# via `COALESCE(NULLIF(t.track_artist, ''), ar.name)`.
+
+
+def _make_real_db_context(tmp_path):
+    """Build a context with a REAL SQLite DB so the scanner's
+    multi-table JOIN runs against actual schema. SimpleNamespace
+    fakes can't simulate the JOIN."""
+    import sqlite3
+    db_path = tmp_path / "test.db"
+    conn = sqlite3.connect(str(db_path))
+    conn.row_factory = sqlite3.Row
+    cursor = conn.cursor()
+    cursor.executescript("""
+        CREATE TABLE artists (
+            id TEXT PRIMARY KEY,
+            name TEXT NOT NULL,
+            thumb_url TEXT
+        );
+        CREATE TABLE albums (
+            id TEXT PRIMARY KEY,
+            title TEXT NOT NULL,
+            thumb_url TEXT
+        );
+        CREATE TABLE tracks (
+            id TEXT PRIMARY KEY,
+            title TEXT,
+            artist_id TEXT,
+            album_id TEXT,
+            file_path TEXT,
+            track_number INTEGER,
+            track_artist TEXT
+        );
+    """)
+    conn.commit()
+    conn.close()
+
+    class _RealDB:
+        def _get_connection(self):
+            c = sqlite3.connect(str(db_path))
+            c.row_factory = sqlite3.Row
+            return c
+
+    return _RealDB()
+
+
+def test_load_db_tracks_prefers_track_artist_for_compilation():
+    """Reporter's exact case (Skowl) — compilation album where
+    every track has a different artist credited via track_artist
+    column, while artist_id points at the album-level curator."""
+    import tempfile, pathlib
+    tmp = pathlib.Path(tempfile.mkdtemp())
+    db = _make_real_db_context(tmp)
+
+    conn = db._get_connection()
+    cursor = conn.cursor()
+    cursor.execute("INSERT INTO artists (id, name) VALUES ('andro', 'Andromedik')")
+    cursor.execute(
+        "INSERT INTO albums (id, title) VALUES ('hightea', 'High Tea Music: Vol 1')"
+    )
+    cursor.execute(
+        "INSERT INTO tracks (id, title, artist_id, album_id, file_path, track_artist) "
+        "VALUES (?, ?, ?, ?, ?, ?)",
+        ('city-lights', 'City Lights', 'andro', 'hightea',
+         '/music/citylights.mp3', 'Eclypse'),
+    )
+    cursor.execute(
+        "INSERT INTO tracks (id, title, artist_id, album_id, file_path, track_artist) "
+        "VALUES (?, ?, ?, ?, ?, ?)",
+        ('invasion', 'Invasion', 'andro', 'hightea',
+         '/music/invasion.mp3', None),  # NULL track_artist falls back
+    )
+    conn.commit()
+    conn.close()
+
+    job = AcoustIDScannerJob()
+    context = SimpleNamespace(
+        db=db,
+        config_manager=SimpleNamespace(get=lambda *a, **k: None),
+    )
+    tracks = job._load_db_tracks(context)
+
+    # Track with track_artist populated → Eclypse (per-track), NOT
+    # Andromedik (album-artist via artist_id).
+    assert tracks['city-lights']['artist'] == 'Eclypse', (
+        f"Compilation track must use track_artist; got {tracks['city-lights']['artist']!r}"
+    )
+    # Track with NULL track_artist → falls back to album artist
+    # via COALESCE. Backward compat for legacy rows + single-artist
+    # albums where track_artist isn't populated.
+    assert tracks['invasion']['artist'] == 'Andromedik'
+
+
+def test_load_db_tracks_falls_back_when_track_artist_empty_string():
+    """Defensive: NULLIF treats empty string as NULL too. Some
+    legacy rows might have stored '' instead of NULL."""
+    import tempfile, pathlib
+    tmp = pathlib.Path(tempfile.mkdtemp())
+    db = _make_real_db_context(tmp)
+
+    conn = db._get_connection()
+    cursor = conn.cursor()
+    cursor.execute("INSERT INTO artists (id, name) VALUES ('a', 'Album Artist')")
+    cursor.execute("INSERT INTO albums (id, title) VALUES ('alb', 'Album')")
+    cursor.execute(
+        "INSERT INTO tracks (id, title, artist_id, album_id, file_path, track_artist) "
+        "VALUES (?, ?, ?, ?, ?, ?)",
+        ('t1', 'T1', 'a', 'alb', '/music/t1.mp3', ''),  # empty string
+    )
+    conn.commit()
+    conn.close()
+
+    job = AcoustIDScannerJob()
+    context = SimpleNamespace(
+        db=db,
+        config_manager=SimpleNamespace(get=lambda *a, **k: None),
+    )
+    tracks = job._load_db_tracks(context)
+
+    # Empty string in track_artist → NULLIF returns NULL → COALESCE
+    # falls back to album artist
+    assert tracks['t1']['artist'] == 'Album Artist'
--- a/webui/static/helper.js
+++ b/webui/static/helper.js
@ -3416,6 +3416,7 @@ const WHATS_NEW = {
    '2.4.3': [
        // --- post-release patch work on the 2.4.3 line — entries hidden by _getLatestWhatsNewVersion until the build version bumps ---
        { date: 'Unreleased — 2.4.3 patch work' },
+        { title: 'AcoustID Scanner: Compilation Albums No Longer Flag Every Track', desc: 'discord report (skowl): downloaded a compilation album like "high tea music: vol 1" where every track has a different artist (eclypse, andromedik, t & sugah, gourski, himmes, sektor, lexurus, etc.) and the acoustid scanner flagged every single track as wrong song — the file tag had the correct per-track artist (e.g. "eclypse" for "city lights") but the scanner compared against the album-level artist ("andromedik", the curator). raw similarity 12% → wrong song flag. the multi-value-credit fix from the prior pr (foxxify) didn\'t help because both sides were single-value but DIFFERENT artists. cause: scanner sql joined `artists` table via `tracks.artist_id` which points at the ALBUM artist, not the per-track artist. but `tracks.track_artist` column was already populated with the correct per-track value by every server scan + auto-import path that handles compilations. scanner just wasn\'t reading it. fix: changed the scanner select to `COALESCE(NULLIF(t.track_artist, \'\'), ar.name)` — prefers per-track artist when populated, falls back to album artist for legacy rows / single-artist albums where track_artist is null. NULLIF handles the empty-string-instead-of-null case for legacy data. composes with foxxify\'s multi-value fix — for the rare compilation track where acoustid ALSO returns a multi-value credit, both paths work together. 2 new tests pin: compilation track uses per-track artist (reporter\'s exact case), null/empty track_artist falls back to album artist via coalesce.', page: 'library' },
        { title: 'AcoustID Scanner: Multi-Artist Songs No Longer Flagged As Wrong', desc: 'discord report (foxxify): the acoustid scanner repair job was flagging multi-artist tracks as "wrong song" because acoustid returns the full credit ("okayracer, aldrch & poptropicaslutz!") while the library db carries only the primary artist ("okayracer"). raw similarity scored ~43% — well below the 60% threshold — so the scanner created a wrong-song finding even though the audio was correct. user couldn\'t fix without lowering the global artist threshold to ~30% (which would let real mismatches through). cause: scanner used raw `SequenceMatcher` comparison that doesn\'t recognise the primary artist is just one of several contributors in the credit string. fix: extended the shared `core/matching/artist_aliases.py::artist_names_match` helper (lifted in #441) with credit-token splitting on common separators (comma, ampersand, semicolon, slash, plus, "feat.", "ft.", "featuring", "with", "vs.", "x"). when actual artist contains separators, helper splits into individual contributors and checks each against expected — primary-in-credit cases now resolve at 100% instead of 43%. composes with existing alias path so cross-script multi-artist credits ("hiroyuki sawano" expected, "澤野弘之, featured" actual) work too. wired into `core/repair_jobs/acoustid_scanner.py` — replaces the raw similarity call. acoustid post-download verifier already used the helper from #441 so it inherits the same fix automatically. 14 new tests pin: split-by-separator across 12 credit-string formats, primary at start/middle/end of credit, no-mask on genuine mismatches, single-token actual falls through to direct compare, multi-value composes with aliases, threshold still respected, end-to-end scanner integration with reporter\'s exact case (okayracer in okayracer-aldrch-poptropicaslutz credit → no finding), end-to-end scanner still flags genuine mismatches.', page: 'library' },
        { title: 'Deezer Cover Art: Embedded Covers No Longer Look Blurry', desc: 'discord report (tim): downloaded cover art via deezer metadata source came out visibly blurry in navidrome and on phones — particularly noticeable on large displays. cause: deezer\'s api returns `cover_xl` urls at 1000×1000 but the underlying cdn serves up to 1900×1900 by rewriting the size segment in the url path. soulsync wasn\'t doing the rewrite — same as iTunes mzstatic and spotify scdn already get upgraded. now `_upgrade_deezer_cover_url` (mirrors `_upgrade_spotify_image_url` pattern) rewrites the cdn url to request 1900×1900 before download. cdn serves source-native size when source < target so asking for 1900 on smaller-source albums returns the same bytes (no upscaling, no failure). applied at both download sites — auto post-process flow + the enhanced library view\'s "write tags to file" feature. existing `prefer_caa_art` toggle in settings → library → post-processing remains as the orthogonal workaround for users who want even higher quality (musicbrainz cover art archive, often 3000×3000+). 16 new tests pin: standard upgrade, alternate dzcdn host, artist picture urls, custom target sizes, idempotency on already-upgraded urls, defensive on non-deezer urls (spotify/itunes/caa/lastfm/random), empty/none handling.', page: 'settings' },
        { title: 'Cross-Script Artist Names No Longer Quarantine Files (Hiroyuki Sawano / 澤野弘之, Сергей Лазарев / Sergey Lazarev)', desc: 'github issue #442 (afonsog6): files where the artist tag was in one script and the expected metadata was in another — japanese kanji `澤野弘之` for `hiroyuki sawano`, cyrillic `сергей лазарев` for `sergey lazarev`, etc. — got quarantined post-download because acoustid verification scored the artist similarity at 0% (the two scripts share no characters). reporter could not even rescue the file via manual import — the import-modal goes through the same verifier and re-quarantined the same file. cause: verifier compared expected vs actual artist with raw `_similarity` and never consulted musicbrainz aliases, even though MB exposes them on every artist record. fix: new `core/matching/artist_aliases.py` pure helper with alias-aware comparison + new `artists.aliases` JSON column populated by the existing MB enrichment worker on every artist match (one extra `inc=aliases` request per artist) + new multi-tier resolver `MusicBrainzService.lookup_artist_aliases` (library DB → cache → live MB) so the verifier finds aliases even for un-enriched artists without thrashing the MB API. verifier resolves aliases ONCE per `verify_audio_file` call and feeds them through three artist comparison sites (best-match scoring, secondary scan when title matches but artist doesn\'t, final fallback scan). reporter\'s exact two cases reproduced as regression tests with stubbed MB service. backward compat: aliases unavailable / MB unreachable → verifier falls back to direct similarity (identical to pre-fix behaviour — never quarantines stricter than today). 70 new tests pin every layer: pure helper (28), service methods (31), verifier integration (11). audited adjacent artist-comparison sites (auto-import single-track id, discovery scoring, matching engine) — left untouched per scope discipline since they aren\'t the user-reported pain.', page: 'downloads' },