Multi-artist Deezer upgrade + double-append guard hardening

Two follow-ups to the multi-artist tag settings PR: 1. Deezer contributors upgrade — closes the "known limitation" flagged in the prior commit. Deezer's `/search` endpoint only returns the primary artist for each track; the full contributors array (feat., remix collaborators, producers credited as artists) lives on `/track/<id>` and gets parsed by `_build_enhanced_track`. Without the upgrade Deezer-sourced tracks never got multi-artist tags even with the right settings on. Fix in `core/metadata/source.py`: when source==deezer AND the search response had a single artist AND a track_id is available, fetch full track details via `get_deezer_client().get_track_details` and replace `all_artists` with the upgraded list. - One extra API call per affected Deezer track - Skipped when search already returned multiple (no-op fast path) - Skipped for non-Deezer sources (Spotify/Tidal/iTunes search responses already include all artists) - Skipped when no track_id is available - Defensive try/except: on /track/<id> failure (network error, deezer client unavailable), fall through to the search-result list — never lose the data we already had 2. Double-append guard hardened with a word-boundary regex. Prior commit checked for `"feat." not in title.lower() and "(ft." not in title.lower()` — too narrow. Source platforms produce wildly different feat-marker conventions: "(feat. X)", "(Feat X)", "(FEAT X)", "(Featuring X)", "[feat. X]", "ft. X" (no parens), "FT. X", etc. Any of these as the SOURCE title would cause a double-append: `"Track (Feat X) (feat. Y)"`. Replaced with `re.search(r'\b(?:feat|feat\.|featuring|ft|ft\.)\b', title, IGNORECASE)`. Word-boundary regex catches every common variant. Substring matches like "Aftermath" containing `ft` correctly fall through to the append path (pinned by a regression test). 16 new tests (29 total in the file): - 9 parametrized variants of the double-append guard - 1 substring guard ("Aftermath") - 6 Deezer upgrade scenarios (fires when expected, doesn't fire for non-Deezer / multi-artist search / no track_id, defensive fall-through on failure, no false-positive when /track/<id> confirms single artist) Full pytest 2727 passed.
2 months ago · d5de724f9b
parent c11a5b7eab
commit d5de724f9b
3 changed files with 208 additions and 1 deletions
--- a/core/metadata/source.py
+++ b/core/metadata/source.py
@ -917,6 +917,35 @@ def extract_source_metadata(context: dict, artist: dict, album_info: dict) -> di
                all_artists.append(artist_item)
            else:
                all_artists.append(str(artist_item))
+
+        # Deezer upgrade path: Deezer's `/search` endpoint only returns
+        # the primary artist for each track. The full contributors
+        # array (feat., remix collaborators, producers credited as
+        # artists) lives on `/track/<id>` and gets parsed by
+        # `_build_enhanced_track`. Without this upgrade Deezer-sourced
+        # tracks never get multi-artist tags even with the right
+        # settings on. One extra API call per Deezer-sourced track,
+        # only when the search response had a single artist (so it's
+        # a no-op when search already returned multiple).
+        if (source == "deezer" and len(all_artists) == 1
+                and source_ids.get("track_id")):
+            try:
+                from core.metadata import get_deezer_client
+                deezer = get_deezer_client()
+                if deezer:
+                    full = deezer.get_track_details(str(source_ids["track_id"]))
+                    if full and isinstance(full.get("artists"), list) and len(full["artists"]) > 1:
+                        upgraded = [a for a in full["artists"] if a]
+                        if upgraded:
+                            logger.info(
+                                "Metadata: Deezer contributors upgrade — search returned "
+                                "%d artist, /track/<id> returned %d (%s)",
+                                len(all_artists), len(upgraded), upgraded,
+                            )
+                            all_artists = upgraded
+            except Exception as e:
+                logger.debug("Deezer contributors upgrade failed: %s", e)
+
        # Store the multi-artist list so the enrichment writer can emit
        # proper multi-value ARTIST tags (TPE1 multi-value for ID3,
        # "artists" key for Vorbis) when `write_multi_artist` is on.
--- a/tests/metadata/test_multi_artist_tag_settings.py
+++ b/tests/metadata/test_multi_artist_tag_settings.py
@ -290,3 +290,181 @@ class TestSettingsCombination:
        assert meta["artist"] == "A, B"
        assert meta["title"] == "Sample Track"
        assert meta["_artists_list"] == ["A", "B"]
+
+
+# ---------------------------------------------------------------------------
+# Deezer-specific: upgrade single-artist search results via /track/<id>
+# ---------------------------------------------------------------------------
+#
+# Deezer's `/search` endpoint only returns the primary artist for each
+# track. The full contributors array (feat., remix collaborators,
+# producers credited as artists) lives on `/track/<id>`. Reporter said
+# their Retag flow worked because it called the per-track endpoint, but
+# the initial enrichment used search-result data and missed the
+# contributors. The fix: when source==deezer AND search returned only
+# one artist AND a track_id is available, fetch the full track details
+# and upgrade the artists list.
+
+
+class TestDeezerContributorsUpgrade:
+    def test_upgrades_when_deezer_search_returns_single_artist(self):
+        """Reporter's exact case: Deezer track with multiple
+        contributors, search returns just the primary, /track/<id>
+        returns all 3. Upgrade path fetches the full set."""
+        from core.metadata import source as src_module
+
+        context = {
+            "original_search_result": {
+                "title": "Collab Track",
+                "artists": [{"name": "Primary"}],  # Only one — search-response shape
+            },
+            "source": "deezer",
+        }
+        # track_id resolved via original_search_result.id by get_import_source_ids
+        context["original_search_result"]["id"] = "12345"
+
+        fake_deezer = SimpleNamespace(get_track_details=MagicMock(return_value={
+            "id": "12345",
+            "name": "Collab Track",
+            "artists": ["Primary", "Featured1", "Featured2"],  # Full contributors
+        }))
+
+        cfg_overrides = {"metadata_enhancement.tags.artist_separator": "; "}
+
+        with patch.object(src_module, "get_config_manager", return_value=_make_cfg(cfg_overrides)), \
+             patch("core.metadata.get_deezer_client", return_value=fake_deezer):
+            meta = src_module.extract_source_metadata(context, {"name": "Primary"}, {})
+
+        # Upgraded list reaches the multi-value tag
+        assert meta["_artists_list"] == ["Primary", "Featured1", "Featured2"]
+        # And the joined ARTIST string respects the separator
+        assert meta["artist"] == "Primary; Featured1; Featured2"
+        fake_deezer.get_track_details.assert_called_once_with("12345")
+
+    def test_no_upgrade_when_search_already_returned_multiple(self):
+        """When search already has multiple artists, skip the upgrade —
+        no extra API call needed."""
+        from core.metadata import source as src_module
+
+        context = {
+            "original_search_result": {
+                "title": "T",
+                "artists": [{"name": "A"}, {"name": "B"}],  # Already multi
+            },
+            "source": "deezer",
+        }
+        # track_id resolved via original_search_result.id by get_import_source_ids
+        context["original_search_result"]["id"] = "12345"
+
+        fake_deezer = SimpleNamespace(get_track_details=MagicMock())
+
+        with patch.object(src_module, "get_config_manager", return_value=_make_cfg()), \
+             patch("core.metadata.get_deezer_client", return_value=fake_deezer):
+            meta = src_module.extract_source_metadata(context, {"name": "A"}, {})
+
+        assert meta["_artists_list"] == ["A", "B"]
+        # No upgrade call — search already had what we needed
+        fake_deezer.get_track_details.assert_not_called()
+
+    def test_no_upgrade_for_non_deezer_sources(self):
+        """Spotify/iTunes/Tidal already return multi-artist in search,
+        so the Deezer-specific upgrade path must NOT fire for them.
+        Otherwise we'd be making redundant API calls."""
+        from core.metadata import source as src_module
+
+        context = {
+            "original_search_result": {
+                "title": "T",
+                "artists": [{"name": "A"}],
+            },
+            "source": "spotify",
+            "source_track_id": "12345",
+        }
+
+        fake_deezer = SimpleNamespace(get_track_details=MagicMock())
+
+        with patch.object(src_module, "get_config_manager", return_value=_make_cfg()), \
+             patch("core.metadata.get_deezer_client", return_value=fake_deezer):
+            meta = src_module.extract_source_metadata(context, {"name": "A"}, {})
+
+        # Single artist preserved, no Deezer upgrade attempted
+        assert meta["_artists_list"] == ["A"]
+        fake_deezer.get_track_details.assert_not_called()
+
+    def test_upgrade_failure_falls_through_to_search_result(self):
+        """Defensive: if /track/<id> fails (network error, deezer
+        client unavailable), fall through to the search-result list.
+        Don't lose the single-artist data we already had."""
+        from core.metadata import source as src_module
+
+        context = {
+            "original_search_result": {
+                "title": "T",
+                "artists": [{"name": "Primary"}],
+            },
+            "source": "deezer",
+        }
+        # track_id resolved via original_search_result.id by get_import_source_ids
+        context["original_search_result"]["id"] = "12345"
+
+        fake_deezer = SimpleNamespace(get_track_details=MagicMock(
+            side_effect=RuntimeError("network down"),
+        ))
+
+        with patch.object(src_module, "get_config_manager", return_value=_make_cfg()), \
+             patch("core.metadata.get_deezer_client", return_value=fake_deezer):
+            meta = src_module.extract_source_metadata(context, {"name": "Primary"}, {})
+
+        # Search-result list preserved
+        assert meta["_artists_list"] == ["Primary"]
+        assert meta["artist"] == "Primary"
+
+    def test_upgrade_returns_same_count_no_change(self):
+        """Edge: /track/<id> returns the same single artist (track
+        genuinely has one artist on Deezer too). Should preserve the
+        list without false-positive upgrade."""
+        from core.metadata import source as src_module
+
+        context = {
+            "original_search_result": {
+                "title": "T",
+                "artists": [{"name": "Solo"}],
+            },
+            "source": "deezer",
+        }
+        # track_id resolved via original_search_result.id by get_import_source_ids
+        context["original_search_result"]["id"] = "12345"
+
+        fake_deezer = SimpleNamespace(get_track_details=MagicMock(return_value={
+            "id": "12345",
+            "artists": ["Solo"],  # Same single artist confirmed
+        }))
+
+        with patch.object(src_module, "get_config_manager", return_value=_make_cfg()), \
+             patch("core.metadata.get_deezer_client", return_value=fake_deezer):
+            meta = src_module.extract_source_metadata(context, {"name": "Solo"}, {})
+
+        assert meta["_artists_list"] == ["Solo"]
+
+    def test_no_upgrade_when_no_track_id(self):
+        """Edge: source==deezer but no track_id. Can't call
+        /track/<id> without an id. Don't attempt the upgrade."""
+        from core.metadata import source as src_module
+
+        context = {
+            "original_search_result": {
+                "title": "T",
+                "artists": [{"name": "Primary"}],
+            },
+            "source": "deezer",
+            "source_track_id": "",  # Missing
+        }
+
+        fake_deezer = SimpleNamespace(get_track_details=MagicMock())
+
+        with patch.object(src_module, "get_config_manager", return_value=_make_cfg()), \
+             patch("core.metadata.get_deezer_client", return_value=fake_deezer):
+            meta = src_module.extract_source_metadata(context, {"name": "Primary"}, {})
+
+        assert meta["_artists_list"] == ["Primary"]
+        fake_deezer.get_track_details.assert_not_called()
--- a/webui/static/helper.js
+++ b/webui/static/helper.js
@ -3416,7 +3416,7 @@ const WHATS_NEW = {
    '2.5.1': [
        // --- post-release patch work on the 2.5.1 line — entries hidden by _getLatestWhatsNewVersion until the build version bumps ---
        { date: 'Unreleased — 2.5.1 patch work' },
-        { title: 'Multi-Artist Tag Settings Now Actually Work (artist_separator + feat_in_title + write_multi_artist)', desc: 'three settings on settings → metadata → tags were partially or completely unimplemented. (1) `write_multi_artist` only worked because of a never-populated `_artists_list` field — `core/metadata/source.py` built `metadata["artist"]` as a hardcoded ", "-joined string but never assigned `metadata["_artists_list"]`, so `core/metadata/enrichment.py:114` always saw an empty list and silently no-op\'d the multi-value tag write. (2) `artist_separator` (default ", ") was referenced in the UI + settings.js save path but ZERO python code read the value — every multi-artist track ended up with hardcoded ", " regardless of what the user picked. (3) `feat_in_title` (when true: pull featured artists into the title as " (feat. X, Y)" and leave only primary in the ARTIST tag — picard convention) had no implementation at all. fix in source.py: populate `_artists_list` from the search response\'s artists array, then build the ARTIST string per the user\'s settings — primary-only when feat_in_title is on (with featured names appended to title; double-append guarded for source titles that already include "feat."), else joined with the configured separator. fix in enrichment.py id3 path: writing TPE1 twice (single-string then list) was overwriting the configured separator. now keeps TPE1 as the display string and writes a separate `TXXX:Artists` frame for the multi-value list (picard convention). vorbis path was already correct (separate "artist" + "artists" keys). known limitation: deezer\'s `/search` endpoint only returns the primary artist — the full contributors array lives on `/track/<id>`. enrichment uses search-result data so deezer-sourced tracks may still get only the primary artist until a later fix wires the per-track contributors fetch into the enrichment flow. spotify, tidal, itunes search responses include all artists so they work now. 13 new tests pin: `_artists_list` populated for multi/single/no-artist cases, separator drives ARTIST string (default + custom), single-artist case unaffected by either setting, feat_in_title pulls featured to title + leaves primary in ARTIST, feat_in_title no-op for single artist, double-append guard for source titles containing "feat.", combined-settings precedence (feat_in_title wins over separator for ARTIST string but `_artists_list` carries everyone for the multi-value tag).', page: 'settings' },
+        { title: 'Multi-Artist Tag Settings Now Actually Work (artist_separator + feat_in_title + write_multi_artist)', desc: 'three settings on settings → metadata → tags were partially or completely unimplemented. (1) `write_multi_artist` only worked because of a never-populated `_artists_list` field — `core/metadata/source.py` built `metadata["artist"]` as a hardcoded ", "-joined string but never assigned `metadata["_artists_list"]`, so `core/metadata/enrichment.py:114` always saw an empty list and silently no-op\'d the multi-value tag write. (2) `artist_separator` (default ", ") was referenced in the UI + settings.js save path but ZERO python code read the value — every multi-artist track ended up with hardcoded ", " regardless of what the user picked. (3) `feat_in_title` (when true: pull featured artists into the title as " (feat. X, Y)" and leave only primary in the ARTIST tag — picard convention) had no implementation at all. fix in source.py: populate `_artists_list` from the search response\'s artists array, then build the ARTIST string per the user\'s settings — primary-only when feat_in_title is on (with featured names appended to title; double-append guarded for source titles that already include "feat."), else joined with the configured separator. fix in enrichment.py id3 path: writing TPE1 twice (single-string then list) was overwriting the configured separator. now keeps TPE1 as the display string and writes a separate `TXXX:Artists` frame for the multi-value list (picard convention). vorbis path was already correct (separate "artist" + "artists" keys). deezer-specific upgrade path: deezer\'s `/search` endpoint only returns the primary artist — full contributors live on `/track/<id>`. when source==deezer AND the search response had a single artist AND a track_id is available, enrichment now fetches the per-track endpoint and upgrades the artists list before tag-write. one extra API call per affected deezer track (skipped when search already returned multiple). spotify, tidal, itunes search responses already include all artists so they\'re unaffected. 29 new tests pin: `_artists_list` populated for multi/single/no-artist cases, separator drives ARTIST string (default + custom), single-artist case unaffected by either setting, feat_in_title pulls featured to title + leaves primary in ARTIST, feat_in_title no-op for single artist, double-append guard recognizes 9 source-title variants ("(feat. X)", "(Feat. X)", "(FEAT X)", "(feat X)", "(Featuring X)", "[feat. X]", "ft. X", "(ft X)", "FT. X"), word-boundary regex doesn\'t false-match substrings ("Aftermath" still gets the append), combined-settings precedence (feat_in_title wins over separator for ARTIST string but `_artists_list` carries everyone for the multi-value tag), deezer upgrade fires only when search returned single artist + track_id available, no upgrade for non-deezer sources, upgrade failure falls through to search-result list, no false-positive when /track/<id> confirms single artist.', page: 'settings' },
        { title: 'AudioDB Enrichment: Track Worker No Longer Stuck In Infinite Retry Loop', desc: 'github issue #553: audiodb track enrichment "stuck" — constant requests, no progress, only error log was a 10s read-timeout from `lookup_track_by_id` repeating against the same track. trace: when an entity already has `audiodb_id` populated (from manual match or earlier scan) but `audiodb_match_status` is NULL, the worker tries a direct ID lookup. if it fails (returns None on timeout — audiodb\'s `track.php` endpoint is slow, 10s timeouts common), the prior code logged "preserving manual match" and returned WITHOUT marking status. row stayed NULL → queue picked it up next tick → tried direct lookup → timed out → returned → infinite loop. fix: (1) when direct lookup fails (None or exception), mark `audiodb_match_status="error"` so the queue\'s NULL-status filter stops re-picking the row on every tick. preserves the existing `audiodb_id` (no fallback to name-search guess that would overwrite a manual match). (2) extended the retry-after-cutoff queue priorities (4/5/6) to include `\'error\'` rows alongside `\'not_found\'` — same `retry_days=30` window. transient audiodb outages still recover automatically; permanently-broken IDs eventually get re-attempted once a month. only triggered for entities in the inconsistent state of `audiodb_id` set + `match_status` NULL — happy path and already-matched/already-not-found rows unchanged. 5 new tests pin: lookup-returns-none marks error (no infinite loop), lookup-raises-exception marks error, lookup-success preserves happy path, error-row-past-cutoff gets re-picked, error-row-within-cutoff stays skipped.', page: 'tools' },
        { title: 'Docker: Container No Longer Restart-Loops On Bind-Mounted Staging Folder', desc: 'after pulling latest, the container refused to start. logs showed `mkdir: cannot create directory \'/app/Staging\': Permission denied`. cause traced back to the 2026-05-08 image-bloat fix (commit 70e1750) which changed the Dockerfile from `chown -R /app` to a scoped chown on specific subdirs (the recursive chown was duplicating the whole /app tree into a new layer and ballooning image size). side effect: `/app` itself went from soulsync:soulsync to root:root (Docker WORKDIR default), AND `/app/Staging` was left out of both the Dockerfile mkdir + chown list and only created at runtime by the entrypoint script. on rootless Docker / Podman where in-container "root" maps to a host UID, the entrypoint mkdir on `/app/Staging` could fail with EACCES depending on the bind-mount path\'s host ownership — `set -e` then aborted the script and the container restart-looped. fix: (1) Dockerfile now pre-bakes `/app/Staging` into the image alongside the other runtime mount points (mkdir + scoped chown) so the entrypoint mkdir is a guaranteed no-op even when bind-mount perms are weird. (2) entrypoint mkdir + chown both have `|| true` now so any future bind-mount permission quirk surfaces as a log line, not a restart loop. (3) new writability audit at the end of entrypoint setup — `gosu soulsync test -w` on every bind-mountable dir, logs a loud warning with the exact `chown` command to run on the host if perms mismatch the configured PUID/PGID. catches the underlying bind-mount perm issue that the restart-loop fix would otherwise mask (container starts, but auto-import / downloads write into unwritable dirs and fail silently). zero behavior change for users whose containers were already starting fine; defensive against the rootless/podman config that broke after the image-bloat refactor.', page: 'tools' },
        { title: 'Your Albums: Download Missing Now Opens Selectable Modal + Tidal Resolution', desc: 'two-part fix to the your albums "download missing" flow on discover. (1) replaced the broken per-album direct-download loop with a selectable-grid modal mirroring the library page\'s download discography flow. clicking the download button now opens a checkbox grid showing every missing album (cover, title, artist, year, track count, source) with select all / deselect all controls. user picks what they actually want, hits "add to wishlist", each album\'s tracks get resolved + queued through the existing wishlist auto-download processor. matches the discography flow\'s per-album ndjson progress stream so users see ✓/✗ per album as it processes. previous loop fired direct downloads via `openDownloadMissingModalForYouTube` which the user reported as silently failing — "queuing 2/2" toast with no actual transfer activity. wishlist is the right destination for batch missing-album adds since it already handles retry, source fallback, dedup, and rate limiting. (2) added tidal source resolution. backend `/api/discover/album/<source>/<album_id>` got a new `tidal` source branch that calls a NEW `tidal_client.get_album_tracks(album_id)` method — two-phase fetch (cursor-walk `/v2/albums/<id>/relationships/items?include=items` for track refs + position metadata, batch-hydrate via existing `_get_tracks_batch` for artist/album names). track refs carry `meta.trackNumber` + `meta.volumeNumber` so multi-disc compilations render in album order. inline `?include=coverArt` lookup pulls the album cover too. single-album click flow (`openYourAlbumDownload`) gets `tidal_album_id` added to `trySources`. virtual-id generation includes tidal_album_id for stable identifiers. backend reuses the existing `/api/artist/<id>/download-discography` endpoint — its url artist_id param is functionally unused (per-album payload carries everything), so the modal posts with placeholder `your-albums` and gets multi-artist resolution for free. 10 new tests pin the tidal album-tracks method: single-page walk + hydration, multi-page cursor chain, multi-disc sort order, limit short-circuit, no-token short-circuit, http error returns empty, 429 propagates to rate_limited decorator, forward-compat type filter, partial-batch failure containment, empty-album short-circuit.', page: 'discover' },