Multi-artist Deezer upgrade + double-append guard hardening

Two follow-ups to the multi-artist tag settings PR:

1. Deezer contributors upgrade — closes the "known limitation"
   flagged in the prior commit. Deezer's `/search` endpoint only
   returns the primary artist for each track; the full contributors
   array (feat., remix collaborators, producers credited as artists)
   lives on `/track/<id>` and gets parsed by `_build_enhanced_track`.
   Without the upgrade Deezer-sourced tracks never got multi-artist
   tags even with the right settings on.

   Fix in `core/metadata/source.py`: when source==deezer AND the
   search response had a single artist AND a track_id is available,
   fetch full track details via `get_deezer_client().get_track_details`
   and replace `all_artists` with the upgraded list.

   - One extra API call per affected Deezer track
   - Skipped when search already returned multiple (no-op fast path)
   - Skipped for non-Deezer sources (Spotify/Tidal/iTunes search
     responses already include all artists)
   - Skipped when no track_id is available
   - Defensive try/except: on /track/<id> failure (network error,
     deezer client unavailable), fall through to the search-result
     list — never lose the data we already had

2. Double-append guard hardened with a word-boundary regex.
   Prior commit checked for `"feat." not in title.lower() and "(ft."
   not in title.lower()` — too narrow. Source platforms produce
   wildly different feat-marker conventions: "(feat. X)", "(Feat X)",
   "(FEAT X)", "(Featuring X)", "[feat. X]", "ft. X" (no parens),
   "FT. X", etc. Any of these as the SOURCE title would cause a
   double-append: `"Track (Feat X) (feat. Y)"`.

   Replaced with `re.search(r'\b(?:feat|feat\.|featuring|ft|ft\.)\b',
   title, IGNORECASE)`. Word-boundary regex catches every common
   variant. Substring matches like "Aftermath" containing `ft`
   correctly fall through to the append path (pinned by a regression
   test).

16 new tests (29 total in the file):
- 9 parametrized variants of the double-append guard
- 1 substring guard ("Aftermath")
- 6 Deezer upgrade scenarios (fires when expected, doesn't fire
  for non-Deezer / multi-artist search / no track_id, defensive
  fall-through on failure, no false-positive when /track/<id>
  confirms single artist)

Full pytest 2727 passed.
pull/556/head
Broque Thomas 2 months ago
parent c11a5b7eab
commit d5de724f9b

@ -917,6 +917,35 @@ def extract_source_metadata(context: dict, artist: dict, album_info: dict) -> di
all_artists.append(artist_item)
else:
all_artists.append(str(artist_item))
# Deezer upgrade path: Deezer's `/search` endpoint only returns
# the primary artist for each track. The full contributors
# array (feat., remix collaborators, producers credited as
# artists) lives on `/track/<id>` and gets parsed by
# `_build_enhanced_track`. Without this upgrade Deezer-sourced
# tracks never get multi-artist tags even with the right
# settings on. One extra API call per Deezer-sourced track,
# only when the search response had a single artist (so it's
# a no-op when search already returned multiple).
if (source == "deezer" and len(all_artists) == 1
and source_ids.get("track_id")):
try:
from core.metadata import get_deezer_client
deezer = get_deezer_client()
if deezer:
full = deezer.get_track_details(str(source_ids["track_id"]))
if full and isinstance(full.get("artists"), list) and len(full["artists"]) > 1:
upgraded = [a for a in full["artists"] if a]
if upgraded:
logger.info(
"Metadata: Deezer contributors upgrade — search returned "
"%d artist, /track/<id> returned %d (%s)",
len(all_artists), len(upgraded), upgraded,
)
all_artists = upgraded
except Exception as e:
logger.debug("Deezer contributors upgrade failed: %s", e)
# Store the multi-artist list so the enrichment writer can emit
# proper multi-value ARTIST tags (TPE1 multi-value for ID3,
# "artists" key for Vorbis) when `write_multi_artist` is on.

@ -290,3 +290,181 @@ class TestSettingsCombination:
assert meta["artist"] == "A, B"
assert meta["title"] == "Sample Track"
assert meta["_artists_list"] == ["A", "B"]
# ---------------------------------------------------------------------------
# Deezer-specific: upgrade single-artist search results via /track/<id>
# ---------------------------------------------------------------------------
#
# Deezer's `/search` endpoint only returns the primary artist for each
# track. The full contributors array (feat., remix collaborators,
# producers credited as artists) lives on `/track/<id>`. Reporter said
# their Retag flow worked because it called the per-track endpoint, but
# the initial enrichment used search-result data and missed the
# contributors. The fix: when source==deezer AND search returned only
# one artist AND a track_id is available, fetch the full track details
# and upgrade the artists list.
class TestDeezerContributorsUpgrade:
def test_upgrades_when_deezer_search_returns_single_artist(self):
"""Reporter's exact case: Deezer track with multiple
contributors, search returns just the primary, /track/<id>
returns all 3. Upgrade path fetches the full set."""
from core.metadata import source as src_module
context = {
"original_search_result": {
"title": "Collab Track",
"artists": [{"name": "Primary"}], # Only one — search-response shape
},
"source": "deezer",
}
# track_id resolved via original_search_result.id by get_import_source_ids
context["original_search_result"]["id"] = "12345"
fake_deezer = SimpleNamespace(get_track_details=MagicMock(return_value={
"id": "12345",
"name": "Collab Track",
"artists": ["Primary", "Featured1", "Featured2"], # Full contributors
}))
cfg_overrides = {"metadata_enhancement.tags.artist_separator": "; "}
with patch.object(src_module, "get_config_manager", return_value=_make_cfg(cfg_overrides)), \
patch("core.metadata.get_deezer_client", return_value=fake_deezer):
meta = src_module.extract_source_metadata(context, {"name": "Primary"}, {})
# Upgraded list reaches the multi-value tag
assert meta["_artists_list"] == ["Primary", "Featured1", "Featured2"]
# And the joined ARTIST string respects the separator
assert meta["artist"] == "Primary; Featured1; Featured2"
fake_deezer.get_track_details.assert_called_once_with("12345")
def test_no_upgrade_when_search_already_returned_multiple(self):
"""When search already has multiple artists, skip the upgrade —
no extra API call needed."""
from core.metadata import source as src_module
context = {
"original_search_result": {
"title": "T",
"artists": [{"name": "A"}, {"name": "B"}], # Already multi
},
"source": "deezer",
}
# track_id resolved via original_search_result.id by get_import_source_ids
context["original_search_result"]["id"] = "12345"
fake_deezer = SimpleNamespace(get_track_details=MagicMock())
with patch.object(src_module, "get_config_manager", return_value=_make_cfg()), \
patch("core.metadata.get_deezer_client", return_value=fake_deezer):
meta = src_module.extract_source_metadata(context, {"name": "A"}, {})
assert meta["_artists_list"] == ["A", "B"]
# No upgrade call — search already had what we needed
fake_deezer.get_track_details.assert_not_called()
def test_no_upgrade_for_non_deezer_sources(self):
"""Spotify/iTunes/Tidal already return multi-artist in search,
so the Deezer-specific upgrade path must NOT fire for them.
Otherwise we'd be making redundant API calls."""
from core.metadata import source as src_module
context = {
"original_search_result": {
"title": "T",
"artists": [{"name": "A"}],
},
"source": "spotify",
"source_track_id": "12345",
}
fake_deezer = SimpleNamespace(get_track_details=MagicMock())
with patch.object(src_module, "get_config_manager", return_value=_make_cfg()), \
patch("core.metadata.get_deezer_client", return_value=fake_deezer):
meta = src_module.extract_source_metadata(context, {"name": "A"}, {})
# Single artist preserved, no Deezer upgrade attempted
assert meta["_artists_list"] == ["A"]
fake_deezer.get_track_details.assert_not_called()
def test_upgrade_failure_falls_through_to_search_result(self):
"""Defensive: if /track/<id> fails (network error, deezer
client unavailable), fall through to the search-result list.
Don't lose the single-artist data we already had."""
from core.metadata import source as src_module
context = {
"original_search_result": {
"title": "T",
"artists": [{"name": "Primary"}],
},
"source": "deezer",
}
# track_id resolved via original_search_result.id by get_import_source_ids
context["original_search_result"]["id"] = "12345"
fake_deezer = SimpleNamespace(get_track_details=MagicMock(
side_effect=RuntimeError("network down"),
))
with patch.object(src_module, "get_config_manager", return_value=_make_cfg()), \
patch("core.metadata.get_deezer_client", return_value=fake_deezer):
meta = src_module.extract_source_metadata(context, {"name": "Primary"}, {})
# Search-result list preserved
assert meta["_artists_list"] == ["Primary"]
assert meta["artist"] == "Primary"
def test_upgrade_returns_same_count_no_change(self):
"""Edge: /track/<id> returns the same single artist (track
genuinely has one artist on Deezer too). Should preserve the
list without false-positive upgrade."""
from core.metadata import source as src_module
context = {
"original_search_result": {
"title": "T",
"artists": [{"name": "Solo"}],
},
"source": "deezer",
}
# track_id resolved via original_search_result.id by get_import_source_ids
context["original_search_result"]["id"] = "12345"
fake_deezer = SimpleNamespace(get_track_details=MagicMock(return_value={
"id": "12345",
"artists": ["Solo"], # Same single artist confirmed
}))
with patch.object(src_module, "get_config_manager", return_value=_make_cfg()), \
patch("core.metadata.get_deezer_client", return_value=fake_deezer):
meta = src_module.extract_source_metadata(context, {"name": "Solo"}, {})
assert meta["_artists_list"] == ["Solo"]
def test_no_upgrade_when_no_track_id(self):
"""Edge: source==deezer but no track_id. Can't call
/track/<id> without an id. Don't attempt the upgrade."""
from core.metadata import source as src_module
context = {
"original_search_result": {
"title": "T",
"artists": [{"name": "Primary"}],
},
"source": "deezer",
"source_track_id": "", # Missing
}
fake_deezer = SimpleNamespace(get_track_details=MagicMock())
with patch.object(src_module, "get_config_manager", return_value=_make_cfg()), \
patch("core.metadata.get_deezer_client", return_value=fake_deezer):
meta = src_module.extract_source_metadata(context, {"name": "Primary"}, {})
assert meta["_artists_list"] == ["Primary"]
fake_deezer.get_track_details.assert_not_called()

@ -3416,7 +3416,7 @@ const WHATS_NEW = {
'2.5.1': [
// --- post-release patch work on the 2.5.1 line — entries hidden by _getLatestWhatsNewVersion until the build version bumps ---
{ date: 'Unreleased — 2.5.1 patch work' },
{ title: 'Multi-Artist Tag Settings Now Actually Work (artist_separator + feat_in_title + write_multi_artist)', desc: 'three settings on settings → metadata → tags were partially or completely unimplemented. (1) `write_multi_artist` only worked because of a never-populated `_artists_list` field — `core/metadata/source.py` built `metadata["artist"]` as a hardcoded ", "-joined string but never assigned `metadata["_artists_list"]`, so `core/metadata/enrichment.py:114` always saw an empty list and silently no-op\'d the multi-value tag write. (2) `artist_separator` (default ", ") was referenced in the UI + settings.js save path but ZERO python code read the value — every multi-artist track ended up with hardcoded ", " regardless of what the user picked. (3) `feat_in_title` (when true: pull featured artists into the title as " (feat. X, Y)" and leave only primary in the ARTIST tag — picard convention) had no implementation at all. fix in source.py: populate `_artists_list` from the search response\'s artists array, then build the ARTIST string per the user\'s settings — primary-only when feat_in_title is on (with featured names appended to title; double-append guarded for source titles that already include "feat."), else joined with the configured separator. fix in enrichment.py id3 path: writing TPE1 twice (single-string then list) was overwriting the configured separator. now keeps TPE1 as the display string and writes a separate `TXXX:Artists` frame for the multi-value list (picard convention). vorbis path was already correct (separate "artist" + "artists" keys). known limitation: deezer\'s `/search` endpoint only returns the primary artist — the full contributors array lives on `/track/<id>`. enrichment uses search-result data so deezer-sourced tracks may still get only the primary artist until a later fix wires the per-track contributors fetch into the enrichment flow. spotify, tidal, itunes search responses include all artists so they work now. 13 new tests pin: `_artists_list` populated for multi/single/no-artist cases, separator drives ARTIST string (default + custom), single-artist case unaffected by either setting, feat_in_title pulls featured to title + leaves primary in ARTIST, feat_in_title no-op for single artist, double-append guard for source titles containing "feat.", combined-settings precedence (feat_in_title wins over separator for ARTIST string but `_artists_list` carries everyone for the multi-value tag).', page: 'settings' },
{ title: 'Multi-Artist Tag Settings Now Actually Work (artist_separator + feat_in_title + write_multi_artist)', desc: 'three settings on settings → metadata → tags were partially or completely unimplemented. (1) `write_multi_artist` only worked because of a never-populated `_artists_list` field — `core/metadata/source.py` built `metadata["artist"]` as a hardcoded ", "-joined string but never assigned `metadata["_artists_list"]`, so `core/metadata/enrichment.py:114` always saw an empty list and silently no-op\'d the multi-value tag write. (2) `artist_separator` (default ", ") was referenced in the UI + settings.js save path but ZERO python code read the value — every multi-artist track ended up with hardcoded ", " regardless of what the user picked. (3) `feat_in_title` (when true: pull featured artists into the title as " (feat. X, Y)" and leave only primary in the ARTIST tag — picard convention) had no implementation at all. fix in source.py: populate `_artists_list` from the search response\'s artists array, then build the ARTIST string per the user\'s settings — primary-only when feat_in_title is on (with featured names appended to title; double-append guarded for source titles that already include "feat."), else joined with the configured separator. fix in enrichment.py id3 path: writing TPE1 twice (single-string then list) was overwriting the configured separator. now keeps TPE1 as the display string and writes a separate `TXXX:Artists` frame for the multi-value list (picard convention). vorbis path was already correct (separate "artist" + "artists" keys). deezer-specific upgrade path: deezer\'s `/search` endpoint only returns the primary artist — full contributors live on `/track/<id>`. when source==deezer AND the search response had a single artist AND a track_id is available, enrichment now fetches the per-track endpoint and upgrades the artists list before tag-write. one extra API call per affected deezer track (skipped when search already returned multiple). spotify, tidal, itunes search responses already include all artists so they\'re unaffected. 29 new tests pin: `_artists_list` populated for multi/single/no-artist cases, separator drives ARTIST string (default + custom), single-artist case unaffected by either setting, feat_in_title pulls featured to title + leaves primary in ARTIST, feat_in_title no-op for single artist, double-append guard recognizes 9 source-title variants ("(feat. X)", "(Feat. X)", "(FEAT X)", "(feat X)", "(Featuring X)", "[feat. X]", "ft. X", "(ft X)", "FT. X"), word-boundary regex doesn\'t false-match substrings ("Aftermath" still gets the append), combined-settings precedence (feat_in_title wins over separator for ARTIST string but `_artists_list` carries everyone for the multi-value tag), deezer upgrade fires only when search returned single artist + track_id available, no upgrade for non-deezer sources, upgrade failure falls through to search-result list, no false-positive when /track/<id> confirms single artist.', page: 'settings' },
{ title: 'AudioDB Enrichment: Track Worker No Longer Stuck In Infinite Retry Loop', desc: 'github issue #553: audiodb track enrichment "stuck" — constant requests, no progress, only error log was a 10s read-timeout from `lookup_track_by_id` repeating against the same track. trace: when an entity already has `audiodb_id` populated (from manual match or earlier scan) but `audiodb_match_status` is NULL, the worker tries a direct ID lookup. if it fails (returns None on timeout — audiodb\'s `track.php` endpoint is slow, 10s timeouts common), the prior code logged "preserving manual match" and returned WITHOUT marking status. row stayed NULL → queue picked it up next tick → tried direct lookup → timed out → returned → infinite loop. fix: (1) when direct lookup fails (None or exception), mark `audiodb_match_status="error"` so the queue\'s NULL-status filter stops re-picking the row on every tick. preserves the existing `audiodb_id` (no fallback to name-search guess that would overwrite a manual match). (2) extended the retry-after-cutoff queue priorities (4/5/6) to include `\'error\'` rows alongside `\'not_found\'` — same `retry_days=30` window. transient audiodb outages still recover automatically; permanently-broken IDs eventually get re-attempted once a month. only triggered for entities in the inconsistent state of `audiodb_id` set + `match_status` NULL — happy path and already-matched/already-not-found rows unchanged. 5 new tests pin: lookup-returns-none marks error (no infinite loop), lookup-raises-exception marks error, lookup-success preserves happy path, error-row-past-cutoff gets re-picked, error-row-within-cutoff stays skipped.', page: 'tools' },
{ title: 'Docker: Container No Longer Restart-Loops On Bind-Mounted Staging Folder', desc: 'after pulling latest, the container refused to start. logs showed `mkdir: cannot create directory \'/app/Staging\': Permission denied`. cause traced back to the 2026-05-08 image-bloat fix (commit 70e1750) which changed the Dockerfile from `chown -R /app` to a scoped chown on specific subdirs (the recursive chown was duplicating the whole /app tree into a new layer and ballooning image size). side effect: `/app` itself went from soulsync:soulsync to root:root (Docker WORKDIR default), AND `/app/Staging` was left out of both the Dockerfile mkdir + chown list and only created at runtime by the entrypoint script. on rootless Docker / Podman where in-container "root" maps to a host UID, the entrypoint mkdir on `/app/Staging` could fail with EACCES depending on the bind-mount path\'s host ownership — `set -e` then aborted the script and the container restart-looped. fix: (1) Dockerfile now pre-bakes `/app/Staging` into the image alongside the other runtime mount points (mkdir + scoped chown) so the entrypoint mkdir is a guaranteed no-op even when bind-mount perms are weird. (2) entrypoint mkdir + chown both have `|| true` now so any future bind-mount permission quirk surfaces as a log line, not a restart loop. (3) new writability audit at the end of entrypoint setup — `gosu soulsync test -w` on every bind-mountable dir, logs a loud warning with the exact `chown` command to run on the host if perms mismatch the configured PUID/PGID. catches the underlying bind-mount perm issue that the restart-loop fix would otherwise mask (container starts, but auto-import / downloads write into unwritable dirs and fail silently). zero behavior change for users whose containers were already starting fine; defensive against the rootless/podman config that broke after the image-bloat refactor.', page: 'tools' },
{ title: 'Your Albums: Download Missing Now Opens Selectable Modal + Tidal Resolution', desc: 'two-part fix to the your albums "download missing" flow on discover. (1) replaced the broken per-album direct-download loop with a selectable-grid modal mirroring the library page\'s download discography flow. clicking the download button now opens a checkbox grid showing every missing album (cover, title, artist, year, track count, source) with select all / deselect all controls. user picks what they actually want, hits "add to wishlist", each album\'s tracks get resolved + queued through the existing wishlist auto-download processor. matches the discography flow\'s per-album ndjson progress stream so users see ✓/✗ per album as it processes. previous loop fired direct downloads via `openDownloadMissingModalForYouTube` which the user reported as silently failing — "queuing 2/2" toast with no actual transfer activity. wishlist is the right destination for batch missing-album adds since it already handles retry, source fallback, dedup, and rate limiting. (2) added tidal source resolution. backend `/api/discover/album/<source>/<album_id>` got a new `tidal` source branch that calls a NEW `tidal_client.get_album_tracks(album_id)` method — two-phase fetch (cursor-walk `/v2/albums/<id>/relationships/items?include=items` for track refs + position metadata, batch-hydrate via existing `_get_tracks_batch` for artist/album names). track refs carry `meta.trackNumber` + `meta.volumeNumber` so multi-disc compilations render in album order. inline `?include=coverArt` lookup pulls the album cover too. single-album click flow (`openYourAlbumDownload`) gets `tidal_album_id` added to `trySources`. virtual-id generation includes tidal_album_id for stable identifiers. backend reuses the existing `/api/artist/<id>/download-discography` endpoint — its url artist_id param is functionally unused (per-album payload carries everything), so the modal posts with placeholder `your-albums` and gets multi-artist resolution for free. 10 new tests pin the tidal album-tracks method: single-page walk + hydration, multi-page cursor chain, multi-disc sort order, limit short-circuit, no-token short-circuit, http error returns empty, 429 propagates to rate_limited decorator, forward-compat type filter, partial-batch failure containment, empty-album short-circuit.', page: 'discover' },

Loading…
Cancel
Save