Defensive followup to the relevance fix. Deezer's advanced search
syntax (`artist:"X"`) is documented as substring match, but in
practice it's brittle on artist name variants ("Foreigner [US]",
"The Foreigner") and on tracks indexed under non-canonical title
spellings. When the advanced query returns nothing, we'd previously
land at "No matches" — a regression vs. pre-fix behaviour where
free-text would have returned a less-relevant but non-empty set.
Fix: when the advanced query returns 0 results AND the caller used
field-scoped kwargs, fall back to a free-text join of the same
kwargs and re-query. Caller-side rerank still tightens whatever the
fallback returns, so the worst-case post-fix behaviour is the
pre-fix behaviour — never strictly worse.
Pulled the cache + parse + store dance into a private helper
(`_search_tracks_with_query`) so the orchestration can call it
twice (advanced → fallback) without code duplication. Single API
call when the advanced query has results — no wasted requests.
Diagnostic logger.debug fires when the fallback triggers so we can
see in production whether it's happening (and to which queries).
# Tests added (4)
- `test_falls_back_to_free_text_when_advanced_empty` — advanced
query returns 0, free-text returns hits; client returns the
free-text hits + both API calls fire.
- `test_no_fallback_when_advanced_query_has_results` — single hit
on advanced query → no second API call.
- `test_no_fallback_when_legacy_free_text_call` — legacy callers
already exhausted the only path; empty result is final.
- `test_no_fallback_when_query_unchanged` — empty kwargs path
doesn't trigger the fallback branch (used_advanced=False).
# Existing tests updated
The 4 prior `TestSearchTracksQueryWiring` + `TestSearchTracksCacheKey`
tests were stubbing `_api_get` to return empty `{'data': []}` and
asserting `assert_called_once`. With the new fallback, those stubs
trigger a second API call and the assertions break — even though
the FIRST call construction is what the tests cared about. Updated
the stubs to return one fake hit so the fallback doesn't fire, and
switched to `call_args_list[0]` for first-call inspection.
# Verification
- 18/18 deezer query tests pass (14 prior + 4 new)
- 2445 full suite passes (+4 from prior commit)
- Ruff clean
// --- post-release patch work on the 2.4.3 line — entries hidden by _getLatestWhatsNewVersion until the build version bumps ---
{date:'Unreleased — 2.4.3 patch work'},
{title:'Search For Match: No More Karaoke / Cover / "Originally Performed By" Junk At The Top',desc:'github issue #534 (radoslav-orlov): typing "dirty white boy" + "foreigner" into the import-modal "search for match" dialog returned re-recordings, karaoke versions, "originally performed by" compilations, and tribute-band cuts ranked above the actual foreigner studio recording. user had to scroll past 5+ junk results before finding the canonical track. cause: deezer endpoint joined track + artist into a single free-text string and passed that to deezer\'s `q` param — which fuzzy-matches across title / lyrics / artist / album / contributors and orders by global popularity, so anything that appears across many compilations outranks the canonical track. fix has two layers. (1) deezer client now supports field-scoped kwargs (`track="X" artist="Y"`) which build deezer\'s advanced search syntax `track:"X" artist:"Y"` — massively tighter relevance because each term matches the right field instead of fuzzy-matching everywhere. backward compat preserved: legacy free-text callers still work. (2) new `core/metadata/relevance.py` helper reranks results locally with cover/karaoke/tribute/re-recorded penalties + exact-artist-match boost + variant-tag (live/acoustic/remix) penalty (skipped when user explicitly typed the variant). applied at the deezer + itunes + spotify search-tracks endpoints so all three sources behave consistently from the user\'s perspective. variant penalty only fires when user did NOT ask for the variant — searching "track (live)" still ranks live versions correctly. 71 new tests pin every component: pattern detection (10 cover patterns, 8 variant patterns, 3 fields), score composition (real-cut > karaoke > re-recorded), the issue #534 screenshot reproduced as a regression test, deezer client query construction, three search-modal endpoints end-to-end.',page:'import'},
{title:'Search For Match: No More Karaoke / Cover / "Originally Performed By" Junk At The Top',desc:'github issue #534 (radoslav-orlov): typing "dirty white boy" + "foreigner" into the import-modal "search for match" dialog returned re-recordings, karaoke versions, "originally performed by" compilations, and tribute-band cuts ranked above the actual foreigner studio recording. user had to scroll past 5+ junk results before finding the canonical track. cause: deezer endpoint joined track + artist into a single free-text string and passed that to deezer\'s `q` param — which fuzzy-matches across title / lyrics / artist / album / contributors and orders by global popularity, so anything that appears across many compilations outranks the canonical track. fix has three layers. (1) deezer client now supports field-scoped kwargs (`track="X" artist="Y"`) which build deezer\'s advanced search syntax `track:"X" artist:"Y"` — massively tighter relevance because each term matches the right field instead of fuzzy-matching everywhere. backward compat preserved: legacy free-text callers still work. (2) new `core/metadata/relevance.py` helper reranks results locally with cover/karaoke/tribute/re-recorded penalties + exact-artist-match boost + variant-tag (live/acoustic/remix) penalty (skipped when user explicitly typed the variant). applied at the deezer + itunes + spotify search-tracks endpoints so all three sources behave consistently from the user\'s perspective. variant penalty only fires when user did NOT ask for the variant — searching "track (live)" still ranks live versions correctly. (3) safety net: when deezer\'s advanced query returns 0 results (sometimes happens on artist name variants like "foreigner [us]" or non-canonical title spellings), client falls back to free-text search so the user never sees an empty result list when the API would have returned the prior less-relevant set. caller-side rerank still tightens whatever the fallback returns. 75 new tests pin every component: pattern detection (10 cover patterns, 8 variant patterns, 3 fields), score composition (real-cut > karaoke > re-recorded), the issue #534 screenshot reproduced as a regression test, deezer client query construction + free-text fallback path, three search-modal endpoints end-to-end.',page:'import'},
{title:'Auto-Import: Album Duration Is Album Total + Re-Imports Fill Metadata Gaps',desc:'two more parity gaps closed in the soulsync standalone library write path. (1) album row\'s `duration` column was being written with the FIRST imported track\'s duration instead of the album total — pre-existing bug that survived the prior parity commit. soulsync_client deep scan computes `sum(t.duration for t in self._tracks)` for each album; auto-import now mirrors that by computing the sum across every matched track in the worker and threading it through context to the album INSERT. (2) `record_soulsync_library_entry` was insert-only on artists + albums — once a row existed (matched by id OR name fallback), subsequent imports of the same artist or album skipped completely. meant: artist genres / thumb / source-id reflected ONLY whatever the FIRST imported album supplied, never refreshing as more albums by that artist landed (ten more deezer/spotify imports later, artist row still had whatever the first random import wrote). new conservative UPDATE path: when an existing row matches, fill ONLY the columns whose current value is NULL or empty — never overwrites populated values. protects manual edits + enrichment-worker writes the same way scanner UPDATEs preserve enrichment columns. f-string column names are validated against an allowlist (`_SOULSYNC_FILLABLE_COLUMNS`) before interpolation — defensive against accidental misuse adding columns without an allowlist update. 4 new tests pin: album duration uses sum not single-track, re-import fills empty thumb + genres on existing artist row, re-import does NOT clobber populated values, re-import fills empty source-id columns when later import has them.',page:'import'},
{title:'Auto-Import: Genre Tags Land On The Artists Row + ISRC/MBID Type Hardening',desc:'small followup to the standalone-library parity commit. (1) auto-import now reads the GENRE tag from each matched audio file (mutagen easy mode, supports flac / mp3 / m4a) and aggregates the deduped set across the album onto the new artists row\'s genres column. matches what soulsync_client._scan_transfer would have written if you\'d done a fresh deep scan after the import — your imported artists no longer feel hollow compared to plex / jellyfin / navidrome scans. dedup is case-insensitive but preserves original casing + insertion order so the json column reads naturally ("Hip-Hop, Rap, Trap" not "hip-hop, rap, trap"). (2) defensive `str()` cast on the worker\'s isrc + mbid extraction. metadata source clients all coerce to string today via `_build_album_track_entry`, but if a future source ever returned int / None for either id the side-effects layer would crash on `.strip()`. cheap insurance. 3 new tests pin: genre aggregation produces deduped insertion-order list, empty when no GENRE tags, isrc/mbid hostile-type input (int, None) coerced to safe string before propagation.',page:'import'},
{title:'Auto-Import: SoulSync Standalone Library Now Gets Full Server-Quality Rows',desc:'soulsync standalone is meant to be a full replacement for plex / jellyfin / navidrome — the imported tracks should land in the db with the same field richness a media server scan would write. they weren\'t. the auto-import context dict (the payload it handed to the post-process pipeline) had no `source` field anywhere, so `record_soulsync_library_entry` couldn\'t pick the right source-id column on the new tracks/albums/artists rows. result: every auto-imported track landed with NULL on `spotify_track_id` / `deezer_id` / `itunes_track_id` / etc. — watchlist scans (which match by stable source IDs) couldn\'t recognise these tracks as already in library and would re-download them on the next pass. fixed by threading `identification[\'source\']` onto the top-level context, plus per-recording IDs (`isrc`, `musicbrainz_recording_id`) onto track_info so picard-tagged libraries land their per-recording metadata directly. also extracted the artist source ID from the metadata source\'s search response (`_search_metadata_source` and `_search_single_track` now pull `best_result.artists[0][\'id\']`) and threaded it through identification → context → standalone library write, so the artists row finally gets its source-ID column populated instead of staying NULL forever. also added `_download_username=\'auto_import\'` so library history shows "Auto-Import" instead of mislabeling every staging import as "Soulseek" (the fallback default), and an "auto_import" → "Auto-Import" mapping in the source-map dicts at side_effects.py to honour it. record_soulsync_library_entry tracks INSERT now also writes `musicbrainz_recording_id` + `isrc` columns directly (matches the navidrome scanner write path). 17 new tests pin: auto-import context carries source for every metadata source (spotify/deezer/itunes/discogs), `_download_username=auto_import`, isrc + mbid pass-through to track_info, album-id back-reference on track_info, artist source-id flows from identification → context (and not from album_id, the prior copy-paste bug), `_search_metadata_source` extracts artist_id from search response, soulsync library writes mbid + isrc to dedicated columns, deezer source maps to deezer_id column, library history + provenance use Auto-Import / auto_import labels.',page:'import'},