mirror of https://github.com/Nezreka/SoulSync.git
dev
main
fix/quarantine-source-dedup
release/2.5.3
fix/disable-beatport-features
johnbaumb-discover-redesign
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
2.4.0
2.4.1
2.4.2
2.5.0
2.5.1
2.5.2
2.5.3
2.5.4
2.5.5
2.5.6
2.5.7
2.5.9
2.6.0
2.6.1
v0.65
${ noResults }
2 Commits (dev)
| Author | SHA1 | Message | Date |
|---|---|---|---|
|
|
402d851cac |
Deezer search: drop advanced-syntax at endpoint, free-text + rerank wins
Live-API verification revealed advanced-syntax queries hurt more than they help on this endpoint. Switching the import-modal Deezer search back to free-text + local rerank. # What live testing showed Hit Deezer's public API with both query forms for the issue #534 case (`Dirty White Boy` + `Foreigner`): **Free-text (`q=Dirty White Boy Foreigner`):** - Returns 21 results - Real Foreigner Head Games studio cut at #1 - Live versions at #2-10 - Karaoke / cover variants at #11-15 **Advanced (`q=track:"Dirty White Boy" artist:"Foreigner"`):** - Returns 12 results - "(2008 Remaster)" at #1 — canonical Head Games cut MISSING from top 8 entirely - Live + alt-album versions follow Advanced syntax DOES filter karaoke at the API level (none in the 12-result set vs. 5 at positions 11-15 in free-text), but it has its own ranking bias that surfaces remasters / "Best Of" cuts ahead of the canonical recording. Net regression for the user- facing goal. # Fix 1. Endpoint reverts to free-text query with local rerank applied. 2. Local rerank gains "remaster" / "remastered" / "reissue" patterns under VARIANT_TAG_PATTERNS (soft 0.4× penalty — user may want them but they shouldn't outrank the original). 3. Client kwarg support (`track=` / `artist=` / `album=`) preserved for future opt-in callers (e.g. exact-match flows where API- level filtering matters more than ranking). # Verified end-to-end against live Deezer API Re-ran the exact #534 case through the live API + new rerank. Top 15 results post-rerank: 1. Dirty White Boy — Foreigner — Head Games ← REAL CUT AT TOP 2-10. Various Live versions 11-15. Karaoke / cover / tribute variants ← BURIED Real Foreigner Head Games studio cut at #1, exactly the user's ask. # Tests - `test_relevance.py` — variant tag patterns extended; existing tests still pass (50 tests). - `test_search_match_endpoints.py::test_joins_track_and_artist_into_free_text_query` — replaces `test_passes_track_and_artist_as_kwargs`; verifies endpoint sends free-text join, NOT field-scoped kwargs (the prior test asserted the wrong direction now). - Karaoke-burying assertion at the endpoint still pins the user-visible behaviour. - Client kwarg path tests untouched (still pin advanced-syntax construction for future opt-in callers). # Verification - 75 relevance + endpoint + query tests pass - 2445 full suite passes - Ruff clean - Live Deezer API shows real cut at #1 post-rerank |
2 weeks ago |
|
|
1cc37081a6 |
Fix Deezer search relevance — issue #534
# Background User reported (#534) that the import-modal "Search for Match" dialog returned irrelevant results when Deezer was the metadata source. Searching `Dirty White Boy` + `Foreigner` returned 5+ karaoke / "originally performed by" / "in the style of" / "re-recorded" / tribute-band results ranked above the actual Foreigner studio cut from Head Games. User had to scroll past the junk every time, or fall back to iTunes search which is much slower. # Root cause — two layers 1. **Endpoint joined `track + artist` into free-text query.** `/api/deezer/search_tracks` was passing `q=Dirty White Boy Foreigner` to Deezer's `/search/track` API. Deezer fuzzy-matches that string across title / lyrics / artist / album / contributors and orders by global popularity — anything that appears across many compilations outranks the canonical recording. 2. **No local rerank.** None of the search-modal endpoints applied any post-filtering. Deezer's API order shipped straight to the user. # Fix — same architectural shape Cin would build ## Layer 1: field-scoped query at the client boundary `core/deezer_client.py::search_tracks()` now accepts optional `track`, `artist`, `album` kwargs. When provided, builds Deezer's advanced search syntax: `q=track:"X" artist:"Y" album:"Z"`. Massive relevance improvement because each term matches the right field instead of fuzzy-matching everywhere. Backward compat preserved: legacy free-text `query=` callers still work unchanged. Field-scoped path takes precedence when both are provided. Empty input fast-fails without an API call. Embedded double-quotes stripped (Deezer's syntax has no escape mechanism). ## Layer 2: provider-neutral relevance reranker New `core/metadata/relevance.py` module — pure-function rerank over the canonical `Track` dataclass. Composable scoring: - **Cover/karaoke patterns** (multiplier 0.05, effectively buries): matches "karaoke", "originally performed by", "in the style of", "made famous by", "tribute", "vocal version", "backing track", "cover version", "re-recorded", "cover by", etc. across title, album, AND artist fields. Catches the screenshot's exact junk: artist credits like "Pop Music Workshop" / "The Karaoke Channel" / "Foreigner Tribute Band". - **Variant tags** (multiplier 0.4): live / acoustic / demo / instrumental / remix / radio edit / club mix etc. — softer penalty since the user MAY want them. Skipped entirely when the expected_title contains the same tag (so searching "Track (Live)" still ranks Live versions first). - **Exact artist boost** (multiplier 1.5): primary artist exactly matches expected_artist after normalisation. Single strongest signal for "this is the canonical recording". - **Title + artist similarity** via SequenceMatcher (parentheticals + punctuation stripped before comparison). - **Album-type weighting**: album=1.0 > single/ep=0.85 > compilation=0.7. Compilations are more likely tribute / karaoke repackages. Each component is a standalone function so tests pin them individually without standing up the full pipeline. ## Wired at three search-modal endpoints - `/api/deezer/search_tracks` — uses both layers (field-scoped query + rerank). - `/api/itunes/search_tracks` — uses rerank only (iTunes API has no advanced-syntax search, but karaoke / cover variants still leak through and need the local penalty). - `/api/spotify/search_tracks` — already builds field-scoped `track:X artist:Y` query; rerank added as the consistency safety net so all three sources behave the same from the user's perspective. Other Deezer call sites (matching engine, watchlist scanner, auto-import single-track ID) deliberately not touched in this PR — they have their own elaborate scoring pipelines tuned to their specific contexts and aren't surfacing the user-reported issue. Per Cin: "don't refactor beyond what the task requires." # Tests 71 new tests across 3 files: - `tests/metadata/test_relevance.py` (50 tests) — every scoring component pinned individually + the issue #534 screenshot reproduced as a regression test (real Foreigner cut wins after rerank, karaoke variants drop to bottom). - `tests/metadata/test_deezer_search_query.py` (14 tests) — advanced-syntax query construction, field-scoped wiring at the client boundary, free-text path unchanged, kwargs win when ambiguous, limit clamping, cache key consistency. - `tests/imports/test_search_match_endpoints.py` (7 tests) — end-to-end through Flask test client: Deezer endpoint passes kwargs not joined query; karaoke buried at bottom for all three sources; legacy query param still works without rerank. # Verification - 2441 full suite passes (+71 from baseline 2370) - 0 failures (the prior watchdog flake fix held) - Ruff clean across all changed files - JS parses clean (`node -c webui/static/helper.js`) # Architectural standards followed - **Logic at the right boundary.** Query construction lives in the client (every caller benefits from one change). Rerank lives in a neutral module (`core/metadata/relevance.py`) over the canonical `Track` dataclass — works for any source, not Deezer- specific. - **Explicit > implicit.** Every scoring rule has its own named function. Pattern tables are module-level constants tests can introspect. - **Scope discipline.** Audited every Deezer search call site; fixed the user-reported one + the consistent siblings. Did NOT speculatively normalise every Deezer call across the codebase. - **Backward compat.** Free-text `query=` callers untouched. Kwargs added to existing client method signature with safe defaults. - **Tests pin contract at correct boundary.** Pure-function rerank tests don't mock anything; client-query tests stub at `_api_get`; endpoint tests run through the real Flask app. |
2 weeks ago |