SoulSync

Commit Graph

Author	SHA1	Message	Date
BoulderBadgeDad	b202c176f7	Cover-art sources: skip low-res art (min-resolution guard) + max-res iTunes Follow-up to the preferred-art feature. Real test runs showed a source could win on priority while handing back a small cover: Cover Art Archive is volunteer-uploaded with no size floor, so CAA-first gave a 599x531 (Taylor Swift) and a 600x600 (Kendrick GNX) -- front-1200 only caps the max, so a ~600px upload stays ~600px -- and Deezer/iTunes lower in the order never got a turn. Fix: - Minimum-resolution guard: artwork._min_size_art_validator builds the resolver's validate hook -- it fetches each candidate, caches the bytes (so the winner isn't fetched twice), and accepts art only when its shortest side >= metadata_enhancement.min_art_size (default 1000px; 0 disables). Art that's too small is a miss, so the resolver falls through to the next source instead of winning on priority. Unmeasurable images are accepted (don't over-reject; fallback is still today's art). Wired into both embed_album_art_metadata and download_cover_art. - iTunes art upgraded to /3000x3000bb/ (was the 600px default) so it contributes high-res when it wins. - select_preferred_art_url gains a validate passthrough to the resolver. - config default metadata_enhancement.min_art_size: 1000. Effect: with an order like caa > deezer > spotify > itunes, a ~600px CAA upload is now skipped and Deezer's ~1900px wins -- consistent big art. (Spotify art often maxes ~640px, so it's skipped at the 1000 floor in favor of bigger sources; lower min_art_size to ~640 to allow it.) Tests: tests/metadata/test_art_min_size.py (6 -- incl. the real 599x531 and 600x600 cases, shortest-side logic, unmeasurable-accept, no-bytes-reject, 0-disables) + iTunes max-res upgrade test. Full metadata suite green (617).	3 weeks ago
BoulderBadgeDad	6bc2836f47	Feature: preferred album-art source selection (opt-in, ordered, with fallback) Lets users pick which providers' cover art to use and in what priority, generalizing the single prefer_caa_art toggle into an ordered, mix-and-match list (Sokhi's request). Fully opt-in: default album_art_order is [], so every existing install is byte-for-byte unchanged until the user enables sources. How it works: - Per album, walk the user's ordered sources top-to-bottom; the first source that actually has THIS album's cover wins. A miss falls through to the next; if all miss, the download's own art is kept (today's default). The worst case is always exactly the cover you'd get today -- never wrong art, never an error into the download. - Connection-gated: a source is only tried when the user is connected to it (free sources CAA/Deezer/iTunes/AudioDB always; Spotify only when authenticated). Tidal/Qobuz/HiFi deferred (cover-URL construction + no clean core accessor -- not shipping unverified extraction). - Album-match validated: a source's art is used only when the album it returns matches the requested artist+album (significant-token subset, tolerant of Deluxe/Remastered/articles/feat./multi-artist). A loose top search hit for a different record is treated as a miss -> guarantees no wrong-album art. - The list supersedes the legacy prefer_caa_art toggle: when album_art_order is non-empty it is the sole authority (add 'caa' to the list to use Cover Art Archive), and prefer_caa_art is neutralized for both the embedded-tag art and cover.jpg paths. With an empty list, prefer_caa_art behaves exactly as before. Implementation: - core/metadata/art_sources.py: pure resolver -- effective_art_order (config + legacy back-compat) and resolve_cover_art (ordered walk + fallback, exception-safe per source). No network/config/DB; fully unit-testable. - core/metadata/art_lookup.py: availability gating, per-source lookups against existing clients (Deezer/iTunes/AudioDB/Spotify search + CAA via MBID), album-match validation, per-album caching, and select_preferred_art_url -- the single gate the pipeline calls (no-op unless an explicit list is set). - core/metadata/artwork.py: wired into embed_album_art_metadata and download_cover_art, gated so no configured list == current behavior. - web_server.py: GET /api/metadata/art-sources (connected sources only). - config/settings.py: default album_art_order: []. - webui (index.html + settings.js): reorderable list in Core Features reusing the hybrid-source-list pattern + real service logos (with emoji fallback); load/save wired through the existing metadata_enhancement settings flow. loadArtSourceOrder populates the saved order synchronously (filtered to known sources, not availability) so a save before the availability fetch resolves, or a temporarily-disconnected source, can never wipe the saved order. Tests: 40 unit/seam tests (resolver ordering/fallback/back-compat, availability, per-source extraction, album-match validation incl. wrong-album/wrong-artist rejection, caching, exception-safety, the off-by-default gate). Full metadata suite still green (610 passed) -- the gated integration changes nothing when no list is configured. Note: the settings UI (DOM-heavy, not unit-testable in the JS harness) and the live per-source art-fetch quality are validated by manual testing.	3 weeks ago
BoulderBadgeDad	163de6c146	MusicBrainz manual search: field-scope the artist in non-strict mode (#754 ) The user-facing Search-for-Match / Fix popup runs non-strict MB searches. That path built a bare "track artist" query with no field scoping, so the artist was just a free fuzzy term — covers and karaoke whose TITLES contained the artist name outranked the canonical recording. Reproduced live: searching "Say You Will" / Foreigner returned cover artists with Foreigner absent, and "Sweet Child O Mine" / Guns N Roses returned only covers (Presnyakov, PMJ…), never the Guns N' Roses original. Keep the track/album side loose (no phrase quotes → diacritic + bracketed- suffix recall, the reason non-strict exists) but field-scope the artist as artist:(...) so it constrains. The artist value is Lucene-escaped via _escape_lucene() — without it, names like "Sunn O)))" or "Anthony Green (Saosin)" would close the artist:( group early (returning unrelated artists) or break the query (zero results). Same fix applied to search_release. Verified against the live MB API: both reporter queries now return the real artist top-to-bottom; diacritic recall is preserved (artist:(Bjork) folds to Björk); and paren/?/!-laden artist names produce valid, balanced queries. Tests pin the constructed query string (no network): non-strict scopes and escapes the artist while keeping the track loose/unquoted; strict path unchanged; plus _escape_lucene unit coverage.	3 weeks ago
BoulderBadgeDad	3b5a5518a6	Cache cap test: exercise the REAL _run_maintenance_write, not a stub Self-review caught a test-fidelity hole: the temp cache overrode _run_maintenance_write with a simplified version, so evict_over_capacity was tested against the stub's plumbing, not production's (retry + connection handling). Removed the override — _get_db is now the only injected seam, so the test runs the genuine code path. Differential-verified the LRU assertions are real: flipping ORDER BY ASC->DESC makes them fail. 8/8 pass; ruff clean.	3 weeks ago
BoulderBadgeDad	bb2241498f	Metadata cache: hard LRU row cap to stop unbounded growth (7.6GB incident) Investigation (not assumption): the cache's TTL eviction + junk cleanup ARE correct and DO run automatically every 6h (CacheEvictorJob, auto_fix=True). The real gap is there's NO SIZE CEILING — TTL-only eviction means 'how big can it get' = 'however much you fetch within the 30-day window', so heavy discovery/enrichment legitimately grew metadata_cache_entities to ~1.8M rows / 7.6 GB, bloating the main DB (a factor in the corruption incident). Fix — add a bounded LRU cap: - entities_to_evict_for_capacity(total, max_rows): pure decision fn (cap<=0 disables), unit-testable like core.db_integrity.prune_backups. - MetadataCache.evict_over_capacity(): deletes the least-recently-ACCESSED rows (uses the already-stored last_accessed_at; NULL = never-touched = evicted first) down to the ceiling. Default 250k rows, tunable. - Wired as Phase 5 of CacheEvictorJob — runs LAST, after TTL/junk/orphan/null cleanup, so it only trims a still-oversized HEALTHY cache. Verified safe to bound/wipe: audited every cache reader (get_entity/ get_entities_batch/get_search_results/get_entity_detail/browse) — all degrade to None/[]/empty on miss, treated as 'go fetch'. Nothing depends on a row existing, so eviction can't break callers. Tests: tests/metadata/test_cache_capacity_eviction.py (8) — pure-fn coverage + real temp-DB proof that it drops the LRU rows specifically (not arbitrary) and NULL-access rows go first. 18 adjacent cache tests still green; ruff clean. Follow-ups (separate phases, scoped): (2) move the cache to its own bounded metadata_cache.sqlite3 (no JOINs to library tables — confirmed clean to split; invalidate-and-rebuild rather than migrate the 7.6GB), (3) kill the raw_json + 22-extracted-column double storage.	3 weeks ago
BoulderBadgeDad	560156abee	Fix: import overwrites album-artist tag to "Unknown Artist" (#735 ) Reported by CubeComming: importing media keeps the track artist correct (e.g. Billie Eilish) but changes the album-artist tag ("Albuminterpret") to "Unknown Artist", breaking grouping in the media server. Cause: in extract_source_metadata (core/metadata/source.py), album_artist is seeded from the resolved track artist, then overridden by the album CONTEXT's first artist. When the album lookup comes back unresolved, that first artist is the literal "Unknown Artist" placeholder — which is truthy, so it clobbered the real artist. Fix: treat "Unknown Artist" (and empty) as a non-value — only let the album context override the album_artist when it names a real artist. A genuine album artist (e.g. "Various Artists") still overrides as before. Tests: tests/metadata/test_album_artist_unknown.py — placeholder doesn't clobber, real album artist still used, no-album-context falls back to track artist, empty doesn't clobber. (Pre-existing test_album_mbid_cache.py failures are an unrelated sandbox DB disk-I/O issue.)	4 weeks ago
BoulderBadgeDad	abdea631a7	HiFi/MB cover art: use CAA 1200px thumbnail, not the flaky /front original Follow-up to the album-art resolution fix. That change upgraded MusicBrainz Cover Art Archive thumbnails (/front-250) to the bare /front original — but /front redirects to archive.org, which is unreliable: probing release-group covers showed intermittent HTTP 500s (same URL 500s one second, serves the next) and multi-MB originals (2.9 MB seen). The result was the user-reported flakiness: cover art that "sometimes works, sometimes shows nothing", and a huge image embedded into every track when it did work. The sized thumbnails (/front-250, -500, -1200) are served by CAA's own CDN, not the archive.org redirect — which is why /front-250 (240p) was always reliable. Upgrade to /front-1200 instead: 1200x1200 is a massive jump from 240p, reliably CDN-served, and a sane ~40 KB instead of multi-MB. Applied in all three CAA spots for consistency: the _upgrade_art_url helper (embed + cover.jpg paths) and both prefer_caa ("CCA") blocks, which fetched the bare /front directly with no fallback — so CCA-on users hit the same flakiness. _fetch_art_bytes still falls back to the original /front-250 if /front-1200 is ever refused. Tests updated to assert the 1200px target, idempotency, and that the bare /front original is intentionally left untouched.	4 weeks ago
BoulderBadgeDad	c9ad4f496f	Embed highest-resolution album art across all art paths User report: embedded album art came out ~600x600 while the cover.jpg in the folder was high-res. The cover.jpg path upgraded the source CDN URL to its highest resolution, but the tag-embed path fetched the raw URL — so iTunes art embedded at its 600x600 default, Spotify at 640, Deezer at 1000. The "Write Tags to File" retag path had the same gap (Deezer-only upgrade), and MusicBrainz art was worse still: every Cover Art Archive URL is built as the /front-250 thumbnail, so MB-sourced downloads embedded 250x250. Factor the resolution upgrade + fetch into two shared helpers in core/metadata/artwork.py and route every art path through them: _upgrade_art_url(url) — bump to the source's highest resolution: - Spotify (i.scdn.co) -> original master (~2000px+) - iTunes (mzstatic.com) -> 3000x3000 - Deezer (dzcdn) -> 1900x1900 - Cover Art Archive -> /front original (was /front-250) _fetch_art_bytes(url) — upgrade, fetch, and fall back once to the original size if the CDN refuses the larger one (non-regressive). Now consistent across: embed-into-tags (post-process), folder cover.jpg (post-process), and the enhanced-library "Write Tags to File" retag flow. The YouTube path already upgraded via Album.from_spotify_album, unchanged. De-duplicates the per-source upgrade code that was copied across sites and drops the now-unused urllib import from tag_writer. Not covered (follow-up): Last.fm / Amazon / Tidal / Qobuz have no explicit upgrade yet — some already serve full-res, others may hand over a capped size that passes through unchanged. Tests: new tests/metadata/test_artwork_resolution.py pins every upgrade (Spotify 300/640->master, iTunes 100/600->3000, Deezer->1900, CAA thumbnail->original, unrecognized/empty unchanged) and the fetch fallback. Updated the two tag_writer fallback tests to patch the network at its new home in artwork.	4 weeks ago
BoulderBadgeDad	b14d504cc1	Fix: MusicBrainz artist discography capped at 25 releases Artist-detail discography from MusicBrainz fetched releases via the artist lookup (`/artist/<mbid>?inc=release-groups`), which MusicBrainz hard-caps at 25 embedded release-groups and which ignores the `limit` param entirely. Prolific artists had ~85% of their catalogue silently dropped — Kendrick Lamar has 167 release-groups on the site but only the first 25 ever reached SoulSync. Reported by Sokhi: "a lot of albums are missing when searching vs what's showing on the site." Switch `get_artist_albums` to walk the paginated browse endpoint (`/release-group?artist=<mbid>`, offset loop) — the same pattern the basic-search path already uses — fetching the full catalogue up to the caller's limit. No type filter and no studio-only filter here: the artist-detail page wants every primary/secondary type so its tabs mirror musicbrainz.org. Verified live: now returns all 167 for Kendrick. Adds 7 tests covering pagination past the cap, offset advance, short-page stop, limit cap, cross-page dedup, type->bucket mapping, and a regression pin asserting the capped inc=release-groups lookup is no longer the discography source.	4 weeks ago
Broque Thomas	8dbbf13c61	Branch cleanup: lift manual-match helpers, fix length-pref ordering, profile-scope view toggle Self-review pass on the prior three commits — kettui-style cleanup that should have landed first time. Length-preference sort ordering (real bug): The `search_tracks_with_artist` stable sort that promoted length-known recordings ran in `core/musicbrainz_search.py`, but the MB endpoint in `web_server.py:search_musicbrainz_tracks` runs `rerank_tracks` after it — which re-sorts by relevance score and dropped the length-pref ordering down to tiebreaker-only. For canonical-same-song MB duplicates that all score identically the tiebreaker survived, but the order-of-operations was wrong. Moved into `rerank_tracks` itself via a new `prefer_known_duration` flag. Sort key sits between relevance score and the stable-order tiebreaker so relevance still wins (length only decides ties, never overrides a higher-relevance match). The MB endpoint opts in via `prefer_known_duration=True`; Spotify / iTunes / Deezer callers stay on the default-off path since their search results always include length. Pinned with three new `TestRerankTracks` cases: ties-promote-length, relevance-still-wins, default-off-unchanged. Route logic lifted to `core/discovery/manual_match.py`: Two pieces lived as inline route logic in `web_server.py` — the `derive_manual_match_provider` fallback chain (payload.source → active source → 'spotify') used by `update_youtube_discovery_match`, and the `is_drifted_for_redo` predicate (cached provider differs from active AND not manual_match) used by `prepare_mirrored_discovery`. Per kettui's "extract logic from web_server.py, don't AST-parse it" standard, both helpers now live in `core/discovery/manual_match.py` with 12 dedicated unit tests covering fallback resolution order, non-dict payload defenses, manual_match exemption from drift, absent-provider legacy default, and edge cases. Side benefits from the lift: - `match_source` now derived once before the cache-save try block instead of being duplicated in try + except (the except block existed only because the original used `match_source` later — pre-computing killed the duplication). - `prepare_mirrored_discovery`'s `has_cached` check now reuses `is_drifted_for_redo` with inverted polarity instead of restating the field whitelist inline, so a future schema change only has to land in one place. - The mirrored-DB persist block now gates on `matched_data is not None` to avoid a pre-existing latent NameError if the cache-save block raised before matched_data construction. Enhanced toggle localStorage key now profile-scoped: `soulsync-library-view-mode` was global — two admin profiles would share one preference. Wrapped in `_libraryViewModeKey()` which appends `:${currentProfile.id}` when a profile is loaded, falls back to the unsuffixed key otherwise (preserves pre-multi-profile saved values). Tests: - 12 new in `tests/discovery/test_manual_match.py` pinning both helpers. - 3 new in `tests/metadata/test_relevance.py` pinning the `prefer_known_duration` semantics. - `test_search_tracks_with_artist_prefers_results_with_known_length` renamed to `_does_not_resort_by_length` since the sort moved out of this method. 664 tests pass across discovery + metadata suites.	4 weeks ago
Broque Thomas	acc5eb77ea	Fix popup: anchor artist field in MB search to stop title-collision covers `/api/musicbrainz/search_tracks` powers the Fix popup's auto-search cascade for users on MusicBrainz as primary. When both track + artist fields were filled, `search_tracks_with_artist` always took the bare keyword path (`<track> <artist>` joined as one query string). MB's recording-search scorer weights title matches far above artist matches, so for "Coffee Break" + "Zeds Dead" the top results were Emapea / The Vidalias / West One Orchestra's "Coffee Break" — three unrelated cover- title collisions ahead of the canonical Zeds Dead recording. The endpoint's `rerank_tracks` pass can't fix this when the right answer is below the API's 50-result cutoff. Both-fields mode now uses a strict field-scoped Lucene query first (`recording:"<t>" AND artist:"<a>"`) which anchors the artist and prunes title-collision covers at the source. `min_score=0` because the field-scoped query is itself precise; rerank still does final ordering. Bare query stays as the fallback when strict returns nothing — covers the diacritic / alias cases the original `strict=False` path was added for ("Bjork" query vs canonical "Björk" artist where Lucene phrase match never hits the recording). Single-field mode (track-only or artist-only) is unchanged: still bare- query directly, since there's no artist value to anchor. Also stable-sort results to prefer entries with non-zero `duration_ms`. MB has multiple recordings per song (single release, album release, remasters, compilations) and not every recording carries length data. Without the preference sort, the user sees a 0:00 row first while a sibling recording with the real 3:04 sits two rows below — matches the report where MBID-paste lookup of the canonical recording (length 3:04) contradicted the search-result's 0:00 row for the same song. Tests: - new `test_search_tracks_with_artist_strict_first_when_both_fields` pins the strict=True call when both fields present - new `test_search_tracks_with_artist_falls_back_to_bare_when_strict_empty` pins the Björk-style fall-through path - new `test_search_tracks_with_artist_prefers_results_with_known_length` pins the length-preference sort - existing `..._keeps_low_score_for_rerank` updated to side_effect so the bare-fallback path is exercised; behaviour pinned identically - existing `..._uses_bare_query_mode` renamed + repurposed for strict- first; old name's behaviour no longer accurate	4 weeks ago
Broque Thomas	4ca3f70bf3	Show MusicBrainz release variants in import Expand matched MusicBrainz release groups into concrete releases for specific album searches so import users can choose the correct edition by track count, format, country, and disambiguation. Preserve distinct MusicBrainz release IDs instead of deduping same-title variants, carry release metadata through import matching, and surface those details on album result cards. Add coverage for variant preservation and release-group expansion.	4 weeks ago
Broque Thomas	b9af4ef4ef	Handle transient SQLite IO during maintenance Keep full refresh moving when post-clear VACUUM hits a transient disk I/O error, and retry clear_server_data once when the clear step itself sees the same transient SQLite failure. Retry metadata cache maintenance writes once on transient disk I/O errors so first-attempt cache jobs do not fail when an immediate retry would succeed. Tests cover best-effort VACUUM, clear retry behavior, and cache maintenance retry behavior.	1 month ago
Broque Thomas	136d665c8a	feat(webui): cache artwork images on disk Add a disk-backed image cache with hashed browser URLs, SQLite metadata, size/type validation, stale fallback, and per-image fetch locking. Route normalized artwork through /api/image-cache while keeping /api/image-proxy as a compatibility shim, and align browser max-age with the image cache TTL. Add focused tests for cache behavior and image URL normalization.	1 month ago
Broque Thomas	987409508b	fix(metadata): surface MusicBrainz 'Other' release-groups in discography (#650 ) S-Bryce reported that for some artists (Vocaloid producers, JP indie acts, niche Western indie) the artist detail page was missing whole release-groups visible on musicbrainz.org. Downloaded tracks from those release-groups appeared in artist track counts but were not bound to any visible album / single card — orphan "ghost" tracks the user couldn't browse to. Two duplicated bugs fed each other: 1. `core/musicbrainz_search.py` browsed MB release-groups with `release_types=['album', 'ep', 'single']`. MB's primary-type vocabulary is {Album, Single, EP, Broadcast, Other} — music videos, one-off web releases, and broadcast singles use Other. Pre-fix the filter dropped them at the API layer. 2. Three sites duplicated the same "raw primary-type → internal album_type" mapping with slightly different vocabularies and all silently defaulted unknown values (including 'Other') to 'album': core/musicbrainz_search.py `_map_release_type` core/metadata/types.py inline `{single:single, ep:ep}.get(...)` core/metadata/cache.py Deezer-specific record_type guard Letting Other through the filter without a real mapper would have placed music videos in the Albums view alongside LPs — visually misleading. Fix shape: - New `core/metadata/release_type.py` — single canonical mapper consumed by every provider's raw→Album projection. Knows the full MB vocabulary including 'other' and 'broadcast'; routes both into the singles bucket since they're functionally single-track releases. Compilation secondary-type override preserved (MB's canonical Greatest-Hits pattern is `primary=Album, secondary=[Compilation]`). - `core/musicbrainz_search.py` `_map_release_type` becomes a thin alias for the new helper so the six internal call sites stay intact. API filter gains 'other'. - `core/metadata/types.py` Album projection drops its inline mini- mapper and calls the canonical helper. Now also handles the compilation secondary-type override it was previously missing. - The Deezer-specific cache.py guard stays as-is — Deezer's record_type vocabulary is closed (album\|single\|ep), not affected by this issue. Verified end-to-end against MB for S-Bryce's artist (`46196b9c-affa- 4616-b53b-e967c8bd70e0`, inabakumori): pre-fix returned 22 release- groups; post-fix returns 27, with the 5 extra all landing in the Singles section with album_type='single' as intended. 23 new unit tests pin the mapper contract (case-insensitive primary types, compilation secondary override, Other/Broadcast → single, unknown → album default preserved, defensive empty/None inputs). 2 new tests in test_musicbrainz_search pin the API filter inclusion of 'other' and the round-trip into the Singles bucket. All 516 existing metadata tests still green — refactor leaves historical behaviour for {album, ep, single, compilation} unchanged.	1 month ago
Broque Thomas	daf9a527d9	feat(fix-popup): include MusicBrainz in the auto-search cascade The Fix Track Match modal's auto-search was hardcoded to query only Spotify -> Deezer -> iTunes, ignoring MusicBrainz entirely — even for users with MB set as their primary metadata source. MB-niche recordings (canonical entries with diacritics, fringe / non-mainstream tracks that the commercial catalogues don't carry) had no chance. Wiring: - New `MusicBrainzSearchClient.search_tracks_with_artist(track, artist, limit)` for surfaces that already have title + artist split. Uses MB's bare-query mode (strict=False) — diacritic-folded, alias/sortname indexed — same recall rationale as the earlier MBID-paste endpoint. - New route `GET /api/musicbrainz/search_tracks` mirrors the existing /api/{spotify,itunes,deezer}/search_tracks endpoints exactly: accepts `track`+`artist` (or legacy `query`) + `limit`, returns `{tracks: [{id, name, artists, album, duration_ms, image_url, source}]}`. Applies the same `core.metadata.relevance.rerank_tracks` pass Deezer / iTunes use, which is critical because MB's free-text scoring weighs title-text matches heavily and would otherwise rank cover / tribute recordings above the canonical version. - `_search_tracks_text` gains a `min_score` parameter. The cascade path passes 20 (vs the enhanced-search-tab default of 80) so MB recordings whose title doesn't literally contain the artist name still enter the candidate pool — without that, "Army of Me" + "Bjork" only surfaces the HIRS Collective cover (score 100) and drops Björk's canonical recording (score 28). The rerank pass then surfaces Björk by artist match. Verified against real MB API: pre-fix returned only the cover; post-fix top 5 are all Björk. - Fix popup `allSources` array (wishlist-tools.js) gets MB appended. The existing `activeIdx` reorder logic moves MB to the front when it's the active primary; otherwise MB sits last (1 req/sec rate limit makes it the slowest source). 7 new unit tests on the adapter: bare-query mode is used, missing artist falls back to None (drops AND-clause), empty inputs short-circuit, low-score candidates are kept for rerank to handle, default strict + default min_score behaviour preserved for the existing search-tab path, client errors are swallowed so the cascade falls through to the next source. Discogs intentionally absent — Discogs has no track-level search API (see core/discogs_client.py:575 — returns []). Adding a Flask endpoint that always returns empty would be a permanent no-op.	1 month ago
Broque Thomas	036faff8b1	feat(fix-popup): paste MusicBrainz URL/MBID to match directly Power-user escape hatch on the Discovery Fix Track Match modal — when fuzzy auto-search ranks the wrong recording among many same-title versions (10 remasters, live cuts, alt sessions), paste the MusicBrainz recording URL or bare UUID into the new field and resolve straight to that record. Layout: - Shape adapter `get_recording_flat(mbid)` lives in `core/musicbrainz_search.py` next to existing `get_track_details`. Returns the flat Fix-popup track shape (artists as `string[]`, album as string, single `image_url`) — distinct from the Spotify-shaped nested dict `get_track_details` returns. - New route `GET /api/musicbrainz/recording/<mbid>` is a thin wrapper: validates MBID format with an anchored UUID regex, calls the adapter, returns 400 / 404 / 200 with no inline shape massaging. - Frontend `parseMusicBrainzMbid()` lives in `shared-helpers.js` — pure URL/UUID parser, reusable from other surfaces (failed-MB cache, manual match) without duplication. - Fix modal HTML gets one new input row + button; existing search row and result render pipeline are untouched. New `lookupDiscoveryFixByMbid()` fetches the endpoint and feeds the single result through the existing `renderDiscoveryFixResults` -> confirm-dialog -> match pipeline, so MB- paste matches go through the exact same selection flow as auto-search results. - Enter-key bound on the MBID input via a separate handler ref so its lifecycle matches the search-input handlers without conflating the two submit targets. 7 unit tests cover the adapter: happy path, empty/None MBID, MB returns None, recording-without-release (empty album), multi-artist credits, includes-list contract, and client-error swallow. Out of scope: the Fix popup's fuzzy cascade is still hardcoded to spotify/deezer/itunes regardless of which primary source the user has configured. Adding MB to that cascade (when MB is the active primary) is a separate concern.	1 month ago
Broque Thomas	43ed30b4d2	fix(musicbrainz): user-facing search recall + album-detail 404 Two bugs surfacing on the Fix popup and enhanced-search MB tab: 1. Strict Lucene phrase queries (`recording:"X" AND artist:"Y"`) killed recall on user-facing manual search — diacritics ("Bjork" vs canonical "Björk"), bracketed suffixes like "(Live)", and any AND-clause mismatch returned zero results. Added `strict: bool = True` param to `search_release` / `search_recording`; when False, sends a bare query joining title + artist so MB hits alias/sortname indexes with diacritic folding. `/api/musicbrainz/search` (Fix popup) and `core/library/service_search.py` (service tabs) now pass strict=False. Enrichment workers stay on strict mode — precision matters there because they auto-accept the top hit above a confidence threshold. 2. Every MB album click was silently 404-ing — `_render_release_as_album` passed `cover-art-archive` as an MB `inc` param, but it's not a valid include for the /release resource (MB rejects with 400). The CAA flags come back on every release response by default, so dropping the bad include preserves the image-scope picker logic intact.	1 month ago
Broque Thomas	e0e31079e6	Update test: get_release includes cover-art-archive	1 month ago
Broque Thomas	5bc5fbb662	Add MusicBrainz as a metadata source Register MusicBrainz as a first-class metadata source alongside Deezer, iTunes, Spotify, Discogs, and Hydrabase. Expose the shared client through metadata services, add the settings option, and expand the MusicBrainz search adapter with source-compatible artist, album, track, and detail methods. Carry MusicBrainz IDs through similar-artist discovery, recommended artists, artist map serialization, and personalized playlist selection. Update DB migrations and lookup filters so similar_artist_musicbrainz_id is preserved on older schemas and used for source requirements and library exclusion. Normalize MusicBrainz album adapter output for import context and add regression coverage for registry mapping, typed album conversion, and similar-artist filtering. Verified by user with 120 focused tests passing.	1 month ago
Broque Thomas	3a4017ea2b	feat: artist-detail deep linking — /artist-detail/:source/:id Artist detail pages previously always pushed /artist-detail to the URL, so refreshing the page or sharing a link would drop users on a broken empty page with no artist loaded. URL format is now /artist-detail/:source/:id (e.g. /artist-detail/spotify/4tZwfgrHOc3mvqsCAfo4LT or /artist-detail/library/42). The source segment lets the backend synthesize a response from the right metadata client without a DB hit. Changes: Client routing (legacy shell + TanStack bridge) - buildArtistDetailPath / _getDeepLinkArtistDetail added to init.js; parse both new :source/:id and legacy bare :id formats so old bookmarks still work - navigateToPage passes artistId + artistSource through to the router bridge, which builds the dynamic href instead of hardcoding route.path - resolveShellPageFromPath / resolveLegacyShellPageFromPath use a prefix match so /artist-detail/* resolves to artist-detail page-id - globals.d.ts typed for artistId / artistSource options - activateLegacyPath and syncActivePageFromLocation (popstate) both restore artist from URL using skipRouteChange:true to avoid a re-navigation loop back to /artist-detail - loadInitialData restores artist from URL on page load (router not yet mounted at DOMContentLoaded so legacy path runs unconditionally) - Same-artist guard in navigateToArtistDetail prevents double-fetch when the router fires activateLegacyPath after the initial navigation Server - artist_source_detail.build_source_only_artist_detail now resolves artist name from the source API when none is supplied, so deep-link restores with an empty name string still render correctly Tests - test_spa_deep_linking: /artist-detail/42 and /artist-detail/spotify/ID both serve index.html - bridge.test.ts: source-aware URL building and library fallback - route-manifest.test.ts: prefix path resolution - artist_source_detail: name resolved from source when input is empty	1 month ago
Broque Thomas	54dbd150cb	Preserve full release dates in audio tags	1 month ago
Broque Thomas	025007b97f	Tighten artist discography soundtrack matching	1 month ago
Broque Thomas	121651da2c	Add amazon_id column to artists table for full source parity Schema: ALTER TABLE artists ADD COLUMN amazon_id TEXT with index, added via _add_amazon_columns migration called after Discogs in _run_migrations. SOURCE_ID_FIELD: add "amazon" -> "amazon_id" entry. find_library_artist_for_ source now looks up Amazon artists by slug before falling back to name match, same as every other source. artist_source_detail already stamps artist_info [source_id_field] = artist_id so the amazon_id is set on source-only payloads. Tests: add "amazon": "amazon_id" to EXPECTED_SOURCE_ID_FIELD; revert test assertion back to strict equality (SOURCE_ONLY_ARTIST_SOURCES == SOURCE_ID_ FIELD.keys() holds again now that amazon has a column).	1 month ago
Broque Thomas	265fe5233e	Fix Amazon artist detail: library upgrade lookup and artist images Library upgrade: find_library_artist_for_source returned None immediately for Amazon because SOURCE_ID_FIELD has no 'amazon' entry (no DB column for Amazon artist IDs). The name-based fallback was unreachable. Fix: only skip the column query when column is None, not the whole function — name lookup now runs for any source when artist_name + active_server are provided. Artist images: add AmazonClient._get_artist_image_from_albums so the standard _get_artist_image_from_source path in metadata/artist_image.py can call it as a fallback (same hook iTunes/Deezer/Discogs expose). Searches by unslugified artist name, matches primary artist, fetches album cover from album_metadata. Test: updated test_source_only_set_matches_mapping_keys → _contains_all_mapped_ sources to assert subset (not equality) — SOURCE_ONLY_ARTIST_SOURCES intentionally includes sources without a DB column that rely on name-only lookup.	1 month ago
Broque Thomas	30f017d1f0	Stop writing TRCK as "6/0" when album total_tracks is unknown Discord report (netti93): downloaded album tracks were tagged with TRCK = "6/0" instead of "6/13" when source data was incomplete. The retag tool wrote correct "6/13" because core/tag_writer.py already handled the case. Trace: core/metadata/enrichment.py:105 formatted unconditionally as f"{track_number}/{total_tracks}" and many album-dict construction sites pass total_tracks: 0 (per types.py, 0 means "unknown" — not a real count). That 0 propagated straight to disk. Fix at the consumer boundary so every album-dict constructor stays unchanged. Lifted to pure helper core/metadata/track_number_format.py:format_track_number_tag that drops the /N suffix when total is 0 / None / negative — emits just "6" instead. Matches retag's behavior + ID3 spec convention (TRCK can be "N" or "N/M"). MP4 trkn tuple gets the same treatment via format_track_number_tuple returning (6, 0) per spec's "unknown total" marker. Wired into all three format-write sites in enrichment.py: ID3 (TRCK), Vorbis (tracknumber), MP4 (trkn). When source data has correct total_tracks (album downloads via the metadata-source pipeline, retag flow), behavior unchanged — still writes "6/13". 16 boundary tests pin every shape: known total / zero total / none total / none track / zero track / negative inputs / string coercion / unparseable strings / floats truncate. Full suite: 3113 passed.	1 month ago
Broque Thomas	c9d4b02a02	Fix Deezer contributors tagging silently dropping for cache-polluted tracks Closes #588. Contributing-artist tagging worked for some tracks but silently dropped them for others — most reproducibly when the album had been fetched before the per-track post-process ran. Trace: get_track_details cache check used `track_position in cached` as the "full payload" sentinel. Both `/track/<id>` AND `/album/<id>/tracks` set track_position. Only `/track/<id>` sets the `contributors` array. When album-tracks data hit the cache first, get_track_details returned the partial record → _build_enhanced_track found no contributors → metadata-source contributors-upgrade silently fell back to single-artist. Reporter's case (Andrea Botez - Sacrifice): the album fetch logged "Retrieved 4 tracks for album 673558211" before the post-process, which cached all 4 tracks as partial records. The contributors- upgrade then hit the partial cache and the upgrade log line never fired because len(upgraded) was never > 1. Lifted cache-validity to a pure helper `_is_full_track_payload` that requires BOTH `track_position` AND `contributors` key presence. Empty list `[]` is valid — single-artist tracks fetched via `/track/<id>` carry it explicitly. Partial cache hits fall through to a fresh `/track/<id>` fetch, which writes the full payload back to cache. 11 boundary tests pin every shape: full payload, single-artist with empty contributors list, partial album-tracks shape, search-result shape, none/non-dict, and the cache-hit/cache-miss/api-failure paths on get_track_details (including the exact reporter-scenario regression). Full suite: 3021 passed.	1 month ago
Broque Thomas	0769fcd5cc	Fix Soulseek downloads losing collab artist tags Soulseek matched-download contexts populate `original_search_result` with `artist` (singular string) and no `artists` list — the full multi-artist array lives on `track_info` (the matched Spotify track object). `extract_source_metadata` only read `original_search.artists`, so the Soulseek path always fell through to the single-artist branch and TPE1 ended up with the primary artist only. Deezer-direct downloads were unaffected because their context populates `original_search.artists` as a proper list. Lifted artist resolution into a pure helper `core/metadata/artist_resolution.py:resolve_track_artists` that walks `original_search.artists` → `track_info.artists` → `artist_dict.name` fallback chain. Normalizes mixed list-item shapes (Spotify-style dicts, bare strings, anything else stringified) and drops empty entries. 13 new tests pin the resolution order, fallback chain, mixed-shape normalization, whitespace stripping, and empty/none handling. The existing `_artists_list` no-fall-through test in `test_multi_artist_tag_settings.py` was updated to reflect the new contract (always populated; multi-value write still gated on `len > 1`) plus a new regression test for the Soulseek shape. Composes with the existing Deezer per-track upgrade (still fires when single-artist + track_id available) and feat_in_title / artist_separator settings (still drive the joined ARTIST string downstream).	1 month ago
Broque Thomas	46206b3240	Pin type='track' / type='artist' collision case for album-type normalizer	1 month ago
Broque Thomas	5eae24b8bb	Fix $albumtype defaulting to album for non-Spotify sources - legacy duck-typed builder only checked the `album_type` key; deezer uses `record_type`, tidal uses `type` (uppercase), some flattened musicbrainz shapes use `primary-type` — all defaulted to album, so EPs and singles ended up filed under Album/ in user templates that reference $albumtype - widen lookup to album_type / record_type / type / primary-type and route through new pure `_normalize_album_type` helper that case-folds + validates against the canonical token set (album / single / ep / compilation), unknown → album - typed-converter path (spotify / deezer / itunes / discogs / mb / hydrabase / qobuz) unchanged — those were already correct Discord report (CAL).	1 month ago
Broque Thomas	4892baf8d4	Skip already-owned tracks during download discography - new track_already_owned helper wraps db.check_track_exists at the same confidence threshold the discography backfill repair job uses (0.7) — name+artist+album, format-agnostic so blasphemy-mode libraries (flac → mp3 + delete original) match correctly - endpoint runs the check after the artist + content-type filters and before add_to_wishlist, so a second discography click on the same artist no longer re-queues every track that already downloaded - per-album response carries a new tracks_skipped_owned counter alongside the existing artist/content/wishlist skip categories Discord report (Skowl).	1 month ago
Broque Thomas	d4ad5bf57f	Filter cross-artist + content-type tracks during download discography - drop tracks where the requested artist isn't named in track.artists (keeps features, drops compilation / appears_on contamination) - honor watchlist.global_include_live/remixes/acoustic/instrumentals the same way the discography backfill repair job already does - surface per-album skip counts in the ndjson stream (artist mismatch + content filter) so the ui can show what was filtered Closes #559.	1 month ago
Broque Thomas	d5de724f9b	Multi-artist Deezer upgrade + double-append guard hardening Two follow-ups to the multi-artist tag settings PR: 1. Deezer contributors upgrade — closes the "known limitation" flagged in the prior commit. Deezer's `/search` endpoint only returns the primary artist for each track; the full contributors array (feat., remix collaborators, producers credited as artists) lives on `/track/<id>` and gets parsed by `_build_enhanced_track`. Without the upgrade Deezer-sourced tracks never got multi-artist tags even with the right settings on. Fix in `core/metadata/source.py`: when source==deezer AND the search response had a single artist AND a track_id is available, fetch full track details via `get_deezer_client().get_track_details` and replace `all_artists` with the upgraded list. - One extra API call per affected Deezer track - Skipped when search already returned multiple (no-op fast path) - Skipped for non-Deezer sources (Spotify/Tidal/iTunes search responses already include all artists) - Skipped when no track_id is available - Defensive try/except: on /track/<id> failure (network error, deezer client unavailable), fall through to the search-result list — never lose the data we already had 2. Double-append guard hardened with a word-boundary regex. Prior commit checked for `"feat." not in title.lower() and "(ft." not in title.lower()` — too narrow. Source platforms produce wildly different feat-marker conventions: "(feat. X)", "(Feat X)", "(FEAT X)", "(Featuring X)", "[feat. X]", "ft. X" (no parens), "FT. X", etc. Any of these as the SOURCE title would cause a double-append: `"Track (Feat X) (feat. Y)"`. Replaced with `re.search(r'\b(?:feat\|feat\.\|featuring\|ft\|ft\.)\b', title, IGNORECASE)`. Word-boundary regex catches every common variant. Substring matches like "Aftermath" containing `ft` correctly fall through to the append path (pinned by a regression test). 16 new tests (29 total in the file): - 9 parametrized variants of the double-append guard - 1 substring guard ("Aftermath") - 6 Deezer upgrade scenarios (fires when expected, doesn't fire for non-Deezer / multi-artist search / no track_id, defensive fall-through on failure, no false-positive when /track/<id> confirms single artist) Full pytest 2727 passed.	1 month ago
Broque Thomas	c11a5b7eab	Multi-artist tag settings: implement artist_separator + feat_in_title + populate _artists_list Three settings on Settings → Metadata → Tags were partially or completely unimplemented. Reporter (Netti93) traced each one. (1) `write_multi_artist` only "worked" because of a never-populated `_artists_list` field. `core/metadata/source.py` built `metadata["artist"]` as a hardcoded ", "-joined string but never assigned `metadata["_artists_list"]`. `core/metadata/enrichment.py` line 107 reads that field and gates the multi-value tag write on `len(_artists_list) > 1` — always saw an empty list, silently no-op'd the write. (2) `artist_separator` (default ", ") was referenced in the UI + settings.js save path but ZERO Python code read the value. Every multi-artist track ended up with hardcoded ", " regardless of what the user picked. (3) `feat_in_title` (when true: pull featured artists into the title as " (feat. X, Y)" and leave only primary in the ARTIST tag — Picard convention) had no implementation at all. Fix in source.py: * Populate `_artists_list` from the search response's artists array * Read `feat_in_title` and `artist_separator` configs * When `feat_in_title=True` and >1 artist: ARTIST = primary only, append "(feat. X, Y)" to title with double-append guard * Else: ARTIST = artists joined with `artist_separator` * Single-artist case unaffected by either setting Double-append guard uses a word-boundary regex catching all common "feat" variants source platforms produce — `feat`, `feat.`, `featuring`, `ft`, `ft.` — case-insensitive. Substring matches (e.g. "Aftermath" containing "ft") correctly fall through to the append path. Fix in enrichment.py ID3 branch: * TPE1 stays as the display string (with separator or primary-only per the user's settings) * Multi-value list goes to a separate `TXXX:Artists` frame (Picard convention) when `write_multi_artist` is on * Pre-fix the ID3 path wrote TPE1 twice — single-string then list — and the second `add` overwrote the first, clobbering both the configured separator AND the feat_in_title semantics. Vorbis path was already correct (separate "artist" + "artists" keys). Known limitation (flagged in WHATS_NEW): Deezer's `/search` endpoint only returns the primary artist. The full contributors array lives on `/track/<id>`. Enrichment uses search-result data so Deezer- sourced tracks may still get only the primary artist until a follow- up commit wires the per-track contributors fetch into the enrichment flow. Spotify, Tidal, and iTunes search responses include all artists so they work now. 23 new tests in `tests/metadata/test_multi_artist_tag_settings.py`: * `_artists_list` populated for multi/single/no-artist cases * `artist_separator` drives ARTIST string (default ", " + custom ";" + custom "; " + " & ") * Single-artist case unaffected by either setting * `feat_in_title=True` pulls featured to title, leaves primary in ARTIST * `feat_in_title` no-op for single artist * Double-append guard recognizes 9 source-title variants ("(feat. X)", "(Feat. X)", "(FEAT X)", "(feat X)", "(Featuring X)", "[feat. X]", "ft. X", "(ft X)", "FT. X") * Substring guard test pins "Aftermath" doesn't false-positive * Combined-settings precedence: feat_in_title wins ARTIST string but `_artists_list` carries everyone for multi-value tag Full pytest 2711 passed.	1 month ago
Broque Thomas	8a4c0dc92a	Deezer cover-art download: fallback to original URL on CDN refusal Defensive followup. If Deezer CDN ever refuses the upgraded 1900×1900 URL for a specific album (rare — empirically tested 4 albums and none hit it), pre-fix would have succeeded with the 1000×1000 URL and post-fix would have failed entirely. Both download sites now retry with the original URL when the upgraded URL fails: - `core/metadata/artwork.py::download_cover_art` — auto post-process flow. Resolves the original URL from album_info / context the same way the existing path does. - `core/tag_writer.py::download_cover_art` — captures the original URL before upgrade so the retry has it without a second context lookup. Strictly non-regressive: worst plausible post-fix case is now identical to pre-fix (cover at 1000×1000 succeeds). Fallback only fires on the rare CDN-refusal edge. Tests added (2): - `test_tag_writer_retries_with_original_on_failure` — upgraded URL raises, original succeeds, both attempts logged in call order - `test_tag_writer_no_fallback_for_non_dzcdn_url` — non-Deezer URLs go through unchanged, no fallback path triggered (single attempt) Verification: - 18/18 helper + integration tests pass - 2561 full suite passes - Ruff clean	1 month ago
Broque Thomas	80cf16339c	Deezer cover art: upgrade CDN URL to 1900×1900 (was embedding 1000×1000) Discord report (Tim): downloaded cover art via Deezer metadata source came out visibly blurry in Navidrome / on phones — large displays exposed the limited resolution. # Cause Deezer's API returns `cover_xl` URLs at 1000×1000. The underlying CDN actually serves up to 1900×1900 by rewriting the size segment in the URL path (same trick the iTunes mzstatic + Spotify scdn upgrades already use). SoulSync wasn't doing the rewrite — every Deezer-sourced cover got embedded at 1000×1000 regardless of how much higher resolution the CDN had available. # Verified empirically ``` $ for size in 1000 1400 1800 1900 2000; do curl -I "...{size}x{size}-..."; done 1000: 200 OK 106 KB 1400: 200 OK 198 KB 1800: 200 OK 331 KB 1900: 200 OK 371 KB 2000: 403 Forbidden ``` 1900 is the safe ceiling. Above that the CDN returns 403. CDN serves source-native bytes when source < target (smaller-source albums get same bytes whether we ask for 1000 or 1900), so asking for 1900 universally is safe. # Fix New `_upgrade_deezer_cover_url(url, target_size=1900)` helper in `core/deezer_client.py`. Pure function, mirrors the `_upgrade_spotify_image_url` pattern that already lives in `core/spotify_client.py`. Defensive on every input shape: - Empty / None → returned as-is - Non-Deezer URL (no `dzcdn`) → returned as-is - No size segment in URL → returned as-is - Already at/above target → returned as-is (idempotent, never downgrades) Applied at both cover-download sites: - `core/metadata/artwork.py::download_cover_art` — auto post-process flow. Mirrors the existing iTunes mzstatic upgrade right above it. - `core/tag_writer.py::download_cover_art` — enhanced library view's "Write Tags to File" feature. # Scope discipline - Helper applied at the DOWNLOAD boundary, not the source extraction point in `deezer_client.py`. Means cached entries in the metadata cache + DB row `image_url` columns keep the original 1000×1000 URL Deezer's API returned. Future CDN behavior changes only affect the download path, not stored data. - Pre-existing `prefer_caa_art` toggle (Settings → Library → Post-Processing) untouched — orthogonal workaround for users who want even higher quality (MusicBrainz Cover Art Archive, often 3000×3000+). - iTunes / Spotify upgrade paths untouched — they already worked. # Tests added (16) `tests/metadata/test_deezer_cover_url_upgrade.py`: - Standard upgrade: default target 1900 on cover URL, alternate dzcdn host (`e-cdns-images.dzcdn.net` vs `cdn-images.dzcdn.net`), artist picture URLs (same path pattern), 500×500 source upgrades too - Custom target size: smaller target = no-op (never downgrade), larger target works - Idempotent: already at/above target returned unchanged - Defensive on non-Deezer URLs: parametrised across 5 hosts (Spotify scdn, iTunes mzstatic, MB CAA, Last.fm, random) — all returned untouched - Defensive on malformed Deezer URL (no size segment) → returned as-is - Empty / None handling # Verification - 16/16 helper tests pass - 560/560 metadata + imports tests pass (no regression) - 2559 full suite passes - Ruff clean	1 month ago
Broque Thomas	59992d42a8	Deezer search: free-text fallback when advanced query returns 0 Defensive followup to the relevance fix. Deezer's advanced search syntax (`artist:"X"`) is documented as substring match, but in practice it's brittle on artist name variants ("Foreigner [US]", "The Foreigner") and on tracks indexed under non-canonical title spellings. When the advanced query returns nothing, we'd previously land at "No matches" — a regression vs. pre-fix behaviour where free-text would have returned a less-relevant but non-empty set. Fix: when the advanced query returns 0 results AND the caller used field-scoped kwargs, fall back to a free-text join of the same kwargs and re-query. Caller-side rerank still tightens whatever the fallback returns, so the worst-case post-fix behaviour is the pre-fix behaviour — never strictly worse. Pulled the cache + parse + store dance into a private helper (`_search_tracks_with_query`) so the orchestration can call it twice (advanced → fallback) without code duplication. Single API call when the advanced query has results — no wasted requests. Diagnostic logger.debug fires when the fallback triggers so we can see in production whether it's happening (and to which queries). # Tests added (4) - `test_falls_back_to_free_text_when_advanced_empty` — advanced query returns 0, free-text returns hits; client returns the free-text hits + both API calls fire. - `test_no_fallback_when_advanced_query_has_results` — single hit on advanced query → no second API call. - `test_no_fallback_when_legacy_free_text_call` — legacy callers already exhausted the only path; empty result is final. - `test_no_fallback_when_query_unchanged` — empty kwargs path doesn't trigger the fallback branch (used_advanced=False). # Existing tests updated The 4 prior `TestSearchTracksQueryWiring` + `TestSearchTracksCacheKey` tests were stubbing `_api_get` to return empty `{'data': []}` and asserting `assert_called_once`. With the new fallback, those stubs trigger a second API call and the assertions break — even though the FIRST call construction is what the tests cared about. Updated the stubs to return one fake hit so the fallback doesn't fire, and switched to `call_args_list[0]` for first-call inspection. # Verification - 18/18 deezer query tests pass (14 prior + 4 new) - 2445 full suite passes (+4 from prior commit) - Ruff clean	1 month ago
Broque Thomas	1cc37081a6	Fix Deezer search relevance — issue #534 # Background User reported (#534) that the import-modal "Search for Match" dialog returned irrelevant results when Deezer was the metadata source. Searching `Dirty White Boy` + `Foreigner` returned 5+ karaoke / "originally performed by" / "in the style of" / "re-recorded" / tribute-band results ranked above the actual Foreigner studio cut from Head Games. User had to scroll past the junk every time, or fall back to iTunes search which is much slower. # Root cause — two layers 1. Endpoint joined `track + artist` into free-text query. `/api/deezer/search_tracks` was passing `q=Dirty White Boy Foreigner` to Deezer's `/search/track` API. Deezer fuzzy-matches that string across title / lyrics / artist / album / contributors and orders by global popularity — anything that appears across many compilations outranks the canonical recording. 2. No local rerank. None of the search-modal endpoints applied any post-filtering. Deezer's API order shipped straight to the user. # Fix — same architectural shape Cin would build ## Layer 1: field-scoped query at the client boundary `core/deezer_client.py::search_tracks()` now accepts optional `track`, `artist`, `album` kwargs. When provided, builds Deezer's advanced search syntax: `q=track:"X" artist:"Y" album:"Z"`. Massive relevance improvement because each term matches the right field instead of fuzzy-matching everywhere. Backward compat preserved: legacy free-text `query=` callers still work unchanged. Field-scoped path takes precedence when both are provided. Empty input fast-fails without an API call. Embedded double-quotes stripped (Deezer's syntax has no escape mechanism). ## Layer 2: provider-neutral relevance reranker New `core/metadata/relevance.py` module — pure-function rerank over the canonical `Track` dataclass. Composable scoring: - Cover/karaoke patterns (multiplier 0.05, effectively buries): matches "karaoke", "originally performed by", "in the style of", "made famous by", "tribute", "vocal version", "backing track", "cover version", "re-recorded", "cover by", etc. across title, album, AND artist fields. Catches the screenshot's exact junk: artist credits like "Pop Music Workshop" / "The Karaoke Channel" / "Foreigner Tribute Band". - Variant tags (multiplier 0.4): live / acoustic / demo / instrumental / remix / radio edit / club mix etc. — softer penalty since the user MAY want them. Skipped entirely when the expected_title contains the same tag (so searching "Track (Live)" still ranks Live versions first). - Exact artist boost (multiplier 1.5): primary artist exactly matches expected_artist after normalisation. Single strongest signal for "this is the canonical recording". - Title + artist similarity via SequenceMatcher (parentheticals + punctuation stripped before comparison). - Album-type weighting: album=1.0 > single/ep=0.85 > compilation=0.7. Compilations are more likely tribute / karaoke repackages. Each component is a standalone function so tests pin them individually without standing up the full pipeline. ## Wired at three search-modal endpoints - `/api/deezer/search_tracks` — uses both layers (field-scoped query + rerank). - `/api/itunes/search_tracks` — uses rerank only (iTunes API has no advanced-syntax search, but karaoke / cover variants still leak through and need the local penalty). - `/api/spotify/search_tracks` — already builds field-scoped `track:X artist:Y` query; rerank added as the consistency safety net so all three sources behave the same from the user's perspective. Other Deezer call sites (matching engine, watchlist scanner, auto-import single-track ID) deliberately not touched in this PR — they have their own elaborate scoring pipelines tuned to their specific contexts and aren't surfacing the user-reported issue. Per Cin: "don't refactor beyond what the task requires." # Tests 71 new tests across 3 files: - `tests/metadata/test_relevance.py` (50 tests) — every scoring component pinned individually + the issue #534 screenshot reproduced as a regression test (real Foreigner cut wins after rerank, karaoke variants drop to bottom). - `tests/metadata/test_deezer_search_query.py` (14 tests) — advanced-syntax query construction, field-scoped wiring at the client boundary, free-text path unchanged, kwargs win when ambiguous, limit clamping, cache key consistency. - `tests/imports/test_search_match_endpoints.py` (7 tests) — end-to-end through Flask test client: Deezer endpoint passes kwargs not joined query; karaoke buried at bottom for all three sources; legacy query param still works without rerank. # Verification - 2441 full suite passes (+71 from baseline 2370) - 0 failures (the prior watchdog flake fix held) - Ruff clean across all changed files - JS parses clean (`node -c webui/static/helper.js`) # Architectural standards followed - Logic at the right boundary. Query construction lives in the client (every caller benefits from one change). Rerank lives in a neutral module (`core/metadata/relevance.py`) over the canonical `Track` dataclass — works for any source, not Deezer- specific. - Explicit > implicit. Every scoring rule has its own named function. Pattern tables are module-level constants tests can introspect. - Scope discipline. Audited every Deezer search call site; fixed the user-reported one + the consistent siblings. Did NOT speculatively normalise every Deezer call across the codebase. - Backward compat. Free-text `query=` callers untouched. Kwargs added to existing client method signature with safe defaults. - Tests pin contract at correct boundary. Pure-function rerank tests don't mock anything; client-query tests stub at `_api_get`; endpoint tests run through the real Flask app.	1 month ago
Broque Thomas	cf5461f2f1	Fix: maintenance findings badge inflated when scan dedup-skipped `_create_finding` silently dedup-skipped re-discovered issues but the caller incremented `findings_created` regardless. So a re-scan that found the same issues as a prior scan reported 364 findings in the badge while 0 NEW pending rows hit the db, leaving the findings tab empty. `_create_finding` now returns bool (True on insert, False on dedup-skip / db error). All 16 repair jobs updated to only increment `findings_created` on True. Added `findings_skipped_dedup` counter surfaced in scan log: "Done: X scanned, 0 fixed, 0 findings (363 already existed), 0 errors". Also fixed a missing `job_id` kwarg in album_tag_consistency that was silently breaking finding creation for that scan.	2 months ago
Broque Thomas	77c54ab7a7	Migrate discography + quality scanner to typed Album path Three more album-shape consumers now route through Album.from_<source>_dict() when caller passes a known source: - _build_discography_release_dict (artist discography cards) - _build_artist_detail_release_card (artist detail release cards) - _normalize_track_album (quality scanner result normalization) Legacy duck-typing stays as fallback for unknown source, non-dict input, or converter errors. Pure additive — existing callers without source kwarg unchanged.	2 months ago
Broque Thomas	967c7f7c0a	Migrate album-info builders to typed Album path Steps 2+3 of typed metadata migration. Two album-info builders now route through Album.from_<source>_dict() when caller passes a known source: - _build_album_info (album-tracks lookups) - _build_single_import_context_payload (single-track import context) Legacy duck-typing stays as fallback for unknown source, non-dict input, or converter errors. Pure additive — existing callers without source kwarg unchanged.	2 months ago
Broque Thomas	eab1297afc	Add Qobuz + Tidal album converters Audit caught two missing providers from the foundation pr. Both return album-shaped data via their clients (search + download flows). Tidal uses tidalapi objects rather than dicts so the converter is from_tidal_object, not _dict. Enrichment-only providers (lastfm/genius/acoustid/listenbrainz/ audiodb) intentionally have no album converter — they enrich existing rows, never return album shapes. Tests: +8 cases. 40 total now.	2 months ago
Broque Thomas	529486a2d1	Foundation: typed Album/Track/Artist + per-provider converters New core/metadata/types.py with canonical dataclasses + classmethod converters for spotify/itunes/deezer/discogs/musicbrainz/hydrabase. Each converter is the single place that knows that provider's wire shape — addresses the duck-typing pattern Cin flagged. Pure additive: no consumer code changed. Follow-up PRs migrate consumers one at a time. Migration plan at docs/metadata-types-migration.md. Tests: 32 cases pin per-provider semantics + cross-provider invariants. Also stabilized a flaky discogs test that depended on local config state.	2 months ago
Broque Thomas	4b15fe0b75	Fix album MBID inconsistency: detector + persistent release-MBID cache Discord report (Samuel [KC]): tracks of the same album sometimes carry different MUSICBRAINZ_ALBUMID tags, which causes Navidrome (and other media servers grouping by album MBID) to split the album into multiple entries. Two-part fix — one for existing libraries, one for the root cause that lets new imports drift. Part 1 — Detector + fix action (catches existing dissenters): `core/repair_jobs/mbid_mismatch_detector.py`: - New helpers: `_read_album_mbid_from_file` and `_write_album_mbid_to_file` use the Picard-standard tag conventions (`TXXX:MusicBrainz Album Id` for MP3, `MUSICBRAINZ_ALBUMID` for FLAC/OGG, `----:com.apple.iTunes:MusicBrainz Album Id` for MP4). - New scan phase `_scan_album_mbid_consistency` runs after the existing track-MBID scan: groups tracks by DB `album_id`, reads each track's embedded album MBID, finds the consensus (most-common) MBID via `Counter`, flags dissenters. Tracks without an album MBID at all are skipped (they don't break Navidrome — only an explicit MBID disagreement does). Albums where MBIDs are perfectly tied (no clear consensus) are skipped too — surface as a manual decision instead of fixing toward a 1/N tie. - New finding type `album_mbid_mismatch` carries `consensus_mbid`, `wrong_mbid`, `consensus_count`, `total_tracks_with_mbid`, and a human-readable reason string. `core/repair_worker.py`: - Added `'album_mbid_mismatch': self._fix_album_mbid_mismatch` to the fix dispatch dict and to the `fixable_types` tuple so auto-fix + bulk-fix paths pick it up. - New `_fix_album_mbid_mismatch` method reads `consensus_mbid` from finding details, resolves the dissenter's file path via the shared library resolver, calls `_write_album_mbid_to_file` to rewrite the tag in place. Doesn't touch the album's other tracks (they're already in agreement). Part 2 — Root cause fix (prevents new SoulSync imports from drifting): The original in-memory `mb_release_cache` in `core/metadata/source.py` maps `(normalized_album, artist) -> release_mbid` so per-track enrichment of the same album hits the cache and writes the same MUSICBRAINZ_ALBUMID to every track. That cache is bounded (4096 entries) and in-process — so cache eviction (when other albums are processed in between) and server restart can BOTH cause inconsistency. Per-track album-name variation (e.g. some tracks tagged `"Album"`, others tagged `"Album (Deluxe)"`) and per-track artist variation (features) make it worse. `core/metadata/album_mbid_cache.py` (new module): - DB-backed `lookup(normalized_album, artist) -> release_mbid` and `record(...)` functions. Same key shape as the in-memory cache. - Strict additive design: every public function is wrapped in try/except and degrades to None / no-op on ANY database error. The existing in-memory cache + MusicBrainz lookup remains the authoritative fallback. If this module breaks, downloads continue exactly as they would today. `database/music_database.py`: - New `mb_album_release_cache` table with composite primary key `(normalized_album_key, artist_key)`. Reverse-lookup index on `release_mbid` for future debug tooling. Created via the existing `CREATE TABLE IF NOT EXISTS` migration pattern — idempotent, no schema version bump needed. `core/metadata/source.py`: - Surgical change inside the existing `embed_source_ids` in-memory-cache-miss branch: BEFORE calling MusicBrainz, consult the persistent cache. If a previous SoulSync run already resolved this album's release MBID, reuse it. After a successful MB lookup, store in BOTH caches. Both calls wrapped in defensive try/except so any failure falls through to existing logic. Tests: - `tests/metadata/test_album_mbid_cache.py` — 16 cache tests: round-trip, idempotent re-record, overwrite semantics, clear_all, album+artist independence (no Greatest Hits collisions), defensive None-on-empty-input, graceful degradation when the DB is unavailable / connection raises / commit fails, schema sanity (table + index exist after init). - `tests/test_album_mbid_consistency.py` — 13 detector tests: tag read/write round-trip on real FLAC files, Picard-standard tag descriptors, defensive paths (unreadable file, empty input), detector behavior (agreement → no flags, lone dissenter → flag, ties → no flag, single-track albums → skipped, no-MBID tracks → skipped, unresolvable file paths → skipped). - `tests/metadata/test_metadata_enrichment.py` — added autouse fixture monkeypatching the persistent cache to no-op for tests in this file. The existing tests pin per-call MB counts and in-memory cache state; without the fixture, persistent rows from earlier tests would bypass the MB call. Persistent layer has its own dedicated tests. Verified: 1782 tests pass (29 new), ruff clean, smoke test confirms end-to-end cache round-trip works. WHATS_NEW entry under '2.4.2' dev cycle.	2 months ago
Antti Kettunen	b85a05fb88	Move image URL normalization into metadata helpers - keep existing /api/image-proxy URLs from being wrapped again - reuse the shared metadata package instead of duplicating URL logic in web_server.py - add regression coverage for proxy passthrough and internal URL normalization	2 months ago
elmerohueso	f9f47f978e	fix post-download tagging, and enable tagging for hifi	2 months ago
Broque Thomas	7e32618f86	Drop old per-service enrichment routes after registry cutover Followup to the enrichment-bubble registry consolidation. The dashboard polling + click handlers all hit /api/enrichment/<service>/{status,pause,resume} now, so the 30 hand-rolled per-service routes in web_server.py have zero callers and can come out: /api/musicbrainz/{status,pause,resume} /api/audiodb/{status,pause,resume} /api/discogs/{status,pause,resume} /api/deezer/{status,pause,resume} /api/spotify-enrichment/{status,pause,resume} /api/itunes-enrichment/{status,pause,resume} /api/lastfm-enrichment/{status,pause,resume} /api/genius-enrichment/{status,pause,resume} /api/tidal-enrichment/{status,pause,resume} /api/qobuz-enrichment/{status,pause,resume} Worker init blocks stay (they still construct the workers + persist pause state). Section comment headers are preserved with a one-line note pointing readers at the new generic blueprint. Test fixtures in tests/conftest.py and tests/metadata/test_enrichment_events.py also updated to use the new URL paths so they reflect production reality. They were synthetic stubs that never depended on the production routes — purely cosmetic alignment. Net: ~510 lines deleted from web_server.py. Full pytest 1541 passed; ruff clean.	2 months ago
Antti Kettunen	74e3cc460c	Simplify service status and labels - Flatten the Spotify service-status rendering so it shows rate-limit and recovery states explicitly, while otherwise displaying the active metadata provider directly. - Keep the Spotify auth controls and metadata-source picker aligned with the real session state after authenticate and disconnect flows. - Return "Unmapped" for unknown metadata source labels instead of implying iTunes. - Update the metadata registry tests to cover the new label fallback.	2 months ago
Antti Kettunen	55603be14c	Clarify Spotify auth flow and sync UI - Send Spotify auth completion back to the opener so the settings page refreshes immediately - Make the local auth flow go straight through to Spotify instead of showing the temporary instruction page - Keep the remote/docker instruction page available for manual callback setups - Sync Spotify status, connect/disconnect buttons, and metadata source selection after auth and disconnect - Keep the disconnect behavior aligned with the active primary metadata source	2 months ago
Antti Kettunen	9646f6ca7f	Clarify Spotify auth actions - Hide the auth button when a Spotify session is active - Treat disconnect as a session change, not a provider swap - Share metadata source labels in the registry - Tighten rate-limit copy around Spotify-specific behavior	2 months ago

1 2

59 Commits (adbdda7b0eeecaaabce5f5b2fa52c026ff84400a)