SoulSync

Commit Graph

Author	SHA1	Message	Date
Broque Thomas	4ca3f70bf3	Show MusicBrainz release variants in import Expand matched MusicBrainz release groups into concrete releases for specific album searches so import users can choose the correct edition by track count, format, country, and disambiguation. Preserve distinct MusicBrainz release IDs instead of deduping same-title variants, carry release metadata through import matching, and surface those details on album result cards. Add coverage for variant preservation and release-group expansion.	21 hours ago
Broque Thomas	b9af4ef4ef	Handle transient SQLite IO during maintenance Keep full refresh moving when post-clear VACUUM hits a transient disk I/O error, and retry clear_server_data once when the clear step itself sees the same transient SQLite failure. Retry metadata cache maintenance writes once on transient disk I/O errors so first-attempt cache jobs do not fail when an immediate retry would succeed. Tests cover best-effort VACUUM, clear retry behavior, and cache maintenance retry behavior.	4 days ago
Broque Thomas	136d665c8a	feat(webui): cache artwork images on disk Add a disk-backed image cache with hashed browser URLs, SQLite metadata, size/type validation, stale fallback, and per-image fetch locking. Route normalized artwork through /api/image-cache while keeping /api/image-proxy as a compatibility shim, and align browser max-age with the image cache TTL. Add focused tests for cache behavior and image URL normalization.	5 days ago
Broque Thomas	987409508b	fix(metadata): surface MusicBrainz 'Other' release-groups in discography (#650 ) S-Bryce reported that for some artists (Vocaloid producers, JP indie acts, niche Western indie) the artist detail page was missing whole release-groups visible on musicbrainz.org. Downloaded tracks from those release-groups appeared in artist track counts but were not bound to any visible album / single card — orphan "ghost" tracks the user couldn't browse to. Two duplicated bugs fed each other: 1. `core/musicbrainz_search.py` browsed MB release-groups with `release_types=['album', 'ep', 'single']`. MB's primary-type vocabulary is {Album, Single, EP, Broadcast, Other} — music videos, one-off web releases, and broadcast singles use Other. Pre-fix the filter dropped them at the API layer. 2. Three sites duplicated the same "raw primary-type → internal album_type" mapping with slightly different vocabularies and all silently defaulted unknown values (including 'Other') to 'album': core/musicbrainz_search.py `_map_release_type` core/metadata/types.py inline `{single:single, ep:ep}.get(...)` core/metadata/cache.py Deezer-specific record_type guard Letting Other through the filter without a real mapper would have placed music videos in the Albums view alongside LPs — visually misleading. Fix shape: - New `core/metadata/release_type.py` — single canonical mapper consumed by every provider's raw→Album projection. Knows the full MB vocabulary including 'other' and 'broadcast'; routes both into the singles bucket since they're functionally single-track releases. Compilation secondary-type override preserved (MB's canonical Greatest-Hits pattern is `primary=Album, secondary=[Compilation]`). - `core/musicbrainz_search.py` `_map_release_type` becomes a thin alias for the new helper so the six internal call sites stay intact. API filter gains 'other'. - `core/metadata/types.py` Album projection drops its inline mini- mapper and calls the canonical helper. Now also handles the compilation secondary-type override it was previously missing. - The Deezer-specific cache.py guard stays as-is — Deezer's record_type vocabulary is closed (album\|single\|ep), not affected by this issue. Verified end-to-end against MB for S-Bryce's artist (`46196b9c-affa- 4616-b53b-e967c8bd70e0`, inabakumori): pre-fix returned 22 release- groups; post-fix returns 27, with the 5 extra all landing in the Singles section with album_type='single' as intended. 23 new unit tests pin the mapper contract (case-insensitive primary types, compilation secondary override, Other/Broadcast → single, unknown → album default preserved, defensive empty/None inputs). 2 new tests in test_musicbrainz_search pin the API filter inclusion of 'other' and the round-trip into the Singles bucket. All 516 existing metadata tests still green — refactor leaves historical behaviour for {album, ep, single, compilation} unchanged.	5 days ago
Broque Thomas	5bc5fbb662	Add MusicBrainz as a metadata source Register MusicBrainz as a first-class metadata source alongside Deezer, iTunes, Spotify, Discogs, and Hydrabase. Expose the shared client through metadata services, add the settings option, and expand the MusicBrainz search adapter with source-compatible artist, album, track, and detail methods. Carry MusicBrainz IDs through similar-artist discovery, recommended artists, artist map serialization, and personalized playlist selection. Update DB migrations and lookup filters so similar_artist_musicbrainz_id is preserved on older schemas and used for source requirements and library exclusion. Normalize MusicBrainz album adapter output for import context and add regression coverage for registry mapping, typed album conversion, and similar-artist filtering. Verified by user with 120 focused tests passing.	6 days ago
Broque Thomas	54dbd150cb	Preserve full release dates in audio tags	1 week ago
Broque Thomas	025007b97f	Tighten artist discography soundtrack matching	1 week ago
Broque Thomas	42a833fcb2	Amazon Music: UI badges, enrichment match chips, watchlist linking, metadata cache - Artist cards, hero section, and enhanced view now show Amazon Music badges when amazon_id is populated (AMAZON_LOGO_URL constant, orange #FF9900 brand) - Enhanced view artist and album match status rows include amazon_match_status chip with click-to-rematch via openManualMatchModal - getServiceUrl: added amazon (album/track ASIN → music.amazon.com) and fixed missing discogs entries; serviceLabels adds tidal/qobuz/amazon - Enhanced view enhanced-artist-id-badges includes amazon_id entry - DB SELECTs for library artists list and artist detail now return amazon_id; both response dicts include the field - watchlist_artists migration adds amazon_artist_id column - Watchlist config GET: amazon_artist_id in SELECT/WHERE/response (index 18) - Watchlist artists list response includes amazon_artist_id - link-provider endpoint: amazon added to valid_providers and col_map - _populateLinkedProviderSection: amazonId param + Amazon Music source row - Watchlist card source badges render Amazon pill (watchlist-source-amazon CSS) - _openSourceSearch labels map includes amazon - service_search: amazon_worker injected via init(); _search_service amazon branch uses search_artists/albums/tracks, same {id,name,image,extra} return shape - _SERVICE_ID_COLUMNS: amazon → amazon_id for artist/album/track - _init_service_search call passes amazon_worker_obj - amazon_client._fetch_album_metas: 5-minute TTL cache per ASIN — cached hits skip _rate_limit() and HTTP call entirely; fixes ~10s artist detail load - registry.py: removed amazon from METADATA_SOURCE_PRIORITY and METADATA_SOURCE_LABELS — T2Tunes has no discography API, cannot serve as a primary metadata source; Amazon remains a download source + ASIN enricher - Settings metadata source dropdown and help text updated accordingly	1 week ago
Broque Thomas	1f579cede8	Add Amazon Music as a primary metadata source Wires AmazonClient into the metadata source registry following the exact same pattern as DeezerClient. No existing source paths touched. - Add get_album_metadata / get_artist_info / get_artist_albums_list aliases to AmazonClient (mirrors DeezerClient interface aliases) - Register amazon in METADATA_SOURCE_PRIORITY and METADATA_SOURCE_LABELS - Add _get_amazon_factory() + get_amazon_client() to registry.py - Add amazon branch to get_client_for_source(); thread amazon_client_factory kwarg through get_primary_client() and get_primary_source_status() - Re-export get_amazon_client from the core.metadata_service shim - Add Amazon Music option to Settings metadata source dropdown - 3530 tests pass	1 week ago
Broque Thomas	d9529fc801	Token leak round 2: artist endpoint + playlist sync + URL-encoded redaction The first token-leak fix scrubbed the artwork URL fixer's own log calls. This catches three more sites that ALSO leaked tokens, plus one upstream gap that let URL-encoded tokens slip through the redactor. Three sites in `web_server.py` (artist endpoint at line 8765-8773): - "Artist image before fix: '...'" -- logged the raw image_url with the auth token in plain form. - "Artist image after fix: '...'" -- logged the URL-encoded form after it had been wrapped in the image proxy (`/api/image-proxy?url=<percent-encoded-token>`). - "Final artist data being sent: {...}" -- dumped the entire artist_info dict on every render, including the image_url field. All three were dev-time debug noise. Removed entirely. The "No artist image URL found" warning at line 8770 stays (no URL, just the artist name). One site in `core/discovery/sync.py:402`: - "[PLAYLIST IMAGE] image_url=..." -- logged the playlist poster URL during sync. Same auth-token leak risk for Plex / Jellyfin playlists. Changed to log only `has_image=True/False`. Upstream gap in `_redact_url_secrets`: - The original regex only matched plain query params (`?key=value`). When an auth-bearing URL gets wrapped inside another URL's query string (our `/api/image-proxy?url=<encoded>` flow) the auth params end up percent-encoded -- `%3FX-Plex-Token%3D...` -- and slipped through. - New second pattern catches the URL-encoded form. Both passes run on every redact call; idempotent. Verified manually: /api/image-proxy?url=...%3FX-Plex-Token%3DABC... -> /api/image-proxy?url=...%3FX-Plex-Token%3D*REDACTED* 6 artwork tests pass.	1 week ago
Broque Thomas	2fe1926074	Stop leaking Plex / Jellyfin / Navidrome tokens into app.log The artwork URL normalizer was logging the full constructed media- server URL on every cover-art lookup at INFO level, including the auth query params (X-Plex-Token / X-Emby-Token / Subsonic t+s+p). Those lines pile up in app.log on disk -- anyone with read access to the log file gains full read access to the user's media server. Also dropped the noisy per-call "Plex/Jellyfin/Navidrome config - base_url: ..., token: ..." INFO lines that fired on every thumbnail. Even the truncated `token[:10]` form is enough partial-known-plaintext to be uncomfortable to leak. - New `_redact_url_secrets` helper masks the values of X-Plex-Token, X-Emby-Token, api_key, apikey, Subsonic t / s / p, generic token / password query params. Regex anchored on `?` or `&` boundary so short keys like `t` don't false-match inside `format=Jpg`. - "Fixed URL: ..." log calls moved from INFO to DEBUG so they don't persist by default, and the URL passed in is run through the redactor first. - Per-call "Plex config - ..." / "Jellyfin config - ..." / "Navidrome config - ..." INFO lines removed entirely. Config inspection has dedicated UI; per-thumbnail spam belongs to no one. - Error-path logging (line 149) also routed through the redactor in case the failing URL had auth params attached. Users with existing app.log files containing the leaked tokens should rotate / wipe the log. Plex tokens can be regenerated by signing out of all devices in Plex settings; Jellyfin api_keys can be revoked from the dashboard; Navidrome users should rotate the account password.	1 week ago
Broque Thomas	30f017d1f0	Stop writing TRCK as "6/0" when album total_tracks is unknown Discord report (netti93): downloaded album tracks were tagged with TRCK = "6/0" instead of "6/13" when source data was incomplete. The retag tool wrote correct "6/13" because core/tag_writer.py already handled the case. Trace: core/metadata/enrichment.py:105 formatted unconditionally as f"{track_number}/{total_tracks}" and many album-dict construction sites pass total_tracks: 0 (per types.py, 0 means "unknown" — not a real count). That 0 propagated straight to disk. Fix at the consumer boundary so every album-dict constructor stays unchanged. Lifted to pure helper core/metadata/track_number_format.py:format_track_number_tag that drops the /N suffix when total is 0 / None / negative — emits just "6" instead. Matches retag's behavior + ID3 spec convention (TRCK can be "N" or "N/M"). MP4 trkn tuple gets the same treatment via format_track_number_tuple returning (6, 0) per spec's "unknown total" marker. Wired into all three format-write sites in enrichment.py: ID3 (TRCK), Vorbis (tracknumber), MP4 (trkn). When source data has correct total_tracks (album downloads via the metadata-source pipeline, retag flow), behavior unchanged — still writes "6/13". 16 boundary tests pin every shape: known total / zero total / none total / none track / zero track / negative inputs / string coercion / unparseable strings / floats truncate. Full suite: 3113 passed.	2 weeks ago
Broque Thomas	0769fcd5cc	Fix Soulseek downloads losing collab artist tags Soulseek matched-download contexts populate `original_search_result` with `artist` (singular string) and no `artists` list — the full multi-artist array lives on `track_info` (the matched Spotify track object). `extract_source_metadata` only read `original_search.artists`, so the Soulseek path always fell through to the single-artist branch and TPE1 ended up with the primary artist only. Deezer-direct downloads were unaffected because their context populates `original_search.artists` as a proper list. Lifted artist resolution into a pure helper `core/metadata/artist_resolution.py:resolve_track_artists` that walks `original_search.artists` → `track_info.artists` → `artist_dict.name` fallback chain. Normalizes mixed list-item shapes (Spotify-style dicts, bare strings, anything else stringified) and drops empty entries. 13 new tests pin the resolution order, fallback chain, mixed-shape normalization, whitespace stripping, and empty/none handling. The existing `_artists_list` no-fall-through test in `test_multi_artist_tag_settings.py` was updated to reflect the new contract (always populated; multi-value write still gated on `len > 1`) plus a new regression test for the Soulseek shape. Composes with the existing Deezer per-track upgrade (still fires when single-artist + track_id available) and feat_in_title / artist_separator settings (still drive the joined ARTIST string downstream).	2 weeks ago
BoulderBadgeDad	c77aa61fdf	Merge pull request #530 from dlynas/feat/explicit-badges feat: add explicit badges to discography modal and artist-detail cards	2 weeks ago
Broque Thomas	5eae24b8bb	Fix $albumtype defaulting to album for non-Spotify sources - legacy duck-typed builder only checked the `album_type` key; deezer uses `record_type`, tidal uses `type` (uppercase), some flattened musicbrainz shapes use `primary-type` — all defaulted to album, so EPs and singles ended up filed under Album/ in user templates that reference $albumtype - widen lookup to album_type / record_type / type / primary-type and route through new pure `_normalize_album_type` helper that case-folds + validates against the canonical token set (album / single / ep / compilation), unknown → album - typed-converter path (spotify / deezer / itunes / discogs / mb / hydrabase / qobuz) unchanged — those were already correct Discord report (CAL).	2 weeks ago
dlynas	42bee21c9f	feat: add explicit badges to discography modal and artist-detail cards Adds an explicit field to the Album dataclass in core/metadata/types.py and the client-level Album dataclasses in deezer_client.py, itunes_client.py, and hydrabase_client.py (the legacy discography path reads from client objects, not typed dicts). Deezer extracts explicit_lyrics (int→bool), iTunes extracts collectionExplicitness ('explicit' string), Hydrabase forwards the explicit field from the server response. Spotify, Discogs, MusicBrainz, Qobuz, and Tidal have no explicit signal and stay None. The flag threads through both builder functions in discography.py and renders as a small "E" badge next to explicit titles in the discography download modal and artist-detail page cards. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2 weeks ago
Broque Thomas	4892baf8d4	Skip already-owned tracks during download discography - new track_already_owned helper wraps db.check_track_exists at the same confidence threshold the discography backfill repair job uses (0.7) — name+artist+album, format-agnostic so blasphemy-mode libraries (flac → mp3 + delete original) match correctly - endpoint runs the check after the artist + content-type filters and before add_to_wishlist, so a second discography click on the same artist no longer re-queues every track that already downloaded - per-album response carries a new tracks_skipped_owned counter alongside the existing artist/content/wishlist skip categories Discord report (Skowl).	2 weeks ago
Broque Thomas	d4ad5bf57f	Filter cross-artist + content-type tracks during download discography - drop tracks where the requested artist isn't named in track.artists (keeps features, drops compilation / appears_on contamination) - honor watchlist.global_include_live/remixes/acoustic/instrumentals the same way the discography backfill repair job already does - surface per-album skip counts in the ndjson stream (artist mismatch + content filter) so the ui can show what was filtered Closes #559.	2 weeks ago
Broque Thomas	d5de724f9b	Multi-artist Deezer upgrade + double-append guard hardening Two follow-ups to the multi-artist tag settings PR: 1. Deezer contributors upgrade — closes the "known limitation" flagged in the prior commit. Deezer's `/search` endpoint only returns the primary artist for each track; the full contributors array (feat., remix collaborators, producers credited as artists) lives on `/track/<id>` and gets parsed by `_build_enhanced_track`. Without the upgrade Deezer-sourced tracks never got multi-artist tags even with the right settings on. Fix in `core/metadata/source.py`: when source==deezer AND the search response had a single artist AND a track_id is available, fetch full track details via `get_deezer_client().get_track_details` and replace `all_artists` with the upgraded list. - One extra API call per affected Deezer track - Skipped when search already returned multiple (no-op fast path) - Skipped for non-Deezer sources (Spotify/Tidal/iTunes search responses already include all artists) - Skipped when no track_id is available - Defensive try/except: on /track/<id> failure (network error, deezer client unavailable), fall through to the search-result list — never lose the data we already had 2. Double-append guard hardened with a word-boundary regex. Prior commit checked for `"feat." not in title.lower() and "(ft." not in title.lower()` — too narrow. Source platforms produce wildly different feat-marker conventions: "(feat. X)", "(Feat X)", "(FEAT X)", "(Featuring X)", "[feat. X]", "ft. X" (no parens), "FT. X", etc. Any of these as the SOURCE title would cause a double-append: `"Track (Feat X) (feat. Y)"`. Replaced with `re.search(r'\b(?:feat\|feat\.\|featuring\|ft\|ft\.)\b', title, IGNORECASE)`. Word-boundary regex catches every common variant. Substring matches like "Aftermath" containing `ft` correctly fall through to the append path (pinned by a regression test). 16 new tests (29 total in the file): - 9 parametrized variants of the double-append guard - 1 substring guard ("Aftermath") - 6 Deezer upgrade scenarios (fires when expected, doesn't fire for non-Deezer / multi-artist search / no track_id, defensive fall-through on failure, no false-positive when /track/<id> confirms single artist) Full pytest 2727 passed.	2 weeks ago
Broque Thomas	c11a5b7eab	Multi-artist tag settings: implement artist_separator + feat_in_title + populate _artists_list Three settings on Settings → Metadata → Tags were partially or completely unimplemented. Reporter (Netti93) traced each one. (1) `write_multi_artist` only "worked" because of a never-populated `_artists_list` field. `core/metadata/source.py` built `metadata["artist"]` as a hardcoded ", "-joined string but never assigned `metadata["_artists_list"]`. `core/metadata/enrichment.py` line 107 reads that field and gates the multi-value tag write on `len(_artists_list) > 1` — always saw an empty list, silently no-op'd the write. (2) `artist_separator` (default ", ") was referenced in the UI + settings.js save path but ZERO Python code read the value. Every multi-artist track ended up with hardcoded ", " regardless of what the user picked. (3) `feat_in_title` (when true: pull featured artists into the title as " (feat. X, Y)" and leave only primary in the ARTIST tag — Picard convention) had no implementation at all. Fix in source.py: * Populate `_artists_list` from the search response's artists array * Read `feat_in_title` and `artist_separator` configs * When `feat_in_title=True` and >1 artist: ARTIST = primary only, append "(feat. X, Y)" to title with double-append guard * Else: ARTIST = artists joined with `artist_separator` * Single-artist case unaffected by either setting Double-append guard uses a word-boundary regex catching all common "feat" variants source platforms produce — `feat`, `feat.`, `featuring`, `ft`, `ft.` — case-insensitive. Substring matches (e.g. "Aftermath" containing "ft") correctly fall through to the append path. Fix in enrichment.py ID3 branch: * TPE1 stays as the display string (with separator or primary-only per the user's settings) * Multi-value list goes to a separate `TXXX:Artists` frame (Picard convention) when `write_multi_artist` is on * Pre-fix the ID3 path wrote TPE1 twice — single-string then list — and the second `add` overwrote the first, clobbering both the configured separator AND the feat_in_title semantics. Vorbis path was already correct (separate "artist" + "artists" keys). Known limitation (flagged in WHATS_NEW): Deezer's `/search` endpoint only returns the primary artist. The full contributors array lives on `/track/<id>`. Enrichment uses search-result data so Deezer- sourced tracks may still get only the primary artist until a follow- up commit wires the per-track contributors fetch into the enrichment flow. Spotify, Tidal, and iTunes search responses include all artists so they work now. 23 new tests in `tests/metadata/test_multi_artist_tag_settings.py`: * `_artists_list` populated for multi/single/no-artist cases * `artist_separator` drives ARTIST string (default ", " + custom ";" + custom "; " + " & ") * Single-artist case unaffected by either setting * `feat_in_title=True` pulls featured to title, leaves primary in ARTIST * `feat_in_title` no-op for single artist * Double-append guard recognizes 9 source-title variants ("(feat. X)", "(Feat. X)", "(FEAT X)", "(feat X)", "(Featuring X)", "[feat. X]", "ft. X", "(ft X)", "FT. X") * Substring guard test pins "Aftermath" doesn't false-positive * Combined-settings precedence: feat_in_title wins ARTIST string but `_artists_list` carries everyone for multi-value tag Full pytest 2711 passed.	2 weeks ago
Broque Thomas	8a4c0dc92a	Deezer cover-art download: fallback to original URL on CDN refusal Defensive followup. If Deezer CDN ever refuses the upgraded 1900×1900 URL for a specific album (rare — empirically tested 4 albums and none hit it), pre-fix would have succeeded with the 1000×1000 URL and post-fix would have failed entirely. Both download sites now retry with the original URL when the upgraded URL fails: - `core/metadata/artwork.py::download_cover_art` — auto post-process flow. Resolves the original URL from album_info / context the same way the existing path does. - `core/tag_writer.py::download_cover_art` — captures the original URL before upgrade so the retry has it without a second context lookup. Strictly non-regressive: worst plausible post-fix case is now identical to pre-fix (cover at 1000×1000 succeeds). Fallback only fires on the rare CDN-refusal edge. Tests added (2): - `test_tag_writer_retries_with_original_on_failure` — upgraded URL raises, original succeeds, both attempts logged in call order - `test_tag_writer_no_fallback_for_non_dzcdn_url` — non-Deezer URLs go through unchanged, no fallback path triggered (single attempt) Verification: - 18/18 helper + integration tests pass - 2561 full suite passes - Ruff clean	2 weeks ago
Broque Thomas	80cf16339c	Deezer cover art: upgrade CDN URL to 1900×1900 (was embedding 1000×1000) Discord report (Tim): downloaded cover art via Deezer metadata source came out visibly blurry in Navidrome / on phones — large displays exposed the limited resolution. # Cause Deezer's API returns `cover_xl` URLs at 1000×1000. The underlying CDN actually serves up to 1900×1900 by rewriting the size segment in the URL path (same trick the iTunes mzstatic + Spotify scdn upgrades already use). SoulSync wasn't doing the rewrite — every Deezer-sourced cover got embedded at 1000×1000 regardless of how much higher resolution the CDN had available. # Verified empirically ``` $ for size in 1000 1400 1800 1900 2000; do curl -I "...{size}x{size}-..."; done 1000: 200 OK 106 KB 1400: 200 OK 198 KB 1800: 200 OK 331 KB 1900: 200 OK 371 KB 2000: 403 Forbidden ``` 1900 is the safe ceiling. Above that the CDN returns 403. CDN serves source-native bytes when source < target (smaller-source albums get same bytes whether we ask for 1000 or 1900), so asking for 1900 universally is safe. # Fix New `_upgrade_deezer_cover_url(url, target_size=1900)` helper in `core/deezer_client.py`. Pure function, mirrors the `_upgrade_spotify_image_url` pattern that already lives in `core/spotify_client.py`. Defensive on every input shape: - Empty / None → returned as-is - Non-Deezer URL (no `dzcdn`) → returned as-is - No size segment in URL → returned as-is - Already at/above target → returned as-is (idempotent, never downgrades) Applied at both cover-download sites: - `core/metadata/artwork.py::download_cover_art` — auto post-process flow. Mirrors the existing iTunes mzstatic upgrade right above it. - `core/tag_writer.py::download_cover_art` — enhanced library view's "Write Tags to File" feature. # Scope discipline - Helper applied at the DOWNLOAD boundary, not the source extraction point in `deezer_client.py`. Means cached entries in the metadata cache + DB row `image_url` columns keep the original 1000×1000 URL Deezer's API returned. Future CDN behavior changes only affect the download path, not stored data. - Pre-existing `prefer_caa_art` toggle (Settings → Library → Post-Processing) untouched — orthogonal workaround for users who want even higher quality (MusicBrainz Cover Art Archive, often 3000×3000+). - iTunes / Spotify upgrade paths untouched — they already worked. # Tests added (16) `tests/metadata/test_deezer_cover_url_upgrade.py`: - Standard upgrade: default target 1900 on cover URL, alternate dzcdn host (`e-cdns-images.dzcdn.net` vs `cdn-images.dzcdn.net`), artist picture URLs (same path pattern), 500×500 source upgrades too - Custom target size: smaller target = no-op (never downgrade), larger target works - Idempotent: already at/above target returned unchanged - Defensive on non-Deezer URLs: parametrised across 5 hosts (Spotify scdn, iTunes mzstatic, MB CAA, Last.fm, random) — all returned untouched - Defensive on malformed Deezer URL (no size segment) → returned as-is - Empty / None handling # Verification - 16/16 helper tests pass - 560/560 metadata + imports tests pass (no regression) - 2559 full suite passes - Ruff clean	2 weeks ago
Broque Thomas	402d851cac	Deezer search: drop advanced-syntax at endpoint, free-text + rerank wins Live-API verification revealed advanced-syntax queries hurt more than they help on this endpoint. Switching the import-modal Deezer search back to free-text + local rerank. # What live testing showed Hit Deezer's public API with both query forms for the issue #534 case (`Dirty White Boy` + `Foreigner`): Free-text (`q=Dirty White Boy Foreigner`): - Returns 21 results - Real Foreigner Head Games studio cut at #1 - Live versions at #2-10 - Karaoke / cover variants at #11-15 Advanced (`q=track:"Dirty White Boy" artist:"Foreigner"`): - Returns 12 results - "(2008 Remaster)" at #1 — canonical Head Games cut MISSING from top 8 entirely - Live + alt-album versions follow Advanced syntax DOES filter karaoke at the API level (none in the 12-result set vs. 5 at positions 11-15 in free-text), but it has its own ranking bias that surfaces remasters / "Best Of" cuts ahead of the canonical recording. Net regression for the user- facing goal. # Fix 1. Endpoint reverts to free-text query with local rerank applied. 2. Local rerank gains "remaster" / "remastered" / "reissue" patterns under VARIANT_TAG_PATTERNS (soft 0.4× penalty — user may want them but they shouldn't outrank the original). 3. Client kwarg support (`track=` / `artist=` / `album=`) preserved for future opt-in callers (e.g. exact-match flows where API- level filtering matters more than ranking). # Verified end-to-end against live Deezer API Re-ran the exact #534 case through the live API + new rerank. Top 15 results post-rerank: 1. Dirty White Boy — Foreigner — Head Games ← REAL CUT AT TOP 2-10. Various Live versions 11-15. Karaoke / cover / tribute variants ← BURIED Real Foreigner Head Games studio cut at #1, exactly the user's ask. # Tests - `test_relevance.py` — variant tag patterns extended; existing tests still pass (50 tests). - `test_search_match_endpoints.py::test_joins_track_and_artist_into_free_text_query` — replaces `test_passes_track_and_artist_as_kwargs`; verifies endpoint sends free-text join, NOT field-scoped kwargs (the prior test asserted the wrong direction now). - Karaoke-burying assertion at the endpoint still pins the user-visible behaviour. - Client kwarg path tests untouched (still pin advanced-syntax construction for future opt-in callers). # Verification - 75 relevance + endpoint + query tests pass - 2445 full suite passes - Ruff clean - Live Deezer API shows real cut at #1 post-rerank	2 weeks ago
Broque Thomas	1cc37081a6	Fix Deezer search relevance — issue #534 # Background User reported (#534) that the import-modal "Search for Match" dialog returned irrelevant results when Deezer was the metadata source. Searching `Dirty White Boy` + `Foreigner` returned 5+ karaoke / "originally performed by" / "in the style of" / "re-recorded" / tribute-band results ranked above the actual Foreigner studio cut from Head Games. User had to scroll past the junk every time, or fall back to iTunes search which is much slower. # Root cause — two layers 1. Endpoint joined `track + artist` into free-text query. `/api/deezer/search_tracks` was passing `q=Dirty White Boy Foreigner` to Deezer's `/search/track` API. Deezer fuzzy-matches that string across title / lyrics / artist / album / contributors and orders by global popularity — anything that appears across many compilations outranks the canonical recording. 2. No local rerank. None of the search-modal endpoints applied any post-filtering. Deezer's API order shipped straight to the user. # Fix — same architectural shape Cin would build ## Layer 1: field-scoped query at the client boundary `core/deezer_client.py::search_tracks()` now accepts optional `track`, `artist`, `album` kwargs. When provided, builds Deezer's advanced search syntax: `q=track:"X" artist:"Y" album:"Z"`. Massive relevance improvement because each term matches the right field instead of fuzzy-matching everywhere. Backward compat preserved: legacy free-text `query=` callers still work unchanged. Field-scoped path takes precedence when both are provided. Empty input fast-fails without an API call. Embedded double-quotes stripped (Deezer's syntax has no escape mechanism). ## Layer 2: provider-neutral relevance reranker New `core/metadata/relevance.py` module — pure-function rerank over the canonical `Track` dataclass. Composable scoring: - Cover/karaoke patterns (multiplier 0.05, effectively buries): matches "karaoke", "originally performed by", "in the style of", "made famous by", "tribute", "vocal version", "backing track", "cover version", "re-recorded", "cover by", etc. across title, album, AND artist fields. Catches the screenshot's exact junk: artist credits like "Pop Music Workshop" / "The Karaoke Channel" / "Foreigner Tribute Band". - Variant tags (multiplier 0.4): live / acoustic / demo / instrumental / remix / radio edit / club mix etc. — softer penalty since the user MAY want them. Skipped entirely when the expected_title contains the same tag (so searching "Track (Live)" still ranks Live versions first). - Exact artist boost (multiplier 1.5): primary artist exactly matches expected_artist after normalisation. Single strongest signal for "this is the canonical recording". - Title + artist similarity via SequenceMatcher (parentheticals + punctuation stripped before comparison). - Album-type weighting: album=1.0 > single/ep=0.85 > compilation=0.7. Compilations are more likely tribute / karaoke repackages. Each component is a standalone function so tests pin them individually without standing up the full pipeline. ## Wired at three search-modal endpoints - `/api/deezer/search_tracks` — uses both layers (field-scoped query + rerank). - `/api/itunes/search_tracks` — uses rerank only (iTunes API has no advanced-syntax search, but karaoke / cover variants still leak through and need the local penalty). - `/api/spotify/search_tracks` — already builds field-scoped `track:X artist:Y` query; rerank added as the consistency safety net so all three sources behave the same from the user's perspective. Other Deezer call sites (matching engine, watchlist scanner, auto-import single-track ID) deliberately not touched in this PR — they have their own elaborate scoring pipelines tuned to their specific contexts and aren't surfacing the user-reported issue. Per Cin: "don't refactor beyond what the task requires." # Tests 71 new tests across 3 files: - `tests/metadata/test_relevance.py` (50 tests) — every scoring component pinned individually + the issue #534 screenshot reproduced as a regression test (real Foreigner cut wins after rerank, karaoke variants drop to bottom). - `tests/metadata/test_deezer_search_query.py` (14 tests) — advanced-syntax query construction, field-scoped wiring at the client boundary, free-text path unchanged, kwargs win when ambiguous, limit clamping, cache key consistency. - `tests/imports/test_search_match_endpoints.py` (7 tests) — end-to-end through Flask test client: Deezer endpoint passes kwargs not joined query; karaoke buried at bottom for all three sources; legacy query param still works without rerank. # Verification - 2441 full suite passes (+71 from baseline 2370) - 0 failures (the prior watchdog flake fix held) - Ruff clean across all changed files - JS parses clean (`node -c webui/static/helper.js`) # Architectural standards followed - Logic at the right boundary. Query construction lives in the client (every caller benefits from one change). Rerank lives in a neutral module (`core/metadata/relevance.py`) over the canonical `Track` dataclass — works for any source, not Deezer- specific. - Explicit > implicit. Every scoring rule has its own named function. Pattern tables are module-level constants tests can introspect. - Scope discipline. Audited every Deezer search call site; fixed the user-reported one + the consistent siblings. Did NOT speculatively normalise every Deezer call across the codebase. - Backward compat. Free-text `query=` callers untouched. Kwargs added to existing client method signature with safe defaults. - Tests pin contract at correct boundary. Pure-function rerank tests don't mock anything; client-query tests stub at `_api_get`; endpoint tests run through the real Flask app.	2 weeks ago
Broque Thomas	3246490800	Auto-import: MBID/ISRC fast paths + duration sanity gate Brings the auto-import matcher to picard / beets / roon parity by reaching for the existing AcoustID-grade infrastructure (typed Album foundation, integrity check thresholds) and layering id-based exact matches on top of the fuzzy scorer. Picard-tagged libraries now land every track with full confidence on the first pass. Three layered phases in `core/imports/album_matching.match_files_to_tracks`: 1. MBID exact match — file has `musicbrainz_trackid` tag, source returns the same id → instant pair, full confidence, no fuzzy scoring. Picard's primary identifier; per-recording. 2. ISRC exact match — file has `isrc` tag, source returns the same id → same fast-path, slightly lower priority than mbid (isrc can be shared across remasters). Both ids normalised before compare (uppercase + strip dashes/spaces for isrc, lowercase for mbid). 3. Duration sanity gate — files in the fuzzy phase whose audio length differs from the candidate track's duration by more than `DURATION_TOLERANCE_MS` (3s, matching the post-download integrity check) are rejected before scoring runs. Defends against the cross-disc / cross-release / wrong-edit problem the integrity check used to catch only AFTER the file had already been moved + tagged + db-inserted. Tag reader (`_read_file_tags`) extended: - Reads `isrc` (uppercased, strip / / spaces normalisation deferred to matcher) - Reads `musicbrainz_trackid` as `mbid` (lowercased) - Reads `audio.info.length` and converts to `duration_ms` to match the metadata-source convention Metadata-source layer (`_build_album_track_entry`) extended: - Propagates `isrc` from top-level OR `external_ids.isrc` (spotify shape — would otherwise be stripped before reaching the matcher) - Propagates `musicbrainz_id` from top-level OR `external_ids.mbid` / `external_ids.musicbrainz` - Without this layer, fast paths would silently never fire in production even though unit tests pass — pinned by `test_album_track_entry_propagates_isrc_and_mbid_from_source` 18 new tests in `tests/imports/test_album_matching_exact_id.py`: - Direct: `find_exact_id_matches` with mbid, isrc, isrc normalisation, mbid > isrc priority, spotify-shape `external_ids.isrc`, no-id empty result, file-used-at-most-once - Direct: `duration_sanity_ok` within / outside tolerance, missing durations defer - End-to-end via `match_files_to_tracks`: mbid match short-circuits fuzzy scoring, id-matched files excluded from fuzzy phase, duration gate rejects wrong-disc collisions in fuzzy phase, normal matches pass through the gate, missing durations fall through, deezer seconds-vs-ms conversion, full picard-tagged 10-track album via mbid only - Production-shape: `_build_album_track_entry` propagates isrc + mbid from spotify-shape (`external_ids.isrc`) AND itunes-shape (top- level `isrc`) Verification: - 35 album-matching tests pass total (17 helper + 18 fast-path) - 23 multi-disc tests still pass after the extension (additive) - Full suite: 2311 passed (+18 new), 1 pre-existing flaky timing test failure (`test_watchdog_warns_about_stuck_workers` — passes in isolation, fails only in full-suite runs, unrelated to this PR) - Ruff clean For users: - Picard / Beets / Mp3Tag-tagged libraries (anyone who's organised their music) get instant perfect-confidence matches every time. - Soulseek-tagged downloads (which usually carry isrc when sourced via metadata-aware soulseekers) get the fast path too. - Naively-named files with no useful tags fall through to the improved fuzzy + duration-gated path — same correctness as before for the common case, much harder for the matcher to confidently pair the wrong file. - One step closer to standalone-DB feature parity with plex / jellyfin / navidrome scanners. Acoustid fingerprint fallback (for files with NO useful tags AND no MBID/ISRC) is the next followup PR.	2 weeks ago
Broque Thomas	9602d1827c	Final silent-exception sweep + ruff S110 lint guardrail — ~45 sites Catches the silent excepts the awk-based earlier sweeps missed: - Bare `except:` followed by `pass` (also swallows KeyboardInterrupt and SystemExit — actively wrong). Upgraded to `except Exception as e: logger.debug("...: %s", e)`. ~14 sites across connection_detect, soulseek_client, listenbrainz_manager, watchlist_scanner, youtube_client, navidrome_client, jellyfin_client, web_server. - `except Exception:` + pass that the awk pattern missed (e.g. multi-line or unusual whitespace). ~31 sites across automation_engine, database_update_worker, music_database, spotify_client, web_server, others. - 14 legitimate cleanup sites left silent with explicit `# noqa: S110` + comment explaining why (atexit handlers, finally-block conn.close calls). Logging during shutdown can itself crash because file handles get torn down before the handler fires. Also enables `S110` rule in pyproject.toml so this pattern fails CI going forward — drift fails at PR review instead of at runtime against a wedged worker thread. Tests path keeps S110 ignored (test fixtures legitimately use try-except-pass for cleanup). Adds a WHATS_NEW entry to helper.js summarizing the full #369 sweep. Verified: `python -m ruff check .` → All checks passed. Verified: `python -m pytest tests/` → 2188 passed. Closes #369	3 weeks ago
Broque Thomas	aa54bed818	Surface silent exceptions across remaining modules — ~70 sites Final sweep. Covers: - Downloads: candidates / lifecycle / master / monitor / wishlist_failed - Metadata: source / registry / cache / common / artwork (+ plex_client) - Imports: pipeline / resolution / file_ops / paths / guards - Library: path_resolver / retag / duplicate_cleaner - Stats / playlists / wishlist / discovery / automation / enrichment - Misc: hydrabase_client, soulsync_client, tag_writer, debug_info, api_call_tracker, album_consistency, beatport_unified_scraper, reorganize_runner, seasonal_discovery, lidarr_download_client, services/sync_service.py, automation_engine, automation/progress Two `_e` renames in imports/file_ops.py (outer scope binding `e`). A few finally-block sites in metadata/album_mbid_cache.py, library/track_identity.py, listening_stats_worker.py, watchlist/ auto_scan.py left silent — same reason as the rest of the sweep (logger calls during cleanup paths can themselves raise). Refs #369	3 weeks ago
Broque Thomas	822759740d	Fix Download Discography pulling wrong artist + log routing Two fixes. (1) Discography endpoint now does server-side per-source ID resolution. When the user clicked Download Discography on a library artist, the endpoint received whichever artist ID the frontend happened to pick (spotify_artist_id \|\| itunes_artist_id \|\| deezer_id \|\| library_db_id) and dispatched it as-is to whichever source it queried. If the picked ID didn't match the queried source's ID format, the lookup returned wrong-artist results (numeric ID collisions) or fell back to a fuzzy name search that picked a wrong artist. Two reproducible cases: - 50 Cent's library row had DB id 194687 — coincidentally a real Deezer artist ID for "Young Hot Rod". When the frontend's /enhanced fetch silently fell back to the DB id, the backend sent 194687 to Deezer, and Deezer returned Young Hot Rod's 50 albums in 50 Cent's discography modal. - Weird Al's library row had a stored Spotify ID. The frontend sent that to Deezer, which rejected the alphanumeric ID and fell back to fuzzy name search — which picked The Beatles somehow, returning 45 Beatles albums. The mechanism for per-source ID dispatch already exists in ``MetadataLookupOptions.artist_source_ids``, and the watchlist scanner already uses it; the on-demand discography endpoint just wasn't wired to it. Fix: when the URL artist_id matches a library row by ANY stored ID (DB id, spotify_artist_id, itunes_artist_id, deezer_id, or musicbrainz_id), pull every stored provider ID and pass them as ``artist_source_ids``. Each source gets its OWN stored ID regardless of which one the URL carries. When the URL ID is a non-library source-native ID and the row lookup misses entirely, behavior is identical to before (single-ID dispatch fallback). Logged the resolved per-source ID dict at INFO so future "wrong artist showed up" diagnostics are immediately legible in app.log. (2) Logger namespace fix in core/artists/quality.py and core/metadata/multi_source_search.py. Both modules used ``logging.getLogger(__name__)`` which resolves to ``core.artists.quality`` / ``core.metadata.multi_source_search`` — neither under the ``soulsync`` namespace where the file handler is wired. Result: every [Enhance], [MultiSourceSearch], and direct-lookup INFO line was being written to a logger with no handlers and silently dropped. App log showed the slow-request warning but no diagnostic detail. Switched both to ``get_logger()`` from utils.logging_config so the soulsync.* namespace picks them up. Same content, now actually lands in app.log. Confirmed working in live test: ``[Enhance] Direct lookup matched: deezer ID 1476162252 → 'Desastre'`` No behavior change in any other caller. Empty ``artist_source_ids`` (no library row matched) reaches lookup as ``None`` → identical to current single-ID dispatch path. Logger fix is pure routing — no content change.	3 weeks ago
Broque Thomas	7316646b01	Extract multi-source search; Enhance Quality matches Redownload coverage Track Redownload had been doing parallel multi-source metadata search across every configured source the whole time; Enhance Quality was running a single-source primary fallback that returned junk matches with empty fields when the primary was iTunes (Discord report: "unknown artist - unknown album - unknown track" wishlist entries for users with neither Spotify nor Deezer connected). Lift the redownload search into core/metadata/multi_source_search.py and point both flows at it. Same scoring, same per-source query optimization (Deezer's structured artist:/track: form), same current-match flagging via stored source IDs. ArtistQualityDeps now takes get_metadata_search_sources (returns [(name, client), ...] for every configured source) instead of the single-primary get_metadata_fallback_client + get_metadata_fallback_source. Spotify direct-lookup stays as a fast-path optimization (only Spotify exposes get_track_details(id) returning rich raw payload); when it doesn't fire, the multi-source parallel search picks the cross-source best match. Empty-field matches still rejected before wishlist add. Tests: _build_deps helper updated to accept the new search_sources contract while preserving fallback_client/fallback_source ergonomics. Reframed tests for the new semantics — direct-lookup is no longer gated on Spotify being the active primary; failure reason now lists every searched source. Added a test pinning the no-sources-configured prompt. 17/17 quality tests green, 2128/2128 full suite green.	3 weeks ago
Broque Thomas	77c54ab7a7	Migrate discography + quality scanner to typed Album path Three more album-shape consumers now route through Album.from_<source>_dict() when caller passes a known source: - _build_discography_release_dict (artist discography cards) - _build_artist_detail_release_card (artist detail release cards) - _normalize_track_album (quality scanner result normalization) Legacy duck-typing stays as fallback for unknown source, non-dict input, or converter errors. Pure additive — existing callers without source kwarg unchanged.	3 weeks ago
Broque Thomas	967c7f7c0a	Migrate album-info builders to typed Album path Steps 2+3 of typed metadata migration. Two album-info builders now route through Album.from_<source>_dict() when caller passes a known source: - _build_album_info (album-tracks lookups) - _build_single_import_context_payload (single-track import context) Legacy duck-typing stays as fallback for unknown source, non-dict input, or converter errors. Pure additive — existing callers without source kwarg unchanged.	3 weeks ago
Broque Thomas	eab1297afc	Add Qobuz + Tidal album converters Audit caught two missing providers from the foundation pr. Both return album-shaped data via their clients (search + download flows). Tidal uses tidalapi objects rather than dicts so the converter is from_tidal_object, not _dict. Enrichment-only providers (lastfm/genius/acoustid/listenbrainz/ audiodb) intentionally have no album converter — they enrich existing rows, never return album shapes. Tests: +8 cases. 40 total now.	3 weeks ago
Broque Thomas	529486a2d1	Foundation: typed Album/Track/Artist + per-provider converters New core/metadata/types.py with canonical dataclasses + classmethod converters for spotify/itunes/deezer/discogs/musicbrainz/hydrabase. Each converter is the single place that knows that provider's wire shape — addresses the duck-typing pattern Cin flagged. Pure additive: no consumer code changed. Follow-up PRs migrate consumers one at a time. Migration plan at docs/metadata-types-migration.md. Tests: 32 cases pin per-provider semantics + cross-provider invariants. Also stabilized a flaky discogs test that depended on local config state.	3 weeks ago
Broque Thomas	4b15fe0b75	Fix album MBID inconsistency: detector + persistent release-MBID cache Discord report (Samuel [KC]): tracks of the same album sometimes carry different MUSICBRAINZ_ALBUMID tags, which causes Navidrome (and other media servers grouping by album MBID) to split the album into multiple entries. Two-part fix — one for existing libraries, one for the root cause that lets new imports drift. Part 1 — Detector + fix action (catches existing dissenters): `core/repair_jobs/mbid_mismatch_detector.py`: - New helpers: `_read_album_mbid_from_file` and `_write_album_mbid_to_file` use the Picard-standard tag conventions (`TXXX:MusicBrainz Album Id` for MP3, `MUSICBRAINZ_ALBUMID` for FLAC/OGG, `----:com.apple.iTunes:MusicBrainz Album Id` for MP4). - New scan phase `_scan_album_mbid_consistency` runs after the existing track-MBID scan: groups tracks by DB `album_id`, reads each track's embedded album MBID, finds the consensus (most-common) MBID via `Counter`, flags dissenters. Tracks without an album MBID at all are skipped (they don't break Navidrome — only an explicit MBID disagreement does). Albums where MBIDs are perfectly tied (no clear consensus) are skipped too — surface as a manual decision instead of fixing toward a 1/N tie. - New finding type `album_mbid_mismatch` carries `consensus_mbid`, `wrong_mbid`, `consensus_count`, `total_tracks_with_mbid`, and a human-readable reason string. `core/repair_worker.py`: - Added `'album_mbid_mismatch': self._fix_album_mbid_mismatch` to the fix dispatch dict and to the `fixable_types` tuple so auto-fix + bulk-fix paths pick it up. - New `_fix_album_mbid_mismatch` method reads `consensus_mbid` from finding details, resolves the dissenter's file path via the shared library resolver, calls `_write_album_mbid_to_file` to rewrite the tag in place. Doesn't touch the album's other tracks (they're already in agreement). Part 2 — Root cause fix (prevents new SoulSync imports from drifting): The original in-memory `mb_release_cache` in `core/metadata/source.py` maps `(normalized_album, artist) -> release_mbid` so per-track enrichment of the same album hits the cache and writes the same MUSICBRAINZ_ALBUMID to every track. That cache is bounded (4096 entries) and in-process — so cache eviction (when other albums are processed in between) and server restart can BOTH cause inconsistency. Per-track album-name variation (e.g. some tracks tagged `"Album"`, others tagged `"Album (Deluxe)"`) and per-track artist variation (features) make it worse. `core/metadata/album_mbid_cache.py` (new module): - DB-backed `lookup(normalized_album, artist) -> release_mbid` and `record(...)` functions. Same key shape as the in-memory cache. - Strict additive design: every public function is wrapped in try/except and degrades to None / no-op on ANY database error. The existing in-memory cache + MusicBrainz lookup remains the authoritative fallback. If this module breaks, downloads continue exactly as they would today. `database/music_database.py`: - New `mb_album_release_cache` table with composite primary key `(normalized_album_key, artist_key)`. Reverse-lookup index on `release_mbid` for future debug tooling. Created via the existing `CREATE TABLE IF NOT EXISTS` migration pattern — idempotent, no schema version bump needed. `core/metadata/source.py`: - Surgical change inside the existing `embed_source_ids` in-memory-cache-miss branch: BEFORE calling MusicBrainz, consult the persistent cache. If a previous SoulSync run already resolved this album's release MBID, reuse it. After a successful MB lookup, store in BOTH caches. Both calls wrapped in defensive try/except so any failure falls through to existing logic. Tests: - `tests/metadata/test_album_mbid_cache.py` — 16 cache tests: round-trip, idempotent re-record, overwrite semantics, clear_all, album+artist independence (no Greatest Hits collisions), defensive None-on-empty-input, graceful degradation when the DB is unavailable / connection raises / commit fails, schema sanity (table + index exist after init). - `tests/test_album_mbid_consistency.py` — 13 detector tests: tag read/write round-trip on real FLAC files, Picard-standard tag descriptors, defensive paths (unreadable file, empty input), detector behavior (agreement → no flags, lone dissenter → flag, ties → no flag, single-track albums → skipped, no-MBID tracks → skipped, unresolvable file paths → skipped). - `tests/metadata/test_metadata_enrichment.py` — added autouse fixture monkeypatching the persistent cache to no-op for tests in this file. The existing tests pin per-call MB counts and in-memory cache state; without the fixture, persistent rows from earlier tests would bypass the MB call. Persistent layer has its own dedicated tests. Verified: 1782 tests pass (29 new), ruff clean, smoke test confirms end-to-end cache round-trip works. WHATS_NEW entry under '2.4.2' dev cycle.	3 weeks ago
Broque Thomas	34ba26f5c8	Persist source IDs at download time + backfill onto tracks on sync Followup to fix/watchlist-external-id-match. The companion PR closed the demand side — the watchlist scanner asks for tracks by external IDs before falling back to fuzzy. But for users on Plex / Jellyfin / Navidrome the supply side was still broken: tracks.spotify_track_id (and the other ID columns) only got populated by the asynchronous enrichment workers, sometimes hours after the file was actually written. During that window the ID match fell through to fuzzy and the bug returned. We were already collecting every ID during post-processing — they live in the `pp` dict in core/metadata/source.py:embed_source_ids and get embedded into file tags. We just dropped the in-memory copy afterwards. This PR persists them and uses them: - Schema migration adds spotify_track_id / itunes_track_id / deezer_track_id / tidal_track_id / qobuz_track_id / musicbrainz_recording_id / audiodb_id / soul_id / isrc columns + indexes to the existing track_downloads table (already keyed by file_path). - core/metadata/source.py:embed_source_ids exposes pp["id_tags"] and the resolved ISRC back to the import context as _embedded_id_tags / _isrc. - core/imports/side_effects.py:record_download_provenance reads those context fields and passes them to db.record_track_download, which now accepts the new ID kwargs and persists them. - New db.get_provenance_by_file_path with exact + basename-suffix fallback (handles container mount-root differences between download-time path and media-server-reported path). - New db.backfill_track_external_ids_from_provenance copies IDs from track_downloads onto a tracks row idempotently — COALESCE on every column preserves any value the enrichment worker already wrote (enrichment is more authoritative for late binding). - database/music_database.py:insert_or_update_media_track (the single insertion point used by every Plex / Jellyfin / Navidrome sync) calls the backfill immediately after each INSERT/UPDATE. - New core/library/track_identity.py:find_provenance_by_external_id used as a second-tier fallback in watchlist_scanner.is_track_missing _from_library — catches the window between download and media-server sync. Caller checks os.path.exists on the provenance file_path before treating it as "already in library" so a deleted file doesn't prevent re-download. Effect: freshly downloaded files become ID-recognizable to the watchlist on the very next scan, no enrichment-wait window. 19 regression tests in tests/test_provenance_id_persistence.py: - Schema migration adds expected columns + indexes - record_track_download persists every ID kwarg - record_track_download backward-compat (old kwargs still work) - get_provenance_by_file_path: exact match, basename fallback for mount-root differences, multi-record latest-wins, defensive None - backfill: copies all IDs, preserves existing via COALESCE, no-op when no provenance exists - find_provenance_by_external_id: per-ID lookup, ISRC cross-bridge, OR semantics, latest-wins on multiple matches Out of scope: backfilling provenance for files downloaded BEFORE this PR (their track_downloads rows don't carry the new IDs). Those continue to wait for enrichment. Acceptable — only affects historical files; new downloads benefit immediately. Full pytest 1625 passed; ruff clean.	3 weeks ago
Antti Kettunen	b85a05fb88	Move image URL normalization into metadata helpers - keep existing /api/image-proxy URLs from being wrapped again - reuse the shared metadata package instead of duplicating URL logic in web_server.py - add regression coverage for proxy passthrough and internal URL normalization	3 weeks ago
Antti Kettunen	36131656dd	Make Spotify status updates event-driven - move Spotify status publishing onto auth, disconnect, and rate-limit transitions - keep dashboard and debug consumers on the shared cached snapshot - leave only the initial snapshot seed as a fallback probe	3 weeks ago
Antti Kettunen	cc13fb8f01	Move metadata status cache into core/metadata - move metadata-source and Spotify status caching out of web_server.py - keep the public /status payload unchanged while shrinking server-side glue - centralize invalidation and TTL handling in core/metadata/status.py	3 weeks ago
Antti Kettunen	e2bd0e1871	Split metadata source and Spotify status - Keep the primary metadata provider snapshot generic and move Spotify auth/rate-limit details into a separate status object. - Update the websocket fixture and dashboard/settings consumers to read the two buckets independently.	3 weeks ago
elmerohueso	cd19aa0301	revert tidal artist/track id name for hifi downloads Co-authored-by: Copilot <copilot@github.com>	3 weeks ago
elmerohueso	4ddb86522c	name tidal and hifi tags the same way	3 weeks ago
elmerohueso	e78dd7f593	get tidal tags during download, without needing to go through the enrichment pipeline	3 weeks ago
elmerohueso	1f4e8e5e3b	get hifi tags during download, without needing to go through the enrichment pipeline	3 weeks ago
elmerohueso	b363afe195	bpm for tidal, copyright and bpm for hifi	3 weeks ago
elmerohueso	f9f47f978e	fix post-download tagging, and enable tagging for hifi	3 weeks ago
Antti Kettunen	74e3cc460c	Simplify service status and labels - Flatten the Spotify service-status rendering so it shows rate-limit and recovery states explicitly, while otherwise displaying the active metadata provider directly. - Keep the Spotify auth controls and metadata-source picker aligned with the real session state after authenticate and disconnect flows. - Return "Unmapped" for unknown metadata source labels instead of implying iTunes. - Update the metadata registry tests to cover the new label fallback.	3 weeks ago
Antti Kettunen	55603be14c	Clarify Spotify auth flow and sync UI - Send Spotify auth completion back to the opener so the settings page refreshes immediately - Make the local auth flow go straight through to Spotify instead of showing the temporary instruction page - Keep the remote/docker instruction page available for manual callback setups - Sync Spotify status, connect/disconnect buttons, and metadata source selection after auth and disconnect - Keep the disconnect behavior aligned with the active primary metadata source	3 weeks ago
Antti Kettunen	9646f6ca7f	Clarify Spotify auth actions - Hide the auth button when a Spotify session is active - Treat disconnect as a session change, not a provider swap - Share metadata source labels in the registry - Tighten rate-limit copy around Spotify-specific behavior	3 weeks ago
Antti Kettunen	e6c2bee427	Move profile Spotify cache into registry - let core.metadata.registry own per-profile Spotify client caching - register the DB-backed profile credentials provider from web_server.py - invalidate only the affected profile cache entry on save, delete, and auth	4 weeks ago
Antti Kettunen	11be8834eb	Use metadata registry for web_server clients - make web_server.py read and refresh Spotify from core.metadata.registry - add single-key metadata cache eviction for Spotify reauth - export the new cache helper through the metadata package shims	4 weeks ago

1 2

58 Commits (dev)