From cceffbd8ecf9defead87c8e2e3a7fc3bed87ddd0 Mon Sep 17 00:00:00 2001 From: Broque Thomas <26755000+Nezreka@users.noreply.github.com> Date: Wed, 6 May 2026 19:00:53 -0700 Subject: [PATCH] Honor manually-matched source IDs in per-source enrichment workers MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit GitHub issue #501 (@Tacobell444). After manually matching an album to a specific source ID via the match-chip UI, clicking "Enrich" on that album would fuzzy-search by name and overwrite the manual match with whatever the search returned — or revert the match status to ``not_found`` if name search missed. Reorganize then read the now- wrong ID and moved files to the wrong destination. Root cause was in the per-source enrichment workers' ``_process_*_individual`` methods. Several workers (Spotify, iTunes) ran search-by-name unconditionally with no check for an existing stored ID. Others (Deezer, Tidal, Qobuz) skipped on existing-ID but without refreshing metadata — preserved the ID but didn't actually honor the user's intent of "use this match to pull fresh data". Cin-shape lift: same fix needed in 5 workers, so extracted the shared behavior into ``core/enrichment/manual_match_honoring.py``: honor_stored_match( db, entity_table, entity_id, id_column, client_fetch_fn, on_match_fn, log_prefix, ) -> bool Per-worker variability (DB column name, client fetch method, response shape) plugs in via callbacks. Workers call the helper at the top of ``_process_album_individual`` / ``_process_track_individual``; if it returns True, the manual match was honored and the search-by-name fallback is skipped. If False (no stored ID, fetch failed, or empty response), the worker's existing search-by-name flow runs as before. Workers wired: - spotify_worker — album + track (was overwriting; now honors) - itunes_worker — album + track (was overwriting; now honors) - deezer_worker — album + track (was skip-on-id; now refreshes) - tidal_worker — album + track (was skip-on-id; now refreshes) - qobuz_worker — album + track (was skip-on-id; now refreshes) Workers left alone (already correct): - discogs_worker — already had inline stored-ID fast path that refreshes metadata. Same behavior, just inline; refactoring to use the shared helper would be churn for zero behavior change. - audiodb_worker — same — inline fast path with full metadata refresh. - musicbrainz_worker — preserves existing MBID and marks status, which is the correct behavior for MB (the MBID itself is the match payload — no separate metadata fetch). - lastfm_worker / genius_worker — name-based services with no source-specific IDs to honor. Inherent re-search per call. Reorganize fixed indirectly — it always honored stored IDs correctly via ``library_reorganize._extract_source_ids``. The "Reorganize broken" symptom was downstream of broken Enrich corrupting the stored ID. Tests: - ``tests/enrichment/test_manual_match_honoring.py`` — 11 tests pinning the shared helper contract: stored-ID fast path, no-ID fallthrough, empty-string treated as no ID, missing row, fetch exception caught and falls through, fetch returns None falls through, callback exceptions propagate, configurable table + column, defensive table-name whitelist. - Per-worker wiring NOT tested individually — the workers depend on live DB / client objects that are heavy to mock. The shared helper's contract is pinned; per-worker call sites are short enough to verify by code review. 2173/2173 full suite green. Closes #501. --- core/deezer_worker.py | 41 ++- core/enrichment/manual_match_honoring.py | 128 ++++++++ core/itunes_worker.py | 51 +++ core/qobuz_worker.py | 36 ++- core/spotify_worker.py | 55 ++++ core/tidal_worker.py | 40 ++- tests/enrichment/__init__.py | 0 .../enrichment/test_manual_match_honoring.py | 302 ++++++++++++++++++ webui/static/helper.js | 14 + 9 files changed, 645 insertions(+), 22 deletions(-) create mode 100644 core/enrichment/manual_match_honoring.py create mode 100644 tests/enrichment/__init__.py create mode 100644 tests/enrichment/test_manual_match_honoring.py diff --git a/core/deezer_worker.py b/core/deezer_worker.py index 4940d16d..bdaf2fb2 100644 --- a/core/deezer_worker.py +++ b/core/deezer_worker.py @@ -9,6 +9,7 @@ from utils.logging_config import get_logger from database.music_database import MusicDatabase from core.deezer_client import DeezerClient from core.worker_utils import interruptible_sleep, set_album_api_track_count +from core.enrichment.manual_match_honoring import honor_stored_match logger = get_logger("deezer_worker") @@ -383,11 +384,33 @@ class DeezerWorker: self.stats['not_found'] += 1 logger.debug(f"No match for artist '{artist_name}'") + def _refresh_album_via_stored_id(self, album_id, stored_id, full_album_dict): + """Issue #501 callback. Stored ID exists → fetched full Deezer + album payload. Use it as both args to ``_update_album`` (search- + result and full-data shapes overlap on the fields we need — + artist verification skipped since manual match presumably + already vetted).""" + self._update_album(album_id, full_album_dict, full_album_dict) + + def _refresh_track_via_stored_id(self, track_id, stored_id, full_track_dict): + """Issue #501 callback for tracks — same pattern as albums.""" + self._update_track(track_id, full_track_dict, full_track_dict) + def _process_album(self, album_id: int, album_name: str, artist_name: str, item: Dict[str, Any]): """Process an album: search Deezer, verify, fetch full details, store metadata""" - existing_id = self._get_existing_id('album', album_id) - if existing_id: - logger.debug(f"Preserving existing Deezer ID for album '{album_name}': {existing_id}") + # Issue #501: honor manual matches. Pre-fix this method just + # SKIPPED when a stored ID was present (preserved the ID but + # never refreshed metadata). Now it goes through the full + # refresh path via the stored ID, picking up label / genres / + # explicit updates without ever overwriting the manual match. + if honor_stored_match( + db=self.db, entity_table='albums', entity_id=album_id, + id_column='deezer_id', + client_fetch_fn=self.client.get_album_raw, + on_match_fn=self._refresh_album_via_stored_id, + log_prefix='Deezer', + ): + self.stats['matched'] += 1 return result = self.client.search_album(artist_name, album_name) @@ -430,9 +453,15 @@ class DeezerWorker: def _process_track(self, track_id: int, track_name: str, artist_name: str, item: Dict[str, Any]): """Process a track: search Deezer, verify, fetch full details for BPM, store metadata""" - existing_id = self._get_existing_id('track', track_id) - if existing_id: - logger.debug(f"Preserving existing Deezer ID for track '{track_name}': {existing_id}") + # Issue #501: honor manual matches (see _process_album). + if honor_stored_match( + db=self.db, entity_table='tracks', entity_id=track_id, + id_column='deezer_id', + client_fetch_fn=self.client.get_track_raw, + on_match_fn=self._refresh_track_via_stored_id, + log_prefix='Deezer', + ): + self.stats['matched'] += 1 return result = self.client.search_track(artist_name, track_name) diff --git a/core/enrichment/manual_match_honoring.py b/core/enrichment/manual_match_honoring.py new file mode 100644 index 00000000..734d2cd4 --- /dev/null +++ b/core/enrichment/manual_match_honoring.py @@ -0,0 +1,128 @@ +"""Honor manually-matched source IDs in per-source enrichment workers. + +GitHub issue #501 (@Tacobell444): every per-source enrichment worker's +``_process_*_individual`` method ran a fuzzy text search on the album / +track name and overwrote the stored source ID with whatever the search +returned. If the user had manually matched an album to a specific source +ID (e.g. set ``albums.spotify_album_id = 'ABC'`` via the match-chip UI), +the next "Enrich" click would search by name → pick a different result +→ overwrite the manual match with the wrong ID, OR fail to match +anything and revert the status to ``not_found``. + +This module lifts the "honor stored ID" fast path into one shared +helper. Each per-source worker (Spotify / iTunes / Deezer / Discogs / +MusicBrainz / AudioDB / Tidal / Qobuz) calls it before falling back +to its existing search-by-name flow. Same fix in 8 workers gets +exactly one implementation; per-worker variability (column name, +client fetch method, response shape) plugs in via callbacks. + +Lift what's truly shared. Caller knows its own column + client +method + update logic; the helper just orchestrates. +""" + +from __future__ import annotations + +from typing import Any, Callable, Optional + +from utils.logging_config import get_logger + +logger = get_logger("enrichment.manual_match_honoring") + + +def _read_id_column(db, entity_table: str, entity_id, id_column: str) -> Optional[str]: + """Read the stored source ID for one entity. Returns None when the + column is empty / unset.""" + if entity_table not in ('albums', 'tracks', 'artists'): + # Defensive: we only operate on these three. Avoids SQL injection + # via a bad table name (id_column is also restricted to known + # column names by callers but defense in depth never hurts). + return None + conn = db._get_connection() + try: + cursor = conn.cursor() + cursor.execute( + f"SELECT {id_column} FROM {entity_table} WHERE id = ?", + (entity_id,), + ) + row = cursor.fetchone() + finally: + conn.close() + if not row: + return None + value = row[0] if not hasattr(row, 'keys') else row[id_column] + return str(value).strip() if value else None + + +def honor_stored_match( + *, + db, + entity_table: str, + entity_id, + id_column: str, + client_fetch_fn: Callable[[str], Any], + on_match_fn: Callable[[Any, str, Any], None], + log_prefix: str = '', +) -> bool: + """Fast-path enrichment via a stored source ID — preserves manual + matches. + + Args: + db: ``MusicDatabase`` instance (for the column read). + entity_table: ``'albums'``, ``'tracks'``, or ``'artists'``. + entity_id: Library DB ID of the entity to enrich. + id_column: Column on ``entity_table`` that stores the source- + specific ID (``spotify_album_id`` / ``itunes_album_id`` / + ``deezer_id`` / etc). + client_fetch_fn: Callable taking the stored ID and returning + the source's raw response (Album dataclass, dict, or + whatever the client returns). Typically + ``self.client.get_album`` or ``self.client.get_track``. + on_match_fn: Worker callback invoked with + ``(entity_id, stored_id, api_response)`` to apply the + metadata refresh. Worker knows the response shape; helper + doesn't. + log_prefix: Display name for log lines (``'Spotify'`` / + ``'iTunes'`` / etc). + + Returns: + True if a stored ID was found AND the fetch returned data AND + the on-match callback ran. Caller skips its search-by-name + flow and counts a match. + + False if no stored ID is set, the fetch failed, or the fetch + returned empty. Caller falls through to its existing search- + by-name flow (the legacy behavior for un-matched entities). + + Notes: + - Exceptions in ``client_fetch_fn`` are caught and logged at + warning level — caller falls through to search. + - Exceptions in ``on_match_fn`` propagate (those are real + DB errors the worker should know about). + """ + stored_id = _read_id_column(db, entity_table, entity_id, id_column) + if not stored_id: + return False + + try: + api_data = client_fetch_fn(stored_id) + except Exception as exc: + logger.warning( + f"[{log_prefix}] Stored-ID fetch failed for " + f"{entity_table[:-1]} #{entity_id} (id={stored_id}): {exc}" + ) + return False + + if not api_data: + logger.debug( + f"[{log_prefix}] Stored ID {stored_id} for " + f"{entity_table[:-1]} #{entity_id} returned empty data — " + f"falling through to search-by-name" + ) + return False + + on_match_fn(entity_id, stored_id, api_data) + logger.info( + f"[{log_prefix}] Honored manual match: " + f"{entity_table[:-1]} #{entity_id} → {id_column}={stored_id}" + ) + return True diff --git a/core/itunes_worker.py b/core/itunes_worker.py index 1e1c2e9d..eebca734 100644 --- a/core/itunes_worker.py +++ b/core/itunes_worker.py @@ -3,12 +3,14 @@ import re import threading import time from difflib import SequenceMatcher +from types import SimpleNamespace from typing import Optional, Dict, Any, List from datetime import datetime, timedelta from utils.logging_config import get_logger from database.music_database import MusicDatabase from core.itunes_client import iTunesClient from core.worker_utils import interruptible_sleep, set_album_api_track_count +from core.enrichment.manual_match_honoring import honor_stored_match logger = get_logger("itunes_worker") @@ -526,11 +528,49 @@ class iTunesWorker: # ── Individual fallback processing ───────────────────────────────── + def _refresh_album_via_stored_id(self, album_id, stored_id, api_album_dict): + """Issue #501 callback. Convert ``client.get_album()`` dict into + the Album-shaped object ``_update_album`` expects, then call it. + Preserves the manual match — never overwrites the stored ID + with a different name-search result.""" + images = api_album_dict.get('images') or [] + image_url = '' + if images and isinstance(images[0], dict): + image_url = images[0].get('url', '') or '' + adapter = SimpleNamespace( + id=api_album_dict.get('id') or stored_id, + name=api_album_dict.get('name', ''), + image_url=image_url, + album_type=api_album_dict.get('album_type', 'album'), + release_date=api_album_dict.get('release_date', ''), + total_tracks=api_album_dict.get('total_tracks', 0), + ) + self._update_album(album_id, adapter) + + def _refresh_track_via_stored_id(self, track_id, stored_id, api_track_dict): + """Track-level callback — track update only writes ID + status, + no metadata backfill, so the dict shape is irrelevant beyond + carrying the stored ID through.""" + adapter = SimpleNamespace(id=api_track_dict.get('id') or stored_id) + self._update_track_from_search(track_id, adapter) + def _process_album_individual(self, item: Dict[str, Any]): album_id = item['id'] album_name = item['name'] artist_name = item.get('artist', '') + # Issue #501: honor manual matches (see SpotifyWorker for full + # explanation — same pattern across every per-source worker). + if honor_stored_match( + db=self.db, entity_table='albums', entity_id=album_id, + id_column='itunes_album_id', + client_fetch_fn=self.client.get_album, + on_match_fn=self._refresh_album_via_stored_id, + log_prefix='iTunes', + ): + self.stats['matched'] += 1 + return + query = f"{artist_name} {album_name}" if artist_name else album_name results = self.client.search_albums(query, limit=5) @@ -561,6 +601,17 @@ class iTunesWorker: track_name = item['name'] artist_name = item.get('artist', '') + # Issue #501: honor manual matches. + if honor_stored_match( + db=self.db, entity_table='tracks', entity_id=track_id, + id_column='itunes_track_id', + client_fetch_fn=self.client.get_track_details, + on_match_fn=self._refresh_track_via_stored_id, + log_prefix='iTunes', + ): + self.stats['matched'] += 1 + return + query = f"{artist_name} {track_name}" if artist_name else track_name results = self.client.search_tracks(query, limit=5) diff --git a/core/qobuz_worker.py b/core/qobuz_worker.py index 5e85c508..31d26c78 100644 --- a/core/qobuz_worker.py +++ b/core/qobuz_worker.py @@ -9,6 +9,7 @@ from utils.logging_config import get_logger from database.music_database import MusicDatabase from core.qobuz_client import _qobuz_is_rate_limited from core.worker_utils import interruptible_sleep +from core.enrichment.manual_match_honoring import honor_stored_match logger = get_logger("qobuz_worker") @@ -426,12 +427,26 @@ class QobuzWorker: self.stats['not_found'] += 1 logger.debug(f"No match for artist '{artist_name}'") + def _refresh_album_via_stored_id(self, album_id, stored_id, full_album_dict): + """Issue #501 callback. Same shape as Tidal/Deezer — pass the + full-album dict in both arg slots.""" + self._update_album(album_id, full_album_dict, full_album_dict) + + def _refresh_track_via_stored_id(self, track_id, stored_id, full_track_dict): + self._update_track(track_id, full_track_dict, full_track_dict) + def _process_album(self, album_id: int, album_name: str, artist_name: str, item: Dict[str, Any]): """Process an album: search Qobuz, verify, fetch full details, store metadata""" - existing_id = self._get_existing_id('album', album_id) - if existing_id: - logger.debug(f"Preserving existing Qobuz ID for album '{album_name}': {existing_id}") - self._mark_status('album', album_id, 'matched') + # Issue #501: honor manual matches. Pre-fix this just marked + # status='matched' without refreshing metadata. + if honor_stored_match( + db=self.db, entity_table='albums', entity_id=album_id, + id_column='qobuz_id', + client_fetch_fn=self.client.get_album, + on_match_fn=self._refresh_album_via_stored_id, + log_prefix='Qobuz', + ): + self.stats['matched'] += 1 return result = self.client.search_album(artist_name, album_name) @@ -484,10 +499,15 @@ class QobuzWorker: def _process_track(self, track_id: int, track_name: str, artist_name: str, item: Dict[str, Any]): """Process a track: search Qobuz, verify, fetch full details, store metadata""" - existing_id = self._get_existing_id('track', track_id) - if existing_id: - logger.debug(f"Preserving existing Qobuz ID for track '{track_name}': {existing_id}") - self._mark_status('track', track_id, 'matched') + # Issue #501: honor manual matches. + if honor_stored_match( + db=self.db, entity_table='tracks', entity_id=track_id, + id_column='qobuz_id', + client_fetch_fn=self.client.get_track, + on_match_fn=self._refresh_track_via_stored_id, + log_prefix='Qobuz', + ): + self.stats['matched'] += 1 return result = self.client.search_track(artist_name, track_name) diff --git a/core/spotify_worker.py b/core/spotify_worker.py index a3e9cd89..3fa503c4 100644 --- a/core/spotify_worker.py +++ b/core/spotify_worker.py @@ -3,12 +3,14 @@ import re import threading import time from difflib import SequenceMatcher +from types import SimpleNamespace from typing import Optional, Dict, Any, List from datetime import datetime, date, timedelta from utils.logging_config import get_logger from database.music_database import MusicDatabase from core.spotify_client import SpotifyClient, SpotifyRateLimitError from core.worker_utils import interruptible_sleep, set_album_api_track_count +from core.enrichment.manual_match_honoring import honor_stored_match logger = get_logger("spotify_worker") @@ -630,11 +632,53 @@ class SpotifyWorker: # ── Individual fallback processing ───────────────────────────────── + def _refresh_album_via_stored_id(self, album_id, stored_id, api_album_dict): + """``honor_stored_match`` callback. Wraps the dict from + ``client.get_album(stored_id)`` in a SimpleNamespace adapter + with the attributes ``_update_album`` reads, then calls it. + Preserves the manual match — never reaches search-by-name.""" + images = api_album_dict.get('images') or [] + image_url = '' + if images and isinstance(images[0], dict): + image_url = images[0].get('url', '') or '' + adapter = SimpleNamespace( + id=api_album_dict.get('id') or stored_id, + name=api_album_dict.get('name', ''), + image_url=image_url, + album_type=api_album_dict.get('album_type', 'album'), + release_date=api_album_dict.get('release_date', ''), + total_tracks=api_album_dict.get('total_tracks', 0), + ) + self._update_album(album_id, adapter) + + def _refresh_track_via_stored_id(self, track_id, stored_id, api_track_dict): + """``honor_stored_match`` callback for tracks. The track-level + update only writes the ID + match status — no metadata + backfill, so the dict shape is irrelevant beyond carrying the + stored ID through.""" + adapter = SimpleNamespace(id=api_track_dict.get('id') or stored_id) + self._update_track_from_search(track_id, adapter) + def _process_album_individual(self, item: Dict[str, Any]): album_id = item['id'] album_name = item['name'] artist_name = item.get('artist', '') + # Issue #501: honor manual matches. If the user has already + # set spotify_album_id on this album row (via match-chip UI), + # refresh metadata via that ID and skip search-by-name — which + # would otherwise overwrite the manual match with whatever + # name-search returned. + if honor_stored_match( + db=self.db, entity_table='albums', entity_id=album_id, + id_column='spotify_album_id', + client_fetch_fn=self.client.get_album, + on_match_fn=self._refresh_album_via_stored_id, + log_prefix='Spotify', + ): + self.stats['matched'] += 1 + return + query = f"{artist_name} {album_name}" if artist_name else album_name results = self.client.search_albums(query, limit=5) @@ -672,6 +716,17 @@ class SpotifyWorker: track_name = item['name'] artist_name = item.get('artist', '') + # Issue #501: honor manual matches (see _process_album_individual). + if honor_stored_match( + db=self.db, entity_table='tracks', entity_id=track_id, + id_column='spotify_track_id', + client_fetch_fn=self.client.get_track_details, + on_match_fn=self._refresh_track_via_stored_id, + log_prefix='Spotify', + ): + self.stats['matched'] += 1 + return + query = f"{artist_name} {track_name}" if artist_name else track_name results = self.client.search_tracks(query, limit=5) diff --git a/core/tidal_worker.py b/core/tidal_worker.py index fa578323..a9f7d41c 100644 --- a/core/tidal_worker.py +++ b/core/tidal_worker.py @@ -8,6 +8,7 @@ from utils.logging_config import get_logger from database.music_database import MusicDatabase from core.tidal_client import TidalClient from core.worker_utils import interruptible_sleep +from core.enrichment.manual_match_honoring import honor_stored_match logger = get_logger("tidal_worker") @@ -435,12 +436,30 @@ class TidalWorker: self.stats['not_found'] += 1 logger.debug(f"No match for artist '{artist_name}'") + def _refresh_album_via_stored_id(self, album_id, stored_id, full_album_dict): + """Issue #501 callback. Stored ID exists → fetched full Tidal + album → call ``_update_album`` with the dict in both arg slots + (search-result and full-data shapes overlap on the fields we + need).""" + self._update_album(album_id, full_album_dict, full_album_dict) + + def _refresh_track_via_stored_id(self, track_id, stored_id, full_track_dict): + """Issue #501 callback for tracks — same pattern as albums.""" + self._update_track(track_id, full_track_dict, full_track_dict) + def _process_album(self, album_id: int, album_name: str, artist_name: str, item: Dict[str, Any]): """Process an album: search Tidal, verify, fetch full details, store metadata""" - existing_id = self._get_existing_id('album', album_id) - if existing_id: - logger.debug(f"Preserving existing Tidal ID for album '{album_name}': {existing_id}") - self._mark_status('album', album_id, 'matched') + # Issue #501: honor manual matches. Pre-fix this just marked + # status='matched' without refreshing metadata. Now goes + # through the full refresh path via the stored ID. + if honor_stored_match( + db=self.db, entity_table='albums', entity_id=album_id, + id_column='tidal_id', + client_fetch_fn=self.client.get_album, + on_match_fn=self._refresh_album_via_stored_id, + log_prefix='Tidal', + ): + self.stats['matched'] += 1 return result = self.client.search_album(artist_name, album_name) @@ -486,10 +505,15 @@ class TidalWorker: def _process_track(self, track_id: int, track_name: str, artist_name: str, item: Dict[str, Any]): """Process a track: search Tidal, verify, fetch full details, store metadata""" - existing_id = self._get_existing_id('track', track_id) - if existing_id: - logger.debug(f"Preserving existing Tidal ID for track '{track_name}': {existing_id}") - self._mark_status('track', track_id, 'matched') + # Issue #501: honor manual matches. + if honor_stored_match( + db=self.db, entity_table='tracks', entity_id=track_id, + id_column='tidal_id', + client_fetch_fn=self.client.get_track, + on_match_fn=self._refresh_track_via_stored_id, + log_prefix='Tidal', + ): + self.stats['matched'] += 1 return result = self.client.search_track(artist_name, track_name) diff --git a/tests/enrichment/__init__.py b/tests/enrichment/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/tests/enrichment/test_manual_match_honoring.py b/tests/enrichment/test_manual_match_honoring.py new file mode 100644 index 00000000..c6031461 --- /dev/null +++ b/tests/enrichment/test_manual_match_honoring.py @@ -0,0 +1,302 @@ +"""Tests for ``core.enrichment.manual_match_honoring.honor_stored_match``. + +The helper is the shared "fast-path enrichment via stored source ID" +used by every per-source enrichment worker (Spotify / iTunes / Deezer +/ Discogs / MusicBrainz / AudioDB / Tidal / Qobuz). It reads the +stored ID from a configurable column, fetches via a caller-supplied +client method, and invokes a caller-supplied update callback. Pin +the contract so per-worker wiring can rely on uniform semantics. + +Issue #501: enrichment workers were running fuzzy name search and +overwriting manually-set source IDs. This helper is the lift point — +all 8 workers will plug in the same way. +""" + +from __future__ import annotations + +import sqlite3 +from unittest.mock import MagicMock + +import pytest + +from core.enrichment.manual_match_honoring import honor_stored_match + + +# --------------------------------------------------------------------------- +# Fake DB fixture (just enough to exercise _read_id_column) +# --------------------------------------------------------------------------- + + +class _FakeConn: + """Minimal sqlite3.Connection-like that the helper can use.""" + + def __init__(self, real_conn): + self._real = real_conn + + def cursor(self): + return self._real.cursor() + + def close(self): + # Don't actually close — tests share the connection. + pass + + +class _FakeDB: + """Stand-in MusicDatabase. Supports ``_get_connection()`` returning + a wrapper that doesn't close, so per-test in-memory state survives + across helper invocations.""" + + def __init__(self): + self._conn = sqlite3.connect(":memory:") + self._conn.row_factory = sqlite3.Row + cur = self._conn.cursor() + cur.execute(""" + CREATE TABLE albums ( + id INTEGER PRIMARY KEY, + spotify_album_id TEXT, + deezer_id TEXT, + itunes_album_id TEXT + ) + """) + cur.execute(""" + CREATE TABLE tracks ( + id INTEGER PRIMARY KEY, + spotify_track_id TEXT, + deezer_id TEXT + ) + """) + self._conn.commit() + + def insert_album(self, album_id, **id_columns): + cols = ['id'] + list(id_columns.keys()) + placeholders = ','.join('?' for _ in cols) + values = [album_id] + list(id_columns.values()) + self._conn.execute( + f"INSERT INTO albums ({','.join(cols)}) VALUES ({placeholders})", + values, + ) + self._conn.commit() + + def insert_track(self, track_id, **id_columns): + cols = ['id'] + list(id_columns.keys()) + placeholders = ','.join('?' for _ in cols) + values = [track_id] + list(id_columns.values()) + self._conn.execute( + f"INSERT INTO tracks ({','.join(cols)}) VALUES ({placeholders})", + values, + ) + self._conn.commit() + + def _get_connection(self): + return _FakeConn(self._conn) + + +@pytest.fixture +def db(): + return _FakeDB() + + +# --------------------------------------------------------------------------- +# Stored-ID fast path (the new behavior) +# --------------------------------------------------------------------------- + + +def test_honors_stored_id_when_present(db): + """Pin: stored ID found → fetch → on_match called → returns True. + Caller skips its search-by-name flow.""" + db.insert_album(42, spotify_album_id='SP-ABC') + api_payload = {'id': 'SP-ABC', 'name': 'Real Album'} + fetch = MagicMock(return_value=api_payload) + on_match = MagicMock() + + result = honor_stored_match( + db=db, entity_table='albums', entity_id=42, + id_column='spotify_album_id', + client_fetch_fn=fetch, on_match_fn=on_match, + log_prefix='Spotify', + ) + + assert result is True + fetch.assert_called_once_with('SP-ABC') + on_match.assert_called_once_with(42, 'SP-ABC', api_payload) + + +def test_returns_false_when_no_stored_id(db): + """Pin: no stored ID → returns False, no fetch attempted, no + callback. Caller proceeds with its search-by-name fallback.""" + db.insert_album(42, spotify_album_id=None) + fetch = MagicMock() + on_match = MagicMock() + + result = honor_stored_match( + db=db, entity_table='albums', entity_id=42, + id_column='spotify_album_id', + client_fetch_fn=fetch, on_match_fn=on_match, + log_prefix='Spotify', + ) + + assert result is False + fetch.assert_not_called() + on_match.assert_not_called() + + +def test_returns_false_when_stored_id_empty_string(db): + """Pin: empty string treated same as NULL — no fetch, return False.""" + db.insert_album(42, spotify_album_id='') + fetch = MagicMock() + on_match = MagicMock() + + result = honor_stored_match( + db=db, entity_table='albums', entity_id=42, + id_column='spotify_album_id', + client_fetch_fn=fetch, on_match_fn=on_match, + ) + + assert result is False + fetch.assert_not_called() + + +def test_returns_false_when_entity_not_in_db(db): + """Pin: missing row → returns False, no fetch, no callback.""" + fetch = MagicMock() + on_match = MagicMock() + + result = honor_stored_match( + db=db, entity_table='albums', entity_id=999, + id_column='spotify_album_id', + client_fetch_fn=fetch, on_match_fn=on_match, + ) + + assert result is False + fetch.assert_not_called() + + +# --------------------------------------------------------------------------- +# Failure paths — fall through to search instead of crashing the worker +# --------------------------------------------------------------------------- + + +def test_returns_false_when_fetch_raises(db): + """Pin: client.get_X(stored_id) raises → caught, logged at warning, + returns False so caller falls through to search. Worker must not + crash on a transient API failure.""" + db.insert_album(42, spotify_album_id='SP-ABC') + fetch = MagicMock(side_effect=RuntimeError("API down")) + on_match = MagicMock() + + result = honor_stored_match( + db=db, entity_table='albums', entity_id=42, + id_column='spotify_album_id', + client_fetch_fn=fetch, on_match_fn=on_match, + ) + + assert result is False + on_match.assert_not_called() + + +def test_returns_false_when_fetch_returns_none(db): + """Pin: stored ID points at a removed/invalid catalog entry → + fetch returns None → falls through to search instead of writing + junk to DB.""" + db.insert_album(42, spotify_album_id='SP-DEAD') + fetch = MagicMock(return_value=None) + on_match = MagicMock() + + result = honor_stored_match( + db=db, entity_table='albums', entity_id=42, + id_column='spotify_album_id', + client_fetch_fn=fetch, on_match_fn=on_match, + ) + + assert result is False + on_match.assert_not_called() + + +def test_returns_false_when_fetch_returns_empty_dict(db): + """Pin: empty dict treated same as None — falsy result skips + callback.""" + db.insert_album(42, spotify_album_id='SP-EMPTY') + fetch = MagicMock(return_value={}) + on_match = MagicMock() + + result = honor_stored_match( + db=db, entity_table='albums', entity_id=42, + id_column='spotify_album_id', + client_fetch_fn=fetch, on_match_fn=on_match, + ) + + assert result is False + on_match.assert_not_called() + + +def test_on_match_exceptions_propagate(db): + """Pin: exceptions inside on_match (DB write errors) propagate to + the worker — they're real errors the worker should surface, not + swallowed silently.""" + db.insert_album(42, spotify_album_id='SP-ABC') + fetch = MagicMock(return_value={'id': 'SP-ABC'}) + on_match = MagicMock(side_effect=ValueError("bad write")) + + with pytest.raises(ValueError, match="bad write"): + honor_stored_match( + db=db, entity_table='albums', entity_id=42, + id_column='spotify_album_id', + client_fetch_fn=fetch, on_match_fn=on_match, + ) + + +# --------------------------------------------------------------------------- +# Per-table / per-column wiring +# --------------------------------------------------------------------------- + + +def test_works_with_tracks_table(db): + """Pin: ``entity_table='tracks'`` works the same as 'albums' — the + helper is generic across both.""" + db.insert_track(7, spotify_track_id='SP-T-1') + fetch = MagicMock(return_value={'id': 'SP-T-1'}) + on_match = MagicMock() + + result = honor_stored_match( + db=db, entity_table='tracks', entity_id=7, + id_column='spotify_track_id', + client_fetch_fn=fetch, on_match_fn=on_match, + ) + + assert result is True + fetch.assert_called_once_with('SP-T-1') + + +def test_works_with_alternate_columns(db): + """Pin: ``id_column`` is configurable so each worker reads its own + column (deezer_id, itunes_album_id, etc).""" + db.insert_album(42, deezer_id='12345', itunes_album_id='IT-99') + fetch = MagicMock(return_value={'id': '12345'}) + on_match = MagicMock() + + # Read the deezer_id column even though spotify_album_id exists too. + result = honor_stored_match( + db=db, entity_table='albums', entity_id=42, + id_column='deezer_id', + client_fetch_fn=fetch, on_match_fn=on_match, + ) + + assert result is True + fetch.assert_called_once_with('12345') + + +def test_rejects_invalid_table_name(db): + """Pin: defensive — only known tables (albums/tracks/artists) + accepted. Avoids SQL injection via crafted table name even though + every caller is hard-coded.""" + fetch = MagicMock() + on_match = MagicMock() + + result = honor_stored_match( + db=db, entity_table='not_a_real_table; DROP TABLE albums', + entity_id=1, id_column='spotify_album_id', + client_fetch_fn=fetch, on_match_fn=on_match, + ) + + assert result is False + fetch.assert_not_called() diff --git a/webui/static/helper.js b/webui/static/helper.js index 878276fb..1a99afed 100644 --- a/webui/static/helper.js +++ b/webui/static/helper.js @@ -3432,6 +3432,7 @@ const WHATS_NEW = { '2.4.2': [ // --- post-2.4.1 dev work — entries hidden by _getLatestWhatsNewVersion until the build version bumps --- { date: 'Unreleased — 2.4.2 dev cycle' }, + { title: 'Fix: Enrich Honors Manual Album Matches', desc: 'github issue #501 (tacobell444): if you manually matched an album to a specific source ID via the match-chip UI, then clicked "enrich" on that album, the worker would search by name and overwrite your manual match with whatever the search returned (or revert status to "not_found" if it found nothing). reorganize then read the now-wrong id and moved files to the wrong destination. fix: extracted a shared `core/enrichment/manual_match_honoring.py` helper. every per-source enrichment worker (spotify / itunes / deezer / tidal / qobuz) now reads its stored id column at the top of `_process_*_individual` — if present, it fetches via `client.get_album(stored_id)` directly and refreshes metadata without touching the id. fuzzy name search only runs as fallback for never-matched entities. discogs / audiodb / musicbrainz already had inline stored-id fast paths and are left alone. lastfm / genius are name-based and don\'t store ids. cin-shape lift: same fix in 5 workers gets exactly one helper, per-worker variability (column name, client method, response shape) plugs in via callbacks. 11 new helper tests pin: stored-id fast-path, no-id fallthrough, fetch-failure fallthrough, table/column whitelist, callback contract.', page: 'library' }, { title: 'Fix: "no such table: hifi_instances" When Adding HiFi Instance', desc: 'github issue #503 (hadshaw21): adding a hifi instance via downloader settings popped up `no such table: hifi_instances` even though the connection test and "check all instances" both worked. root cause: `_initialize_database` runs every CREATE TABLE + every migration step inside one sqlite transaction. python\'s sqlite3 module doesn\'t autocommit DDL by default, so if any later migration step throws on a user\'s specific DB shape (e.g. an old volume from a prior soulsync version with quirky schema state), the WHOLE batch rolls back — including the hifi_instances CREATE that ran successfully. user\'s next boot retries init, hits the same migration failure, rolls back again. table never lands. fix: defensive lazy-create. every hifi_instances CRUD method now runs `CREATE TABLE IF NOT EXISTS hifi_instances (...)` immediately before its operation. idempotent — costs one PRAGMA-level no-op when the table is already present, fully recovers from a broken init. read methods (`get_hifi_instances`, `get_all_hifi_instances`) now return empty instead of raising when init failed. write methods (`add`, `remove`, `toggle`, `reorder`, `seed`) work end-to-end. doesn\'t paper over the underlying init issue (still worth tracking down which migration breaks for which users) but makes hifi instance management self-healing. 7 new tests pin the lazy-create behavior — every method works against a DB that\'s missing the table.', page: 'settings' }, { title: 'Plex: "All Libraries (Combined)" Mode', desc: 'github issue #505 (popebruhlxix): users with multiple plex music libraries (e.g. one per plex home user) only saw one library inside soulsync because the connection settings forced you to pick a single library section. now there\'s a new "all libraries (combined)" option in settings → connections → plex → music library dropdown. picking it flips the plex client into a server-wide read mode where every read method (`get_all_artists` / `get_all_album_ids` / `search_tracks` / `get_library_stats` / etc) dispatches through `server.library.search(libtype=...)` instead of querying a single section. one api call, plex handles the aggregation. cross-section dedup applied at the listing layer — same-name artists across sections collapse to a canonical entry (the one with more tracks), so plex home families with overlapping music tastes don\'t see "drake" twice. removal-detection id enumeration stays raw on purpose — deduping there would falsely prune tracks linked to non-canonical ratingKeys. write methods (genre / poster / metadata updates) are unaffected and operate on plex objects via ratingKey directly — write-back targets one section\'s copy of an artist if it exists in multiple, document and revisit if it matters. trigger_library_scan + is_library_scanning fan out across every music section in the new mode. backward compatible — existing users with a real library name saved see no behavior change. the "all libraries" option only appears in the dropdown when more than one music library exists on the server. 29 new tests pin both modes (single-section preserved, all-libraries dispatches through server-wide search, dedup keeps canonical, id enumeration stays raw).', page: 'settings' }, { title: 'Fix: Download Discography Showed Wrong Artist\'s Albums', desc: 'clicked download discography on 50 cent → modal showed young hot rod\'s albums. clicked weird al → modal showed the beatles. cause: the endpoint received whichever single artist id the frontend happened to pick (spotify or itunes or deezer or library db id) and dispatched it as-is to whichever source it queried. when the picked id didn\'t match the queried source\'s id format, lookup either returned wrong-artist results (numeric collisions — db id 194687 was a real deezer artist for someone else) or fell back to a fuzzy name search that picked a wrong artist. the per-source id dispatch mechanism (`MetadataLookupOptions.artist_source_ids`) already existed and the watchlist scanner already used it; the on-demand discography endpoint just wasn\'t wired to it. fixed: when the url artist_id matches a library row by ANY stored id (db id, spotify_artist_id, itunes_artist_id, deezer_id, musicbrainz_id), backend pulls every stored provider id and dispatches the right id to each source. each source gets its OWN stored id regardless of what the url carries. when the url id is a non-library source-native id and the row lookup misses entirely, behavior is identical to before (single-id fallback). also fixed two log-namespace bugs: enhance quality and multi-source search were writing through `getLogger(__name__)` which resolves to `core.artists.quality` / `core.metadata.multi_source_search` — neither under the soulsync handler — so every diagnostic line was silently dropped. switched both to `get_logger()` from utils.logging_config so they actually land in app.log.', page: 'library' }, @@ -3778,6 +3779,19 @@ const WHATS_NEW = { // Section shape: { title, description, features: [bullet strings], // usage_note?: 'optional hint shown at the bottom' } const VERSION_MODAL_SECTIONS = [ + { + title: "Enrich Now Honors Manual Album Matches", + description: "github issue #501 (tacobell444): manually matching an album then clicking enrich would overwrite your manual match with whatever the worker\'s name-search returned, or revert status to \"not found\". reorganize then read the wrong id and moved files to the wrong destination.", + features: [ + "• every per-source enrichment worker (spotify / itunes / deezer / tidal / qobuz) now reads its stored id column at the top of `_process_*_individual` — if present, fetch directly via that id and refresh metadata without touching the id", + "• fuzzy name search only runs as fallback for entities that have never been matched", + "• discogs / audiodb / musicbrainz already had inline stored-id fast paths and are left alone — same correct behavior, just inline", + "• lastfm / genius are name-based and don\'t store ids — no-op for those", + "• cin-shape lift: same fix in 5 workers gets exactly one shared helper at `core/enrichment/manual_match_honoring.py`, per-worker variability (column name / client method / response shape) plugs in via callbacks", + "• reorganize fixed indirectly — it always honored stored ids correctly, the bug was upstream in enrich corrupting the id", + ], + usage_note: "no settings to change — applies on next click of enrich on a manually-matched album or track", + }, { title: "HiFi Instance Add No Longer Errors With \"no such table\"", description: "github issue #503 (hadshaw21): adding a hifi instance from downloader settings popped up `no such table: hifi_instances` even when the connection test passed.",