SoulSync

History

Broque Thomas ecb8939c80 Match library tracks by external IDs before fuzzy in watchlist scan Reported case (CAL): a track already on disk got re-downloaded by the watchlist scanner on every scan. Library DB had stale album metadata for the file (track tagged on album "Left Alone") while the metadata source reported it on a different album ("NPC" single). The title+artist+album fuzzy block correctly said the album names didn't match and declared the track missing — but the file's stable external IDs (Spotify ID, ISRC, etc.) unambiguously identified it as the same recording. The earlier compilation-album fix (PR #461) handled qualifier drift ("OST" vs "Music From The Motion Picture"). This case is two genuinely different album names referring to the same song. Fix: provider-neutral external-ID short-circuit before the fuzzy block in `is_track_missing_from_library`. Pulls every recognized ID off the source track (Spotify / iTunes / Deezer / Tidal / Qobuz / MusicBrainz / AudioDB / Hydrabase / ISRC), runs a single SELECT against the indexed external-ID columns on the `tracks` table, and treats any hit as "track exists in library — don't re-download". If no IDs are available (older imports without enrichment, library scans that didn't populate external IDs), falls through to the existing fuzzy logic so the safety net stays intact. New `core/library/track_identity.py` module with two helpers: - `extract_external_ids(track)`: handles dict and object-style track shapes, direct-field aliases (spotify_id / spotify_track_id / SPOTIFY_TRACK_ID), and provider-disambiguated native `id` fields (when track has `provider='deezer'` and `id='X'`, treats X as a Deezer ID). - `find_library_track_by_external_id(db, external_ids, server_source)`: builds an OR of indexed column matches with IS NOT NULL guards, optional server_source filter that also passes legacy NULL rows, single-row LIMIT. ISRC bridges across providers — a library track imported via Deezer can be matched against a Spotify scan when both sides carry the same ISRC. 43 regression tests in `tests/test_library_track_identity.py`: - 9 ID-extraction tests for direct fields (Spotify / iTunes / Deezer / ISRC / MBID / AudioDB / Hydrabase) - 8 ID-extraction tests via the provider field (8 providers + source alias + missing-provider-ignored) - 7 mixed/defensive tests (multiple IDs, object-style, empty strings, None track, numeric coercion) - 8 lookup tests (per-provider + ISRC cross-bridge) - 3 OR-semantics tests - 4 server_source filter tests - 2 ID-column-map sanity tests Full pytest 1606 passed; ruff clean.	4 weeks ago
..
static	Match library tracks by external IDs before fuzzy in watchlist scan	4 weeks ago
index.html	Hide dashboard status placeholders until ready	4 weeks ago

Broque Thomas ecb8939c80 Match library tracks by external IDs before fuzzy in watchlist scan

Reported case (CAL): a track already on disk got re-downloaded by the
watchlist scanner on every scan. Library DB had stale album metadata
for the file (track tagged on album "Left Alone") while the metadata
source reported it on a different album ("NPC" single). The
title+artist+album fuzzy block correctly said the album names didn't
match and declared the track missing — but the file's stable external
IDs (Spotify ID, ISRC, etc.) unambiguously identified it as the same
recording.

The earlier compilation-album fix (PR #461) handled qualifier drift
("OST" vs "Music From The Motion Picture"). This case is two
genuinely different album names referring to the same song.

Fix: provider-neutral external-ID short-circuit before the fuzzy
block in `is_track_missing_from_library`. Pulls every recognized ID
off the source track (Spotify / iTunes / Deezer / Tidal / Qobuz /
MusicBrainz / AudioDB / Hydrabase / ISRC), runs a single SELECT
against the indexed external-ID columns on the `tracks` table, and
treats any hit as "track exists in library — don't re-download".

If no IDs are available (older imports without enrichment, library
scans that didn't populate external IDs), falls through to the
existing fuzzy logic so the safety net stays intact.

New `core/library/track_identity.py` module with two helpers:
- `extract_external_ids(track)`: handles dict and object-style track
  shapes, direct-field aliases (spotify_id / spotify_track_id /
  SPOTIFY_TRACK_ID), and provider-disambiguated native `id` fields
  (when track has `provider='deezer'` and `id='X'`, treats X as a
  Deezer ID).
- `find_library_track_by_external_id(db, external_ids,
  server_source)`: builds an OR of indexed column matches with
  IS NOT NULL guards, optional server_source filter that also
  passes legacy NULL rows, single-row LIMIT.

ISRC bridges across providers — a library track imported via Deezer
can be matched against a Spotify scan when both sides carry the
same ISRC.

43 regression tests in `tests/test_library_track_identity.py`:
- 9 ID-extraction tests for direct fields (Spotify / iTunes / Deezer /
  ISRC / MBID / AudioDB / Hydrabase)
- 8 ID-extraction tests via the provider field (8 providers + source
  alias + missing-provider-ignored)
- 7 mixed/defensive tests (multiple IDs, object-style, empty strings,
  None track, numeric coercion)
- 8 lookup tests (per-provider + ISRC cross-bridge)
- 3 OR-semantics tests
- 4 server_source filter tests
- 2 ID-column-map sanity tests

Full pytest 1606 passed; ruff clean.

static

Match library tracks by external IDs before fuzzy in watchlist scan

index.html

Hide dashboard status placeholders until ready