SoulSync

History

Broque Thomas 34ba26f5c8 Persist source IDs at download time + backfill onto tracks on sync Followup to fix/watchlist-external-id-match. The companion PR closed the demand side — the watchlist scanner asks for tracks by external IDs before falling back to fuzzy. But for users on Plex / Jellyfin / Navidrome the supply side was still broken: tracks.spotify_track_id (and the other ID columns) only got populated by the asynchronous enrichment workers, sometimes hours after the file was actually written. During that window the ID match fell through to fuzzy and the bug returned. We were already collecting every ID during post-processing — they live in the `pp` dict in core/metadata/source.py:embed_source_ids and get embedded into file tags. We just dropped the in-memory copy afterwards. This PR persists them and uses them: - Schema migration adds spotify_track_id / itunes_track_id / deezer_track_id / tidal_track_id / qobuz_track_id / musicbrainz_recording_id / audiodb_id / soul_id / isrc columns + indexes to the existing track_downloads table (already keyed by file_path). - core/metadata/source.py:embed_source_ids exposes pp["id_tags"] and the resolved ISRC back to the import context as _embedded_id_tags / _isrc. - core/imports/side_effects.py:record_download_provenance reads those context fields and passes them to db.record_track_download, which now accepts the new ID kwargs and persists them. - New db.get_provenance_by_file_path with exact + basename-suffix fallback (handles container mount-root differences between download-time path and media-server-reported path). - New db.backfill_track_external_ids_from_provenance copies IDs from track_downloads onto a tracks row idempotently — COALESCE on every column preserves any value the enrichment worker already wrote (enrichment is more authoritative for late binding). - database/music_database.py:insert_or_update_media_track (the single insertion point used by every Plex / Jellyfin / Navidrome sync) calls the backfill immediately after each INSERT/UPDATE. - New core/library/track_identity.py:find_provenance_by_external_id used as a second-tier fallback in watchlist_scanner.is_track_missing _from_library — catches the window between download and media-server sync. Caller checks os.path.exists on the provenance file_path before treating it as "already in library" so a deleted file doesn't prevent re-download. Effect: freshly downloaded files become ID-recognizable to the watchlist on the very next scan, no enrichment-wait window. 19 regression tests in tests/test_provenance_id_persistence.py: - Schema migration adds expected columns + indexes - record_track_download persists every ID kwarg - record_track_download backward-compat (old kwargs still work) - get_provenance_by_file_path: exact match, basename fallback for mount-root differences, multi-record latest-wins, defensive None - backfill: copies all IDs, preserves existing via COALESCE, no-op when no provenance exists - find_provenance_by_external_id: per-ID lookup, ISRC cross-bridge, OR semantics, latest-wins on multiple matches Out of scope: backfilling provenance for files downloaded BEFORE this PR (their track_downloads rows don't carry the new IDs). Those continue to wait for enrichment. Acceptable — only affects historical files; new downloads benefit immediately. Full pytest 1625 passed; ruff clean.	3 weeks ago
..
__init__.py	basic db structure	10 months ago
music_database.py	Persist source IDs at download time + backfill onto tracks on sync	3 weeks ago

Broque Thomas 34ba26f5c8 Persist source IDs at download time + backfill onto tracks on sync

Followup to fix/watchlist-external-id-match. The companion PR closed
the demand side — the watchlist scanner asks for tracks by external IDs
before falling back to fuzzy. But for users on Plex / Jellyfin /
Navidrome the supply side was still broken: tracks.spotify_track_id
(and the other ID columns) only got populated by the asynchronous
enrichment workers, sometimes hours after the file was actually
written. During that window the ID match fell through to fuzzy and
the bug returned.

We were already collecting every ID during post-processing — they
live in the `pp` dict in core/metadata/source.py:embed_source_ids and
get embedded into file tags. We just dropped the in-memory copy
afterwards.

This PR persists them and uses them:

- Schema migration adds spotify_track_id / itunes_track_id /
  deezer_track_id / tidal_track_id / qobuz_track_id /
  musicbrainz_recording_id / audiodb_id / soul_id / isrc columns +
  indexes to the existing track_downloads table (already keyed by
  file_path).
- core/metadata/source.py:embed_source_ids exposes pp["id_tags"] and
  the resolved ISRC back to the import context as _embedded_id_tags
  / _isrc.
- core/imports/side_effects.py:record_download_provenance reads those
  context fields and passes them to db.record_track_download, which
  now accepts the new ID kwargs and persists them.
- New db.get_provenance_by_file_path with exact + basename-suffix
  fallback (handles container mount-root differences between
  download-time path and media-server-reported path).
- New db.backfill_track_external_ids_from_provenance copies IDs
  from track_downloads onto a tracks row idempotently — COALESCE on
  every column preserves any value the enrichment worker already
  wrote (enrichment is more authoritative for late binding).
- database/music_database.py:insert_or_update_media_track (the
  single insertion point used by every Plex / Jellyfin / Navidrome
  sync) calls the backfill immediately after each INSERT/UPDATE.
- New core/library/track_identity.py:find_provenance_by_external_id
  used as a second-tier fallback in watchlist_scanner.is_track_missing
  _from_library — catches the window between download and media-server
  sync. Caller checks os.path.exists on the provenance file_path
  before treating it as "already in library" so a deleted file
  doesn't prevent re-download.

Effect: freshly downloaded files become ID-recognizable to the
watchlist on the very next scan, no enrichment-wait window.

19 regression tests in tests/test_provenance_id_persistence.py:
- Schema migration adds expected columns + indexes
- record_track_download persists every ID kwarg
- record_track_download backward-compat (old kwargs still work)
- get_provenance_by_file_path: exact match, basename fallback for
  mount-root differences, multi-record latest-wins, defensive None
- backfill: copies all IDs, preserves existing via COALESCE,
  no-op when no provenance exists
- find_provenance_by_external_id: per-ID lookup, ISRC cross-bridge,
  OR semantics, latest-wins on multiple matches

Out of scope: backfilling provenance for files downloaded BEFORE
this PR (their track_downloads rows don't carry the new IDs). Those
continue to wait for enrichment. Acceptable — only affects historical
files; new downloads benefit immediately.

Full pytest 1625 passed; ruff clean.

__init__.py

basic db structure

music_database.py

Persist source IDs at download time + backfill onto tracks on sync