Followup to fix/watchlist-external-id-match. The companion PR closed
the demand side — the watchlist scanner asks for tracks by external IDs
before falling back to fuzzy. But for users on Plex / Jellyfin /
Navidrome the supply side was still broken: tracks.spotify_track_id
(and the other ID columns) only got populated by the asynchronous
enrichment workers, sometimes hours after the file was actually
written. During that window the ID match fell through to fuzzy and
the bug returned.
We were already collecting every ID during post-processing — they
live in the `pp` dict in core/metadata/source.py:embed_source_ids and
get embedded into file tags. We just dropped the in-memory copy
afterwards.
This PR persists them and uses them:
- Schema migration adds spotify_track_id / itunes_track_id /
deezer_track_id / tidal_track_id / qobuz_track_id /
musicbrainz_recording_id / audiodb_id / soul_id / isrc columns +
indexes to the existing track_downloads table (already keyed by
file_path).
- core/metadata/source.py:embed_source_ids exposes pp["id_tags"] and
the resolved ISRC back to the import context as _embedded_id_tags
/ _isrc.
- core/imports/side_effects.py:record_download_provenance reads those
context fields and passes them to db.record_track_download, which
now accepts the new ID kwargs and persists them.
- New db.get_provenance_by_file_path with exact + basename-suffix
fallback (handles container mount-root differences between
download-time path and media-server-reported path).
- New db.backfill_track_external_ids_from_provenance copies IDs
from track_downloads onto a tracks row idempotently — COALESCE on
every column preserves any value the enrichment worker already
wrote (enrichment is more authoritative for late binding).
- database/music_database.py:insert_or_update_media_track (the
single insertion point used by every Plex / Jellyfin / Navidrome
sync) calls the backfill immediately after each INSERT/UPDATE.
- New core/library/track_identity.py:find_provenance_by_external_id
used as a second-tier fallback in watchlist_scanner.is_track_missing
_from_library — catches the window between download and media-server
sync. Caller checks os.path.exists on the provenance file_path
before treating it as "already in library" so a deleted file
doesn't prevent re-download.
Effect: freshly downloaded files become ID-recognizable to the
watchlist on the very next scan, no enrichment-wait window.
19 regression tests in tests/test_provenance_id_persistence.py:
- Schema migration adds expected columns + indexes
- record_track_download persists every ID kwarg
- record_track_download backward-compat (old kwargs still work)
- get_provenance_by_file_path: exact match, basename fallback for
mount-root differences, multi-record latest-wins, defensive None
- backfill: copies all IDs, preserves existing via COALESCE,
no-op when no provenance exists
- find_provenance_by_external_id: per-ID lookup, ISRC cross-bridge,
OR semantics, latest-wins on multiple matches
Out of scope: backfilling provenance for files downloaded BEFORE
this PR (their track_downloads rows don't carry the new IDs). Those
continue to wait for enrichment. Acceptable — only affects historical
files; new downloads benefit immediately.
Full pytest 1625 passed; ruff clean.
User report: SoulSync was only pulling MusicBrainz genres from the
recording (track-level) endpoint. Most MB recordings don't carry genres
at the track level — they live on the release (album) or artist. So
the MB tier was contributing nothing to the genre merge for the
overwhelming majority of tracks.
Fix:
- Added `'genres'` to the release-detail `includes` (was missing).
- After release-detail processing, if pp['mb_genres'] is still empty,
populate from release_detail['genres'] (sorted by count desc).
- If still empty AND artist_mbid is set, fetch artist with
`includes=['genres']` and use those.
No extra API call when the recording (or release) already had genres —
the artist fetch only fires when both upstream tiers came back empty.
The downstream genre merge in _embed_metadata_genres is unchanged; this
just makes the MB feed into it richer.
Tests: 4 new (recording present, recording empty → release, recording
+ release empty → artist, all empty → []). Full suite 873 passing.
Ruff clean.
Reported by @kcaoyef421 in Discord.
- Normalize album import track display handling so queue labels and match rows stay consistent
- Bound MusicBrainz caches and avoid caching transient lookup failures
- Stop swallowing programmer errors in source enrichment helpers
- Restore import config test seams without reintroducing lazy imports
- Guard task completion calls and fix the Windows path test expectation
- Keep file lock tracking from growing without bound
- Relocate the shared metadata helper module from core/metadata_common.py into core/metadata/common.py.
- Update the new metadata package, the import pipeline, and the web entrypoint to use the package-scoped helper.
- Keep the shared config, mutagen, file-lock, and tag-writing helpers centralized without touching unrelated files.
- Pass the live runtime bundle into the shared metadata facade so worker-backed source enrichment can actually run.
- Forward runtime from the import pipeline and web-server wrapper into embed_source_ids.
- Add a regression test that verifies the runtime object reaches the source-ID embedding path.
- Keep existing metadata_cache and metadata_service at the top level for now
- Move the new branch-local metadata helpers under core/metadata
- Share MusicBrainz release cache state from core.metadata.source and update import sites