Soulseek matched-download contexts populate `original_search_result`
with `artist` (singular string) and no `artists` list — the full
multi-artist array lives on `track_info` (the matched Spotify track
object). `extract_source_metadata` only read `original_search.artists`,
so the Soulseek path always fell through to the single-artist branch
and TPE1 ended up with the primary artist only. Deezer-direct
downloads were unaffected because their context populates
`original_search.artists` as a proper list.
Lifted artist resolution into a pure helper
`core/metadata/artist_resolution.py:resolve_track_artists` that walks
`original_search.artists` → `track_info.artists` → `artist_dict.name`
fallback chain. Normalizes mixed list-item shapes (Spotify-style
dicts, bare strings, anything else stringified) and drops empty
entries.
13 new tests pin the resolution order, fallback chain, mixed-shape
normalization, whitespace stripping, and empty/none handling. The
existing `_artists_list` no-fall-through test in
`test_multi_artist_tag_settings.py` was updated to reflect the new
contract (always populated; multi-value write still gated on
`len > 1`) plus a new regression test for the Soulseek shape.
Composes with the existing Deezer per-track upgrade (still fires when
single-artist + track_id available) and feat_in_title /
artist_separator settings (still drive the joined ARTIST string
downstream).
Two follow-ups to the multi-artist tag settings PR:
1. Deezer contributors upgrade — closes the "known limitation"
flagged in the prior commit. Deezer's `/search` endpoint only
returns the primary artist for each track; the full contributors
array (feat., remix collaborators, producers credited as artists)
lives on `/track/<id>` and gets parsed by `_build_enhanced_track`.
Without the upgrade Deezer-sourced tracks never got multi-artist
tags even with the right settings on.
Fix in `core/metadata/source.py`: when source==deezer AND the
search response had a single artist AND a track_id is available,
fetch full track details via `get_deezer_client().get_track_details`
and replace `all_artists` with the upgraded list.
- One extra API call per affected Deezer track
- Skipped when search already returned multiple (no-op fast path)
- Skipped for non-Deezer sources (Spotify/Tidal/iTunes search
responses already include all artists)
- Skipped when no track_id is available
- Defensive try/except: on /track/<id> failure (network error,
deezer client unavailable), fall through to the search-result
list — never lose the data we already had
2. Double-append guard hardened with a word-boundary regex.
Prior commit checked for `"feat." not in title.lower() and "(ft."
not in title.lower()` — too narrow. Source platforms produce
wildly different feat-marker conventions: "(feat. X)", "(Feat X)",
"(FEAT X)", "(Featuring X)", "[feat. X]", "ft. X" (no parens),
"FT. X", etc. Any of these as the SOURCE title would cause a
double-append: `"Track (Feat X) (feat. Y)"`.
Replaced with `re.search(r'\b(?:feat|feat\.|featuring|ft|ft\.)\b',
title, IGNORECASE)`. Word-boundary regex catches every common
variant. Substring matches like "Aftermath" containing `ft`
correctly fall through to the append path (pinned by a regression
test).
16 new tests (29 total in the file):
- 9 parametrized variants of the double-append guard
- 1 substring guard ("Aftermath")
- 6 Deezer upgrade scenarios (fires when expected, doesn't fire
for non-Deezer / multi-artist search / no track_id, defensive
fall-through on failure, no false-positive when /track/<id>
confirms single artist)
Full pytest 2727 passed.
Three settings on Settings → Metadata → Tags were partially or
completely unimplemented. Reporter (Netti93) traced each one.
(1) `write_multi_artist` only "worked" because of a never-populated
`_artists_list` field. `core/metadata/source.py` built
`metadata["artist"]` as a hardcoded ", "-joined string but never
assigned `metadata["_artists_list"]`. `core/metadata/enrichment.py`
line 107 reads that field and gates the multi-value tag write
on `len(_artists_list) > 1` — always saw an empty list, silently
no-op'd the write.
(2) `artist_separator` (default ", ") was referenced in the UI +
settings.js save path but ZERO Python code read the value. Every
multi-artist track ended up with hardcoded ", " regardless of
what the user picked.
(3) `feat_in_title` (when true: pull featured artists into the title
as " (feat. X, Y)" and leave only primary in the ARTIST tag —
Picard convention) had no implementation at all.
Fix in source.py:
* Populate `_artists_list` from the search response's artists array
* Read `feat_in_title` and `artist_separator` configs
* When `feat_in_title=True` and >1 artist: ARTIST = primary only,
append "(feat. X, Y)" to title with double-append guard
* Else: ARTIST = artists joined with `artist_separator`
* Single-artist case unaffected by either setting
Double-append guard uses a word-boundary regex catching all common
"feat" variants source platforms produce — `feat`, `feat.`,
`featuring`, `ft`, `ft.` — case-insensitive. Substring matches
(e.g. "Aftermath" containing "ft") correctly fall through to the
append path.
Fix in enrichment.py ID3 branch:
* TPE1 stays as the display string (with separator or primary-only
per the user's settings)
* Multi-value list goes to a separate `TXXX:Artists` frame (Picard
convention) when `write_multi_artist` is on
* Pre-fix the ID3 path wrote TPE1 twice — single-string then list
— and the second `add` overwrote the first, clobbering both the
configured separator AND the feat_in_title semantics. Vorbis path
was already correct (separate "artist" + "artists" keys).
Known limitation (flagged in WHATS_NEW): Deezer's `/search` endpoint
only returns the primary artist. The full contributors array lives
on `/track/<id>`. Enrichment uses search-result data so Deezer-
sourced tracks may still get only the primary artist until a follow-
up commit wires the per-track contributors fetch into the enrichment
flow. Spotify, Tidal, and iTunes search responses include all
artists so they work now.
23 new tests in `tests/metadata/test_multi_artist_tag_settings.py`:
* `_artists_list` populated for multi/single/no-artist cases
* `artist_separator` drives ARTIST string (default ", " + custom
";" + custom "; " + " & ")
* Single-artist case unaffected by either setting
* `feat_in_title=True` pulls featured to title, leaves primary in
ARTIST
* `feat_in_title` no-op for single artist
* Double-append guard recognizes 9 source-title variants ("(feat.
X)", "(Feat. X)", "(FEAT X)", "(feat X)", "(Featuring X)",
"[feat. X]", "ft. X", "(ft X)", "FT. X")
* Substring guard test pins "Aftermath" doesn't false-positive
* Combined-settings precedence: feat_in_title wins ARTIST string
but `_artists_list` carries everyone for multi-value tag
Full pytest 2711 passed.
Discord report (Samuel [KC]): tracks of the same album sometimes carry
different MUSICBRAINZ_ALBUMID tags, which causes Navidrome (and other
media servers grouping by album MBID) to split the album into multiple
entries. Two-part fix — one for existing libraries, one for the root
cause that lets new imports drift.
Part 1 — Detector + fix action (catches existing dissenters):
`core/repair_jobs/mbid_mismatch_detector.py`:
- New helpers: `_read_album_mbid_from_file` and
`_write_album_mbid_to_file` use the Picard-standard tag conventions
(`TXXX:MusicBrainz Album Id` for MP3, `MUSICBRAINZ_ALBUMID` for
FLAC/OGG, `----:com.apple.iTunes:MusicBrainz Album Id` for MP4).
- New scan phase `_scan_album_mbid_consistency` runs after the
existing track-MBID scan: groups tracks by DB `album_id`, reads
each track's embedded album MBID, finds the consensus
(most-common) MBID via `Counter`, flags dissenters. Tracks without
an album MBID at all are skipped (they don't break Navidrome —
only an explicit MBID disagreement does). Albums where MBIDs are
perfectly tied (no clear consensus) are skipped too — surface as
a manual decision instead of fixing toward a 1/N tie.
- New finding type `album_mbid_mismatch` carries `consensus_mbid`,
`wrong_mbid`, `consensus_count`, `total_tracks_with_mbid`, and a
human-readable reason string.
`core/repair_worker.py`:
- Added `'album_mbid_mismatch': self._fix_album_mbid_mismatch` to the
fix dispatch dict and to the `fixable_types` tuple so auto-fix +
bulk-fix paths pick it up.
- New `_fix_album_mbid_mismatch` method reads `consensus_mbid` from
finding details, resolves the dissenter's file path via the shared
library resolver, calls `_write_album_mbid_to_file` to rewrite the
tag in place. Doesn't touch the album's other tracks (they're
already in agreement).
Part 2 — Root cause fix (prevents new SoulSync imports from drifting):
The original in-memory `mb_release_cache` in `core/metadata/source.py`
maps `(normalized_album, artist) -> release_mbid` so per-track
enrichment of the same album hits the cache and writes the same
MUSICBRAINZ_ALBUMID to every track. That cache is bounded (4096
entries) and in-process — so cache eviction (when other albums are
processed in between) and server restart can BOTH cause
inconsistency. Per-track album-name variation (e.g. some tracks
tagged `"Album"`, others tagged `"Album (Deluxe)"`) and per-track
artist variation (features) make it worse.
`core/metadata/album_mbid_cache.py` (new module):
- DB-backed `lookup(normalized_album, artist) -> release_mbid` and
`record(...)` functions. Same key shape as the in-memory cache.
- Strict additive design: every public function is wrapped in
try/except and degrades to None / no-op on ANY database error.
The existing in-memory cache + MusicBrainz lookup remains the
authoritative fallback. If this module breaks, downloads continue
exactly as they would today.
`database/music_database.py`:
- New `mb_album_release_cache` table with composite primary key
`(normalized_album_key, artist_key)`. Reverse-lookup index on
`release_mbid` for future debug tooling. Created via the existing
`CREATE TABLE IF NOT EXISTS` migration pattern — idempotent, no
schema version bump needed.
`core/metadata/source.py`:
- Surgical change inside the existing `embed_source_ids`
in-memory-cache-miss branch: BEFORE calling MusicBrainz, consult
the persistent cache. If a previous SoulSync run already resolved
this album's release MBID, reuse it. After a successful MB lookup,
store in BOTH caches. Both calls wrapped in defensive try/except
so any failure falls through to existing logic.
Tests:
- `tests/metadata/test_album_mbid_cache.py` — 16 cache tests:
round-trip, idempotent re-record, overwrite semantics, clear_all,
album+artist independence (no Greatest Hits collisions),
defensive None-on-empty-input, graceful degradation when the DB
is unavailable / connection raises / commit fails, schema sanity
(table + index exist after init).
- `tests/test_album_mbid_consistency.py` — 13 detector tests:
tag read/write round-trip on real FLAC files, Picard-standard tag
descriptors, defensive paths (unreadable file, empty input),
detector behavior (agreement → no flags, lone dissenter → flag,
ties → no flag, single-track albums → skipped, no-MBID tracks →
skipped, unresolvable file paths → skipped).
- `tests/metadata/test_metadata_enrichment.py` — added autouse
fixture monkeypatching the persistent cache to no-op for tests in
this file. The existing tests pin per-call MB counts and
in-memory cache state; without the fixture, persistent rows from
earlier tests would bypass the MB call. Persistent layer has its
own dedicated tests.
Verified: 1782 tests pass (29 new), ruff clean, smoke test confirms
end-to-end cache round-trip works.
WHATS_NEW entry under '2.4.2' dev cycle.
Followup to fix/watchlist-external-id-match. The companion PR closed
the demand side — the watchlist scanner asks for tracks by external IDs
before falling back to fuzzy. But for users on Plex / Jellyfin /
Navidrome the supply side was still broken: tracks.spotify_track_id
(and the other ID columns) only got populated by the asynchronous
enrichment workers, sometimes hours after the file was actually
written. During that window the ID match fell through to fuzzy and
the bug returned.
We were already collecting every ID during post-processing — they
live in the `pp` dict in core/metadata/source.py:embed_source_ids and
get embedded into file tags. We just dropped the in-memory copy
afterwards.
This PR persists them and uses them:
- Schema migration adds spotify_track_id / itunes_track_id /
deezer_track_id / tidal_track_id / qobuz_track_id /
musicbrainz_recording_id / audiodb_id / soul_id / isrc columns +
indexes to the existing track_downloads table (already keyed by
file_path).
- core/metadata/source.py:embed_source_ids exposes pp["id_tags"] and
the resolved ISRC back to the import context as _embedded_id_tags
/ _isrc.
- core/imports/side_effects.py:record_download_provenance reads those
context fields and passes them to db.record_track_download, which
now accepts the new ID kwargs and persists them.
- New db.get_provenance_by_file_path with exact + basename-suffix
fallback (handles container mount-root differences between
download-time path and media-server-reported path).
- New db.backfill_track_external_ids_from_provenance copies IDs
from track_downloads onto a tracks row idempotently — COALESCE on
every column preserves any value the enrichment worker already
wrote (enrichment is more authoritative for late binding).
- database/music_database.py:insert_or_update_media_track (the
single insertion point used by every Plex / Jellyfin / Navidrome
sync) calls the backfill immediately after each INSERT/UPDATE.
- New core/library/track_identity.py:find_provenance_by_external_id
used as a second-tier fallback in watchlist_scanner.is_track_missing
_from_library — catches the window between download and media-server
sync. Caller checks os.path.exists on the provenance file_path
before treating it as "already in library" so a deleted file
doesn't prevent re-download.
Effect: freshly downloaded files become ID-recognizable to the
watchlist on the very next scan, no enrichment-wait window.
19 regression tests in tests/test_provenance_id_persistence.py:
- Schema migration adds expected columns + indexes
- record_track_download persists every ID kwarg
- record_track_download backward-compat (old kwargs still work)
- get_provenance_by_file_path: exact match, basename fallback for
mount-root differences, multi-record latest-wins, defensive None
- backfill: copies all IDs, preserves existing via COALESCE,
no-op when no provenance exists
- find_provenance_by_external_id: per-ID lookup, ISRC cross-bridge,
OR semantics, latest-wins on multiple matches
Out of scope: backfilling provenance for files downloaded BEFORE
this PR (their track_downloads rows don't carry the new IDs). Those
continue to wait for enrichment. Acceptable — only affects historical
files; new downloads benefit immediately.
Full pytest 1625 passed; ruff clean.
User report: SoulSync was only pulling MusicBrainz genres from the
recording (track-level) endpoint. Most MB recordings don't carry genres
at the track level — they live on the release (album) or artist. So
the MB tier was contributing nothing to the genre merge for the
overwhelming majority of tracks.
Fix:
- Added `'genres'` to the release-detail `includes` (was missing).
- After release-detail processing, if pp['mb_genres'] is still empty,
populate from release_detail['genres'] (sorted by count desc).
- If still empty AND artist_mbid is set, fetch artist with
`includes=['genres']` and use those.
No extra API call when the recording (or release) already had genres —
the artist fetch only fires when both upstream tiers came back empty.
The downstream genre merge in _embed_metadata_genres is unchanged; this
just makes the MB feed into it richer.
Tests: 4 new (recording present, recording empty → release, recording
+ release empty → artist, all empty → []). Full suite 873 passing.
Ruff clean.
Reported by @kcaoyef421 in Discord.
- Normalize album import track display handling so queue labels and match rows stay consistent
- Bound MusicBrainz caches and avoid caching transient lookup failures
- Stop swallowing programmer errors in source enrichment helpers
- Restore import config test seams without reintroducing lazy imports
- Guard task completion calls and fix the Windows path test expectation
- Keep file lock tracking from growing without bound
- Relocate the shared metadata helper module from core/metadata_common.py into core/metadata/common.py.
- Update the new metadata package, the import pipeline, and the web entrypoint to use the package-scoped helper.
- Keep the shared config, mutagen, file-lock, and tag-writing helpers centralized without touching unrelated files.
- Pass the live runtime bundle into the shared metadata facade so worker-backed source enrichment can actually run.
- Forward runtime from the import pipeline and web-server wrapper into embed_source_ids.
- Add a regression test that verifies the runtime object reaches the source-ID embedding path.
- Keep existing metadata_cache and metadata_service at the top level for now
- Move the new branch-local metadata helpers under core/metadata
- Share MusicBrainz release cache state from core.metadata.source and update import sites