Expand matched MusicBrainz release groups into concrete releases for specific album searches so import users can choose the correct edition by track count, format, country, and disambiguation. Preserve distinct MusicBrainz release IDs instead of deduping same-title variants, carry release metadata through import matching, and surface those details on album result cards. Add coverage for variant preservation and release-group expansion.
Keep full refresh moving when post-clear VACUUM hits a transient disk I/O error, and retry clear_server_data once when the clear step itself sees the same transient SQLite failure.
Retry metadata cache maintenance writes once on transient disk I/O errors so first-attempt cache jobs do not fail when an immediate retry would succeed.
Tests cover best-effort VACUUM, clear retry behavior, and cache maintenance retry behavior.
Add a disk-backed image cache with hashed browser URLs, SQLite metadata, size/type validation, stale fallback, and per-image fetch locking. Route normalized artwork through /api/image-cache while keeping /api/image-proxy as a compatibility shim, and align browser max-age with the image cache TTL. Add focused tests for cache behavior and image URL normalization.
S-Bryce reported that for some artists (Vocaloid producers, JP indie
acts, niche Western indie) the artist detail page was missing whole
release-groups visible on musicbrainz.org. Downloaded tracks from
those release-groups appeared in artist track counts but were not
bound to any visible album / single card — orphan "ghost" tracks the
user couldn't browse to.
Two duplicated bugs fed each other:
1. `core/musicbrainz_search.py` browsed MB release-groups with
`release_types=['album', 'ep', 'single']`. MB's primary-type
vocabulary is {Album, Single, EP, Broadcast, Other} — music
videos, one-off web releases, and broadcast singles use Other.
Pre-fix the filter dropped them at the API layer.
2. Three sites duplicated the same "raw primary-type → internal
album_type" mapping with slightly different vocabularies and all
silently defaulted unknown values (including 'Other') to 'album':
core/musicbrainz_search.py `_map_release_type`
core/metadata/types.py inline `{single:single, ep:ep}.get(...)`
core/metadata/cache.py Deezer-specific record_type guard
Letting Other through the filter without a real mapper would have
placed music videos in the Albums view alongside LPs — visually
misleading.
Fix shape:
- New `core/metadata/release_type.py` — single canonical mapper
consumed by every provider's raw→Album projection. Knows the full
MB vocabulary including 'other' and 'broadcast'; routes both into
the singles bucket since they're functionally single-track
releases. Compilation secondary-type override preserved (MB's
canonical Greatest-Hits pattern is `primary=Album,
secondary=[Compilation]`).
- `core/musicbrainz_search.py` `_map_release_type` becomes a thin
alias for the new helper so the six internal call sites stay
intact. API filter gains 'other'.
- `core/metadata/types.py` Album projection drops its inline mini-
mapper and calls the canonical helper. Now also handles the
compilation secondary-type override it was previously missing.
- The Deezer-specific cache.py guard stays as-is — Deezer's
record_type vocabulary is closed (album|single|ep), not affected
by this issue.
Verified end-to-end against MB for S-Bryce's artist (`46196b9c-affa-
4616-b53b-e967c8bd70e0`, inabakumori): pre-fix returned 22 release-
groups; post-fix returns 27, with the 5 extra all landing in the
Singles section with album_type='single' as intended.
23 new unit tests pin the mapper contract (case-insensitive primary
types, compilation secondary override, Other/Broadcast → single,
unknown → album default preserved, defensive empty/None inputs).
2 new tests in test_musicbrainz_search pin the API filter inclusion
of 'other' and the round-trip into the Singles bucket. All 516
existing metadata tests still green — refactor leaves historical
behaviour for {album, ep, single, compilation} unchanged.
The Fix Track Match modal's auto-search was hardcoded to query only
Spotify -> Deezer -> iTunes, ignoring MusicBrainz entirely — even for
users with MB set as their primary metadata source. MB-niche recordings
(canonical entries with diacritics, fringe / non-mainstream tracks that
the commercial catalogues don't carry) had no chance.
Wiring:
- New `MusicBrainzSearchClient.search_tracks_with_artist(track, artist,
limit)` for surfaces that already have title + artist split. Uses MB's
bare-query mode (strict=False) — diacritic-folded, alias/sortname
indexed — same recall rationale as the earlier MBID-paste endpoint.
- New route `GET /api/musicbrainz/search_tracks` mirrors the existing
/api/{spotify,itunes,deezer}/search_tracks endpoints exactly: accepts
`track`+`artist` (or legacy `query`) + `limit`, returns
`{tracks: [{id, name, artists, album, duration_ms, image_url, source}]}`.
Applies the same `core.metadata.relevance.rerank_tracks` pass Deezer /
iTunes use, which is critical because MB's free-text scoring weighs
title-text matches heavily and would otherwise rank cover / tribute
recordings above the canonical version.
- `_search_tracks_text` gains a `min_score` parameter. The cascade path
passes 20 (vs the enhanced-search-tab default of 80) so MB recordings
whose title doesn't literally contain the artist name still enter the
candidate pool — without that, "Army of Me" + "Bjork" only surfaces
the HIRS Collective cover (score 100) and drops Björk's canonical
recording (score 28). The rerank pass then surfaces Björk by artist
match. Verified against real MB API: pre-fix returned only the cover;
post-fix top 5 are all Björk.
- Fix popup `allSources` array (wishlist-tools.js) gets MB appended.
The existing `activeIdx` reorder logic moves MB to the front when
it's the active primary; otherwise MB sits last (1 req/sec rate
limit makes it the slowest source).
7 new unit tests on the adapter: bare-query mode is used, missing
artist falls back to None (drops AND-clause), empty inputs short-circuit,
low-score candidates are kept for rerank to handle, default strict +
default min_score behaviour preserved for the existing search-tab path,
client errors are swallowed so the cascade falls through to the next
source.
Discogs intentionally absent — Discogs has no track-level search API
(see core/discogs_client.py:575 — returns []). Adding a Flask endpoint
that always returns empty would be a permanent no-op.
Power-user escape hatch on the Discovery Fix Track Match modal — when
fuzzy auto-search ranks the wrong recording among many same-title
versions (10 remasters, live cuts, alt sessions), paste the MusicBrainz
recording URL or bare UUID into the new field and resolve straight to
that record.
Layout:
- Shape adapter `get_recording_flat(mbid)` lives in
`core/musicbrainz_search.py` next to existing `get_track_details`.
Returns the flat Fix-popup track shape (artists as `string[]`,
album as string, single `image_url`) — distinct from the
Spotify-shaped nested dict `get_track_details` returns.
- New route `GET /api/musicbrainz/recording/<mbid>` is a thin wrapper:
validates MBID format with an anchored UUID regex, calls the adapter,
returns 400 / 404 / 200 with no inline shape massaging.
- Frontend `parseMusicBrainzMbid()` lives in `shared-helpers.js` —
pure URL/UUID parser, reusable from other surfaces (failed-MB cache,
manual match) without duplication.
- Fix modal HTML gets one new input row + button; existing search row
and result render pipeline are untouched. New `lookupDiscoveryFixByMbid()`
fetches the endpoint and feeds the single result through the existing
`renderDiscoveryFixResults` -> confirm-dialog -> match pipeline, so MB-
paste matches go through the exact same selection flow as auto-search
results.
- Enter-key bound on the MBID input via a separate handler ref so its
lifecycle matches the search-input handlers without conflating the
two submit targets.
7 unit tests cover the adapter: happy path, empty/None MBID, MB returns
None, recording-without-release (empty album), multi-artist credits,
includes-list contract, and client-error swallow.
Out of scope: the Fix popup's fuzzy cascade is still hardcoded to
spotify/deezer/itunes regardless of which primary source the user has
configured. Adding MB to that cascade (when MB is the active primary)
is a separate concern.
Two bugs surfacing on the Fix popup and enhanced-search MB tab:
1. Strict Lucene phrase queries (`recording:"X" AND artist:"Y"`) killed
recall on user-facing manual search — diacritics ("Bjork" vs canonical
"Björk"), bracketed suffixes like "(Live)", and any AND-clause
mismatch returned zero results. Added `strict: bool = True` param to
`search_release` / `search_recording`; when False, sends a bare query
joining title + artist so MB hits alias/sortname indexes with
diacritic folding. `/api/musicbrainz/search` (Fix popup) and
`core/library/service_search.py` (service tabs) now pass strict=False.
Enrichment workers stay on strict mode — precision matters there
because they auto-accept the top hit above a confidence threshold.
2. Every MB album click was silently 404-ing — `_render_release_as_album`
passed `cover-art-archive` as an MB `inc` param, but it's not a valid
include for the /release resource (MB rejects with 400). The CAA flags
come back on every release response by default, so dropping the bad
include preserves the image-scope picker logic intact.
Register MusicBrainz as a first-class metadata source alongside Deezer, iTunes, Spotify, Discogs, and Hydrabase. Expose the shared client through metadata services, add the settings option, and expand the MusicBrainz search adapter with source-compatible artist, album, track, and detail methods.
Carry MusicBrainz IDs through similar-artist discovery, recommended artists, artist map serialization, and personalized playlist selection. Update DB migrations and lookup filters so similar_artist_musicbrainz_id is preserved on older schemas and used for source requirements and library exclusion.
Normalize MusicBrainz album adapter output for import context and add regression coverage for registry mapping, typed album conversion, and similar-artist filtering. Verified by user with 120 focused tests passing.
Artist detail pages previously always pushed /artist-detail to the URL,
so refreshing the page or sharing a link would drop users on a broken
empty page with no artist loaded.
URL format is now /artist-detail/:source/:id (e.g.
/artist-detail/spotify/4tZwfgrHOc3mvqsCAfo4LT or
/artist-detail/library/42). The source segment lets the backend
synthesize a response from the right metadata client without a DB hit.
Changes:
Client routing (legacy shell + TanStack bridge)
- buildArtistDetailPath / _getDeepLinkArtistDetail added to init.js;
parse both new :source/:id and legacy bare :id formats so old
bookmarks still work
- navigateToPage passes artistId + artistSource through to the router
bridge, which builds the dynamic href instead of hardcoding route.path
- resolveShellPageFromPath / resolveLegacyShellPageFromPath use a prefix
match so /artist-detail/* resolves to artist-detail page-id
- globals.d.ts typed for artistId / artistSource options
- activateLegacyPath and syncActivePageFromLocation (popstate) both
restore artist from URL using skipRouteChange:true to avoid a
re-navigation loop back to /artist-detail
- loadInitialData restores artist from URL on page load (router not yet
mounted at DOMContentLoaded so legacy path runs unconditionally)
- Same-artist guard in navigateToArtistDetail prevents double-fetch
when the router fires activateLegacyPath after the initial navigation
Server
- artist_source_detail.build_source_only_artist_detail now resolves
artist name from the source API when none is supplied, so deep-link
restores with an empty name string still render correctly
Tests
- test_spa_deep_linking: /artist-detail/42 and /artist-detail/spotify/ID
both serve index.html
- bridge.test.ts: source-aware URL building and library fallback
- route-manifest.test.ts: prefix path resolution
- artist_source_detail: name resolved from source when input is empty
Schema: ALTER TABLE artists ADD COLUMN amazon_id TEXT with index, added via
_add_amazon_columns migration called after Discogs in _run_migrations.
SOURCE_ID_FIELD: add "amazon" -> "amazon_id" entry. find_library_artist_for_
source now looks up Amazon artists by slug before falling back to name match,
same as every other source. artist_source_detail already stamps artist_info
[source_id_field] = artist_id so the amazon_id is set on source-only payloads.
Tests: add "amazon": "amazon_id" to EXPECTED_SOURCE_ID_FIELD; revert test
assertion back to strict equality (SOURCE_ONLY_ARTIST_SOURCES == SOURCE_ID_
FIELD.keys() holds again now that amazon has a column).
Library upgrade: find_library_artist_for_source returned None immediately for
Amazon because SOURCE_ID_FIELD has no 'amazon' entry (no DB column for Amazon
artist IDs). The name-based fallback was unreachable. Fix: only skip the column
query when column is None, not the whole function — name lookup now runs for
any source when artist_name + active_server are provided.
Artist images: add AmazonClient._get_artist_image_from_albums so the standard
_get_artist_image_from_source path in metadata/artist_image.py can call it as
a fallback (same hook iTunes/Deezer/Discogs expose). Searches by unslugified
artist name, matches primary artist, fetches album cover from album_metadata.
Test: updated test_source_only_set_matches_mapping_keys → _contains_all_mapped_
sources to assert subset (not equality) — SOURCE_ONLY_ARTIST_SOURCES intentionally
includes sources without a DB column that rely on name-only lookup.
Discord report (netti93): downloaded album tracks were tagged with
TRCK = "6/0" instead of "6/13" when source data was incomplete. The
retag tool wrote correct "6/13" because core/tag_writer.py already
handled the case.
Trace: core/metadata/enrichment.py:105 formatted unconditionally as
f"{track_number}/{total_tracks}" and many album-dict construction
sites pass total_tracks: 0 (per types.py, 0 means "unknown" — not a
real count). That 0 propagated straight to disk.
Fix at the consumer boundary so every album-dict constructor stays
unchanged. Lifted to pure helper
core/metadata/track_number_format.py:format_track_number_tag that
drops the /N suffix when total is 0 / None / negative — emits just
"6" instead. Matches retag's behavior + ID3 spec convention (TRCK
can be "N" or "N/M"). MP4 trkn tuple gets the same treatment via
format_track_number_tuple returning (6, 0) per spec's "unknown
total" marker.
Wired into all three format-write sites in enrichment.py: ID3 (TRCK),
Vorbis (tracknumber), MP4 (trkn). When source data has correct
total_tracks (album downloads via the metadata-source pipeline,
retag flow), behavior unchanged — still writes "6/13".
16 boundary tests pin every shape: known total / zero total / none
total / none track / zero track / negative inputs / string coercion
/ unparseable strings / floats truncate.
Full suite: 3113 passed.
Closes#588. Contributing-artist tagging worked for some tracks but
silently dropped them for others — most reproducibly when the album
had been fetched before the per-track post-process ran.
Trace: get_track_details cache check used `track_position in cached`
as the "full payload" sentinel. Both `/track/<id>` AND
`/album/<id>/tracks` set track_position. Only `/track/<id>` sets the
`contributors` array. When album-tracks data hit the cache first,
get_track_details returned the partial record →
_build_enhanced_track found no contributors → metadata-source
contributors-upgrade silently fell back to single-artist.
Reporter's case (Andrea Botez - Sacrifice): the album fetch logged
"Retrieved 4 tracks for album 673558211" before the post-process,
which cached all 4 tracks as partial records. The contributors-
upgrade then hit the partial cache and the upgrade log line never
fired because len(upgraded) was never > 1.
Lifted cache-validity to a pure helper `_is_full_track_payload` that
requires BOTH `track_position` AND `contributors` key presence. Empty
list `[]` is valid — single-artist tracks fetched via `/track/<id>`
carry it explicitly. Partial cache hits fall through to a fresh
`/track/<id>` fetch, which writes the full payload back to cache.
11 boundary tests pin every shape: full payload, single-artist with
empty contributors list, partial album-tracks shape, search-result
shape, none/non-dict, and the cache-hit/cache-miss/api-failure paths
on get_track_details (including the exact reporter-scenario
regression).
Full suite: 3021 passed.
Soulseek matched-download contexts populate `original_search_result`
with `artist` (singular string) and no `artists` list — the full
multi-artist array lives on `track_info` (the matched Spotify track
object). `extract_source_metadata` only read `original_search.artists`,
so the Soulseek path always fell through to the single-artist branch
and TPE1 ended up with the primary artist only. Deezer-direct
downloads were unaffected because their context populates
`original_search.artists` as a proper list.
Lifted artist resolution into a pure helper
`core/metadata/artist_resolution.py:resolve_track_artists` that walks
`original_search.artists` → `track_info.artists` → `artist_dict.name`
fallback chain. Normalizes mixed list-item shapes (Spotify-style
dicts, bare strings, anything else stringified) and drops empty
entries.
13 new tests pin the resolution order, fallback chain, mixed-shape
normalization, whitespace stripping, and empty/none handling. The
existing `_artists_list` no-fall-through test in
`test_multi_artist_tag_settings.py` was updated to reflect the new
contract (always populated; multi-value write still gated on
`len > 1`) plus a new regression test for the Soulseek shape.
Composes with the existing Deezer per-track upgrade (still fires when
single-artist + track_id available) and feat_in_title /
artist_separator settings (still drive the joined ARTIST string
downstream).
- legacy duck-typed builder only checked the `album_type` key; deezer
uses `record_type`, tidal uses `type` (uppercase), some flattened
musicbrainz shapes use `primary-type` — all defaulted to album, so
EPs and singles ended up filed under Album/ in user templates that
reference $albumtype
- widen lookup to album_type / record_type / type / primary-type and
route through new pure `_normalize_album_type` helper that
case-folds + validates against the canonical token set
(album / single / ep / compilation), unknown → album
- typed-converter path (spotify / deezer / itunes / discogs / mb /
hydrabase / qobuz) unchanged — those were already correct
Discord report (CAL).
- new track_already_owned helper wraps db.check_track_exists at
the same confidence threshold the discography backfill repair job
uses (0.7) — name+artist+album, format-agnostic so blasphemy-mode
libraries (flac → mp3 + delete original) match correctly
- endpoint runs the check after the artist + content-type filters and
before add_to_wishlist, so a second discography click on the same
artist no longer re-queues every track that already downloaded
- per-album response carries a new tracks_skipped_owned counter
alongside the existing artist/content/wishlist skip categories
Discord report (Skowl).
- drop tracks where the requested artist isn't named in track.artists
(keeps features, drops compilation / appears_on contamination)
- honor watchlist.global_include_live/remixes/acoustic/instrumentals
the same way the discography backfill repair job already does
- surface per-album skip counts in the ndjson stream (artist mismatch
+ content filter) so the ui can show what was filtered
Closes#559.
Two follow-ups to the multi-artist tag settings PR:
1. Deezer contributors upgrade — closes the "known limitation"
flagged in the prior commit. Deezer's `/search` endpoint only
returns the primary artist for each track; the full contributors
array (feat., remix collaborators, producers credited as artists)
lives on `/track/<id>` and gets parsed by `_build_enhanced_track`.
Without the upgrade Deezer-sourced tracks never got multi-artist
tags even with the right settings on.
Fix in `core/metadata/source.py`: when source==deezer AND the
search response had a single artist AND a track_id is available,
fetch full track details via `get_deezer_client().get_track_details`
and replace `all_artists` with the upgraded list.
- One extra API call per affected Deezer track
- Skipped when search already returned multiple (no-op fast path)
- Skipped for non-Deezer sources (Spotify/Tidal/iTunes search
responses already include all artists)
- Skipped when no track_id is available
- Defensive try/except: on /track/<id> failure (network error,
deezer client unavailable), fall through to the search-result
list — never lose the data we already had
2. Double-append guard hardened with a word-boundary regex.
Prior commit checked for `"feat." not in title.lower() and "(ft."
not in title.lower()` — too narrow. Source platforms produce
wildly different feat-marker conventions: "(feat. X)", "(Feat X)",
"(FEAT X)", "(Featuring X)", "[feat. X]", "ft. X" (no parens),
"FT. X", etc. Any of these as the SOURCE title would cause a
double-append: `"Track (Feat X) (feat. Y)"`.
Replaced with `re.search(r'\b(?:feat|feat\.|featuring|ft|ft\.)\b',
title, IGNORECASE)`. Word-boundary regex catches every common
variant. Substring matches like "Aftermath" containing `ft`
correctly fall through to the append path (pinned by a regression
test).
16 new tests (29 total in the file):
- 9 parametrized variants of the double-append guard
- 1 substring guard ("Aftermath")
- 6 Deezer upgrade scenarios (fires when expected, doesn't fire
for non-Deezer / multi-artist search / no track_id, defensive
fall-through on failure, no false-positive when /track/<id>
confirms single artist)
Full pytest 2727 passed.
Three settings on Settings → Metadata → Tags were partially or
completely unimplemented. Reporter (Netti93) traced each one.
(1) `write_multi_artist` only "worked" because of a never-populated
`_artists_list` field. `core/metadata/source.py` built
`metadata["artist"]` as a hardcoded ", "-joined string but never
assigned `metadata["_artists_list"]`. `core/metadata/enrichment.py`
line 107 reads that field and gates the multi-value tag write
on `len(_artists_list) > 1` — always saw an empty list, silently
no-op'd the write.
(2) `artist_separator` (default ", ") was referenced in the UI +
settings.js save path but ZERO Python code read the value. Every
multi-artist track ended up with hardcoded ", " regardless of
what the user picked.
(3) `feat_in_title` (when true: pull featured artists into the title
as " (feat. X, Y)" and leave only primary in the ARTIST tag —
Picard convention) had no implementation at all.
Fix in source.py:
* Populate `_artists_list` from the search response's artists array
* Read `feat_in_title` and `artist_separator` configs
* When `feat_in_title=True` and >1 artist: ARTIST = primary only,
append "(feat. X, Y)" to title with double-append guard
* Else: ARTIST = artists joined with `artist_separator`
* Single-artist case unaffected by either setting
Double-append guard uses a word-boundary regex catching all common
"feat" variants source platforms produce — `feat`, `feat.`,
`featuring`, `ft`, `ft.` — case-insensitive. Substring matches
(e.g. "Aftermath" containing "ft") correctly fall through to the
append path.
Fix in enrichment.py ID3 branch:
* TPE1 stays as the display string (with separator or primary-only
per the user's settings)
* Multi-value list goes to a separate `TXXX:Artists` frame (Picard
convention) when `write_multi_artist` is on
* Pre-fix the ID3 path wrote TPE1 twice — single-string then list
— and the second `add` overwrote the first, clobbering both the
configured separator AND the feat_in_title semantics. Vorbis path
was already correct (separate "artist" + "artists" keys).
Known limitation (flagged in WHATS_NEW): Deezer's `/search` endpoint
only returns the primary artist. The full contributors array lives
on `/track/<id>`. Enrichment uses search-result data so Deezer-
sourced tracks may still get only the primary artist until a follow-
up commit wires the per-track contributors fetch into the enrichment
flow. Spotify, Tidal, and iTunes search responses include all
artists so they work now.
23 new tests in `tests/metadata/test_multi_artist_tag_settings.py`:
* `_artists_list` populated for multi/single/no-artist cases
* `artist_separator` drives ARTIST string (default ", " + custom
";" + custom "; " + " & ")
* Single-artist case unaffected by either setting
* `feat_in_title=True` pulls featured to title, leaves primary in
ARTIST
* `feat_in_title` no-op for single artist
* Double-append guard recognizes 9 source-title variants ("(feat.
X)", "(Feat. X)", "(FEAT X)", "(feat X)", "(Featuring X)",
"[feat. X]", "ft. X", "(ft X)", "FT. X")
* Substring guard test pins "Aftermath" doesn't false-positive
* Combined-settings precedence: feat_in_title wins ARTIST string
but `_artists_list` carries everyone for multi-value tag
Full pytest 2711 passed.
Defensive followup. If Deezer CDN ever refuses the upgraded
1900×1900 URL for a specific album (rare — empirically tested 4
albums and none hit it), pre-fix would have succeeded with the
1000×1000 URL and post-fix would have failed entirely.
Both download sites now retry with the original URL when the
upgraded URL fails:
- `core/metadata/artwork.py::download_cover_art` — auto post-process
flow. Resolves the original URL from album_info / context the same
way the existing path does.
- `core/tag_writer.py::download_cover_art` — captures the original
URL before upgrade so the retry has it without a second context
lookup.
Strictly non-regressive: worst plausible post-fix case is now
identical to pre-fix (cover at 1000×1000 succeeds). Fallback only
fires on the rare CDN-refusal edge.
Tests added (2):
- `test_tag_writer_retries_with_original_on_failure` — upgraded URL
raises, original succeeds, both attempts logged in call order
- `test_tag_writer_no_fallback_for_non_dzcdn_url` — non-Deezer URLs
go through unchanged, no fallback path triggered (single attempt)
Verification:
- 18/18 helper + integration tests pass
- 2561 full suite passes
- Ruff clean
Discord report (Tim): downloaded cover art via Deezer metadata
source came out visibly blurry in Navidrome / on phones — large
displays exposed the limited resolution.
# Cause
Deezer's API returns `cover_xl` URLs at 1000×1000. The underlying
CDN actually serves up to 1900×1900 by rewriting the size segment
in the URL path (same trick the iTunes mzstatic + Spotify scdn
upgrades already use). SoulSync wasn't doing the rewrite — every
Deezer-sourced cover got embedded at 1000×1000 regardless of how
much higher resolution the CDN had available.
# Verified empirically
```
$ for size in 1000 1400 1800 1900 2000; do curl -I "...{size}x{size}-..."; done
1000: 200 OK 106 KB
1400: 200 OK 198 KB
1800: 200 OK 331 KB
1900: 200 OK 371 KB
2000: 403 Forbidden
```
1900 is the safe ceiling. Above that the CDN returns 403. CDN
serves source-native bytes when source < target (smaller-source
albums get same bytes whether we ask for 1000 or 1900), so asking
for 1900 universally is safe.
# Fix
New `_upgrade_deezer_cover_url(url, target_size=1900)` helper in
`core/deezer_client.py`. Pure function, mirrors the
`_upgrade_spotify_image_url` pattern that already lives in
`core/spotify_client.py`. Defensive on every input shape:
- Empty / None → returned as-is
- Non-Deezer URL (no `dzcdn`) → returned as-is
- No size segment in URL → returned as-is
- Already at/above target → returned as-is (idempotent, never
downgrades)
Applied at both cover-download sites:
- `core/metadata/artwork.py::download_cover_art` — auto post-process
flow. Mirrors the existing iTunes mzstatic upgrade right above it.
- `core/tag_writer.py::download_cover_art` — enhanced library view's
"Write Tags to File" feature.
# Scope discipline
- Helper applied at the DOWNLOAD boundary, not the source extraction
point in `deezer_client.py`. Means cached entries in the metadata
cache + DB row `image_url` columns keep the original 1000×1000 URL
Deezer's API returned. Future CDN behavior changes only affect the
download path, not stored data.
- Pre-existing `prefer_caa_art` toggle (Settings → Library →
Post-Processing) untouched — orthogonal workaround for users who
want even higher quality (MusicBrainz Cover Art Archive, often
3000×3000+).
- iTunes / Spotify upgrade paths untouched — they already worked.
# Tests added (16)
`tests/metadata/test_deezer_cover_url_upgrade.py`:
- Standard upgrade: default target 1900 on cover URL, alternate
dzcdn host (`e-cdns-images.dzcdn.net` vs `cdn-images.dzcdn.net`),
artist picture URLs (same path pattern), 500×500 source upgrades
too
- Custom target size: smaller target = no-op (never downgrade),
larger target works
- Idempotent: already at/above target returned unchanged
- Defensive on non-Deezer URLs: parametrised across 5 hosts
(Spotify scdn, iTunes mzstatic, MB CAA, Last.fm, random) — all
returned untouched
- Defensive on malformed Deezer URL (no size segment) → returned
as-is
- Empty / None handling
# Verification
- 16/16 helper tests pass
- 560/560 metadata + imports tests pass (no regression)
- 2559 full suite passes
- Ruff clean
Defensive followup to the relevance fix. Deezer's advanced search
syntax (`artist:"X"`) is documented as substring match, but in
practice it's brittle on artist name variants ("Foreigner [US]",
"The Foreigner") and on tracks indexed under non-canonical title
spellings. When the advanced query returns nothing, we'd previously
land at "No matches" — a regression vs. pre-fix behaviour where
free-text would have returned a less-relevant but non-empty set.
Fix: when the advanced query returns 0 results AND the caller used
field-scoped kwargs, fall back to a free-text join of the same
kwargs and re-query. Caller-side rerank still tightens whatever the
fallback returns, so the worst-case post-fix behaviour is the
pre-fix behaviour — never strictly worse.
Pulled the cache + parse + store dance into a private helper
(`_search_tracks_with_query`) so the orchestration can call it
twice (advanced → fallback) without code duplication. Single API
call when the advanced query has results — no wasted requests.
Diagnostic logger.debug fires when the fallback triggers so we can
see in production whether it's happening (and to which queries).
# Tests added (4)
- `test_falls_back_to_free_text_when_advanced_empty` — advanced
query returns 0, free-text returns hits; client returns the
free-text hits + both API calls fire.
- `test_no_fallback_when_advanced_query_has_results` — single hit
on advanced query → no second API call.
- `test_no_fallback_when_legacy_free_text_call` — legacy callers
already exhausted the only path; empty result is final.
- `test_no_fallback_when_query_unchanged` — empty kwargs path
doesn't trigger the fallback branch (used_advanced=False).
# Existing tests updated
The 4 prior `TestSearchTracksQueryWiring` + `TestSearchTracksCacheKey`
tests were stubbing `_api_get` to return empty `{'data': []}` and
asserting `assert_called_once`. With the new fallback, those stubs
trigger a second API call and the assertions break — even though
the FIRST call construction is what the tests cared about. Updated
the stubs to return one fake hit so the fallback doesn't fire, and
switched to `call_args_list[0]` for first-call inspection.
# Verification
- 18/18 deezer query tests pass (14 prior + 4 new)
- 2445 full suite passes (+4 from prior commit)
- Ruff clean
# Background
User reported (#534) that the import-modal "Search for Match" dialog
returned irrelevant results when Deezer was the metadata source.
Searching `Dirty White Boy` + `Foreigner` returned 5+ karaoke /
"originally performed by" / "in the style of" / "re-recorded" /
tribute-band results ranked above the actual Foreigner studio cut
from Head Games. User had to scroll past the junk every time, or
fall back to iTunes search which is much slower.
# Root cause — two layers
1. **Endpoint joined `track + artist` into free-text query.**
`/api/deezer/search_tracks` was passing `q=Dirty White Boy Foreigner`
to Deezer's `/search/track` API. Deezer fuzzy-matches that
string across title / lyrics / artist / album / contributors and
orders by global popularity — anything that appears across many
compilations outranks the canonical recording.
2. **No local rerank.** None of the search-modal endpoints applied
any post-filtering. Deezer's API order shipped straight to the
user.
# Fix — same architectural shape Cin would build
## Layer 1: field-scoped query at the client boundary
`core/deezer_client.py::search_tracks()` now accepts optional
`track`, `artist`, `album` kwargs. When provided, builds Deezer's
advanced search syntax: `q=track:"X" artist:"Y" album:"Z"`. Massive
relevance improvement because each term matches the right field
instead of fuzzy-matching everywhere.
Backward compat preserved: legacy free-text `query=` callers still
work unchanged. Field-scoped path takes precedence when both are
provided. Empty input fast-fails without an API call. Embedded
double-quotes stripped (Deezer's syntax has no escape mechanism).
## Layer 2: provider-neutral relevance reranker
New `core/metadata/relevance.py` module — pure-function rerank over
the canonical `Track` dataclass. Composable scoring:
- **Cover/karaoke patterns** (multiplier 0.05, effectively buries):
matches "karaoke", "originally performed by", "in the style of",
"made famous by", "tribute", "vocal version", "backing track",
"cover version", "re-recorded", "cover by", etc. across title,
album, AND artist fields. Catches the screenshot's exact junk:
artist credits like "Pop Music Workshop" / "The Karaoke Channel"
/ "Foreigner Tribute Band".
- **Variant tags** (multiplier 0.4): live / acoustic / demo /
instrumental / remix / radio edit / club mix etc. — softer
penalty since the user MAY want them. Skipped entirely when the
expected_title contains the same tag (so searching
"Track (Live)" still ranks Live versions first).
- **Exact artist boost** (multiplier 1.5): primary artist exactly
matches expected_artist after normalisation. Single strongest
signal for "this is the canonical recording".
- **Title + artist similarity** via SequenceMatcher (parentheticals
+ punctuation stripped before comparison).
- **Album-type weighting**: album=1.0 > single/ep=0.85 > compilation=0.7.
Compilations are more likely tribute / karaoke repackages.
Each component is a standalone function so tests pin them
individually without standing up the full pipeline.
## Wired at three search-modal endpoints
- `/api/deezer/search_tracks` — uses both layers (field-scoped
query + rerank).
- `/api/itunes/search_tracks` — uses rerank only (iTunes API has
no advanced-syntax search, but karaoke / cover variants still
leak through and need the local penalty).
- `/api/spotify/search_tracks` — already builds field-scoped
`track:X artist:Y` query; rerank added as the consistency safety
net so all three sources behave the same from the user's
perspective.
Other Deezer call sites (matching engine, watchlist scanner,
auto-import single-track ID) deliberately not touched in this PR
— they have their own elaborate scoring pipelines tuned to their
specific contexts and aren't surfacing the user-reported issue.
Per Cin: "don't refactor beyond what the task requires."
# Tests
71 new tests across 3 files:
- `tests/metadata/test_relevance.py` (50 tests) — every scoring
component pinned individually + the issue #534 screenshot
reproduced as a regression test (real Foreigner cut wins after
rerank, karaoke variants drop to bottom).
- `tests/metadata/test_deezer_search_query.py` (14 tests) —
advanced-syntax query construction, field-scoped wiring at the
client boundary, free-text path unchanged, kwargs win when
ambiguous, limit clamping, cache key consistency.
- `tests/imports/test_search_match_endpoints.py` (7 tests) —
end-to-end through Flask test client: Deezer endpoint passes
kwargs not joined query; karaoke buried at bottom for all three
sources; legacy query param still works without rerank.
# Verification
- 2441 full suite passes (+71 from baseline 2370)
- 0 failures (the prior watchdog flake fix held)
- Ruff clean across all changed files
- JS parses clean (`node -c webui/static/helper.js`)
# Architectural standards followed
- **Logic at the right boundary.** Query construction lives in the
client (every caller benefits from one change). Rerank lives in
a neutral module (`core/metadata/relevance.py`) over the
canonical `Track` dataclass — works for any source, not Deezer-
specific.
- **Explicit > implicit.** Every scoring rule has its own named
function. Pattern tables are module-level constants tests can
introspect.
- **Scope discipline.** Audited every Deezer search call site;
fixed the user-reported one + the consistent siblings. Did NOT
speculatively normalise every Deezer call across the codebase.
- **Backward compat.** Free-text `query=` callers untouched. Kwargs
added to existing client method signature with safe defaults.
- **Tests pin contract at correct boundary.** Pure-function rerank
tests don't mock anything; client-query tests stub at `_api_get`;
endpoint tests run through the real Flask app.
`_create_finding` silently dedup-skipped re-discovered issues but
the caller incremented `findings_created` regardless. So a re-scan
that found the same issues as a prior scan reported 364 findings
in the badge while 0 NEW pending rows hit the db, leaving the
findings tab empty.
`_create_finding` now returns bool (True on insert, False on
dedup-skip / db error). All 16 repair jobs updated to only
increment `findings_created` on True. Added `findings_skipped_dedup`
counter surfaced in scan log: "Done: X scanned, 0 fixed, 0
findings (363 already existed), 0 errors".
Also fixed a missing `job_id` kwarg in album_tag_consistency that
was silently breaking finding creation for that scan.
Three more album-shape consumers now route through
Album.from_<source>_dict() when caller passes a known source:
- _build_discography_release_dict (artist discography cards)
- _build_artist_detail_release_card (artist detail release cards)
- _normalize_track_album (quality scanner result normalization)
Legacy duck-typing stays as fallback for unknown source,
non-dict input, or converter errors. Pure additive — existing
callers without source kwarg unchanged.
Steps 2+3 of typed metadata migration. Two album-info builders now
route through Album.from_<source>_dict() when caller passes a
known source:
- _build_album_info (album-tracks lookups)
- _build_single_import_context_payload (single-track import context)
Legacy duck-typing stays as fallback for unknown source, non-dict
input, or converter errors. Pure additive — existing callers
without source kwarg unchanged.
Audit caught two missing providers from the foundation pr. Both
return album-shaped data via their clients (search + download
flows). Tidal uses tidalapi objects rather than dicts so the
converter is from_tidal_object, not _dict.
Enrichment-only providers (lastfm/genius/acoustid/listenbrainz/
audiodb) intentionally have no album converter — they enrich
existing rows, never return album shapes.
Tests: +8 cases. 40 total now.
New core/metadata/types.py with canonical dataclasses + classmethod
converters for spotify/itunes/deezer/discogs/musicbrainz/hydrabase.
Each converter is the single place that knows that provider's wire
shape — addresses the duck-typing pattern Cin flagged.
Pure additive: no consumer code changed. Follow-up PRs migrate
consumers one at a time. Migration plan at
docs/metadata-types-migration.md.
Tests: 32 cases pin per-provider semantics + cross-provider
invariants. Also stabilized a flaky discogs test that depended on
local config state.
Discord report (Samuel [KC]): tracks of the same album sometimes carry
different MUSICBRAINZ_ALBUMID tags, which causes Navidrome (and other
media servers grouping by album MBID) to split the album into multiple
entries. Two-part fix — one for existing libraries, one for the root
cause that lets new imports drift.
Part 1 — Detector + fix action (catches existing dissenters):
`core/repair_jobs/mbid_mismatch_detector.py`:
- New helpers: `_read_album_mbid_from_file` and
`_write_album_mbid_to_file` use the Picard-standard tag conventions
(`TXXX:MusicBrainz Album Id` for MP3, `MUSICBRAINZ_ALBUMID` for
FLAC/OGG, `----:com.apple.iTunes:MusicBrainz Album Id` for MP4).
- New scan phase `_scan_album_mbid_consistency` runs after the
existing track-MBID scan: groups tracks by DB `album_id`, reads
each track's embedded album MBID, finds the consensus
(most-common) MBID via `Counter`, flags dissenters. Tracks without
an album MBID at all are skipped (they don't break Navidrome —
only an explicit MBID disagreement does). Albums where MBIDs are
perfectly tied (no clear consensus) are skipped too — surface as
a manual decision instead of fixing toward a 1/N tie.
- New finding type `album_mbid_mismatch` carries `consensus_mbid`,
`wrong_mbid`, `consensus_count`, `total_tracks_with_mbid`, and a
human-readable reason string.
`core/repair_worker.py`:
- Added `'album_mbid_mismatch': self._fix_album_mbid_mismatch` to the
fix dispatch dict and to the `fixable_types` tuple so auto-fix +
bulk-fix paths pick it up.
- New `_fix_album_mbid_mismatch` method reads `consensus_mbid` from
finding details, resolves the dissenter's file path via the shared
library resolver, calls `_write_album_mbid_to_file` to rewrite the
tag in place. Doesn't touch the album's other tracks (they're
already in agreement).
Part 2 — Root cause fix (prevents new SoulSync imports from drifting):
The original in-memory `mb_release_cache` in `core/metadata/source.py`
maps `(normalized_album, artist) -> release_mbid` so per-track
enrichment of the same album hits the cache and writes the same
MUSICBRAINZ_ALBUMID to every track. That cache is bounded (4096
entries) and in-process — so cache eviction (when other albums are
processed in between) and server restart can BOTH cause
inconsistency. Per-track album-name variation (e.g. some tracks
tagged `"Album"`, others tagged `"Album (Deluxe)"`) and per-track
artist variation (features) make it worse.
`core/metadata/album_mbid_cache.py` (new module):
- DB-backed `lookup(normalized_album, artist) -> release_mbid` and
`record(...)` functions. Same key shape as the in-memory cache.
- Strict additive design: every public function is wrapped in
try/except and degrades to None / no-op on ANY database error.
The existing in-memory cache + MusicBrainz lookup remains the
authoritative fallback. If this module breaks, downloads continue
exactly as they would today.
`database/music_database.py`:
- New `mb_album_release_cache` table with composite primary key
`(normalized_album_key, artist_key)`. Reverse-lookup index on
`release_mbid` for future debug tooling. Created via the existing
`CREATE TABLE IF NOT EXISTS` migration pattern — idempotent, no
schema version bump needed.
`core/metadata/source.py`:
- Surgical change inside the existing `embed_source_ids`
in-memory-cache-miss branch: BEFORE calling MusicBrainz, consult
the persistent cache. If a previous SoulSync run already resolved
this album's release MBID, reuse it. After a successful MB lookup,
store in BOTH caches. Both calls wrapped in defensive try/except
so any failure falls through to existing logic.
Tests:
- `tests/metadata/test_album_mbid_cache.py` — 16 cache tests:
round-trip, idempotent re-record, overwrite semantics, clear_all,
album+artist independence (no Greatest Hits collisions),
defensive None-on-empty-input, graceful degradation when the DB
is unavailable / connection raises / commit fails, schema sanity
(table + index exist after init).
- `tests/test_album_mbid_consistency.py` — 13 detector tests:
tag read/write round-trip on real FLAC files, Picard-standard tag
descriptors, defensive paths (unreadable file, empty input),
detector behavior (agreement → no flags, lone dissenter → flag,
ties → no flag, single-track albums → skipped, no-MBID tracks →
skipped, unresolvable file paths → skipped).
- `tests/metadata/test_metadata_enrichment.py` — added autouse
fixture monkeypatching the persistent cache to no-op for tests in
this file. The existing tests pin per-call MB counts and
in-memory cache state; without the fixture, persistent rows from
earlier tests would bypass the MB call. Persistent layer has its
own dedicated tests.
Verified: 1782 tests pass (29 new), ruff clean, smoke test confirms
end-to-end cache round-trip works.
WHATS_NEW entry under '2.4.2' dev cycle.
- keep existing /api/image-proxy URLs from being wrapped again
- reuse the shared metadata package instead of duplicating URL logic in web_server.py
- add regression coverage for proxy passthrough and internal URL normalization
Followup to the enrichment-bubble registry consolidation. The
dashboard polling + click handlers all hit
/api/enrichment/<service>/{status,pause,resume} now, so the 30
hand-rolled per-service routes in web_server.py have zero callers
and can come out:
/api/musicbrainz/{status,pause,resume}
/api/audiodb/{status,pause,resume}
/api/discogs/{status,pause,resume}
/api/deezer/{status,pause,resume}
/api/spotify-enrichment/{status,pause,resume}
/api/itunes-enrichment/{status,pause,resume}
/api/lastfm-enrichment/{status,pause,resume}
/api/genius-enrichment/{status,pause,resume}
/api/tidal-enrichment/{status,pause,resume}
/api/qobuz-enrichment/{status,pause,resume}
Worker init blocks stay (they still construct the workers + persist
pause state). Section comment headers are preserved with a one-line
note pointing readers at the new generic blueprint.
Test fixtures in tests/conftest.py and
tests/metadata/test_enrichment_events.py also updated to use the
new URL paths so they reflect production reality. They were
synthetic stubs that never depended on the production routes —
purely cosmetic alignment.
Net: ~510 lines deleted from web_server.py. Full pytest 1541
passed; ruff clean.
- Flatten the Spotify service-status rendering so it shows rate-limit and recovery states explicitly, while otherwise displaying the active metadata provider directly.
- Keep the Spotify auth controls and metadata-source picker aligned with the real session state after authenticate and disconnect flows.
- Return "Unmapped" for unknown metadata source labels instead of implying iTunes.
- Update the metadata registry tests to cover the new label fallback.
- Send Spotify auth completion back to the opener so the settings page refreshes immediately
- Make the local auth flow go straight through to Spotify instead of showing the temporary instruction page
- Keep the remote/docker instruction page available for manual callback setups
- Sync Spotify status, connect/disconnect buttons, and metadata source selection after auth and disconnect
- Keep the disconnect behavior aligned with the active primary metadata source
- Hide the auth button when a Spotify session is active
- Treat disconnect as a session change, not a provider swap
- Share metadata source labels in the registry
- Tighten rate-limit copy around Spotify-specific behavior
- let core.metadata.registry own per-profile Spotify client caching
- register the DB-backed profile credentials provider from web_server.py
- invalidate only the affected profile cache entry on save, delete, and auth
- split metadata lookup logic into core/metadata/*
- keep core/metadata_service.py as the legacy barrel
- update tests and artist-detail code to patch concrete modules
User report: SoulSync was only pulling MusicBrainz genres from the
recording (track-level) endpoint. Most MB recordings don't carry genres
at the track level — they live on the release (album) or artist. So
the MB tier was contributing nothing to the genre merge for the
overwhelming majority of tracks.
Fix:
- Added `'genres'` to the release-detail `includes` (was missing).
- After release-detail processing, if pp['mb_genres'] is still empty,
populate from release_detail['genres'] (sorted by count desc).
- If still empty AND artist_mbid is set, fetch artist with
`includes=['genres']` and use those.
No extra API call when the recording (or release) already had genres —
the artist fetch only fires when both upstream tiers came back empty.
The downstream genre merge in _embed_metadata_genres is unchanged; this
just makes the MB feed into it richer.
Tests: 4 new (recording present, recording empty → release, recording
+ release empty → artist, all empty → []). Full suite 873 passing.
Ruff clean.
Reported by @kcaoyef421 in Discord.
- Normalize album import track display handling so queue labels and match rows stay consistent
- Bound MusicBrainz caches and avoid caching transient lookup failures
- Stop swallowing programmer errors in source enrichment helpers
- Restore import config test seams without reintroducing lazy imports
- Guard task completion calls and fix the Windows path test expectation
- Keep file lock tracking from growing without bound
- Cover search_result fallback normalization and ambiguous album detection.
- Add staging metadata, multi-disc path, and MusicBrainz enrichment cases.
- Move the single-track context test next to the imports code it exercises.
- keep single-track import lookup in imports/resolution.py
- normalize simple-download search_result data before wishlist matching
- run wishlist cleanup for simple-download post-processing
- keep source-only artist detail on resolved names and MB short-circuit
- Move the import pipeline runtime factory into core.imports.pipeline
- Move the metadata runtime factory into core.metadata.enrichment
- Keep the web server wiring thin and drop the shared glue module
- Add contract tests that keep the two runtime bundles separate
- Move the metadata and MusicBrainz-related tests into a dedicated tests/metadata subfolder.
- Keep the rest of the suite flat for now.
- Preserve the existing test filenames so the change stays organizational rather than behavioral.