Register MusicBrainz as a first-class metadata source alongside Deezer, iTunes, Spotify, Discogs, and Hydrabase. Expose the shared client through metadata services, add the settings option, and expand the MusicBrainz search adapter with source-compatible artist, album, track, and detail methods.
Carry MusicBrainz IDs through similar-artist discovery, recommended artists, artist map serialization, and personalized playlist selection. Update DB migrations and lookup filters so similar_artist_musicbrainz_id is preserved on older schemas and used for source requirements and library exclusion.
Normalize MusicBrainz album adapter output for import context and add regression coverage for registry mapping, typed album conversion, and similar-artist filtering. Verified by user with 120 focused tests passing.
Four selection-quality fixes on the SoulSync-made discover playlists.
None change public method signatures; all are tightenings on what's
already there.
(1) Diversity for Hidden Gems + Discovery Shuffle
Both used to be `RANDOM() LIMIT N` with no diversity. Could return
50 tracks from one artist or 20 from one album if the discovery
pool happened to be skewed. Both now over-fetch 3x and run the
existing `_apply_diversity_filter`:
- Hidden Gems: max 2 per album, 3 per artist
- Discovery Shuffle: max 2 per album, 2 per artist (tighter — shuffle
should feel maximally varied)
(2) Source-aware popularity thresholds
`popularity >= 60` for "Popular Picks" and `popularity < 40` for
"Hidden Gems" was Spotify-shaped (0-100 scale). Deezer writes its
`rank` value into that column (often six-digit integers); iTunes
writes nothing meaningful. For Deezer-primary users:
- Popular Picks pulled essentially everything (rank >= 60 = all)
- Hidden Gems pulled essentially nothing (rank < 40 = none)
New `_get_popularity_thresholds(source)` helper returns per-source
values:
- Spotify: (60, 40) — the existing 0-100 scale
- Deezer: (500_000, 100_000) — ballpark from real rank values
- iTunes / unknown: (None, None) — skip the popularity filter
entirely, fall back to random + diversity
`get_popular_picks` and `get_hidden_gems` now consult the helper.
When threshold is None they skip the popularity SQL filter. Diversity
+ ID gate still apply.
(3) Push genre keyword filter into SQL
`get_genre_playlist` used to fetch `limit=1_000_000` rows into Python
then run a substring keyword filter on `artist_genres`. Bad on big
discovery pools.
Now the keyword OR chain is generated as SQL placeholders:
AND (artist_genres LIKE ? OR artist_genres LIKE ? OR ...)
Each placeholder gets `f'%{keyword.lower()}%'` via `extra_params`.
`fetch_limit` drops back to `limit * 10`. `_genre_matches` Python
helper deleted (only intra-file caller; verified via grep).
Parent-genre expansion via `GENRE_MAPPING` preserved — keywords list
feeds the LIKE chain unchanged.
(4) Filter out tracks already in library
Discovery pool can include tracks the user already owns. Hidden Gems
/ Shuffle / Popular Picks shouldn't surface those.
`_select_discovery_tracks` gained `exclude_owned: bool = True`
parameter. When True, adds a correlated NOT EXISTS subquery against
the `tracks` table covering all 3 source IDs:
AND NOT EXISTS (
SELECT 1 FROM tracks t WHERE
(t.spotify_track_id IS NOT NULL AND t.spotify_track_id = discovery_pool.spotify_track_id)
OR (t.itunes_track_id IS NOT NULL AND t.itunes_track_id = discovery_pool.itunes_track_id)
OR (t.deezer_id IS NOT NULL AND t.deezer_id = discovery_pool.deezer_track_id)
)
Note column-name asymmetry: tracks.deezer_id vs
discovery_pool.deezer_track_id. Inline comment marks the trap. All
5 public discovery methods automatically benefit (default True).
Seasonal Playlist doesn't go through the helper so it's unaffected
(curated content, dedup is wrong intent there).
Tests
12 new tests in `tests/test_personalized_playlists_id_gate.py` (27
total in the file):
- Hidden Gems + Discovery Shuffle apply diversity (cap proven by
inserting 10 same-artist + same-album rows and asserting return
count ≤ per-album cap)
- Popularity thresholds: Spotify (60, 40), Deezer larger scale,
iTunes None / None
- Popular Picks skips threshold filter when None
- Genre playlist pushes filter to SQL (parent + child genre expansion)
- Owned-track exclusion: filtered when match, kept when no match,
opt-out flag works
- Deezer column-name asymmetry pinned (regression footgun)
Test fixture re-added the minimal `tracks` table (4 columns: id,
spotify_track_id, itunes_track_id, deezer_id) — only what the new
NOT EXISTS subquery needs to join. Plus `insert_library_track`
helper.
Verification
- 27/27 in this test file pass (15 prior + 12 new)
- 2232/2232 full suite green
- ruff clean
LOC delta:
- core/personalized_playlists.py: 1030 → 1101 (+71)
- tests/test_personalized_playlists_id_gate.py: 352 → 616 (+264)
The original gate baked into `_select_discovery_tracks` only checked
Spotify + iTunes:
AND (spotify_track_id IS NOT NULL OR itunes_track_id IS NOT NULL)
For Deezer-primary users, discovery_pool rows have populated
`deezer_track_id` but NULL Spotify + NULL iTunes IDs. The gate
filtered every row out — Time Machine, Genre Browser, Hidden Gems,
Discovery Shuffle, Popular Picks all rendered "no tracks found" for
every tab on every Deezer-primary install.
Extended the gate to include `deezer_track_id` and added that column
to the standard SELECT column tuple. `_build_track_dict` already
exposed `deezer_track_id` in its output shape, so frontend rendering
needed no changes.
Regression pinned via new test
`test_discovery_helper_accepts_deezer_only_id_rows` — inserts a row
with NULL Spotify + NULL iTunes but a populated `deezer_track_id`
and asserts it survives the gate.
2220/2220 full suite green.
Owner decision: not worth shipping. The four library-driven personalized
sections were stubbed returning [] for ages because their schema
prereqs didn't exist; the prior commit re-enabled them by routing
through a new `_select_library_tracks` helper. Owner reviewed and chose
to delete the sections entirely instead.
Removed everywhere:
- `core/personalized_playlists.py` — `get_recently_added`,
`get_top_tracks`, `get_forgotten_favorites`, `get_familiar_favorites`
+ the `_select_library_tracks` helper (no other callers; verified
via grep).
- `web_server.py` — 4 route handlers
(`/api/discover/personalized/recently-added`, `top-tracks`,
`forgotten-favorites`, `familiar-favorites`).
- `webui/index.html` — 4 `<div class="discover-section">` blocks
(`#personalized-recently-added`, `#personalized-top-tracks`,
`#personalized-forgotten-favorites`,
`#personalized-familiar-favorites`).
- `webui/static/discover.js` — 4 load functions
(`loadPersonalizedRecentlyAdded`, `loadPersonalizedTopTracks`,
`loadPersonalizedForgottenFavorites`, `loadFamiliarFavorites`),
plus their entries in `loadDiscoverPage`'s Promise.all, plus
4 module-level state vars + 6 dead branches across
`openDownloadModalForDiscoverPlaylist` / `startDiscoverPlaylistSync`
and the sync-progress / rehydrate dispatchers.
- `webui/static/helper.js` — 4 tooltip / docs entries.
- `webui/static/sync-spotify.js` — 1 stale rehydrate dispatcher
branch (`discover_familiar_favorites`) caught during the global
grep pass.
- `tests/test_personalized_playlists_id_gate.py` — 3 library-method
tests + the test infrastructure that supported them
(`tracks` schema, `insert_library_track` helper). Documentation
header updated to reflect the deletion.
Net: -527 / +2 lines across 7 files.
What stays:
- Daily Mixes (also in personalized package, intentionally paused —
separate decision).
- Popular Picks + Hidden Gems + Discovery Shuffle (alive, not
affected by this deletion).
- All 14 tests in the personalized-playlists test file still pass.
- The PersonalizedPlaylistsService lift from the prior commit
(`_select_discovery_tracks` etc) — those are still in active use
by the surviving discovery_pool methods.
DISCOVER_TRACK_SELECTION_REVIEW.md at repo root contains historical
references to the four deleted endpoints. Treated as historical
context (same policy as WHATS_NEW), left alone.
2219/2219 full suite green (was 2222 - 3 deleted tests = 2219).
JS parses clean, ruff clean.
User-facing bug found in the discover-page audit: multiple sections
(hidden gems, discovery shuffle, popular picks, decade browser,
genre browser) had no `WHERE (spotify_track_id IS NOT NULL OR
itunes_track_id IS NOT NULL ...)` gate. Tracks with no source IDs
in the discovery pool got displayed, the user clicked download, the
download silently failed because there was nothing to look up.
Lift + gate
`PersonalizedPlaylistsService` had 5 selection methods that all shared
the same shape — connect to DB, run a SELECT against `discovery_pool`
with different WHERE clauses, optionally apply diversity, return
list of track dicts. ~366 lines of business logic, ~55% of which was
repeated boilerplate.
Three new private helpers consolidate everything:
- `_select_discovery_tracks(*, source, extra_where, extra_params,
order_by, fetch_limit, extra_columns)` — shared SELECT against
`discovery_pool`. The mandatory ID gate is hard-coded into the
WHERE clause: no opt-out flag, every method inherits it for free.
Plus the source filter and the blacklist filter — same shape every
selector needs.
- `_apply_diversity_filter(tracks, *, max_per_album, max_per_artist,
limit)` — per-album / per-artist cap loop, returns trimmed list.
Lifted from the inline duplicates in decade / genre / popular_picks.
- `_compute_adaptive_diversity_limits(tracks, *, relaxed=False)` —
step-function tiers based on unique-artist count. `relaxed=True`
gives the slightly looser limits the genre playlist used vs the
decade playlist.
Re-enable 4 library methods
`get_recently_added`, `get_top_tracks`, `get_forgotten_favorites`,
`get_familiar_favorites` were all stubs (`return []`) because they
predated the schema columns they need. Schema now has them:
`tracks.created_at`, `tracks.play_count`, `tracks.last_played`, and
the source ID columns added in earlier work.
New `_select_library_tracks(*, where_clause, params, order_by, limit)`
helper mirrors the discovery selector but targets the `tracks` table
joined against `albums` + `artists`. Mandatory ID gate lives in the
helper too: every library method automatically rejects rows where
spotify_track_id, itunes_track_id, deezer_id,
musicbrainz_recording_id, AND audiodb_id are all NULL.
Selection rules:
- `get_recently_added` — ORDER BY created_at DESC
- `get_top_tracks` — WHERE play_count > 0 ORDER BY play_count DESC
- `get_forgotten_favorites` — WHERE play_count > 5 AND last_played
< (now - 90 days) ORDER BY play_count DESC
- `get_familiar_favorites` — WHERE play_count BETWEEN 3 AND 15
Tests
`tests/test_personalized_playlists_id_gate.py` — 17 tests pinning:
- `_select_discovery_tracks` filters NULL-id rows, honors source +
blacklist + extra_where
- `_apply_diversity_filter` caps per-album + per-artist + stops at
limit
- `_compute_adaptive_diversity_limits` returns the right tier for
unique-artist count + relaxed flag
- All 5 discovery methods (decade, popular_picks, hidden_gems,
discovery_shuffle, genre is exercised via the helper) reject
NULL-id rows
- All 4 library methods reject NULL-id rows + honor their
play-count rules
Behavior preserved
Same diversity tiers, same over-fetch multipliers (10x for decade /
genre, 3x for popular_picks), same `popularity DESC, RANDOM()`
ordering, same `popularity >= 60` / `< 40` thresholds, same
blacklist filter. Public method signatures unchanged — `web_server.py`
needs zero edits.
Net file: 1089 → ~1170 LOC (helpers + docstrings), but actual
business logic across the 9 methods went from ~418 lines down to
~195 (-53%).
2222/2222 full suite green (was 2205 + 17 new). Ruff clean.
All callers of _create_fallback_client() and _get_configured_fallback_source()
now use get_primary_client() and get_primary_source() directly. No more
legacy alias usage anywhere in the codebase.
All metadata source decisions now flow through get_primary_source() and
get_primary_client() in core/metadata_service.py. Previously 6 different
files reimplemented this logic with inconsistent defaults ('itunes' vs
'deezer') and auth checks, causing bugs when any one was missed.
Changes:
- metadata_service.py: Added canonical get_primary_source/get_primary_client
- web_server.py: _get_metadata_fallback_source() and _get_active_discovery_source()
are now thin wrappers delegating to metadata_service
- seasonal_discovery.py: _get_source() delegates to metadata_service
- personalized_playlists.py: _get_active_source() delegates to metadata_service
- spotify_client.py: Fixed _fallback_source default from 'itunes' to 'deezer'
- watchlist_scanner.py: _get_fallback_metadata_client() delegates to metadata_service
Future changes to source selection only need to update one file.
Seasonal discovery, personalized playlists, and playlist explorer all
defaulted to Spotify when authenticated, ignoring the user's configured
primary source. Now they read from config first.
Spotify's related_artists API (no Deezer/iTunes equivalent) is preserved
as a fallback for all users in personalized playlists. Artist discography
endpoint intentionally unchanged — ID-based lookups need the source that
owns the ID.
- New discovery_artist_blacklist table with NOCASE name matching
- Filter blacklisted artists from all 6 discovery pool queries, hero
endpoint, and recent releases via SQL subquery and Python set check
- Name-based filtering means one block covers all sources (Spotify/iTunes/Deezer)
- Hover any discovery track row → ✕ button to quick-block that artist
- 🚫 button on Discover hero opens management modal with search-to-add
(powered by enhanced search) and list of blocked artists with unblock
- CRUD API: GET/POST/DELETE /api/discover/artist-blacklist
- Updated changelogs
Genre explorer and deep dive modal now combine data from all available
metadata sources (iTunes + Deezer always, Spotify when authenticated).
Artists are deduplicated by name across sources, preferring entries
with images. Source dots (green/red/purple) indicate data origin.
Deezer genre support:
- Extract genre_id from Deezer album search responses via ID-to-name
mapping table (26 Deezer genre categories)
- Extract full genre names from Deezer get_album responses
- One-time backfill updates existing cached albums from stored raw_json
- Propagate album genres to Deezer artist entities
Cross-source album routing:
- /api/discover/album endpoint uses source-specific client (iTunes or
Deezer) based on the item's source, not just the active fallback
- Spotify path falls back to active fallback when album not found
- Track clicks use album_id directly instead of name-based resolution
- resolve-cache-album adds partial match and live search fallback
Other fixes:
- Genre explorer positioned at top of Discover page (below hero)
- Genre explorer results cached 24hr in-memory for fast reload
- Related genres computed from all albums by matched artists
- Artist clicks open Artists page with discography (not library detail)
- Discovery pool genre queries restored to source-filtered (Browse by
Genre tabs stay source-isolated as designed)
The genre query filtered by active source (spotify/deezer/itunes)
but discovery pool entries keep their original source. Switching
metadata sources caused all genres to disappear. Removed the source
filter since artist genres are source-agnostic metadata.
- personalized_playlists._get_active_source() now returns 'deezer' when
configured instead of always falling back to 'itunes'
- Add deezer_track_id to _build_track_dict() for discovery pool tracks
- Include album_deezer_id and artist_deezer_id in get_discovery_recent_albums()
response — fixes "No deezer album ID available" error when clicking cards
- Skip Spotify library section entirely when Spotify is not authenticated
Users can now choose between iTunes/Apple Music and Deezer as their free
metadata source in Settings. Spotify always takes priority when authenticated;
the fallback handles all lookups when it's not.
Core changes:
- DeezerClient: full metadata interface (search, albums, artists, tracks)
matching iTunesClient's API surface with identical dataclass return types
- SpotifyClient: configurable _fallback property switches between iTunes/Deezer
based on live config reads (no restart needed)
- MetadataService, web_server, watchlist_scanner, api/search, repair_worker,
seasonal_discovery, personalized_playlists: all direct iTunesClient imports
replaced with fallback-aware helpers
Database:
- deezer_artist_id on watchlist_artists and similar_artists tables
- deezer_track_id/album_id/artist_id on discovery_pool and discovery_cache
- Full CRUD for Deezer IDs: add, read, update, backfill, metadata enrichment
- Watchlist duplicate detection by artist name prevents re-adding across sources
- SimilarArtist dataclass and all query/insert methods handle Deezer columns
Bug fixes found during review:
- Similar artist backfill was writing Deezer IDs into iTunes columns
- Discover hero was storing resolved Deezer IDs in wrong column
- Status cache not invalidating on settings save (source name lag)
- Watchlist add allowing duplicates when switching metadata sources