- watchlist_scanner: fall back to album.image_url when album object has no
images list (affects MusicBrainz CAA URLs, iTunes, Deezer — all use
image_url on the Album dataclass, not the Spotify-style images array)
- Pulse Downloads nav icon while active downloads are in progress, same
pattern as watchlist scan animation
Add MusicBrainz watchlist artist ID storage, badges, linked-provider editing, and per-artist preferred source support.
Backfill watchlist MusicBrainz matches from already-enriched library artists so existing MusicBrainz worker matches appear in watchlist cards and settings.
Extend bulk watchlist add, liked artist matching, artist map source picking, and service status labels to recognize MusicBrainz, with regression tests for watchlist ID persistence and backfill.
Register MusicBrainz as a first-class metadata source alongside Deezer, iTunes, Spotify, Discogs, and Hydrabase. Expose the shared client through metadata services, add the settings option, and expand the MusicBrainz search adapter with source-compatible artist, album, track, and detail methods.
Carry MusicBrainz IDs through similar-artist discovery, recommended artists, artist map serialization, and personalized playlist selection. Update DB migrations and lookup filters so similar_artist_musicbrainz_id is preserved on older schemas and used for source requirements and library exclusion.
Normalize MusicBrainz album adapter output for import context and add regression coverage for registry mapping, typed album conversion, and similar-artist filtering. Verified by user with 120 focused tests passing.
Snapshots now track when their source data changes. Watchlist scan
emits stale flags on the playlists whose underlying pool just got
refreshed; the next pipeline run sees the flag and regenerates the
snapshot before syncing, so the server playlist never lags the source.
Schema:
- new `is_stale INTEGER NOT NULL DEFAULT 0` column on
`personalized_playlists`, plus an idempotent ADD COLUMN migration
in `ensure_personalized_schema` for installs created before this PR.
- `PlaylistRecord.is_stale: bool = False` exposed on the dataclass so
callers can branch on freshness without re-querying.
Manager:
- new `mark_kinds_stale(kinds, profile_id=None)` flips the flag in
bulk for a list of kinds (used by upstream data refreshers).
- `_persist_snapshot` clears `is_stale = 0` on successful refresh.
- SELECT statements + `_row_to_record` updated to read the column
(with tuple-form length guard for safety).
Pipeline:
- `_build_payloads_for_kinds` now branches: refresh_first=True OR
`existing.is_stale` -> refresh_playlist, else read existing
snapshot. So the auto-refresh kicks in without needing the user to
toggle the refresh-each-run option.
Watchlist scanner emits stale flags at three sites:
- after `update_discovery_pool_timestamp` -> marks pool-fed kinds
stale: hidden_gems, discovery_shuffle, popular_picks, time_machine,
genre_playlist, daily_mix.
- after release_radar `save_curated_playlist` -> marks `fresh_tape`.
- after discovery_weekly `save_curated_playlist` -> marks `archives`.
All three calls go through a module-level `_mark_personalized_kinds_stale`
helper that builds a PersonalizedPlaylistManager with `deps=None` (only
DB access is needed for the flag update — no generator dispatch). Each
call is wrapped in try/except so a flag failure can never abort the
scan itself.
Tests:
- new `TestStaleFlag` class in `test_personalized_manager.py` (6
tests): default-false, single-kind flip, multi-kind, profile
scoping, refresh-clears, empty-list noop.
- two new pipeline tests pin the auto-refresh dispatch:
`test_stale_snapshot_auto_refreshes_even_without_refresh_first`
and `test_non_stale_snapshot_skips_refresh`.
- existing stub-manager `SimpleNamespace` returns gained
`is_stale=False` so the new attribute read doesn't AttributeError.
Full suite: 3391 pass.
User-facing WHATS_NEW entry added under 2.5.2 (above the prior
pipeline auto-sync entry) describing the auto-refresh behavior.
Catches the silent excepts the awk-based earlier sweeps missed:
- Bare `except:` followed by `pass` (also swallows KeyboardInterrupt
and SystemExit — actively wrong). Upgraded to `except Exception as
e: logger.debug("...: %s", e)`. ~14 sites across connection_detect,
soulseek_client, listenbrainz_manager, watchlist_scanner,
youtube_client, navidrome_client, jellyfin_client, web_server.
- `except Exception:` + pass that the awk pattern missed (e.g.
multi-line or unusual whitespace). ~31 sites across automation_engine,
database_update_worker, music_database, spotify_client, web_server,
others.
- 14 legitimate cleanup sites left silent with explicit `# noqa: S110`
+ comment explaining why (atexit handlers, finally-block conn.close
calls). Logging during shutdown can itself crash because file handles
get torn down before the handler fires.
Also enables `S110` rule in pyproject.toml so this pattern fails CI
going forward — drift fails at PR review instead of at runtime against
a wedged worker thread. Tests path keeps S110 ignored (test fixtures
legitimately use try-except-pass for cleanup).
Adds a WHATS_NEW entry to helper.js summarizing the full #369 sweep.
Verified: `python -m ruff check .` → All checks passed.
Verified: `python -m pytest tests/` → 2188 passed.
Closes#369
Smoke-testing the just-merged provenance PR against live logs revealed
the new ID-match block was silently no-opping: no [ExtID Match] /
[Provenance Match] log lines despite the code path being live. Tracing
revealed two related gaps in extract_external_ids' source detection:
1. **Underscore-prefixed key.** Deezer / Discogs / Hydrabase clients
tag normalized track dicts with ``_source`` (underscore prefix —
convention used in 8+ places across core/). The extractor only
looked for ``provider`` and ``source``, so Deezer-sourced tracks
silently returned no IDs.
2. **No provider field at all.** Spotify and iTunes raw API responses
carry ``id`` but no provider/source key of any kind. The extractor
couldn't disambiguate the native ``id``, so Spotify-primary scans
would have hit the same silent miss once the user switched primary
sources.
Two-part fix:
- ``extract_external_ids`` now recognizes ``_source`` as another
candidate provider field.
- New optional ``source_hint`` parameter lets the caller supply the
configured primary source as a fallback when the track dict has no
provider field of its own. Track-side provider field still wins
when present (defensive against a wrong hint).
Watchlist scanner now passes ``get_primary_source()`` as the hint so
both naming conventions (Deezer-style _source, Spotify-style no-tag)
get handled uniformly.
6 new regression tests cover:
- _source recognized for Deezer
- _source recognized for Hydrabase (cross-provider mapping)
- _source recognized for Discogs (no library column — verifies
graceful no-crash)
- source_hint disambiguates raw tracks for spotify/itunes/deezer
- track-side provider takes precedence over hint
- None hint defaults safely
Full pytest 1630 passed; ruff clean. After this lands and the server
restarts, watchlist scans should produce [ExtID Match] /
[Provenance Match] log lines for tracks already on disk regardless of
which metadata source the user has configured as primary.
Followup to fix/watchlist-external-id-match. The companion PR closed
the demand side — the watchlist scanner asks for tracks by external IDs
before falling back to fuzzy. But for users on Plex / Jellyfin /
Navidrome the supply side was still broken: tracks.spotify_track_id
(and the other ID columns) only got populated by the asynchronous
enrichment workers, sometimes hours after the file was actually
written. During that window the ID match fell through to fuzzy and
the bug returned.
We were already collecting every ID during post-processing — they
live in the `pp` dict in core/metadata/source.py:embed_source_ids and
get embedded into file tags. We just dropped the in-memory copy
afterwards.
This PR persists them and uses them:
- Schema migration adds spotify_track_id / itunes_track_id /
deezer_track_id / tidal_track_id / qobuz_track_id /
musicbrainz_recording_id / audiodb_id / soul_id / isrc columns +
indexes to the existing track_downloads table (already keyed by
file_path).
- core/metadata/source.py:embed_source_ids exposes pp["id_tags"] and
the resolved ISRC back to the import context as _embedded_id_tags
/ _isrc.
- core/imports/side_effects.py:record_download_provenance reads those
context fields and passes them to db.record_track_download, which
now accepts the new ID kwargs and persists them.
- New db.get_provenance_by_file_path with exact + basename-suffix
fallback (handles container mount-root differences between
download-time path and media-server-reported path).
- New db.backfill_track_external_ids_from_provenance copies IDs
from track_downloads onto a tracks row idempotently — COALESCE on
every column preserves any value the enrichment worker already
wrote (enrichment is more authoritative for late binding).
- database/music_database.py:insert_or_update_media_track (the
single insertion point used by every Plex / Jellyfin / Navidrome
sync) calls the backfill immediately after each INSERT/UPDATE.
- New core/library/track_identity.py:find_provenance_by_external_id
used as a second-tier fallback in watchlist_scanner.is_track_missing
_from_library — catches the window between download and media-server
sync. Caller checks os.path.exists on the provenance file_path
before treating it as "already in library" so a deleted file
doesn't prevent re-download.
Effect: freshly downloaded files become ID-recognizable to the
watchlist on the very next scan, no enrichment-wait window.
19 regression tests in tests/test_provenance_id_persistence.py:
- Schema migration adds expected columns + indexes
- record_track_download persists every ID kwarg
- record_track_download backward-compat (old kwargs still work)
- get_provenance_by_file_path: exact match, basename fallback for
mount-root differences, multi-record latest-wins, defensive None
- backfill: copies all IDs, preserves existing via COALESCE,
no-op when no provenance exists
- find_provenance_by_external_id: per-ID lookup, ISRC cross-bridge,
OR semantics, latest-wins on multiple matches
Out of scope: backfilling provenance for files downloaded BEFORE
this PR (their track_downloads rows don't carry the new IDs). Those
continue to wait for enrichment. Acceptable — only affects historical
files; new downloads benefit immediately.
Full pytest 1625 passed; ruff clean.
Reported case (CAL): a track already on disk got re-downloaded by the
watchlist scanner on every scan. Library DB had stale album metadata
for the file (track tagged on album "Left Alone") while the metadata
source reported it on a different album ("NPC" single). The
title+artist+album fuzzy block correctly said the album names didn't
match and declared the track missing — but the file's stable external
IDs (Spotify ID, ISRC, etc.) unambiguously identified it as the same
recording.
The earlier compilation-album fix (PR #461) handled qualifier drift
("OST" vs "Music From The Motion Picture"). This case is two
genuinely different album names referring to the same song.
Fix: provider-neutral external-ID short-circuit before the fuzzy
block in `is_track_missing_from_library`. Pulls every recognized ID
off the source track (Spotify / iTunes / Deezer / Tidal / Qobuz /
MusicBrainz / AudioDB / Hydrabase / ISRC), runs a single SELECT
against the indexed external-ID columns on the `tracks` table, and
treats any hit as "track exists in library — don't re-download".
If no IDs are available (older imports without enrichment, library
scans that didn't populate external IDs), falls through to the
existing fuzzy logic so the safety net stays intact.
New `core/library/track_identity.py` module with two helpers:
- `extract_external_ids(track)`: handles dict and object-style track
shapes, direct-field aliases (spotify_id / spotify_track_id /
SPOTIFY_TRACK_ID), and provider-disambiguated native `id` fields
(when track has `provider='deezer'` and `id='X'`, treats X as a
Deezer ID).
- `find_library_track_by_external_id(db, external_ids,
server_source)`: builds an OR of indexed column matches with
IS NOT NULL guards, optional server_source filter that also
passes legacy NULL rows, single-row LIMIT.
ISRC bridges across providers — a library track imported via Deezer
can be matched against a Spotify scan when both sides carry the
same ISRC.
43 regression tests in `tests/test_library_track_identity.py`:
- 9 ID-extraction tests for direct fields (Spotify / iTunes / Deezer /
ISRC / MBID / AudioDB / Hydrabase)
- 8 ID-extraction tests via the provider field (8 providers + source
alias + missing-provider-ignored)
- 7 mixed/defensive tests (multiple IDs, object-style, empty strings,
None track, numeric coercion)
- 8 lookup tests (per-provider + ISRC cross-bridge)
- 3 OR-semantics tests
- 4 server_source filter tests
- 2 ID-column-map sanity tests
Full pytest 1606 passed; ruff clean.
Two related bugs reported on Discord by Mushy.
1. The watchlist re-downloaded the same OST track up to 7 times.
``is_track_missing_from_library`` compared Spotify's album name and
the media-server scan's album name with a raw SequenceMatcher at a
strict 0.85 threshold. Compilations and soundtracks routinely fail
this — Spotify reports
``"Napoleon Dynamite (Music From The Motion Picture)"`` while the
Plex / Navidrome / Jellyfin tag scan saves it as
``"Napoleon Dynamite OST"``. Raw similarity ≈ 0.49, so the scanner
declared the track missing on every 30-minute scan and added it back
to the wishlist. The wishlist then issued a fresh download. slskd
appended ``_<19-digit-ns-timestamp>`` to each new copy because the
target file already existed, and the user ended up with seven copies
of one song in one folder.
Fix: extract two pure helpers — ``_normalize_album_for_match``
strips qualifier parentheticals (Music From X, OST, Deluxe Edition,
Remastered, Anniversary, etc.) and trailing dash-clauses;
``_albums_likely_match`` checks equality after normalization,
substring containment, and a relaxed 0.6 fuzzy ratio. A volume /
part / disc / standalone-trailing-number guard rejects pairs like
``"Greatest Hits Vol. 1"`` vs ``"Greatest Hits Vol. 2"`` so the
relaxed threshold doesn't introduce false positives on serialized
releases. After this change the Napoleon Dynamite case collapses
to ``"napoleon dynamite" == "napoleon dynamite"`` via the equality
short-circuit and the redownload loop dies.
2. The duplicate detector found only one of the seven dupe files.
The detector buckets tracks by the first 4 chars of their normalized
tag title. Files written by slskd directly into a library folder
often get inconsistent (or blank) tags from the media-server rescan,
so the seven copies were bucketed apart by parsed title and never
compared.
Fix: refactor the per-bucket comparison into ``_scan_bucket``, then
add a second pass — ``_build_filename_buckets`` re-buckets leftover
tracks by canonical filename stem (slskd dedup tail stripped via
``_strip_slskd_dedup_suffix``, same regex the import-cleanup PR uses)
plus extension. Filename agreement is itself strong evidence the
files came from the same source download, so the second pass calls
``_scan_bucket`` with ``require_metadata_match=False`` to skip the
title / artist / cross-album gates. The same-physical-file guard
still runs so bind-mount duplicates aren't flagged.
72 new regression tests across two files cover the album-match
helpers (28 tests including the Napoleon Dynamite scenario, 7 volume
disagreements, 8 positive/negative pairs, 5 defensive cases) and the
new filename-bucket pass (16 tests across bucket construction, scan
integration, and existing title-pass behavior). Full pytest 1509
passed; ruff clean.
Reported by Mushy in Discord.
- Extract the import pipeline, album import, staging, path, file ops, guards, runtime state, side effects, and metadata enrichment out of .
- Canonicalize the refactored import path around and remove legacy , , , and request shapes from the import endpoints.
- Make album and track metadata lookups follow the configured provider priority instead of hard-coding Spotify, while still falling back when needed.
- Update the import routes and frontend payloads to use the new core helpers.
- Add coverage for the extracted helpers and the refactored import flows.
PS. apologies to anyone who might check this commit out - the intention was to start small, but things kinda snowballed out of control at some point since the logic just kept going on and on, and everything kinda had to be changed all at once for it all to make any sense
Two bugs reported in issue #320:
1. Auto-watchlist scan bypassed Global Override settings.
scan_watchlist_profile applied _apply_global_watchlist_overrides, but
the scheduled auto-scan called scan_watchlist_artists directly —
bypassing the override. Users who unchecked "Albums" or "Live" under
Watchlist → Global Override still saw full albums and live tracks
added during nightly scans (per-artist defaults, which include
everything, won).
Moved override application into scan_watchlist_artists itself so
every entry point respects it. scan_watchlist_profile now forwards
the apply_global_overrides flag through to avoid double-application.
2. is_live_version (watchlist + discography backfill) and
live_commentary_cleaner's content patterns used bare \blive\b, which
matched verb uses like "What We Live For" by American Authors,
"Live Forever" by Oasis, "Live and Let Die" by Wings.
Tightened the live patterns to require clear recording context:
(Live) / [Live Version] / - Live / Live at|from|in|on|version|
session|recording|performance|album|show|tour|concert|edit|cut|take
/ In Concert / On Stage / Unplugged / Concert.
Locked in 11 regression tests covering the reported false positives
(What We Live For, Live Forever, Living on a Prayer, Live and Let Die)
and the reported true positives (Dimension - Live at Big Day Out,
MTV Unplugged, etc.).
Version bumped to 2.37 with changelog entries.
PR #340 added ruff to the build-and-test.yml CI gate, which surfaced
286 pre-existing lint errors. Left unfixed, every feature branch push
fails CI. This commit resolves all of them so CI goes green and
contributors can actually land work.
Auto-fixes (248 of 286): removed unused f-string prefixes (F541),
renamed unused loop control variables with underscore prefix (B007),
removed duplicate imports (F811).
Manually fixed 10 latent bugs ruff caught (all wrapped in try/except
today, silently failing):
- music_database.py: _add_discovery_tables() called undefined
conn.commit() — would have crashed the iTunes-support migration
for existing databases. Now uses cursor.connection.commit().
- web_server.py settings GET: referenced undefined download_orchestrator
when it should be soulseek_client. Feature (_source_status on the
settings payload) was silently missing for UI auto-disable logic.
- web_server.py _process_wishlist_automatically: active_server
undefined in track-ownership check. Auto-wishlist was falling
through to the error handler and re-downloading owned tracks.
- web_server.py start_wishlist_missing_downloads: same active_server
bug in the manual wishlist path.
- web_server.py _process_failed_tracks_to_wishlist_exact: emitted
wishlist_item_added automation event with undefined artist_name
and track. Automation event silently never fired correctly.
- web_server.py discovery metadata enrichment: referenced cache
without calling get_metadata_cache() first. Track enrichment from
cached API responses was silently skipped.
- web_server.py Beatport discovery worker: wing-it fallback branch
used undefined successful_discoveries variable. Wing-it counter
never incremented correctly. Now uses state['spotify_matches']
consistently with the rest of the function.
- web_server.py _run_full_missing_tracks_process: stale import json
mid-function shadowed the module-level import, making an earlier
json.dumps() call reference an unbound local (F823).
- web_server.py discovery loop: platform loop variable shadowed
the module-level platform import (F402).
- core/watchlist_scanner.py: 7 lambda captures of loop variables
(B023 classic Python closure-in-loop bug) now bind at creation.
No existing tests had to change. Full suite stays at 263 passed.
Users can now override which metadata provider (Spotify, Deezer, Apple Music,
Discogs) is used when scanning a specific watchlist artist for new releases.
The selector appears in the artist config modal and only shows sources the
artist has enrichment IDs for. Default behavior is unchanged — all artists
use the global metadata source unless explicitly overridden.
Switch similar-artist backfill to the shared provider-priority flow instead of assuming iTunes as the fallback.
Reuse the generic metadata search helpers, keep a compatibility alias for the old helper name, and update the scanner tests to cover the new path.
Add a regression test that verifies backfill walks each available fallback provider and persists the resolved IDs per source.
Shift similar-artist lookup to the shared metadata provider priority flow.
Use generic provider clients for search and metadata extraction instead of
branching on Spotify/iTunes-specific paths.
Add a regression test that verifies MusicMap matching queries the provider
priority list and preserves canonical metadata from the best match.
Make discovery pool population and curated playlists follow the configured metadata source order. Keep Spotify strict where fallback would corrupt source-specific IDs, and trim fan-out with smaller similar-artist samples and page caps. Leave the remaining incremental path for follow-up.
Reduce request volume in the discovery helpers while keeping the source-priority model intact.
- make cache_discovery_recent_albums source-priority aware
- cap Spotify artist-album pagination in the discovery and incremental paths
- reduce the similar-artist sample size for the cache-refresh helper
- keep Spotify strict where fallback would contaminate source-specific IDs
- add regression coverage for source order, strict Spotify lookups, and pagination caps
Watchlist scanner: empty discography (no new releases in lookback) was
treated as API failure, causing "Failed to get artist discography" for
artists like Kendrick Lamar who simply had no recent releases. Now
distinguishes None (API failure → try next source) from [] (success,
no new tracks). Spotify backfill now uses the authenticated client
instance instead of creating a fresh unauthenticated one.
Wishlist nebula: album remove now sends album_name (API updated to
accept album_name as fallback alongside album_id). Track remove
re-renders the nebula after deletion. Toned down processing pulse
animation.
Updated test to verify fallback triggers on API failure (None), not
on empty results.
Make discovery pool population respect provider priority while keeping Spotify strict, and reduce unnecessary request volume in the hot discovery paths.
- keep discovery fan-out source-priority aware
- preserve cache use where freshness is not required
- cap Spotify artist-album pagination in discovery and cache refresh paths
- keep incremental release checks to a single page, since they only need the newest releases
- add regression coverage for provider order, strict Spotify handling, and pagination caps
Resolve Spotify artist matching through the exact Spotify client only, so watchlist ID backfill cannot drift to fallback-provider results. Remove the remaining preemptive provider availability check from the backfill loop.
Drop the old active-provider artist lookup helpers from watchlist_scanner now that the web scan flow resolves sources through the shared metadata priority.
Keep the Spotify-specific feature toggles in place for discovery and sync paths that still use them.
Move the web watchlist scan core onto the shared metadata source priority so primary provider settings are respected during artist, album, and image resolution.
Add coverage for primary-source-first discography lookup and fallback to later providers when the primary source has no albums.
Bring placeholder tracklist skipping back into the shared watchlist scan path, and centralize the DB-only artist image backfill helper so both web scan entrypoints reuse the same logic.
Drop the legacy watchlist scan entrypoints that are no longer used by the web scan flow, and keep the live refresh path pointed at the shared scanner helper.
Move the shared watchlist scan loop into core/watchlist_scanner.py so web_server.py only handles triggers, locks, progress, and post-scan orchestration.
Manual and scheduled watchlist scans now share the same scanner-side core, while the web entrypoints keep profile selection and automation progress updates.
Spotify was being called for album/artist data fetching across multiple
background workers and the Artists page search even when the user had
Deezer or iTunes set as their primary metadata source. Being authenticated
for playlist sync was treated as permission to use Spotify for everything.
- watchlist_scanner: add _spotify_is_primary_source() that checks both
auth and primary source config; use it for all album/artist data fetching
(discovery pool, recent album caching, playlist curation, similar artist
ID matching, proactive ID backfill). _spotify_available_for_run() is kept
for sync_spotify_library_cache which must run regardless of primary source
- repair_jobs/metadata_gap_filler: gate Spotify ISRC lookup on primary
source being 'spotify'; MusicBrainz lookup unaffected
- repair_jobs/unknown_artist_fixer: replace hardcoded spotify_client with
source-aware client selection — primary source ID tried first, each ID
matched to its correct client (fixes latent bug passing Deezer IDs to
Spotify)
- web_server.py /api/match/search: Artists page search was hardcoded to
spotify_client.search_artists(); now uses _get_metadata_fallback_client()
so results come from the configured primary source
Adds a new Last.fm Radio section to the Discover page that lets users
search a track on Last.fm, generate a similar-tracks playlist, and run
it through the existing discovery/download/sync pipeline. Also generates
playlists automatically from top listening history during watchlist scans
(max once per week).
- core/lastfm_client.py: Add get_similar_tracks() using track.getsimilar
- core/listenbrainz_manager.py: Add save_lastfm_radio_playlist() with
deterministic MBID (MD5 seed), cleanup limit of 5 for lastfm_radio type
- web_server.py: Add /api/lastfm/configured, /api/lastfm/search/tracks,
/api/lastfm/radio/generate, /api/discover/listenbrainz/lastfm-radio;
fix playlist['name'] KeyError in discovery worker that was resetting
phase back to 'fresh' after completion
- core/watchlist_scanner.py: Add _generate_lastfm_radio_playlists() with
weekly throttle, called at end of scan_all_watchlist_artists()
- webui/index.html: Add #lastfm-radio-section above ListenBrainz section,
hidden unless Last.fm API key is configured
- webui/static/script.js: Search/generation/card-load functions; fix
discovery modal labels (Last.fm Radio vs ListenBrainz), description
update on completion, belt-and-suspenders completion handling inside
updateYouTubeDiscoveryModal; fix album/duration display for tracks
without metadata; music note SVG placeholder for missing art
- webui/static/style.css: Styles for search bar, dropdown, result rows
Deezer and iTunes defaulted to 50 albums max, silently truncating large
discographies. Deezer now paginates (100 per page) up to 200. iTunes
raised to 200 (single call). All callers in web_server.py updated to
use the new defaults instead of hardcoding limit=50.
Also adds diagnostic logging for allow_duplicates album comparison
to help debug inconsistent singles behavior.
Singles like "idol" weren't added when the same song existed on a
different album because check_track_exists used album-aware matching
that found the track via album name. Now skips the album hint when
allow_duplicates is on so matching is title+artist only, then compares
album names ourselves with a strict 0.85 threshold. Only affects users
with allow_duplicates enabled.
The setting only affected wishlist dedup but the watchlist scanner's
library check still skipped tracks by title+artist regardless. Now
when allow_duplicates is enabled, the scanner compares album names
and only skips if the same album matches. Same song on a different
album is allowed through to the wishlist.
Spotify lists unreleased albums with placeholder names like "Track 1",
"Track 2" before the real tracklist is revealed. The scanner was trying
to download these, searching Soulseek for "Track 1" by artist which
matches random files. Now skips any album where more than half the
tracks match the placeholder pattern. Covers both the watchlist scan
and discovery pool paths.
Stripped 4,200+ emoji characters from print(), logger calls across
39 Python files. Logs are now clean text — easier to grep, more
professional, no encoding issues on terminals without Unicode support.
Seasonal config icons preserved for UI display.
Clients are for the most part being initialized per-request, which leads to a lot of redundant client initialization, as well as noise on the logs, since each client initialization emits a row on the logs, eg. 'Deezer client initialized'
All metadata source decisions now flow through get_primary_source() and
get_primary_client() in core/metadata_service.py. Previously 6 different
files reimplemented this logic with inconsistent defaults ('itunes' vs
'deezer') and auth checks, causing bugs when any one was missed.
Changes:
- metadata_service.py: Added canonical get_primary_source/get_primary_client
- web_server.py: _get_metadata_fallback_source() and _get_active_discovery_source()
are now thin wrappers delegating to metadata_service
- seasonal_discovery.py: _get_source() delegates to metadata_service
- personalized_playlists.py: _get_active_source() delegates to metadata_service
- spotify_client.py: Fixed _fallback_source default from 'itunes' to 'deezer'
- watchlist_scanner.py: _get_fallback_metadata_client() delegates to metadata_service
Future changes to source selection only need to update one file.
Albums announced but not yet released have no real audio available,
causing Soulseek to match random tracks with similar names. Both
discography methods (Spotify and generic client) now filter out
albums with release dates in the future. Skipped albums are not
marked as processed — they will be picked up on the first scan
after their release date passes.
Addresses all three points from community rate-limiting report:
1. Watchlist scans fetched ALL albums then filtered — 262 albums = 27
API calls per artist. Now determines upfront if full discography is
needed: subsequent scans and time-bounded lookbacks use max_pages=1
(1 API call). Only "full discography" global setting fetches all.
2. MIN_API_INTERVAL (350ms) now configurable via spotify.min_api_interval
setting. Users who get rate-limited frequently can increase the delay.
Floor at 100ms to prevent abuse.
3. Retry-After header extraction improved: added diagnostic logging when
headers exist but lack Retry-After key, plus regex fallback to parse
the value from the error message string.
- Add discogs_artist_id column to watchlist_artists table (migration)
- Add discogs_artist_id to WatchlistArtist dataclass
- Add to get_watchlist_artists optional_columns and constructor
- Add update_watchlist_discogs_id DB method
- Backfill loop includes Discogs when token is configured
- Add _match_to_discogs for cross-provider artist matching
- Backfill maps updated: id_attr, match_fn, update_fn all include discogs
- Was only backfilling the active provider — artists added via Deezer
never got Spotify/iTunes IDs, and vice versa
- Now backfills iTunes (always), Deezer (always), and Spotify (if
authenticated) at the start of every scan
- Added _match_to_deezer() and update_watchlist_deezer_id() for
Deezer cross-provider matching
- Generalized backfill with provider→attribute/function maps
- Add rate limiting to all 4 Spotify pagination loops (get_artist_albums,
get_user_playlists, get_playlist_tracks, get_album_tracks) — these
called sp.next() bypassing the rate_limited decorator entirely, causing
unthrottled API calls that triggered 429 bans
- Track pagination calls in API rate monitor (separate endpoint names)
- Increase DELAY_BETWEEN_ARTISTS from 2s to 4s in watchlist scanner
- Abort watchlist scan immediately if Spotify rate limit detected mid-scan
instead of continuing to hammer the API