Both the auto and manual wishlist download paths called
`remove_tracks_already_in_library` before submitting the batch — a
serial DB lookup per track per artist (~1s/track on a 24-track
wishlist). The batches set `force_download_all=True` which is
explicitly documented as "skip the expensive library check" — the
pre-flight cleanup was contradicting that flag.
Removed the cleanup call from both flows. Kept `remove_wishlist_duplicates`
(fast SQL DELETE) and the standalone `/api/wishlist/cleanup` endpoint
that exposes the library scan as explicit user-triggered maintenance.
Safety check on the trade-off:
- post-processing at `core/imports/pipeline.py:576-624` already handles
re-downloads defensively: existing file with metadata → skip overwrite
+ delete source duplicate, no library corruption.
- Master worker's analysis loop normally removes wishlist entries for
found tracks via `_check_and_remove_track_from_wishlist_by_metadata`,
so stale wishlist entries should be rare in practice.
- Worst case for the rare orphan: one redundant download attempt that
the post-processing layer no-ops on. Bandwidth waste, not data damage.
Tests updated:
- `..._does_not_run_library_cleanup` (renamed from `_skips_enhance_tracks_during_cleanup`)
asserts no DB track-existence checks happen and no wishlist removals
fire — both `enhance` and "owned" tracks reach the master worker.
- `..._marks_batch_complete_when_wishlist_genuinely_empty` (renamed from
`..._after_cleanup`) covers the path where the wishlist starts empty.
Full suite: 1232 passing. Ruff clean.
The manual wishlist download endpoint blocked the request thread on a
slow library-cleanup pass before submitting the batch — for a 24-track
wishlist that's ~50 per-track DB lookups serialised in the request
handler, taking 30+ seconds before the frontend got a response. The
modal sat at "Pending..." with no progress visible the whole time.
Split start_manual_wishlist_download_batch into:
1. SYNC path (request handler):
- Generate batch_id, create download_batches entry with phase=analysis
and analysis_total=0 placeholder.
- Submit a single bg job (`_prepare_and_run_manual_wishlist_batch`) to
the missing-download executor.
- Return 200 with batch_id immediately. Frontend can start polling
/api/active-processes status right away.
2. BG path (executor thread):
- db.remove_wishlist_duplicates (slow-ish, single SQL)
- remove_tracks_already_in_library (the slow one — per-track DB checks)
- wishlist_service.get_wishlist_tracks_for_download
- sanitize + dedupe + filter (track_ids / category)
- Update batch.analysis_total with the real filtered count
- add_activity_item("Wishlist Download Started", ...)
- run_full_missing_tracks_process (master worker)
Edge case: if cleanup empties the wishlist, the bg job marks the batch
phase='complete' with error='No tracks in wishlist' (instead of the old
synchronous 400 response). Frontend status poll picks this up and the
modal can close cleanly.
Tests: existing 2 manual-download tests updated to drive the bg job
explicitly via a new `_run_submitted_bg_job` helper. Added 2 new tests:
- `..._returns_immediately_with_placeholder` — proves the sync path
doesn't trigger any cleanup or master-worker calls; analysis_total=0.
- `..._marks_batch_complete_when_wishlist_empty_after_cleanup` —
cleanup empties the list, master worker never invoked, batch ends
with phase='complete'.
Full suite: 1232 passing (was 1230). Ruff clean.
Final lift in the PR5 discovery-workers series. Pulls the 328-line
library quality scanner out of `web_server.py` into its own focused
module under `core/discovery/`. Pure 1:1 lift — wrapper keeps the
original entry-point name.
What the quality scanner does:
1. Reset scanner state (counters, results), load quality profile +
minimum acceptable tier from QUALITY_TIERS.
2. Load tracks from DB based on scope:
- 'watchlist' → tracks for watchlisted artists only.
- other → all library tracks.
3. For each track:
- Stop-request gate (state['status'] != 'running').
- Quality-tier check via _get_quality_tier_from_extension(file_path).
- Skip tracks meeting standards (tier_num <= min_acceptable_tier).
- For low-quality tracks: matching_engine search query gen, score
candidates against Spotify (artist + title similarity, album-type
bonus), pick best match >= 0.7 confidence.
- On match: add full Spotify track to wishlist via
`wishlist_service.add_spotify_track_to_wishlist` with
source_type='quality_scanner' and a source_context that captures
original file_path, format tier, bitrate, and match confidence.
4. After all tracks: status='finished', progress=100, activity feed
entry, emit `quality_scan_completed` event for automation engine.
5. On critical exception: status='error', error message captured.
Wishlist service interaction is via the public
`add_spotify_track_to_wishlist` API only — no overlap with kettui's
planned `core/wishlist/` package extraction (the import lives inside
the function, exactly as in the original, and will follow whatever
path that package takes).
Dependencies injected via `QualityScannerDeps` (8 fields) —
quality_scanner_state dict, quality_scanner_lock, QUALITY_TIERS
constant, spotify_client, matching_engine, automation_engine, plus 2
callable helpers (get_quality_tier_from_extension, add_activity_item).
Diff vs original after `deps.X` → global X normalization is **zero
differences** — 328 lines orig = 328 lines lifted, byte-identical body
(including all whitespace, comments, log strings, and the inline
`from core.wishlist_service import get_wishlist_service` /
`from database.music_database import MusicDatabase` imports at the
top of the function).
Tests: 11 new under tests/discovery/test_discovery_quality_scanner.py
covering state init/reset, no-watchlist-artists short-circuit,
unauthenticated Spotify error, high-quality skip, low-quality search
trigger, match → wishlist add (with full source_context payload),
no-match no-add, mid-loop stop request, completion phase + progress,
automation engine event emission, all-library scope load.
Full suite: 1152 passing (was 1141). Ruff clean.
End of the PR5 series — `web_server.py` lost ~328 lines on this commit
alone; total trim across PR5a–PR5h is ~2,400 lines of discovery worker
code moved into focused `core/discovery/*.py` modules. The remaining
discovery-adjacent worker `_process_watchlist_scan_automatically` was
deliberately deferred to avoid overlap with kettui's planned wishlist
extraction.
Seventh lift in the PR5 discovery-workers series. Pulls the 286-line
ListenBrainz discovery worker out of `web_server.py` into its own
focused module under `core/discovery/`. Pure 1:1 lift — wrapper keeps
the original entry-point name.
What the ListenBrainz discovery worker does:
1. Pause enrichment workers (release shared resources).
2. For each ListenBrainz track:
- Cancellation gate (state['phase'] != 'discovering').
- Discovery cache lookup; cache hit short-circuits the search.
- Strategy 1: matching_engine search queries with confidence scoring
against Spotify (preferred) or iTunes (fallback).
- Strategy 2: swapped artist/title query.
- Strategy 3: album-based query (uses album_name when available —
unique to LB, since YouTube tracks don't have album metadata).
- Strategy 4: extended search with limit=50.
- On match → save to discovery cache with image extracted from album
images or matched_track.image_url fallback.
- On miss → Wing It stub stored as 'wing-it' status.
3. After all tracks: phase='discovered', status='complete', activity feed
entry mentioning 'ListenBrainz Discovery Complete'.
4. On error: state['status']='error', phase='fresh'.
5. Finally: resume enrichment workers.
Dependencies injected via `ListenbrainzDiscoveryDeps` (16 fields) —
listenbrainz_playlist_states, spotify_client, matching_engine, plus 13
callable helpers (pause/resume enrichment, get_active_discovery_source,
get_metadata_fallback_client, get_discovery_cache_key, get_database,
validate_discovery_cache_artist, extract_artist_name,
spotify_rate_limited, discovery_score_candidates, get_metadata_cache,
build_discovery_wing_it_stub, add_activity_item).
Diff vs original after `deps.X` → global X normalization is **zero
differences** — 286 lines orig = 286 lines lifted, byte-identical body
(including all whitespace, comments, log strings).
Pre-existing bug preserved (not fixed): if `listenbrainz_playlist_states[
state_key]` raises KeyError on entry, the outer except handler tries to
mutate `state` which is unbound → secondary UnboundLocalError. Same bug
in the original (and the YouTube discovery worker). Documented here for
future cleanup but out of scope for the lift.
Tests: 11 new under tests/discovery/test_discovery_listenbrainz.py
covering cache hit short-circuit, Strategy 1 confidence match, Wing It
fallback, iTunes fallback (Spotify unauthenticated and rate-limited),
cancellation (phase change), completion phase update, activity feed
entry, per-track error handling, float duration_ms tolerance (regression
for the :02d format crash fixed earlier), enrichment workers resume on
finally.
Full suite: 1141 passing (was 1130). Ruff clean.
Sixth lift in the PR5 discovery-workers series. Pulls the 323-line
Beatport chart discovery worker out of `web_server.py` into its own
focused module under `core/discovery/`. Pure 1:1 lift — wrapper keeps
the original entry-point name.
What the Beatport discovery worker does:
1. Pause enrichment workers (release shared resources).
2. For each Beatport track:
- Cancellation gate (state['phase'] != 'discovering').
- Clean Beatport text (artist/title) of common annotations via
`clean_beatport_text` helper.
- Single-string artist normalization for "CID,Taylr Renee"-style
entries — split on comma, take the first.
- Discovery cache lookup; cache hit short-circuits the search and
normalizes cached artists from ['str'] → [{'name': 'str'}] to
match the frontend's expected list-of-objects shape.
- matching_engine search-query generation (with high min_confidence
of 0.9 to avoid bad matches).
- Strategy 1: scored candidates from initial Spotify/iTunes searches.
- Strategy 4: extended search with limit=50 if no high-confidence
match found.
- On Spotify match: format artists as [{'name': str}] objects, pull
full album object from raw cache when available, fallback to
reconstructed album dict otherwise.
- On iTunes match: format with image_url-derived album.images entry
(300x300 spec), source set to discovery_source.
- Save matched result to discovery cache when confidence >= 0.75
(note: lower than search threshold; discovery still benefits from
these less-confident matches as user-visible suggestions).
- On miss: Wing It stub stored as 'wing-it' status (success ticked).
3. After all tracks: phase='discovered', activity feed entry, sync
discovery results back to mirrored playlist via
`_sync_discovery_results_to_mirrored` with 'beatport' tag.
4. On error: state['phase']='fresh' + status='error'.
5. Finally: resume enrichment workers.
Dependencies injected via `BeatportDiscoveryDeps` (17 fields) —
beatport_chart_states, spotify_client, matching_engine, plus 14
callable helpers (pause/resume enrichment, get_active_discovery_source,
get_metadata_fallback_client, clean_beatport_text,
get_discovery_cache_key, get_database, validate_discovery_cache_artist,
spotify_rate_limited, discovery_score_candidates, get_metadata_cache,
build_discovery_wing_it_stub, add_activity_item,
sync_discovery_results_to_mirrored).
Diff vs original after `deps.X` → global X normalization is **zero
differences** — 323 lines orig = 323 lines lifted, byte-identical body
(including all whitespace, comments, log strings).
Tests: 12 new under tests/discovery/test_discovery_beatport.py covering
cache hit short-circuit (with cached-artist normalization), Spotify
match formatting (list and string artist inputs), iTunes match
(image_url to album.images), Wing It fallback, cancellation
(phase change), completion phase update, activity feed entry, mirrored
sync invocation, top-level error handler, per-track error handling,
comma-separated artist split.
Full suite: 1130 passing (was 1118). Ruff clean.
Fifth lift in the PR5 discovery-workers series. Pulls the 278-line
public-Spotify-link discovery worker out of `web_server.py` into its
own focused module under `core/discovery/`. Pure 1:1 lift — wrapper
keeps the original entry-point name.
What the Spotify Public discovery worker does:
1. Pause enrichment workers (release shared resources).
2. For each track:
- Cancellation gate (state['cancelled']).
- Normalize artists to plain string list (handles dict + str inputs).
- Discovery cache lookup; cache hit short-circuits the search and
populates display fields from the cached match.
- SimpleNamespace duck-type → `_search_spotify_for_tidal_track`
(shared search helper, returns tuple for Spotify or dict for iTunes).
- On Spotify match: build `match_data` preserving track_number /
disc_number from raw API data; image extracted from album images
or track object fallback; release_date filled from track.release_date
when album dict is missing it.
- On iTunes match: dict result → match_data with source set to
discovery_source; image extracted from album images.
- Save matched result to discovery cache.
- On miss: Wing It stub stored as 'wing-it' status.
3. After all tracks: phase='discovered', activity feed entry.
4. On error: state['phase']='error' + status with error string.
5. Finally: resume enrichment workers.
This worker is structurally close to the Deezer worker (see PR5d) but
intentionally diverges on:
- Track-data field names (`spotify_public_track` vs `deezer_track`).
- Artist normalization (Spotify Public can pass dicts or strings).
- No mirrored-playlist DB writeback (sync is handled separately).
Dependencies injected via `SpotifyPublicDiscoveryDeps` (12 fields) —
spotify_public_discovery_states, spotify_client, plus 10 callable
helpers (pause/resume enrichment, get_active_discovery_source,
get_metadata_fallback_client, get_discovery_cache_key, get_database,
validate_discovery_cache_artist, search_spotify_for_tidal_track,
build_discovery_wing_it_stub, add_activity_item).
Diff vs original after `deps.X` → global X normalization is **zero
differences** — 278 lines orig = 278 lines lifted, byte-identical body
(including all whitespace, comments, log strings).
Tests: 10 new under tests/discovery/test_discovery_spotify_public.py
covering cache hit short-circuit, dict-artist normalization, Spotify
tuple match (track/disc preservation), iTunes dict match path, Wing It
fallback, cancellation, completion phase update, activity feed entry,
top-level error handler, per-track error handling.
Full suite: 1118 passing (was 1108). Ruff clean.
Fourth lift in the PR5 discovery-workers series. Pulls the 270-line
Deezer discovery worker out of `web_server.py` into its own focused
module under `core/discovery/`. Pure 1:1 lift — wrapper keeps the
original entry-point name so the existing call sites continue to work
without changes.
What the Deezer discovery worker does:
1. Pause enrichment workers (release shared resources).
2. For each Deezer track:
- Cancellation gate (state['cancelled']).
- Discovery cache lookup; cache hit short-circuits the search and
populates display fields from the cached match (artist string,
album name).
- SimpleNamespace duck-type → `_search_spotify_for_tidal_track`
(shared search helper, returns tuple for Spotify or dict for iTunes).
- On Spotify match: build `match_data` preserving track_number /
disc_number from raw API data, image extracted from album images
or track object fallback, release_date filled from track.release_date
when album dict is missing it.
- On iTunes match: dict result populated as `match_data`, source set
to discovery_source, image extracted from album images.
- Save matched result to discovery cache.
- On miss: Wing It stub stored as 'wing-it' status.
3. After all tracks: phase='discovered', activity feed entry, sync
discovery results back to mirrored playlist via
`_sync_discovery_results_to_mirrored`.
4. On error: state['phase']='error' + status with error string.
5. Finally: resume enrichment workers.
Dependencies injected via `DeezerDiscoveryDeps` (13 fields) —
deezer_discovery_states dict, spotify_client, plus 11 callable helpers
(pause/resume enrichment, get_active_discovery_source,
get_metadata_fallback_client, get_discovery_cache_key, get_database,
validate_discovery_cache_artist, search_spotify_for_tidal_track,
build_discovery_wing_it_stub, add_activity_item,
sync_discovery_results_to_mirrored).
Diff vs original after `deps.X` → global X normalization is **zero
differences** — 270 lines orig = 270 lines lifted, byte-identical body
(including all whitespace, comments, log strings).
Tests: 10 new under tests/discovery/test_discovery_deezer.py covering
cache hit short-circuit, Spotify tuple match (track/disc number
preservation), iTunes dict match path, Wing It fallback, cancellation,
completion phase update, activity feed entry, mirrored sync invocation,
top-level error handler, per-track error handling.
Full suite: 1108 passing (was 1098). Ruff clean.
Third lift in the PR5 discovery-workers series. Pulls the 323-line
mirrored-playlist discovery worker out of `web_server.py` into its own
focused module under `core/discovery/`. Pure 1:1 lift — wrapper keeps
the original entry-point name so the existing call site
(`_run_playlist_discovery_worker(pls, automation_id=None)` from the
automation engine) continues to work without changes.
What the playlist discovery worker does:
1. Pause enrichment workers (release shared resources).
2. Pre-compute total track count across all playlists for the automation
progress card.
3. For each playlist:
- Fast pre-scan separates already-discovered tracks (skipped, unless
incomplete metadata or a Wing It stub) from undiscovered ones.
- For each undiscovered track:
- Cancellation gate via _playlist_discovery_cancelled set.
- Discovery cache lookup (with artist validation).
- matching_engine search-query generation, then Spotify (preferred)
or iTunes (fallback) search + scoring.
- Extended search fallback (limit=50) if no high-confidence match.
- On match → enrich album from metadata cache (id, images,
total_tracks, album_type, release_date, artists, plus track_number
and disc_number), build matched_data, write to track.extra_data,
save to discovery cache.
- On miss → Wing It stub stored as 'wing_it_fallback' provider.
4. After all playlists: emit `discovery_completed` event when at least
one new track was discovered, mark automation progress 'finished'.
5. On error → automation progress 'error', traceback printed.
6. Finally: resume enrichment workers.
Dependencies injected via `PlaylistDiscoveryDeps` (16 fields) —
spotify_client, matching_engine, automation_engine, the cancellation
set, plus 12 callable helpers (pause/resume enrichment,
get_active_discovery_source, get_metadata_fallback_client/source,
update_automation_progress, get_database, get_discovery_cache_key,
validate_discovery_cache_artist, discovery_score_candidates,
get_metadata_cache, build_discovery_wing_it_stub).
Diff vs original after `deps.X` → global X normalization is **zero
differences** — 323 lines orig = 323 lines lifted, byte-identical body
(including all whitespace, comments, log strings).
Tests: 15 new under tests/discovery/test_discovery_playlist.py covering
empty playlists, no-tracks playlist skip, complete-discovery skip,
incomplete-discovery re-run, Wing It always re-run, unmatched_by_user
respect, cache hit short-circuit, match above threshold (extra_data +
cache save), match below threshold falls to Wing It, iTunes fallback,
neither-provider error path, cancellation, discovery_completed event
emit, no-event on zero-discovered, multi-playlist grand_total
aggregation.
Full suite: 1098 passing (was 1083). Ruff clean.
yt_dlp sometimes returns float `duration_ms` for YouTube tracks. The
discovery workers format the duration with `f"{x // 60000}:{(x % 60000)
// 1000:02d}"` — and `:02d` requires an int. When the duration is a
float, the format string raises:
Unknown format code 'd' for object of type 'float'
Caught when running YouTube discovery on a real playlist (bbno$ tracks)
— every track failed with status='Error'.
Pre-existing bug, surfaced now because of yt_dlp returning float
durations on this playlist. Fixed at all 8 sites by casting through
`int()` before the `// 60000` and `% 60000` operations:
- core/discovery/youtube.py: 2 sites in run_youtube_discovery_worker
(cache hit + main result construction).
- web_server.py L29238/L29372: 2 sites in _run_listenbrainz_discovery_worker.
- web_server.py L40112/L40136/L40161/L40178: 4 sites in the YouTube
retry/pre-discovered results assembly path.
The `if duration_ms` / `if dur` guard already protects against None and 0,
so `int(...)` is only called on truthy numeric values.
Tests: 1 new regression test under tests/discovery/test_discovery_youtube.py
(`test_float_duration_does_not_crash_format`) — passes a float
duration_ms and asserts the worker completes without an error result.
Ruff clean.
Second lift in the PR5 discovery-workers series. Pulls the 332-line
YouTube discovery worker out of `web_server.py` into its own focused
module under `core/discovery/`. Pure 1:1 lift — wrappers keep the
original entry-point name so the two callers
(`youtube_discovery_executor.submit(_run_youtube_discovery_worker, ...)`)
continue to work without changes.
What the YouTube discovery worker does:
1. Pause enrichment workers (release shared resources).
2. For each YouTube playlist track:
- Cancellation check (phase != 'discovering' aborts).
- Discovery cache lookup; cache hit short-circuits the search.
- Strategy 1: matching_engine search queries with confidence scoring
against Spotify (preferred) or iTunes (fallback).
- Strategy 2: swapped artist/title query.
- Strategy 3: raw (untokenized) query.
- Strategy 4: extended search with limit=50.
- On match → save to discovery cache.
- On miss → build Wing It stub from raw source data.
3. After loop: phase='discovered', sort results by index, and for mirrored
playlists write extra_data back to the DB.
4. Activity feed entry with match summary.
5. On error → state['status']='error', phase='fresh'.
6. Finally: resume enrichment workers.
Dependencies injected via `YoutubeDiscoveryDeps` (16 fields) —
youtube_playlist_states, spotify_client, matching_engine, plus 13
callable helpers (pause/resume enrichment, get_active_discovery_source,
get_metadata_fallback_client, discovery cache key/validate, extract
artist name, spotify_rate_limited, discovery_score_candidates,
get_metadata_cache, build_discovery_wing_it_stub, get_database,
add_activity_item).
Diff vs original after `deps.X` → global X normalization is **zero
differences** — 332 lines orig = 332 lines lifted, byte-identical body
(including all whitespace).
Pre-existing bug preserved (not fixed): if `youtube_playlist_states[url_hash]`
raises KeyError on entry, the outer except handler tries to mutate
`state` which is unbound → secondary UnboundLocalError. Same bug in
the original. Documented here for future cleanup but out of scope
for the lift.
Tests: 14 new under tests/discovery/test_discovery_youtube.py covering
cache hit short-circuit, Strategy 1 confidence match, Wing It fallback,
iTunes fallback path (Spotify unauthenticated and rate-limited),
cancellation (phase changed), skip_discovery flag, completion phase
update, activity feed entry, mirrored playlist DB writeback, non-mirrored
no-writeback, enrichment workers pause/resume, error-during-loop resume,
results sorted by index after retry.
Full suite: 1082 passing (was 1068). Ruff clean.
- remove the redundant wishlist-service injection from the runtime wrappers
- keep the package owning its own singleton service access
- simplify the route runtime API and update the wishlist tests to match
- ignore unconfigured backends when clearing completed downloads
- keep the post-download cleanup route best-effort after a successful wishlist run
- add regression coverage for the orchestrator clear step
First lift in the new PR5 discovery-workers series. Pulls the 448-line
playlist sync background worker out of `web_server.py` into its own
focused module under `core/discovery/`. Pure 1:1 lift — wrappers keep
the original entry-point name so the four callers
(`sync_executor.submit(_run_sync_task, ...)`) continue to work without
changes.
What the sync worker does:
1. Convert frontend JSON tracks → SpotifyTrack/SpotifyPlaylist objects.
2. Normalize artist/album shapes for downstream wishlist parity.
3. Wire a progress_callback that updates `sync_states` + automation card.
4. Patch sync_service for database-only fallback when no media server is
connected.
5. `run_async(sync_service.sync_playlist(...))` and capture the result.
6. Update sync_states to 'finished', push playlist poster image to
Plex / Jellyfin / Emby, record sync history (with re-sync vs new-sync
branching), emit `playlist_synced` event for automation engine, and
persist sync status with a tracks_hash for smart-skip on the next
scheduled sync.
7. On exception → mark error in sync_states + automation; finally clear
progress callback + drop `_original_tracks_map` from sync_service.
Dependencies injected via `SyncDeps` (11 fields) — config_manager,
sync_service, plex_client, jellyfin_client, automation_engine, run_async,
record_sync_history_start, update_automation_progress,
update_and_save_sync_status, sync_states dict, sync_lock. The only
structural drift from a pure paste is the top-of-function variable
binding: original used `global sync_states, sync_service`, lifted version
rebinds them as locals from deps (`sync_states = deps.sync_states` etc.)
since the names aren't module-level in the new file. Same behaviour
otherwise — diff against the original after `deps.X` → global X
normalization is **zero differences**.
Tests: 18 new under tests/discovery/test_discovery_sync.py covering
sync history recording (new + resync), setup error path (with and
without automation_id), missing sync_service handling, sync_playlist
exception handling, successful sync state transition, unmatched-tracks
summary, playlist image upload (plex + jellyfin + zero-synced gate),
automation engine emit, automation progress finished call, sync history
DB persistence (completion + match_details), tracks_hash persistence,
and finally-block cleanup (callback clear + map drop).
Full suite: 1068 passing (was 1050). Ruff clean.
Kicks off the PR5 series — 9 discovery workers totaling ~2,400 lines
across `_run_sync_task`, `_run_*_discovery_worker` family,
`_run_quality_scanner`, and `_process_watchlist_scan_automatically`.
Wishlist-related extractions deliberately skipped to avoid overlap with
kettui's planned `core/wishlist/` package.
- add module-level loggers for the wishlist package instead of threading the web server logger through runtime objects
- default wishlist helper runtimes and cleanup helpers to their package logger while still allowing test overrides
- keep web_server.py as a thin caller that no longer injects its logger into wishlist flows
- extract the remaining wishlist endpoint behavior from web_server.py into core/wishlist/routes.py
- keep web_server.py as a thin Flask adapter around the new route helpers
- add tests that cover wishlist counts, stats, track listing, clear/remove flows, cycle updates, and album-track adds
- add core/wishlist as the home for wishlist payload, resolution, state, processing, reporting, and selection helpers
- move wishlist-specific tests into tests/wishlist alongside the new package layout
- keep web_server.py and the import/search callers as thin adapters for now
Three drifts caught in line-by-line review against original — all reverted
so the lifted body is now byte-identical to the web_server.py original
(after deps.X → global X normalization):
1. `import traceback` was hoisted to module-level imports. Original imports
it inline inside the except block. Moved back to inline (same behavior,
but strict location parity).
2. Trailing whitespace on the blank line after analysis_results.append.
3. Trailing whitespace on the blank line inside the wishlist except handler.
Diff against original now produces ZERO differences. Tests still 21/21
passing, ruff clean.
Final extraction in the download orchestrator series. Lifts the 586-line
master worker that drives the entire missing-tracks pipeline from
`web_server.py` into `core/downloads/master.py`. Pure 1:1 lift — wrappers
keep the original entry-point name so the three callers
(`missing_download_executor.submit(_run_full_missing_tracks_process, ...)`)
continue to work without changes.
What the master worker does:
1. PHASE 1 ANALYSIS — per-track DB ownership check with album fast path
(lookup album by name+artist, match tracks within it) plus a
MusicBrainz release-cache preflight so per-track post-processing all
uses the same release MBID (prevents Navidrome album splits).
2. Wishlist removal for tracks already in the library.
3. Explicit-content filter.
4. PHASE 2 transition — if nothing missing, mark batch complete, update
per-source playlist phases, kick auto-wishlist completion handler.
5. Soulseek album pre-flight — search for a complete album folder before
falling back to track-by-track search, cache the source for reuse.
6. Wishlist album grouping — derive per-album disc counts and resolve ONE
artist context per album so collab albums don't fold-split.
7. Task creation with explicit album/artist context injection +
playlist-folder-mode flag propagation.
8. Hand off to download_monitor + start_next_batch_of_downloads.
9. Error handler — phase=error, reset YouTube playlist phase to
'discovered', reset auto-wishlist globals on auto-initiated batches.
Dependencies injected via `MasterDeps` (21 fields) — wide surface
covering config, MB caches/locks, soulseek client, source-page state
dicts, multiple callbacks (wishlist removal, explicit filter, executor
+ auto-completion fn, monitor, start_next_batch). The only behaviour
difference from a pure paste is `import traceback` hoisted to module
scope (was inline in the except block) — same behaviour. Trailing
whitespace on two blank lines also got normalized away by the editor;
neither has any runtime effect.
`reset_wishlist_auto_processing` callback wraps the
`global wishlist_auto_processing, wishlist_auto_processing_timestamp`
write + `wishlist_timer_lock` since `global` can't reach back into
web_server.py from a separate module.
Tests: 21 new under tests/downloads/test_downloads_master.py covering
analysis-phase state, force_download_all, found-track wishlist removal,
explicit filter, no-missing complete + per-source state updates, auto
wishlist completion submit, album fast path (direct + fallthrough),
MB preflight (caches both keys, no-mb-worker no-op), task creation
(queue + tasks dict, explicit context for albums, wishlist album
grouping consistency, playlist folder mode), monitor + next-batch
handoff, multi-disc total_discs computation, error handler (phase set,
youtube reset, auto wishlist reset), and batch-removed-mid-flight
defensive path.
Full suite: 1050 passing (was 1029). Ruff clean.
End of the PR4 series — `web_server.py` lost ~590 lines on this commit
alone; total trim across PR4a–PR4h is ~2900 lines of orchestrator code
moved into focused `core/downloads/*.py` modules.
Seventh sub-PR in the download orchestrator series. Strict 1:1 lift —
zero behavior change. ~570 lines moved out of web_server.py.
What moved (lifted as 3 tightly-coupled functions in one module):
- _start_next_batch_of_downloads → start_next_batch_of_downloads
- _on_download_completed → on_download_completed
- _check_batch_completion_v2 → check_batch_completion_v2
Dependencies bundled in `LifecycleDeps` (15+ refs):
- config_manager, automation_engine, download_monitor, repair_worker,
mb_worker (live globals)
- is_shutting_down (lambda over IS_SHUTTING_DOWN flag)
- get_batch_lock (web_server helper for batch_locks dict)
- submit_download_track_worker (lambda wrapping
missing_download_executor.submit + _download_track_worker)
- submit_failed_to_wishlist + submit_failed_to_wishlist_with_auto_completion
(async, used by on_download_completed) AND process_failed_to_wishlist
+ process_failed_to_wishlist_with_auto_completion (sync, used by
check_batch_completion_v2 — direct call matches original v2 behavior;
the non-v2 path always submitted to executor)
- ensure_spotify_track_format, get_track_artist_name,
check_and_remove_from_wishlist, regenerate_batch_m3u (web_server
helpers — large, will lift in follow-up PRs)
- youtube_playlist_states, tidal_discovery_states,
deezer_discovery_states, spotify_public_discovery_states
(per-source playlist state dicts — phase transitions on batch
completion)
Direct imports for already-lifted helpers:
- core.runtime_state.{download_tasks, download_batches, tasks_lock,
add_activity_item}
- core.downloads.history.record_sync_history_completion (PR4a)
- core.album_consistency.run_album_consistency
- core.metadata.common.get_file_lock
Behavior parity verified line-by-line:
- start_next_batch: same batch lock acquisition, same shutdown gate,
same V2-cancelled-task skip, same searching-status-set-before-submit,
same submit-fails-no-ghost-worker semantics
- on_download_completed: same duplicate-call detection (skip decrement
but still check completion), same failed-track tracking with
spotify_track formatting + activity items + automation event emission,
same wishlist removal on success, same active_count decrement, same
stuck-detection (searching > 10min → not_found, post_processing >
5min → completed), same M3U regeneration + repair worker hand-off
+ album consistency pass + wishlist failed-tracks submission
- check_batch_completion_v2: same finished-count tally, same stuck
detection, same already-complete short-circuit returning True, same
per-source playlist phase updates, same album consistency pass,
same DIRECT (sync) wishlist processing call (NOT submit-to-executor
— matches original v2 which called process_* functions directly)
CRITICAL drift caught + fixed during review:
- Initial lift had v2 routing wishlist calls through submit_* deps
(async). Original v2 called process_* directly (sync). Added separate
process_* deps to LifecycleDeps and routed v2 to them. Tests updated.
Two minor defensive additions documented:
- `is_auto_batch = False` initialized before conditional in v2 (Python
scope rules made this unnecessary in original, but explicit is safer)
- Variable rename inside the queue-completion-check loop in
on_download_completed: `task_id` → `queue_task_id` to avoid shadowing
the outer parameter. Log output preserves the same task ID.
Tests: 28 new under tests/downloads/test_downloads_lifecycle.py
covering start-next (early-returns, shutdown gate, max_concurrent,
cancelled-task skip, searching-status-set, submit-failure-no-ghost,
orphan task), on-complete (decrement, duplicate skip, failed/cancelled
tracking, automation emit, wishlist removal, batch completion + emit
+ source phase update, stuck detection, auto vs manual routing),
check-v2 (missing batch, not-complete, complete-marking, already-
complete, auto routing, exception handling).
Full suite: 1029 passing (was 1001). Ruff clean.
Fifth sub-PR in the download orchestrator series. Strict 1:1 lift —
zero behavior change.
What moved:
- _build_batch_status_data → build_batch_status_data
- get_batch_download_status route body → build_single_batch_status
- get_batched_download_statuses route body → build_batched_status
- get_all_downloads_unified route body → build_unified_downloads_response
- Status priority dict → module-level _STATUS_PRIORITY constant
Dependencies bundled in `StatusDeps` dataclass:
- config_manager, docker_resolve_path, find_completed_file,
make_context_key, submit_post_processing (lambda wrapping
missing_download_executor.submit + _run_post_processing_worker),
get_cached_transfer_data
Direct imports from core.runtime_state for download_tasks /
download_batches / tasks_lock (already lifted by kettui).
Behavior parity:
- Same response payload shape across all 3 endpoints
- Same safety-valve mutation: stuck downloading task with file recovered
→ status='post_processing' + submit worker; stuck searching → not_found;
stuck downloading no file → failed
- Same live transfer state mapping (Cancelled/Canceled, Failed/Errored/
Rejected/TimedOut, Completed/Succeeded with byte-mismatch verification,
InProgress, default queued)
- Same intermediate post_processing status promotion + single-shot worker
submission (only when status != 'post_processing')
- Same 'Errored' handling: keeps current status to let monitor retry
- Same 17-key item dict in unified response with same field order
- Same artist/album/artwork normalization (handles string, dict, list,
list-of-dicts, list-of-strings variants)
- Same sort: (priority asc, -timestamp desc)
- Same batch summary aggregation
- Same items[:limit] slicing
- Same logger messages text-for-text
- Same lock scope (single tasks_lock per call) — no new contention
Pre-existing bug preserved (will fix in follow-up PR):
- batched_status `debug_info` block iterates `response["batches"]` and
guards with `if "error" not in batch_status`. Every successful
payload includes `"error": batch.get('error')` (key always present,
value usually None) so the guard is always False and debug_info
never populates in production. Test documents the buggy behavior so
the next PR can flip the check to `batch_status.get('error') is None`.
Tests: 32 new under tests/downloads/test_downloads_status.py covering
phase routing (analysis vs downloading vs unknown), task formatting +
sort + V2 fields, every live transfer state mapping (Cancelled,
Succeeded with full + partial bytes, InProgress, Errored, terminal-
not-overridden), safety valve (stuck searching → not_found, stuck
downloading recovered → post_processing, stuck downloading no file →
failed), all 3 route helpers (single, batched, unified), unified
artist/album/artwork normalization, batch summary aggregation, limit
slicing, plus debug_info bug documentation.
Full suite: 982 passing (was 950). Ruff clean.
Fourth sub-PR in the download orchestrator series. Strict 1:1 lift —
zero behavior change. ~407 lines moved out of web_server.py.
What moved:
- _run_post_processing_worker → run_post_processing_worker
The lifted function is intentionally kept as one ~400-line block to
preserve byte-for-byte parity with the original. Refactoring it into
smaller helpers (context lookup, file search loop, transfer-folder
handler, downloads-folder handler) gets its own follow-up PR.
Dependencies: 9 callbacks bundled in `PostProcessDeps` dataclass.
- config_manager, soulseek_client, run_async (live refs)
- docker_resolve_path, extract_filename, make_context_key
(small utilities still in web_server.py — will lift in a future PR
alongside other shared utilities)
- find_completed_file (file search helper, still in web_server.py)
- enhance_file_metadata, wipe_source_tags (web_server wrappers around
core.metadata.enrichment)
- post_process_with_verification (web_server wrapper around
core.imports.pipeline)
- mark_task_completed (wraps runtime_state.mark_task_completed +
session counter)
- on_download_completed (deferred to PR4g batch lifecycle)
Direct imports for already-lifted helpers (no injection needed):
- core.imports.album_naming.resolve_album_group
- core.imports.context.{get_import_clean_title, get_import_clean_album,
get_import_original_search, get_import_context_artist,
get_import_context_album, normalize_import_context}
- core.imports.filename.extract_track_number_from_filename
- core.metadata.enrichment (re-exported as metadata_enrichment)
- core.runtime_state.{download_tasks, tasks_lock,
matched_downloads_context, matched_context_lock}
Behavior parity:
- Same control flow: missing-task short-circuit → cancelled/completed
short-circuit → missing-filename failure → docker path resolution →
context lookup with fuzzy fallback → expected filename generation →
YouTube special-case path resolution → 5-attempt search loop with
Strategy 1 (original filename in download+transfer) and Strategy 2
(expected final filename in transfer) → file-not-found failure →
transfer-folder handler with metadata enhancement → downloads-folder
handler with full post-process verification
- Same retry count (5), sleep duration (5s), per-attempt logging
- Same album_info dict construction with is_album=True for explicit
album downloads
- Same album grouping skip when context.is_album_download is True
- Same wipe_source_tags fallback when enhancement context missing
- Same matched_downloads_context cleanup on success
- Same exception swallowing at processing-error and critical-error
layers, both setting status='failed' + error_message + calling
on_download_completed(b, t, success=False)
- Every logger message text preserved verbatim (so log filters keep
working)
Tests: 16 new under tests/downloads/test_downloads_post_processing.py
covering missing task, cancelled, already-completed, stream_processed,
missing filename + username, file-not-found-after-retries with sleep
mocked, stream-processor-completes-mid-search, transfer-folder with
metadata enhanced + with no context (wipes tags), downloads-folder
with + without context, processing exception, critical outer
exception, YouTube special path, fuzzy context matching.
Full suite: 950 passing (was 934). Ruff clean.
Third sub-PR in the download orchestrator series. Strict 1:1 lift —
zero behavior change.
What moved:
- _automatic_wishlist_cleanup_after_db_update → cleanup_wishlist_after_db_update
The lifted fn takes config_manager as an arg (so core/downloads/cleanup.py
doesn't need to import web_server). Other deps (wishlist_service,
MusicDatabase, get_database) stay as in-function imports — matches the
original deferred-import pattern.
The single caller in web_server.py (missing_download_executor.submit at
L18028) keeps using the same wrapper name with no signature change.
Behavior parity:
- Same per-profile iteration via get_all_profiles()
- Same essential-field skip (no name / no artists / no spotify_track_id)
- Same artist normalization (string / dict / fallback to str())
- Same 0.7 confidence threshold for db match
- Same break-on-first-artist-match semantics
- Same album extraction (dict.name vs string passthrough)
- Same active_server pulled via config_manager.get_active_media_server()
- Same per-track exception swallowing inside the loops
- Same top-level exception swallow with traceback.print_exc()
- Same logger messages (exact text match for "[Auto Cleanup]" prefix)
Tests: 13 new under tests/downloads/test_downloads_cleanup.py covering
empty wishlist short-circuit, found-in-db removal, missed track stays,
low-confidence skip, missing-fields skip, dict + string artist formats,
break-on-first-match, multi-profile walk, album dict/string handling,
db check failure continuing to next artist, top-level exception swallow,
active server propagation.
Full suite: 934 passing (was 921). Ruff clean.
Second sub-PR in the download orchestrator series. Strict 1:1 lift —
zero behavior change.
What moved:
- cancel_download (single slskd cancel) → cancel_single_download
- cancel_all_downloads (cancel + clear + sweep) → cancel_all_active
- clear_finished_downloads (slskd clear + sweep) → clear_finished_active
- clear_completed_downloads (local task tracker prune) → clear_completed_local
Slskd-touching helpers take (soulseek_client, run_async, sweep_callback)
explicitly so the route layer wires the live client + the existing
_sweep_empty_download_directories helper. The local-state helper imports
download_tasks/download_batches/batch_locks/tasks_lock straight from
core.runtime_state since those are module-level shared globals.
Prep change: `batch_locks` dict moved from web_server.py global into
core/runtime_state.py alongside the other download globals. web_server.py
re-imports from runtime_state so the ~3 existing call sites in
web_server.py keep resolving without modification. Identity preserved
(same dict across all importers).
Out of scope (deferred to PR4g batch lifecycle):
- cancel_download_task (calls _on_download_completed)
- cancel_task_v2 + _atomic_cancel_task + _find_task_by_playlist_track
(manipulate batch active_count directly, deeply coupled to lifecycle)
Behavior parity:
- Same response shapes + status codes on each route
- Same call order (cancel_all → clear_all_completed → sweep)
- Same conditional sweep on clear_finished (skipped on failure)
- Same sweep ALWAYS runs after cancel_all even if clear_all returns False
(matches original — clear failure was non-fatal in cancel_all path)
- Same TERMINAL_STATUSES set: completed/failed/not_found/cancelled/skipped/
already_owned (lifted to module-level constant)
- Same empty-batch pruning + same batch_locks cleanup
- Same lock acquisition pattern (single tasks_lock)
Tests: 14 new under tests/downloads/test_downloads_cancel.py covering
single cancel, cancel-all happy + failure paths, clear-finished + sweep
gate, local task pruning across all 7 active/terminal states, batch
queue trimming, batch_locks cleanup.
Full suite: 921 passing (was 907). Ruff clean.
First sub-PR in the download orchestrator series. Strict 1:1 lift —
zero behavior change.
What moved:
- _record_sync_history_start → record_sync_history_start
- _record_sync_history_completion → record_sync_history_completion
- _detect_sync_source → detect_sync_source
- Source prefix map → module-level _SOURCE_PREFIX_MAP constant
What stayed:
- web_server.py keeps three thin wrappers (_detect_sync_source,
_record_sync_history_start, _record_sync_history_completion) that
delegate into core/downloads/history.py. ~60 callers of these names
in web_server.py keep resolving without touching every site.
Each lifted function takes `database` as an arg (was
`db = MusicDatabase()` inline). The wrappers construct
`MusicDatabase()` per call to mirror the exact original behavior —
each invocation got a fresh DB connection.
Behavior parity:
- Same SQL UPDATE statement (preserves the in-place update path when
a sync_history entry already exists for the playlist_id)
- Same JSON serialization with ensure_ascii=False
- Same thumb URL extraction order (album_context.images → image_url
→ first track album.images)
- Same per-track result shape (index, name, artist, album, image_url,
duration_ms, source_track_id, status, confidence, matched_track,
download_status)
- Same status mapping (found/not_found, completed/failed)
- Same best-effort exception swallowing (sync history failure must
never break the actual download)
- Reads `download_tasks` from core.runtime_state (already lifted by
kettui in PR378)
Tests: 34 new under tests/downloads/test_downloads_history.py
covering source detection (16 prefixes), start happy paths + thumb
extraction + duplicate-update + DB error swallowing, completion stats
+ per-track results JSON shape + edge cases.
Full suite: 907 passing (was 873). Ruff clean.
User report: SoulSync was only pulling MusicBrainz genres from the
recording (track-level) endpoint. Most MB recordings don't carry genres
at the track level — they live on the release (album) or artist. So
the MB tier was contributing nothing to the genre merge for the
overwhelming majority of tracks.
Fix:
- Added `'genres'` to the release-detail `includes` (was missing).
- After release-detail processing, if pp['mb_genres'] is still empty,
populate from release_detail['genres'] (sorted by count desc).
- If still empty AND artist_mbid is set, fetch artist with
`includes=['genres']` and use those.
No extra API call when the recording (or release) already had genres —
the artist fetch only fires when both upstream tiers came back empty.
The downstream genre merge in _embed_metadata_genres is unchanged; this
just makes the MB feed into it richer.
Tests: 4 new (recording present, recording empty → release, recording
+ release empty → artist, all empty → []). Full suite 873 passing.
Ruff clean.
Reported by @kcaoyef421 in Discord.
The endpoint was returning a 200-line literal dict inline. Moved the
three lists (TRIGGERS, ACTIONS, NOTIFICATIONS) to module-level constants
in core/automation/blocks.py. Route shrinks to 7 lines. Data is now
importable for tests + future docs.
Added 8 shape tests so a typo in the dict (missing 'type', wrong
field type, missing options on a select, etc.) gets caught by CI
instead of breaking the builder UI silently.
The `known_signals` field stays computed at request time via
_collect_known_signals(database) since it's dynamic.
No behavior change. Same response shape. 869 tests passing (was 861).
Ruff clean.
Routes moved to thin parse-args/jsonify handlers; logic now lives in
three focused modules under core/automation/. 436 lines deleted from
web_server.py; 53 added back as wrappers.
Module split:
- core/automation/api.py — CRUD + run + history helpers. Each function
takes (database, automation_engine, ...) explicitly and returns
(response_body, http_status). Includes signal cycle detection
preflight checks for create + update.
- core/automation/progress.py — owns the in-memory progress state dict
+ lock (mirroring the original web_server.py globals as module-level
shared state so all callers see one view), init/update/history
helpers, and the WebSocket emit loop.
- core/automation/signals.py — collect_known_signals for the builder
autocomplete.
Out of scope (deferred):
- _register_automation_handlers — the 23+ action handler closures stay
in web_server.py because each one is tightly coupled to feature-
specific implementations (wishlist, watchlist, library scan, etc.).
- Worker functions (_process_wishlist_automatically, etc.) — belong
with their feature lifts.
- _run_sync_task / _run_playlist_discovery_worker — sync + discovery
PRs.
Behavior preserved 1:1:
- Same route response shapes + status codes
- Same JSON field hydration (trigger_config, action_config,
notify_config, last_result, then_actions)
- Same backward-compat: empty then_actions + notify_type set →
synthesize then_actions from notify_type/notify_config
- Same signal cycle detection behavior on create + update
- Same system-automation protection on delete + duplicate
- Same reschedule/cancel logic on toggle + bulk-toggle + update
- Same progress state shape (status, progress, phase, current_item,
log capped at 50, started_at/finished_at, action_type)
- Same emit-on-finish socketio push from update_progress
- Same emit loop semantics (1s tick, snapshot active states, reap
finished after window)
Pre-existing bugs preserved (will fix in follow-up PRs):
- emit_progress_loop uses naive datetime.now() against tz-aware
started_at/finished_at, so the timeout-zombie check raises
TypeError → caught → never fires, and the cleanup-after-window
check raises → caught → state is reaped on FIRST tick regardless
of the window. Tests document this behavior so the next PR can
flip them to the corrected expectation.
Tests: 72 new under tests/automation/ (signals 10, progress 24,
api 38). Full suite: 861 passing (was 789). Ruff clean.
Three drifts caught in line-by-line review against the pre-lift
web_server.py. All addressed for strict 1:1 behavior parity.
1. /api/enhanced-search/source/<src> now returns plain JSON
`{"artists":[],"albums":[],"tracks":[],"available":false}` (or
`{"videos":[],"available":false}` for youtube_videos) when the
source's client isn't available, matching the original endpoint
contract. Previously streamed an NDJSON `{"type":"done"}` line
instead.
Restructured by splitting the orchestrator into resolve+stream
helpers:
- `resolve_client(source_name, deps)` — already existed, used
for /api/enhanced-search single-source mode
- `resolve_youtube_videos_client(deps)` — new, returns the
soulseek_client.youtube subclient or None
- `stream_metadata_source(source_name, query, client)` — pure
NDJSON generator, caller resolves client first
- `stream_youtube_videos(query, youtube_client, run_async)` —
same shape for the yt-dlp path
The route now decides plain-JSON-vs-stream based on resolution
result, mirroring the original control flow exactly.
2. core/search/library_check.py — reverted the defensive `(x or '')`
and `getattr(plex_client, 'server', None) is not None` patterns
to original byte-for-byte (`x.get('name', '')`,
`plex_client.server`, no try/except around `get_plex_config`).
Lift PR shouldn't change crash semantics; if the original raises
on malformed input, mine should too. Pre-existing edge cases get
their own follow-up PR.
3. core/search/stream.py — same revert: `soulseek_client.youtube`
instead of `getattr(..., 'youtube', None)` etc.
Also removed the module-level `EMPTY_SOURCE` from sources.py and
moved its (per-call) duplicate into _fan_out_response as a local —
the original used a per-request local dict and the identity-check
behavior depends on that. Module-level was a footgun for future
mutations.
789 tests still pass (95 search), ruff clean.
Line-by-line review of the search lift caught one drift: cache.get_cache_key
was coercing falsy provider returns ('', None, 0) to 'unknown' / False.
Original web_server.py only fell back to those sentinels on exception, not
on falsy success values.
Real-world impact: low — get_active_media_server() and get_primary_source()
return non-empty strings in practice. But cache keys are tuples with no
schema enforcement, so any drift here can silently fragment the cache.
Restored 1:1 parity with original semantics.
Added test covering the falsy-success path so this can't drift again.
789 tests pass, ruff clean.
Routes moved to thin parse-args/jsonify handlers; logic now lives in
six focused modules under core/search/. 720 lines deleted from
web_server.py; 109 added back as wrappers; ~700 lines of new core code
plus ~700 lines of tests.
Module split:
- core/search/cache.py — TTL+LRU cache for enhanced-search responses,
keyed by (query, active_server, fallback_source, hydrabase_active,
source_tag) so config changes don't poison stale entries.
- core/search/sources.py — per-kind metadata search (artists/albums/
tracks) and the multi-kind ThreadPoolExecutor that fans them out.
- core/search/library_check.py — library + wishlist presence check
with Plex thumb URL resolution; profile-aware wishlist with legacy
fallback for older DBs missing the profile_id column.
- core/search/stream.py — single-track preview search; effective stream
mode resolution, query-variant generation, retry walk, matching
engine integration.
- core/search/basic.py — flat Soulseek file search, quality-sorted.
- core/search/orchestrator.py — main enhanced-search dispatch
(short-query fast path, single-source bypass, hydrabase-primary fan
out, alternate source list builder), NDJSON streaming generator
for /source/<src>, and the SearchDeps dataclass that bundles the
cross-cutting deps.
Routes pass clients (spotify, hydrabase, hydrabase_worker, soulseek)
and helpers (config_manager, fix_artist_image_url,
_is_hydrabase_active, _get_metadata_fallback_*, _run_background_
comparison, run_async, dev_mode_enabled_provider) into core/search via
a SearchDeps bundle built per-request. fix_artist_image_url stays in
web_server.py because it touches 31 other call sites.
Behavior preserved 1:1:
- Same response shapes (db_artists, spotify_artists, spotify_albums,
spotify_tracks, primary_source, metadata_source, alternate_sources,
source_available)
- Same NDJSON line ordering (artists/albums/tracks as they finish, plus
done marker)
- Same per-kind exception swallowing
- Same hydrabase-worker mirror on dev mode
- Same cache key shape (5-tuple) and TTL/LRU semantics
- Same stream-track effective-mode resolution including the
Soulseek-coerce-to-YouTube edge case
- Same library-check Plex thumb URL rewriting and wishlist fallback
for older DBs
Tests: 94 new (cache TTL/LRU/key, sources happy/partial/all-fail,
library presence with library + wishlist + thumbs, stream effective
mode + query gen + retry, orchestrator client resolution + short
query + single source + fan-out alternates + hydrabase primary +
NDJSON drain). Full suite: 788 passing (was 694).
Ruff clean.
Stats route logic moves into core/stats/queries.py as pure-ish functions
that take dependencies (database, image-url fixer, listening worker) as
arguments. The 13 route handlers in web_server.py shrink to thin
parse-args / jsonify wrappers.
What moved to core/stats/queries.py:
- stats_cached: 3-key metadata cache lookup + image url fix-up
- stats_overview / timeline / genres / library_health / db_storage
- stats_top_artists / top_albums / top_tracks: top-N + DB enrichment
- stats_recent: listening_history readback
- stats_resolve_track: title+artist -> file_path lookup for playback
- listening_stats_sync: spawns daemon thread that runs worker._poll
- listening_stats_status: stats payload, with None-worker fallback shape
No behavior change. Same response shapes, same error handling, same
silent-except on per-row enrichment failure. fix_artist_image_url
stays in web_server.py and is passed through as a callback so we
don't have to lift its config_manager / media-server dependencies in
this PR.
Adds tests/stats/test_stats_queries.py — 27 tests covering happy
paths, edge cases, image-url plumbing, worker glue.
Ruff clean. 694 tests pass (was 667 + 27 new).
Lifted-then-not-deleted leftovers from the PR378 merge:
- web_server.py `_resolve_album_group` and `_build_final_path_for_track`
were already imported at module top from `core/imports/`. Removed the
shadowing local copies.
- Mutagen reimports (FLAC/MP4/OggVorbis) at L17736-17738 shadowed the
top-of-file imports. Picture/MP4Cover/MP4FreeForm were unused. Dropped
the whole block.
- core/imports/context.py: `getattr(artist, "name")` -> `artist.name`
(B009).
Ruff clean, 667 tests pass.
- Normalize album import track display handling so queue labels and match rows stay consistent
- Bound MusicBrainz caches and avoid caching transient lookup failures
- Stop swallowing programmer errors in source enrichment helpers
- Restore import config test seams without reintroducing lazy imports
- Guard task completion calls and fix the Windows path test expectation
- Keep file lock tracking from growing without bound
- Cover search_result fallback normalization and ambiguous album detection.
- Add staging metadata, multi-disc path, and MusicBrainz enrichment cases.
- Move the single-track context test next to the imports code it exercises.
- keep single-track import lookup in imports/resolution.py
- normalize simple-download search_result data before wishlist matching
- run wishlist cleanup for simple-download post-processing
- keep source-only artist detail on resolved names and MB short-circuit
- pass the selected manual match through singles import
- keep the import context source-aware so artist and album stay correct
- avoid treating non-Spotify IDs as wishlist Spotify IDs
- make wishlist logging and local variable names source-neutral
- Move the import pipeline runtime factory into core.imports.pipeline
- Move the metadata runtime factory into core.metadata.enrichment
- Keep the web server wiring thin and drop the shared glue module
- Add contract tests that keep the two runtime bundles separate
- Relocate the shared metadata helper module from core/metadata_common.py into core/metadata/common.py.
- Update the new metadata package, the import pipeline, and the web entrypoint to use the package-scoped helper.
- Keep the shared config, mutagen, file-lock, and tag-writing helpers centralized without touching unrelated files.
- Pass the live runtime bundle into the shared metadata facade so worker-backed source enrichment can actually run.
- Forward runtime from the import pipeline and web-server wrapper into embed_source_ids.
- Add a regression test that verifies the runtime object reaches the source-ID embedding path.
- Keep existing metadata_cache and metadata_service at the top level for now
- Move the new branch-local metadata helpers under core/metadata
- Share MusicBrainz release cache state from core.metadata.source and update import sites
- Move app-wide task and activity registries out of core/imports
- Share one runtime-state module across the web server, API, and import pipeline
- Keep import-specific helpers focused on context and post-processing
- Move import flow modules into a dedicated package
- Update app and test imports to the new namespace
- Group the import-focused tests under tests/imports