mirror of https://github.com/Nezreka/SoulSync.git
main
dev
video
fix/disable-beatport-features
johnbaumb-discover-redesign
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
2.4.0
2.4.1
2.4.2
2.5.0
2.5.1
2.5.2
2.5.3
2.5.4
2.5.5
2.5.6
2.5.7
2.5.9
2.6.0
2.6.1
2.6.2
2.6.3
2.6.4
2.6.5
2.6.6
2.6.7
2.6.8
2.6.9
2.7.0
2.7.1
2.7.2
2.7.3
2.7.4
2.7.5
2.7.6
2.7.7
2.7.8
2.7.9
2.8.0
v0.65
${ noResults }
1160 Commits (aa2806180e00a09e205f41d088b8f73ea9fe6ccb)
| Author | SHA1 | Message | Date |
|---|---|---|---|
|
|
aa2806180e |
Fix: Soulseek album poll hangs on a stalled peer; failed batches never cleared
Two related bugs from the Slipknot album never finishing. 1) _poll_album_bundle_downloads hung when the peer stalled. The finish check needs every transfer terminal (completed/failed); the #715 grace only covers 'slskd says Completed but file not on disk'. A transfer stuck InProgress / Queued, or dropped by slskd, is none of those — so it blocked both the finish AND the grace exit, and the poll spun to the full ~6h timeout. Add a bundle-level stall guard: track a progress marker (#terminal transfers, total bytes across pending). If NOTHING advances for _stall_grace (180s) — no terminal transition AND no pending byte movement — the peer has stalled; mark the stuck transfers failed so the existing finish/all-failed checks resolve the bundle with whatever completed (missing tracks then fall back to the per-track matcher). Conservative: only trips when EVERYTHING is frozen, so a slow-but-progressing or still-queued transfer is unaffected. 2) Failed batches lingered in the UI forever ('No tracks loaded'). The auto-cleanup gate removed only complete/error/cancelled phases — 'failed' (e.g. an album-bundle hard failure) was missing, so it never aged out. Add 'failed' to the terminal set so it's removed after 5 minutes like the others. Tests (tests/test_soulseek_album_poll_stall.py): stalled peer → gives up with the completed subset (not the deadline); progressing bundle not falsely stalled; all-stalled → empty; dropped transfers stall out; clean finish unaffected. 124 download/soulseek tests pass; ruff clean. |
4 weeks ago |
|
|
cea897cbd1 |
Wishlist: import Optional (fix ruff F821 in processing.py)
make_wishlist_batch_row / _run_wishlist_cycle annotate params with Optional, but the typing import only had Any/Callable/Dict. Slipped past py_compile + tests because 'from __future__ import annotations' makes annotations strings (never evaluated at runtime), but ruff flags it statically (F821). |
4 weeks ago |
|
|
d3c897fb9d |
Wishlist: route the manual flow through the shared engine (manual == auto)
Stage 2: the manual 'Download Wishlist' flow now calls the same _run_wishlist_cycle engine the auto timer uses, so a manual scan runs the exact same code path as an auto scan. The old bespoke manual orchestration (build payloads + SERIAL inline dispatch) is deleted — its grouping/dispatch was a near-duplicate of auto's that had already drifted. Behavior changes (all intended, discussed): - Manual now dispatches album bundles in PARALLEL (album pool) like auto, instead of serially on one thread. A single cycle='albums' engine call covers the whole selection (albums bundled, singles/ungroupable -> per-track residual), so no 'both cycles' pass is needed. - The manual placeholder batch_id is reused as the engine's first sub-batch (first_batch_id), so the modal's existing poll target stays valid. - WishlistManualDownloadRuntime gains album_bundle_executor (wired in web_server, falls back to the shared pool when unset). - 'Don't start manual while auto is running' is unchanged — the existing route guard (is_wishlist_actually_processing -> 409) already covers it; no queue added. NOT touched: process_wishlist_automatically's behavior (proven by test_automation staying green in Stage 1) and the per-track download mechanics. test_manual_download.py rewritten to characterize the new behavior (engine dispatch via the executor, parallel, placeholder reuse, album-context). Full wishlist suite green (131); wishlist + automation = 392 passed. |
4 weeks ago |
|
|
db1e51109c |
Wishlist: extract shared _run_wishlist_cycle engine; auto delegates to it
Stage 1 of unifying the auto + manual wishlist flows. Extract the group -> per-album+residual batches -> register -> dispatch logic that lived inline in process_wishlist_automatically into a standalone _run_wishlist_cycle() engine (built on make_wishlist_batch_row). The auto path now just calls it. Per-flow differences are arguments (auto_initiated stamps the auto-only fields + selects auto vs manual naming/logging; first_batch_id lets a caller reuse a pre-created placeholder). Album batches dispatch to the dedicated album pool, residual to the shared pool (unchanged from #740). Auto behavior is PROVABLY unchanged: its full characterization suite (test_automation.py) stays green (10/10), and the whole wishlist suite passes (131). This commit does NOT touch the manual flow yet (Stage 2) and does not change what auto does — it only moves auto's logic behind a shared entrypoint the manual flow will call next. |
4 weeks ago |
|
|
e4b5cbbe60 |
Wishlist: unify batch-row construction into make_wishlist_batch_row
The auto and manual wishlist flows each built the same ~20-field
download_batches row in separate places (auto album, auto residual, manual
placeholder, manual sub-batches) — four near-identical literals that could (and
did) drift apart, producing subtly different batch shapes between the flows.
Extract make_wishlist_batch_row() as the single source of truth: it emits the
consistent core field set, with the genuinely per-flow differences as explicit
arguments — initial phase ('queued' for auto / 'analysis' for manual), the
auto-only auto_initiated/auto_processing_timestamp/current_cycle via
extra_fields, and album-vs-residual contexts. All four sites now go through it,
so every wishlist batch has an IDENTICAL shape (this also removes the field
drift that confused the modal-hydration code).
Deliberately NOT unified — and left explicit in each caller, per the
'don't cargo-cult genuinely-different code' principle: the grouping decision
(auto groups only on the albums cycle), batch-id allocation (manual reuses the
caller's placeholder id for the first sub-batch), and dispatch (auto
parallel-submits album batches to the dedicated pool + residual to the shared
pool; manual runs them serially on one thread). Those are real behavioral
differences, not duplication.
Behavior-preserving: verified safe to normalize the row shape (grep confirmed
every reader uses .get() with defaults, no key-presence checks). The existing
auto (test_automation.py) and manual (test_manual_download.py) characterization
suites stay green = differential proof of identical behavior. Adds
test_batch_factory.py (core fields, album/residual, extra_fields, no shared
mutable state, consistent key shape). 131 wishlist tests pass.
|
4 weeks ago |
|
|
0898014364 |
Fix #740: run wishlist album-bundle downloads on a dedicated pool
A 2.6.3 change (
|
4 weeks ago |
|
|
4fcc461616 |
Source IDs: add canonical registry, adopt at the highest-value sites
The same provider ID is stored under inconsistent column names across tables (deezer_id vs deezer_artist_id vs album_deezer_id vs similar_artist_deezer_id; spotify/itunes keep an entity qualifier, others don't; musicbrainz uses three nouns), so code checks 2-5 name variants everywhere. Add core/source_ids.py as the single source of truth for (provider, entity) -> column, with accessors that read an ID from a dict/sqlite3.Row robustly (canonical column first, then known aliases). NO database columns are renamed — these are the real names today; the registry just centralizes the knowledge. Targeted adoption (behavior-identical, verified): - core/artist_source_lookup.SOURCE_ID_FIELD now derives from the registry instead of duplicating the mapping (values unchanged). - web_server.py artist-detail builds artist_source_ids via source_id_map(...) instead of a hand-rolled per-source .get() dict. Broader call-site adoption deferred as clearly-scoped follow-up. Tests: tests/test_source_ids_registry.py (canonical columns, alias fallback, canonical-preferred, sqlite3.Row, source_id_map, SOURCE_ID_FIELD unchanged). Existing artist_source_lookup + artist_full_detail suites still green. |
4 weeks ago |
|
|
54d0fed345
|
Merge pull request #728 from IamGroot60/fix/usenet-album-progress-sab-fetch
Fix Usenet album bundle: stuck at 99% (SAB post-processing in History) + writable staging + client→local path resolution (#721) |
1 month ago |
|
|
560156abee |
Fix: import overwrites album-artist tag to "Unknown Artist" (#735)
Reported by CubeComming: importing media keeps the track artist correct
(e.g. Billie Eilish) but changes the album-artist tag ("Albuminterpret") to
"Unknown Artist", breaking grouping in the media server.
Cause: in extract_source_metadata (core/metadata/source.py), album_artist is
seeded from the resolved track artist, then overridden by the album CONTEXT's
first artist. When the album lookup comes back unresolved, that first artist is
the literal "Unknown Artist" placeholder — which is truthy, so it clobbered the
real artist.
Fix: treat "Unknown Artist" (and empty) as a non-value — only let the album
context override the album_artist when it names a real artist. A genuine album
artist (e.g. "Various Artists") still overrides as before.
Tests: tests/metadata/test_album_artist_unknown.py — placeholder doesn't
clobber, real album artist still used, no-album-context falls back to track
artist, empty doesn't clobber. (Pre-existing test_album_mbid_cache.py failures
are an unrelated sandbox DB disk-I/O issue.)
|
1 month ago |
|
|
0b325da3e9 |
Usenet bundle: writable staging dir + client→local path resolution (#721)
Follow-up to the poll fix, covering the two things that blocked a
successful end-to-end album import once the poll itself stopped
freezing:
1. Staging dir permissions
The album-bundle private staging path defaults to
'storage/album_bundle_staging' -> /app/storage, but /app/storage was
never created or chowned by the image (unlike /app/Staging,
/app/Transfer, etc.), and /app is root-owned. The copy failed with
"[Errno 13] Permission denied: 'storage'" under the non-root soulsync
UID. Added /app/storage to the Dockerfile build-time mkdir+chown and
the entrypoint PUID/PGID chown, exactly like the sibling runtime dirs.
2. Client->local path resolution
Usenet/torrent clients report save paths from inside THEIR OWN
container (e.g. SAB '/data/downloads/music/<album>'); SoulSync often
mounts the same files at a different point ('/app/downloads/<album>').
Feeding the client path straight to the audio walker yields
"No audio files found" though the files are physically present.
New resolve_reported_save_path():
a. use the reported path as-is if readable (mirrored mounts),
b. apply explicit download_source.usenet_path_mappings
({from,to}, Sonarr/Radarr-style) for non-shared layouts,
c. basename fallback under SoulSync's own download roots —
zero-config for the standard shared-volume arr setup.
Wired into both call sites in usenet.py AND torrent.py
(download_album_to_staging + _finalize_download), logging any
translation and including the resolved path in the no-audio error.
Tests: resolver verbatim / explicit-mapping / basename-fallback /
priority / not-found / empty / mapping-miss-then-basename. ruff +
compileall + pytest green (645 in the download suites).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
1 month ago |
|
|
b8384beef9 |
Fix: Usenet bundle stuck at 99%/100% — SAB reports post-processing in History as non-terminal (#721)
The earlier #721 fix tolerated a ~10s "completed but no save_path" window, but the real production stall sits upstream of that: SABnzbd removes a finished download from the queue and runs par2 verify / repair / unpack *in History*, exposing the live stage in the slot `status` ('Verifying' / 'Repairing' / 'Extracting' / 'Moving' / ...) with `storage` empty until the final move. `_parse_history_slot` mapped EVERY non-'Failed' status to 'completed', so a still-extracting 1.7 GB FLAC album looked "completed with no save_path" the instant download hit 100%. The poll burned its completed-no-path budget mid-PP and bailed, freezing the UI on the last download emit (the stuck-at-99%/100% signature). SAB then finished fine — which is why the job shows Completed in History but SoulSync never staged it. Root fix - `_parse_history_slot` routes `status` through `_map_state`, so PP stages stay NON-terminal: the poll keeps waiting (as 'downloading') for as long as post-processing takes and only a real 'Completed' flips to terminal success. `save_path` is trusted only on true completion (mid-PP path fields may point at the incomplete dir). Supporting / defensive - `UsenetStatus.incomplete_path`: surfaced separately from save_path (SAB `incomplete_path`) and used by the poll loops as a LAST RESORT after the completed-no-path window, to recover the case where `storage` never lands but the files are physically on disk. - `poll_album_download`: dedicated, configurable completed-no-path window (~120s via `download_source.album_bundle_completed_no_path_seconds`) decoupled from the ~10s transient-miss window; incomplete_path fallback; a 30s heartbeat log so the previously-silent poll loop is diagnosable. - `usenet.py` `_download_thread`: per-track parity — it was erroring immediately on the first completed-no-path read. - `album_bundle_dispatch.py` / `status.py` / `monitor.py`: use the project `get_logger` so download-flow logs land in app.log under the `soulsync.*` namespace (they were console-only before, which hid the `[Album Bundle] flow failed` line during triage). Tests - PP-history state mapping; end-to-end Hunky Dory PP regression (download -> Verifying/Extracting in History past both budgets -> Completed+storage -> success); completed-no-path window + incomplete_path fallback; per-track thread parity. ruff + compileall + pytest all green (the only local failures are environmental: missing tzdata + local tools/ffmpeg.exe, neither present on CI). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
1 month ago |
|
|
d5f6a14ba1 |
Discovery lift (10/N): save_*_bubble_snapshot -> shared helper
Final cluster: the four structurally-identical snapshot endpoints
(discover_downloads, artist_bubbles, search_bubbles, beatport_bubbles) ->
core.discovery.endpoints.save_bubble_snapshot(...), wired via
_save_source_bubble_snapshot. All four validate a payload key, persist via
db.save_bubble_snapshot(kind, items, profile_id=...), and return a count +
timestamp; they differ only by:
- payload_key ('downloads' for discover, 'bubbles' for the rest) + its
no_data_error message.
- snapshot_kind, success_noun, and the info/except log subject + noun
("downloads"/"artists"/"albums/tracks"/"charts").
get_database / get_current_profile_id injected; get_json (request.json) invoked
inside the try, preserving the original 400/500 behavior incl. traceback dump.
Tests: +5 (missing key 400, None body 400, happy path with kind/profile/count/
timestamp, discover_downloads variant, exception -> 500). Full discovery suite:
210 passed.
web_server.py: -98 lines.
|
1 month ago |
|
|
4caf36deb1 |
Discovery lift (9/N): update_*_playlist_phase -> shared helper
Ninth cluster: update_<source>_playlist_phase for the five sources sharing the
identical validation + full-message response (Tidal, Deezer, Qobuz,
Spotify-Public, YouTube) -> core.discovery.endpoints.update_playlist_phase(...),
wired via _update_source_playlist_phase + the _PHASE_LIST/_PHASE_LIST_YT
constants.
Per-source params:
- valid_phases — YouTube additionally allows 'parsed'.
- apply_extra_fields — Deezer/Qobuz/Spotify-Public also persist
download_process_id / converted_spotify_playlist_id from the body; Tidal and
YouTube do NOT, so they pass False (kept strictly 1:1 — the generic won't
apply those keys for them even if a caller sent them).
- not_found_message / error_label; get_json invoked inside the try.
NOT folded in: iTunes-Link — uses data.get('phase') (no "Phase not provided"
400) and returns a no-message payload.
Tests: +7 (404, missing-phase 400, invalid 400, happy path with extra-fields
suppressed, extra-fields applied when enabled, YouTube 'parsed' allowed,
exception -> 500). Full discovery suite: 205 passed.
web_server.py: -123 lines.
|
1 month ago |
|
|
50ebfbd82f |
Discovery lift (8/N): update_*_discovery_match -> shared helper
Eighth cluster, the heavyweights (~110 lines each). The fix-modal
update_<source>_discovery_match for the four sources with the identical
structure (Tidal, Deezer, Qobuz, Spotify-Public) ->
core.discovery.endpoints.update_discovery_match(...), wired via
_update_source_discovery_match. Applies the user-selected Spotify track to the
discovery result (status/artist/album/duration/spotify_data/match-count) and
writes the manual fix to the discovery cache.
Per-source pieces are params:
- source_log_label / error_label.
- original_track_key ('tidal_track' / 'deezer_track' / ...).
- original_artist_getter: Tidal handles string-or-object artists
(first_artist_str_or_obj); the rest assume strings (first_artist_plain).
- web_server helpers (join/extract artist, build_fix_modal_spotify_data,
cache-key, get_database, active-discovery-source) injected.
- get_json passed as a callable and invoked INSIDE the try, preserving the
original's "request.get_json() inside try" behavior (malformed body -> 500).
NOT folded in (genuinely divergent): iTunes-Link (saves spotify_data directly
via a different cache signature), YouTube (multi-key original_track fallback),
ListenBrainz (entirely different unmatch-capable structure, no cache write),
Beatport.
Tests: +9 (extractors; 400/404/400 guards; full happy path with result
mutation + duration formatting + match-count + cache-save args; no-increment
when already found; cache error swallowed; get_json raise -> 500). Full
discovery suite: 198 passed.
web_server.py: -400 lines.
|
1 month ago |
|
|
17c9e9b7b9 |
Discovery lift (7/N): start_*_sync -> shared helper
Seventh cluster: start_<source>_sync for the five sources with the identical
flow (Tidal, Deezer, Qobuz, Spotify-Public, YouTube) ->
core.discovery.endpoints.start_sync(...), wired via _start_source_sync.
Validates phase, converts discovery results, seeds sync state, posts a
"... Sync Started" activity item, and submits to the sync executor. Per-source
pieces are params:
- sync_id_prefix (f"{prefix}_{key}"), not_found/not_ready messages, convert_fn.
- name/image accessors: Tidal reads an object (playlist_name_obj/
playlist_image_obj), the rest a dict (playlist_name_strict/playlist_image_dict).
- activity_label vs error_label DIFFER for Spotify-Public ("Spotify Link
Sync Started" activity, "Spotify Public" logs).
- submit_sync_task glue (_submit_sync_task) closes over sync_executor /
_run_sync_task / get_current_profile_id so the helper stays global-free.
NOT folded in: iTunes-Link (no final info log), ListenBrainz (submits the
task WITHOUT a playlist_image_url arg), Beatport (extra debug logging, chart).
Tests: +6 (404, not-ready 400, no-matches 400, full happy path with
state/sync-infra/submit/activity assertions, resync phases allowed,
exception -> 500). Full discovery suite: 189 passed.
web_server.py: -172 lines.
|
1 month ago |
|
|
7b6615b65a |
Discovery lift (6/N): get_*_playlist_states -> shared helper
Sixth cluster: the bulk-hydration get_<source>_playlist_states endpoints for
the five sources that build the identical per-entry dict + {"states": [...]}
shape (Tidal, Deezer, Qobuz, Spotify-Public, iTunes-Link) ->
core.discovery.endpoints.get_playlist_states(states, *, error_label,
info_log_label=None), wired via _get_source_playlist_states.
iTunes-Link is the only one of the five without the "Returning N stored ..."
info log, so info_log_label is optional (iTunes passes None to suppress it).
NOT folded in: the YouTube/ListenBrainz get_all_*_playlists endpoints. They
return {"playlists": [...]} (different key) with a different field set
(url / created_at / playlist, no discovery_results) and filter out
mirrored_/profile-scoped entries — genuinely divergent, kept as-is.
Tests: +4 (list build + last_accessed bump + exact shape, empty, optional ids
default None, missing-required-field -> 500). Full discovery suite: 183 passed.
web_server.py: -116 lines.
|
1 month ago |
|
|
44b032b6c0 |
Discovery lift (5/N): reset_*_playlist -> shared helper
Fifth cluster: reset_<source>_playlist for the four sources with byte-
identical bodies (Tidal, Deezer, Qobuz, Spotify-Public) ->
core.discovery.endpoints.reset_playlist(states, key, *, label,
not_found_message), wired via _reset_source_playlist. Resets phase/status to
'fresh', clears discovery/sync fields, cancels any discovery_future, and
preserves the original playlist payload.
Left with their own bodies (genuinely divergent):
- YouTube: status -> 'parsed' (not 'fresh'), no download_process_id, logs the
playlist name, "reset to fresh state".
- ListenBrainz: status -> 'cached', logs playlist title, returns
{"success": True, "phase": "fresh"} (different payload), _lb_state_key.
- iTunes-Link: state.update(...), no info log, "iTunes Link reset to fresh
phase".
Tests: +4 (404, full clear + playlist preserved + future cancelled, no-future
path, exception -> 500). Full discovery suite: 179 passed.
web_server.py: -100 lines.
|
1 month ago |
|
|
8a9ed677ab |
Discovery lift (4/N): get_*_discovery_status -> shared helper
Fourth cluster: get_<source>_discovery_status (all eight sources, Beatport
included) -> core.discovery.endpoints.get_discovery_status(states, key, *,
not_found_message, error_label), wired via _get_source_discovery_status.
Unlike sync-status, the discovery-status response shape is byte-identical
across every source (phase/status/progress/spotify_matches/spotify_total/
results/complete), so Beatport folds in here too. Only the 404 string
("... discovery not found" vs "... playlist not found" vs "Beatport chart
not found") and the except-log label vary. ListenBrainz key via _lb_state_key.
NOT touched this cluster: get_*_playlist_state (the sibling endpoints).
Those genuinely diverge per source — different id-key name (playlist_id /
url_hash / playlist_mbid), presence of url / created_at / download_process_id,
Tidal's playlist.__dict__ serialization, and YouTube's strict (non-.get)
field access. Folding them would need a flag pile that wouldn't be a clean
1:1, so they keep their own bodies.
Tests: +4 (404, full response + last_accessed bump, complete=False when not
'discovered', missing-field -> 500). Full discovery suite: 175 passed.
web_server.py: -155 lines.
|
1 month ago |
|
|
aad1d2b8f0 |
Discovery lift (3/N): get_*_sync_status -> shared helper
Third cluster: the get_<source>_sync_status routes (Tidal, Deezer, Qobuz, Spotify-Public, iTunes-Link, YouTube, ListenBrainz) -> core.discovery. endpoints.get_sync_status(...), wired via _get_source_sync_status glue. This cluster carries the real per-source quirks, all captured 1:1 as params: - not_found_message (iTunes-Link uses "iTunes Link not found"). - error_label vs activity_subject — these DIFFER for Spotify-Public: the activity feed says "Spotify Link playlist ..." while the except log says "Error getting Spotify Public sync status". - playlist-name accessor, three styles lifted verbatim as named helpers: playlist_name_attr_or_unknown (Tidal: object .name), playlist_name_strict (Deezer/Qobuz/Spotify-Public/iTunes: state['playlist']['name'], can raise), playlist_name_safe (YouTube/ListenBrainz: .get default). The strict getter preserves the original's behavior of raising -> 500 AFTER phase/sync_progress were already mutated. - ListenBrainz key via _lb_state_key (caller-resolved). Beatport stays separate (different payload: status not sync_status, sync_id, no lock, chart key). Tests: +9 (3 name accessors incl. raise/fallback semantics; status 404s, running-no-mutation, finished+activity, error+revert+activity, and strict- getter-missing -> 500 after partial mutation). Full discovery suite: 171 passed. web_server.py: -244 lines. |
1 month ago |
|
|
2d76a7c061 |
Discovery lift (2/N): cancel_*_sync + delete_*_playlist -> shared helpers
Second cluster. Two more sets of byte-identical per-source bodies: cancel_<source>_sync (Tidal, Deezer, Qobuz, Spotify-Public, iTunes-Link, YouTube, ListenBrainz) -> core.discovery.endpoints.cancel_sync(states, key, *, label, not_found_message, sync_lock, sync_states, active_sync_workers). Returns (payload, status_code); a thin web_server glue (_cancel_source_sync) wires the sync-infra globals + jsonify. Caller passes the resolved key (ListenBrainz transforms via _lb_state_key) and the exact 404 string (iTunes-Link uses "iTunes Link not found"). delete_<source>_playlist (Tidal, Deezer, Qobuz, Spotify-Public) -> delete_playlist_state(states, key, *, label, not_found_message), wired via _delete_source_playlist. Intentionally left with their own bodies (genuinely divergent, not 1:1): - Beatport cancel (cancels a stored sync_future, no message, warning log). - iTunes-Link / YouTube / ListenBrainz / Beatport deletes (different success messages, info-log wording, playlist-name extraction, /remove route, chart key). Tests: +11 in tests/discovery/test_discovery_endpoints.py covering cancel (404, active-worker cancel + state revert, worker-absent, no-sync-in-progress, label in message, exception->500) and delete (404, future cancel + removal, no/falsy future, exception->500 leaves state). Full discovery suite: 162 passed. web_server.py: -216 lines. |
1 month ago |
|
|
628395eda5 |
Discovery lift (1/N): convert_*_results_to_spotify_tracks -> shared helper
First cluster of the per-source playlist-discovery deduplication. The convert_<source>_results_to_spotify_tracks functions (Tidal, Deezer, Qobuz, Spotify-Public, YouTube, ListenBrainz) plus the already-generic _convert_link_results_to_spotify_tracks were byte-identical apart from the source label used in their log line. Lift the shared body into core/discovery/endpoints.py as convert_results_to_spotify_tracks(results, source_label); the 7 web_server functions become 1-line delegations (names/signatures unchanged, so all callers and behavior are identical — 1:1). Beatport is intentionally NOT folded in: its converter coerces artist objects to strings and emits a different track shape (source field, album dict), so it keeps its own implementation. Tests: tests/discovery/test_discovery_endpoints.py (12) pin both input shapes (manual spotify_data / auto spotify_track+found), optional track/disc numbers, falsy-0 omission, field defaults, skip-on-neither, order preservation, if/elif precedence, empty input. web_server.py: -209 lines. Full discovery suite: 151 passed. |
1 month ago |
|
|
abdea631a7 |
HiFi/MB cover art: use CAA 1200px thumbnail, not the flaky /front original
Follow-up to the album-art resolution fix. That change upgraded MusicBrainz
Cover Art Archive thumbnails (/front-250) to the bare /front original — but
/front redirects to archive.org, which is unreliable: probing release-group
covers showed intermittent HTTP 500s (same URL 500s one second, serves the
next) and multi-MB originals (2.9 MB seen). The result was the user-reported
flakiness: cover art that "sometimes works, sometimes shows nothing", and a
huge image embedded into every track when it did work.
The sized thumbnails (/front-250, -500, -1200) are served by CAA's own CDN,
not the archive.org redirect — which is why /front-250 (240p) was always
reliable. Upgrade to /front-1200 instead: 1200x1200 is a massive jump from
240p, reliably CDN-served, and a sane ~40 KB instead of multi-MB.
Applied in all three CAA spots for consistency: the _upgrade_art_url helper
(embed + cover.jpg paths) and both prefer_caa ("CCA") blocks, which fetched
the bare /front directly with no fallback — so CCA-on users hit the same
flakiness. _fetch_art_bytes still falls back to the original /front-250 if
/front-1200 is ever refused.
Tests updated to assert the 1200px target, idempotency, and that the bare
/front original is intentionally left untouched.
|
1 month ago |
|
|
c9ad4f496f |
Embed highest-resolution album art across all art paths
User report: embedded album art came out ~600x600 while the cover.jpg in
the folder was high-res. The cover.jpg path upgraded the source CDN URL
to its highest resolution, but the tag-embed path fetched the raw URL —
so iTunes art embedded at its 600x600 default, Spotify at 640, Deezer at
1000. The "Write Tags to File" retag path had the same gap (Deezer-only
upgrade), and MusicBrainz art was worse still: every Cover Art Archive
URL is built as the /front-250 thumbnail, so MB-sourced downloads
embedded 250x250.
Factor the resolution upgrade + fetch into two shared helpers in
core/metadata/artwork.py and route every art path through them:
_upgrade_art_url(url) — bump to the source's highest resolution:
- Spotify (i.scdn.co) -> original master (~2000px+)
- iTunes (mzstatic.com) -> 3000x3000
- Deezer (dzcdn) -> 1900x1900
- Cover Art Archive -> /front original (was /front-250)
_fetch_art_bytes(url) — upgrade, fetch, and fall back once to the
original size if the CDN refuses the larger one (non-regressive).
Now consistent across: embed-into-tags (post-process), folder cover.jpg
(post-process), and the enhanced-library "Write Tags to File" retag flow.
The YouTube path already upgraded via Album.from_spotify_album, unchanged.
De-duplicates the per-source upgrade code that was copied across sites
and drops the now-unused urllib import from tag_writer.
Not covered (follow-up): Last.fm / Amazon / Tidal / Qobuz have no
explicit upgrade yet — some already serve full-res, others may hand over
a capped size that passes through unchanged.
Tests: new tests/metadata/test_artwork_resolution.py pins every upgrade
(Spotify 300/640->master, iTunes 100/600->3000, Deezer->1900, CAA
thumbnail->original, unrecognized/empty unchanged) and the fetch
fallback. Updated the two tag_writer fallback tests to patch the network
at its new home in artwork.
|
1 month ago |
|
|
b14d504cc1 |
Fix: MusicBrainz artist discography capped at 25 releases
Artist-detail discography from MusicBrainz fetched releases via the artist lookup (`/artist/<mbid>?inc=release-groups`), which MusicBrainz hard-caps at 25 embedded release-groups and which ignores the `limit` param entirely. Prolific artists had ~85% of their catalogue silently dropped — Kendrick Lamar has 167 release-groups on the site but only the first 25 ever reached SoulSync. Reported by Sokhi: "a lot of albums are missing when searching vs what's showing on the site." Switch `get_artist_albums` to walk the paginated browse endpoint (`/release-group?artist=<mbid>`, offset loop) — the same pattern the basic-search path already uses — fetching the full catalogue up to the caller's limit. No type filter and no studio-only filter here: the artist-detail page wants every primary/secondary type so its tabs mirror musicbrainz.org. Verified live: now returns all 167 for Kendrick. Adds 7 tests covering pagination past the cap, offset advance, short-page stop, limit cap, cross-page dedup, type->bucket mapping, and a regression pin asserting the capped inc=release-groups lookup is no longer the discography source. |
1 month ago |
|
|
7145368d42 |
Basic search: visual overhaul + per-source picker in hybrid mode
Two things in this commit. Functional download / matched-download behaviour is untouched — same JS handlers, same routes for the download actions, same album-expand interaction. VISUAL REDESIGN - Glass search-bar card with accent radial wash + focus ring + pill primary search button - Source chip row above the search bar (see below) - Always-visible compact filter pill row (Type / Format / Sort) — pills carry both ``bs-filter-pill`` (new visual) and ``filter-btn`` (legacy class for ``resetFilters`` + ``applyFiltersAndSort`` in wishlist-tools.js to keep working) - Accent-tinted status pill matching the dashboard / auto-sync look - Album result cards: glass card with accent left-edge stripe, 52px brand-tinted cover icon, chevron expand indicator, pill action buttons (Download / Matched Album), accent glow on hover - Track result cards: glass row with accent stripe, 44px icon, pill action buttons (Stream / Download / Matched Download) - Multi-disc separators inside expanded album track lists styled with the accent treatment - Responsive: action button columns stack vertically below 900px New CSS lives in a self-contained ``webui/static/basic-search-v2.css`` sheet linked from index.html. Selectors are scoped to ``#basic-search-section`` for any class that already exists in style.css (``.album-result-card``, ``.album-icon``, ``.track-*``, etc.); the new ``bs-*`` prefixed classes for the search bar / filters / source row / status are unscoped because they only exist in the new markup. ``!important`` is used on the card-level rules to defeat the original unscoped ``.album-result-card`` etc. rules in style.css that would otherwise leak heavyweight padding / box-shadow / 56px icon styles into the new design. Also removed ``overflow: hidden`` from the original ``.album-result-card`` and ``.track-result-card`` rules in style.css — those two classes only render in ``downloads.js`` basic search results (verified via grep, two render sites only), so the removal can't impact any other UI. SOURCE PICKER (hybrid mode) - New ``GET /api/search/sources`` endpoint returns the list of active sources from the orchestrator's chain (or the single active source in single-source mode). - Frontend renders a chip row above the search bar. Click a chip to target that source for the next search; the chip's brand accent fills. - In single-source mode the lone chip is rendered as a dashed- border label so the user always knows what they're searching but can't accidentally try to switch to sources that aren't configured. - ``/api/search`` accepts an optional ``source`` body param. When set, ``core/search/basic.py:run_basic_search`` resolves the client directly via ``orchestrator.client(source)`` and calls its ``.search()`` instead of going through the hybrid chain. - Backwards compatible: omitting ``source`` falls through to the original ``orchestrator.search()`` call exactly as before. Unknown source names also fall back to the default — typo protection. TESTS (5 new + 6 pre-existing = 11 total in test_search_basic.py) - source param routes to specific client, NOT orchestrator chain - no source param preserves original orchestrator-default behaviour - unknown source name falls back to orchestrator default - ``run_basic_soulseek_search`` backwards-compat alias preserved - source-targeted path serialises albums + tracks correctly 101 search-suite tests pass. |
1 month ago |
|
|
258905ff5c |
Fix: duplicate tracks in albums with Japanese / CJK titles (#722)
Reporter @Sokhii: downloading the Mushoku Tensei Original
Soundtrack II via Apple Music metadata + Tidal download
produced duplicate library entries — same audio file landed
under multiple track positions in the album view.
Root cause (verified by direct probe + isolated repro):
``MusicMatchingEngine.normalize_string`` correctly skipped
unidecode for CJK text (kanji→pinyin would have produced
gibberish — see the inline comment at line 74-76), but then
ran ``re.sub(r'[^a-z0-9\s$]', '', text)`` which stripped EVERY
CJK character. Every Japanese title normalised to ``''``.
``similarity_score`` has an early-out guard
if not str1 or not str2: return 0.0
so EVERY CJK-vs-CJK title comparison returned 0.000.
Downstream effect: the matcher fell back to duration+artist
alone. For an OST album with 24 tracks all by the same artist
with similar durations, multiple iTunes track queries landed
on the SAME Tidal candidate. SoulSync wrote each download to
a different output filename (per the iTunes track position),
so on disk there were N copies of the same audio under
different track numbers. The user's library showed 34 entries
for an album with 24 actual tracks.
Probed iTunes album 1753240110 directly — 24 distinct tracks,
zero (disc, track_number) collisions, both US + JP storefronts.
So the duplicate origin was definitely downstream of metadata
fetch.
Fix: when CJK is detected upstream, the alphanumeric-strip step
also preserves CJK Unified Ideographs + radicals
(⺀-鿿), Hiragana + Katakana (-ヿ), Halfwidth
/ Fullwidth forms (-), and Hangul syllables
(가-). CJK titles now produce a comparable normalised
form instead of an empty string. ``similarity_score`` works as
intended:
'命の灯火' vs '命の灯火' → 1.000 (was 0.000)
'命の灯火' vs '無職転生' → 0.000 (was 0.000, but now from
actual char comparison
not from the empty-string
guard)
Latin-only normalisation is completely unchanged. ``has_cjk``
is False for Latin input, so both the CJK-lowercase branch AND
the new CJK-preserve strip branch are skipped — Latin titles
go through the original unidecode + lowercase + strip path
verbatim. Tested via 4 regression tests that pin the Latin
baseline (simple, unidecode target, $-preservation, identical
+ different similarity scores).
16 new unit tests in ``tests/test_matching_engine_cjk.py``:
- Kanji / Hiragana / Katakana / Hangul / Chinese all survive
- CJK-only strip still removes Latin punctuation in the
CJK branch
- Mixed Latin + CJK lowercases the Latin half
- Identical CJK titles → 1.0
- Disjoint CJK titles → near 0
- Partially overlapping CJK titles → midrange
- CJK doesn't falsely match unrelated Latin
- 4 Latin-baseline regression pins
- Real-world Mushoku Tensei OST scenario
371 text + imports + new CJK tests pass after the fix.
|
1 month ago |
|
|
df675c7c9f |
Fix: Usenet bundle stuck on "downloading release" when SAB History flips before storage lands (#721)
Follow-up to the 2.6.3 queue→history handoff fix (#706). User @IamGroot60 reported in #721 that on 2.6.3 the bundle still gets stuck mid-flight: SoulSync UI sits on "Usenet downloading release 61%" forever, SAB History shows the job as Completed 2+ minutes ago, files are physically present in the slskd downloads folder but never copied into ``storage/album_bundle_staging/<batch>/``. Root cause: a second-stage gap in the SAB pipeline. SAB flips a job's ``status`` to ``Completed`` in History as soon as par2 + unrar finish, but its post-processing pipeline writes the final ``storage`` field a few seconds LATER (the move-to-final step). ``poll_album_download`` saw the first ``Completed`` read with ``save_path=None`` and bailed: if status.state in complete_states: return last_save_path # ← None at this point ``download_album_to_staging`` got ``save_path=None``, set ``result['error']`` and returned. The bundle was marked failed but the LAST progress emit before the failure was ``downloading progress=0.61``, so the UI froze on "61%" — the terminal ``failed`` emit never registered on the user's screen because the renderer holds the last-known progress. Fix - ``poll_album_download`` now tracks a separate transient counter for "complete state seen, save_path not yet set." Up to ``transient_miss_threshold`` (default 5) consecutive reads in that state are tolerated before the poll bails. SAB writes the ``storage`` field within 2-10 seconds of the History flip in practice — the default 5 × 2s = 10s window covers it. - When save_path eventually lands, return it normally. - When the threshold is exhausted with save_path still empty, emit terminal ``failed`` with an explicit message pointing at the missing save_path field — no more 6-hour silent spin. - Earlier ``downloading`` reads with a non-empty ``save_path`` (qBit / Transmission set this from the start of the download) remain "sticky" — if the eventual ``completed`` read has empty save_path, the cached one applies. So torrent flows aren't affected by the retry path. SAB adapter (``_parse_history_slot``) - Widened the save_path field fallback chain: storage → path → download_path → dirname → incomplete_path Covers SAB version differences (older builds populated ``path``) and forks that expose ``download_path`` or ``dirname``. ``incomplete_path`` is the last resort — SAB's in-progress dir before the final move — so the bundle plugin at least has a path to scan when nothing else lands. - Whitespace-only values are skipped. - Loud debug log when none of the known fields land — users on SAB versions / forks with novel field names need to see this in logs so we can grow ``_HISTORY_SAVE_PATH_KEYS``. Tests - ``test_album_bundle.py`` (3 new): - tolerates_completed_with_late_save_path_arrival — the #721 scenario; first Completed read has no save_path, third has it; poll returns the path normally - gives_up_when_completed_with_no_save_path_persists — past the threshold the poll fails loudly instead of silent-spinning - uses_save_path_from_earlier_downloading_emit_if_completed_lacks_one — sticky save_path keeps torrent flows working - ``test_usenet_client_adapters.py`` (6 new): - falls back to ``path`` when ``storage`` empty - falls back to ``download_path`` - prefers ``storage`` when multiple fields present - returns ``None`` when all fields empty (the #721 gap window) - ignores whitespace-only values - uses ``incomplete_path`` as last resort 132 album-bundle + usenet tests pass. Branch is on dev parented at 2.6.3 — user @IamGroot60 offered to test on dev, so this is a candidate cherry-pick for either a 2.6.4 hotfix or merge straight into dev for the next release. |
1 month ago |
|
|
5771c5ba77 |
Album-bundle staging: clean Soulseek copies + sweep orphans at startup
Two related leaks in ``storage/album_bundle_staging/<batch_id>/``:
1. **Soulseek bundle cleanup was excluded.** The per-batch cleanup
at the end of a bundle download gated on:
(album_bundle_source or '').lower() in ('torrent', 'usenet')
The comment justified it as "slskd keeps its own completed
folders" — but the Soulseek bundle path ALSO copies completed
files into the private staging dir (``soulseek_client.py:1599``,
``copy_audio_files_atomically(completed, Path(staging_dir))``)
for the per-track workers to claim. Those copies persisted
forever; long-running installs accumulated stale GB. Extended
the cleanup gate's allow-list to include ``soulseek`` so the
per-batch dir is removed on bundle completion — same code path
that already worked for torrent / usenet.
2. **No sweep for orphan dirs.** Any leftover ``<batch_id>``
subdir from a previous-session crash, an errored batch, or a
pre-fix Soulseek bundle stayed on disk forever. Added
``sweep_orphan_album_bundle_staging(staging_root, active_batch_ids)``
that runs ONCE at server startup, before any batch can register
a staging dir. Removes every ``<batch_id>``-shaped subdir
whose id isn't in the active set. Safe by construction:
- Only touches subdirs of the configured staging root.
- Name-shape check (``entry.name == _safe_batch_dirname(entry.name)``)
rejects hand-placed dirs like ``.git`` or stray docs.
- ``shutil.rmtree`` errors log + continue — sweep must not
crash app startup over a permission glitch.
- active_batch_ids normalised through ``_safe_batch_dirname``
so colon-bearing batch_ids match their on-disk form.
Wired into the web_server startup right after the stuck-flags
diagnostic so it fires before anything else touches batches.
Tests
- ``test_downloads_lifecycle.py`` gained one regression test
pinning that Soulseek bundles now have their staging dir
cleaned (sibling to the existing torrent test).
- ``test_album_bundle_staging_sweep.py`` (NEW, 11 tests)
covers: orphan removal with no actives, active dirs preserved,
special-char batch_id normalisation, no-op on missing /empty
/empty-string staging root, non-dir entries skipped, unsafe-
name dirs preserved (.git etc.), partial rmtree failure doesn't
abort the rest, listdir failure returns 0 cleanly, default
None active set, defensive against empty / None entries in
the active set.
488 downloads tests pass.
For users with an existing "clean up old files" automation pointed
at this dir: stop pointing it there if you want — the auto-cleanup
+ startup sweep cover it now. Or leave it as belt-and-suspenders
with a relaxed (1h+) mtime threshold so it can't race a mid-batch
download.
|
1 month ago |
|
|
f976a6da53 |
Fix: Soulseek album-bundle downloads stuck on "failed" after slskd
finished the release (#715) Symptom (user @pavelcreates / @IamGroot60 on 2.6.2): - Click Download on an album in the search modal - slskd starts + completes every track of the release - 22+ minutes after the last completed download, batch flips to "failed" with no clear log line explaining why - Per-track Soulseek downloads on the same machine were fine Root cause: ``core/soulseek_client._resolve_downloaded_album_file`` probed three hard-coded candidate paths to locate each downloaded file in the slskd download dir: candidates = [ download_path / remote_filename, download_path / basename, download_path / *normalized_path_parts, ] On the common slskd config ``directories.downloads.username = true`` slskd writes files at ``<download_dir>/<username>/<filename>`` — none of the three candidates carry a username segment, so the resolver returned None for every file even though the file was physically present in a subdir one level deeper. ``_poll_album _bundle_downloads`` saw 0 completed_paths, kept spinning, and hit the master deadline (~30 min) before bailing the batch. Why per-track worked: ``web_server._find_completed_file_robust`` already does a recursive walk-by-basename + path-confirm against the remote directory components, so any layout slskd writes ends up resolved. The bundle path didn't go through it. Fix - Lifted the robust finder into ``core/downloads/file_finder.py`` as a pure function ``find_completed_audio_file(download_dir, api_filename, transfer_dir=None) -> (path, location)``. Zero globals; recursive walk; handles slskd dedup suffix ``_<10+digit-timestamp>``, YouTube / Tidal ``id||title`` encoded filenames, the AcoustID-quarantine subdir skip, basename collisions disambiguated by remote-path components, and a fuzzy-basename fallback above 0.85. - ``_resolve_downloaded_album_file`` keeps the three-candidate fast path (cheap probe for the slskd-flat default) but now delegates to the new helper when none hit, instead of giving up. - ``_poll_album_bundle_downloads`` tracks "slskd reports Completed but local resolver returns None" per key. When every remaining key has been in that state past a 45-second grace window, the poll exits early with an explicit error pointing at the likely ``soulseek.download_path`` mismatch instead of silently spinning until the master deadline. - ``web_server._find_completed_file_robust`` becomes a thin delegate so both callers share one finder. Legacy inline impl kept as ``_find_completed_file_robust_legacy`` for reference; to be removed next release. - Fixed misleading ``"(0 tracks, quality=)"`` log on the preflight- reuse path — was reading attrs off a None ``picked`` object. Tests (17 new in tests/downloads/test_file_finder.py) - Flat slskd layout - Username-prefixed (the #715 case) - Full remote tree preserved - Deeply nested username + tree - File genuinely missing returns None - Basename collision disambiguated by remote dirs - Single basename match wins regardless of dirs - slskd dedup suffix match - Short ``_<digits>`` (year) not treated as dedup - AcoustID quarantine subdir skipped - YouTube / Tidal ``id||title`` encoded filenames - transfer_dir fallback - Both dirs miss → (None, None) - Non-audio files ignored - Empty api_filename - Fuzzy match on punctuation variant - Fuzzy rejects below threshold 475 downloads tests pass after the lift. |
1 month ago |
|
|
01a867e589 |
Auto-Sync: fix LB pipelines stuck on "Refreshing:" for 5+ minutes
Pipeline-driven Auto-Sync runs against any ListenBrainz playlist
(Weekly Jams, Weekly Exploration, Top Discoveries, etc.) would sit
on ``Refreshing: "<name>"`` with no UI updates for 5-7 minutes
before the pipeline progressed. Two real bugs stacked:
1. **Double discovery.** The refresh handler called
``_maybe_discover`` (matching engine, per-track Spotify/iTunes/
Deezer matches) inline for any source returning
``needs_discovery=True`` tracks. Phase 2 of the pipeline then
ran the SAME matching engine via ``run_playlist_discovery_worker``
on the same tracks. The refresh-side run blocked the loop with
zero progress emission; Phase 2's already has the timed
progress-poll pattern. So LB tracks discovered twice, the first
time silently.
Pipeline now sets ``skip_discovery=True`` on its refresh config.
The handler honors the flag and lets Phase 2 handle discovery
end-to-end. Standalone callers (Sync-page tab, registration
action) leave the flag unset so they still get matched_data
on refresh.
2. **No targeted LB refresh.** The LB adapter's ``refresh_playlist``
called ``manager.update_all_playlists()`` — the only refresh
entry-point the manager exposed — which re-pulls every cached
LB playlist's details from the API (~12+ round-trips) even
when only one playlist needed refreshing. Wasteful;
tax-on-everyone for one-playlist work.
Added ``LBManager.refresh_playlist(mbid)`` — reads the cached
playlist_type, fetches just that playlist's details, runs the
normal ``_update_playlist`` upsert path. Defaults type to
``user`` for un-cached mbids so new-playlist discovery still
works. Skips ``_cleanup_old_playlists`` and
``_ensure_rolling_mirrors_from_cache`` (wasted work for a
single-playlist refresh).
Also: killed a silent ``except Exception: pass`` in the LB
adapter's old refresh wrapper that was masking every LB API
failure as a stale-cache hit. Refresh errors now log with full
traceback at warning level and propagate ``None`` so the outer
handler at ``refresh_mirrored.py:104`` counts the error and
surfaces it to the run-history error tally.
Pinned with 12 new unit tests across:
- ``tests/test_listenbrainz_manager.py`` (8): targeted refresh
happy path, unauthenticated guard, empty-mbid guard, upstream
``None`` return, default playlist_type for unknown mbid,
exception propagation, cost guard skipping cleanup, skipped-
when-unchanged signal
- ``tests/test_playlist_sources_adapters.py`` (3): adapter uses
targeted call (not legacy), adapter returns ``None`` on manager
error (not silent swallow), adapter resolves synthetic series
ids before calling the manager
- ``tests/automation/test_handlers_playlist.py`` (1):
skip_discovery flag bypasses ``_maybe_discover`` end-to-end
|
1 month ago |
|
|
45ecf2730d |
Wishlist: harden Spotify backfill — poisoned tn=1 can't mask lean album
Residual per-track wishlist downloads (single tracks from different
albums, below the album-bundle threshold) were producing folders
without a year subfolder whenever the wishlist row carried a stale
``track_number=1`` from an older payload default.
Why: ``core/downloads/candidates.py`` had a single API-fetch branch
that served two concerns — resolving the track position AND
hydrating the lean ``spotify_album_context`` (release_date /
total_tracks / cover image) — gated entirely on track_number being
unresolved. When the wishlist row's ``track_number`` happened to
be 1 (a poisoned default rather than a real value), the gate
short-circuited and the album hydration the same call would have
done was skipped. Deezer-sourced discovery matches don't ship
release_date in their search-result album shape, so without the
backfill the folder lost its year.
The two concerns split:
- track_number resolution keeps its track_info → track object →
API precedence chain. track_info defaults still win.
- album hydration runs whenever release_date or total_tracks are
missing, independent of where (or whether) track_number was
resolved.
The single API round-trip still serves both — the cost contract
is preserved. The side-effect coupling is gone.
Lifted into ``core/downloads/track_metadata_backfill.py``
(``hydrate_download_metadata``) so the precedence chain is pinned
in isolation. 24 unit tests cover the precedence chain, the
poisoned-tn=1 regression case, defensive non-dict/None inputs,
the cost guard (API called at most once per invocation), and
disc_number resolution.
Also lands the upstream piece: ``core/wishlist/routes.py:_build_track_data``
no longer defaults ``track_number=1`` / ``disc_number=1`` /
``total_tracks=1`` / ``release_date=''`` when the library-modal add
payload omits them. Missing values now flow through as ``None`` so
the downstream pipeline can detect-and-recover instead of locking
to a fake position.
|
1 month ago |
|
|
997732ee63 |
Wishlist: fix three regressions causing all imports to land as track 01 with no year
Real-world regression triggered by the album-bundle work earlier in
2.6.3. Tracks with full Spotify metadata were importing as
``01 - <title>`` under ``Artist - Album/`` (no year), even when the
source filename carried the correct track number and Spotify's
release_date was available.
Investigation via DB inspection of stored wishlist rows:
```
"Never Gonna Give You Up" → track_number=None, release_date=""
"idfc" → track_number=1, release_date=""
"No Sleep Till Brooklyn" → track_number=1, release_date=""
```
Source-of-truth Spotify metadata had release_date AND real track
positions, but the wishlist row was poisoned. Three regressions
compounded the loss:
**Fix A — ``track_object_to_dict`` (``core/wishlist/payloads.py:295``)
preserved only album.name during Track→dict conversion.**
Pre-fix:
```python
album_name = "Unknown Album"
if hasattr(track_object, "album") and track_object.album:
if hasattr(track_object.album, "name"):
album_name = track_object.album.name
else:
album_name = str(track_object.album)
result = {
...
"album": {"name": album_name}, # ← release_date / images / etc. all dropped
...
}
```
When a wishlist payload arrived as a Track dataclass instead of a
raw spotify_data dict, the Track→dict conversion stripped
release_date, images, album_type, total_tracks, id, and album-level
artists. Every wishlist row added through this path landed in the
DB with ``album={'name': X}`` only.
Post-fix: three branches handle the three album shapes
- ``album_attr`` is a dict → ``dict(album_attr)`` preserves every key
- ``album_attr`` is a sub-object → pull all common Album-dataclass
attrs (id, release_date, album_type, total_tracks, images, ...)
- ``album_attr`` is a bare string → build a dict from the track
object's adjacent attrs (release_date, album_id, album_type, ...)
and surface ``image_url`` as ``album.images``
**Fix B — ``core/discovery/playlist.py:309`` only added
``track_number`` / ``disc_number`` keys when truthy.**
Pre-fix:
```python
matched_data = { 'id': ..., 'name': ..., ... } # no track_number / disc_number
if track_number:
matched_data['track_number'] = track_number
if disc_number:
matched_data['disc_number'] = disc_number
```
Deezer-sourced matches always hit this branch with ``track_number=None``
because the cache enrichment at line 304 reads ``_raw.get('track_number')``
literally, but Deezer's raw shape uses ``track_position``. So the key
was omitted from ``matched_data``, downstream consumers couldn't
distinguish "missing key" from "value is 1", and the chain silently
filled 1.
Post-fix: keys are ALWAYS present (None when unknown). Also adds a
``best_match.track_number`` fallback so the Track-dataclass-mapped
value (which DOES include ``track_position``→``track_number``
mapping) gets used when the cache lookup misses.
**Fix C — Pipeline only consulted ``album_info.track_number`` before
falling to the filename (``core/imports/pipeline.py:645``).**
VA-collection source files like ``417 Fountains of Wayne - Stacys
Mom.flac`` have a leading playlist-position number that isn't the
album track number. The previous chain (album_info → filename →
floor-1) couldn't recover the real position because the filename
extractor either returned 417 (wrong) or None (caught by the floor).
But the wishlist payload's ``track_info.spotify_data.track_number``
HAD the right answer all along — Spotify says Stacy's Mom is track
3 on Welcome Interstate Managers.
Post-fix: resolution chain extracted into ``core/imports/track_number.py:resolve_track_number``
as a pure function:
1. ``album_info.track_number`` (album-bundle dispatch authoritative)
2. ``track_info.track_number`` (per-track flow payload)
3. ``track_info.spotify_data.track_number`` (nested fallback)
4. ``extract_explicit_track_number(file_path)`` (filename, returns
0 when no numeric prefix — vs the default helper that returns 1)
5. Caller (pipeline) applies the final >=1 floor
Each step coerces to a positive int or falls through to the next.
Pure function = unit-testable in isolation = single place to fix
the rule.
**Test coverage (37 new tests):**
- ``tests/wishlist/test_payloads.py`` (+4) — Track→dict conversion
preserves full album dict (dict / object / string album shapes) +
None-track-number stays None.
- ``tests/discovery/test_discovery_playlist.py`` (+2) — matched_data
always includes track_number/disc_number keys (None when unknown)
+ falls back to best_match attrs when cache misses.
- ``tests/imports/test_track_number_resolver.py`` (+16) — every
resolution-chain branch pinned: album_info-wins, track_info
fallback, spotify_data nested, JSON-string parsing, garbage-string
fall-through, zero / negative / non-numeric / string-numeric
coercion, filename fallback, explicit extractor vs default
extractor semantics, defensive None inputs, VA-collection
filename behaviour, all-sources-missing → None.
1571 wider-suite tests pass (wishlist + imports + discovery +
downloads + metadata). Ruff clean.
**Migration note:** existing wishlist rows that were saved under
the OLD ``track_object_to_dict`` (with stripped album metadata) still
have ``release_date=''`` in the DB blob. Those won't self-heal — the
next attempt loads from the poisoned blob. Users can remove + re-add
those tracks to refresh, or wait for the next sync run that
re-discovers them with full metadata. No automatic migration shipped
in this PR (scope creep — the forward path is fixed, backfill is a
separate concern).
|
1 month ago |
|
|
6841128dc2 |
Wishlist: distinguish Queued from Analyzing for executor-pending batches
PR 4 of 4 in the wishlist-album-bundle issue series. UI fix only —
zero behavior change.
User's 26-track wishlist run rendered all 26 sub-batches as
"Analyzing..." simultaneously. Pre-fix the rows were created with
``phase='analysis'`` BEFORE being submitted to ``missing_download_executor``
(max_workers=3 by default), so 23 batches sat in the executor queue
visually identical to the 3 actually running. Misled users into
thinking SoulSync was processing 26 in parallel; really only 3 ever
ran at once with the rest waiting their turn.
Fix:
- Wishlist auto-flow submission sites now create batch rows with
``phase='queued'``.
- The master worker (``core/downloads/master.py:328``) already flipped
phase to ``'analysis'`` as its first action on entry — that
transition becomes the real signal that the executor picked the
batch up.
- ``core/downloads/status.py`` surfaces ``analysis_progress`` for
the ``queued`` phase too so the UI has the track count to render
"Queued — N tracks" instead of an empty card.
- Frontend (``webui/static/pages-extra.js``, ``downloads.js``) renders
"Queued ⏳" for ``phase='queued'`` distinct from the spinner-laden
"Analyzing..." for ``phase='analysis'``.
Scope choices:
- Only the auto-wishlist submission sites flipped this PR
(``core/wishlist/processing.py:860`` album sub-batches +
``core/wishlist/processing.py:907`` residual). The manual-wishlist
sites at ``:451`` and ``:627`` use the same executor + worker, but
those create a caller-allocated batch_id that the frontend polls
immediately — wanted to verify the manual-poll path handles
``queued`` cleanly before flipping those. Trivial follow-up.
- Other submission sites in album_bundle_dispatch / web_server.py /
task_worker.py left untouched — they don't go through the
executor-queue pattern that causes this UI confusion.
Tests:
- Updated ``test_process_wishlist_automatically_creates_batch_for_matching_tracks``
to assert ``phase='queued'`` on creation (was ``'analysis'``); explanatory
comment names the executor-pool reason.
- New ``test_queued_phase_surfaces_analysis_progress_for_ui_count`` in
``tests/downloads/test_downloads_status.py`` pinning the new
``queued ⊂ analysis_progress`` rendering contract.
- 884 tests pass across wishlist + downloads + imports suites.
- Ruff clean on changed Python files; JS syntax OK on changed
webui files.
PR 3 (sibling-completion gate) was investigated and dropped — the
"1/26 finalized" symptom turns out to be downstream of the
staging-match bug (PR 2's instrumentation will catch it on the
user's next reproduction run), not an independent sibling-gate bug.
The gate logic itself is correct.
|
1 month ago |
|
|
66d7029276 |
Wishlist payloads: preserve real track_number + release_date end-to-end
Two confirmed-from-code-reading bugs in the wishlist retry chain.
Both cause downstream post-process to render every retried file as
``01 - <title>`` without year in the folder path, even when the
source slskd file had the correct track number embedded and Spotify
had the album release date.
**Bug A — track_number defaults to 1 at every link in the chain.**
Pre-fix: ``.get('track_number', 1)`` defaulted at four sites:
- ``core/wishlist/payloads.py:121`` ``ensure_wishlist_track_format``
- ``core/wishlist/payloads.py:282`` Track-object conversion
- ``core/imports/context.py:421`` legacy album-info builder
- ``core/imports/pipeline.py:645`` final processing read
Each step "filled in" 1 when the upstream had dropped the key. The
downstream filename-extract fallback at ``pipeline.py:652`` ONLY
runs when the value is None — pre-filled 1 never matched, so the
fallback never fired, so the source filename's track number (e.g.
``08. No Sleep Till Brooklyn.flac``) was discarded in favour of the
default-1.
Fix: change every default from ``1`` to ``None`` along the chain.
The pipeline already has the right detect-and-recover logic — it
just needs the chain to stop poisoning it. Final ``< 1`` floor at
``pipeline.py:660`` still defaults to 1 as last resort, so callers
that genuinely have nothing still produce a valid number.
**Bug B — release_date dropped from cancelled-task wishlist payload.**
Pre-fix: ``build_cancelled_task_wishlist_payload`` only ``setdefault``ed
``name`` / ``album_type`` / ``images`` on the album dict. The
release_date field copy was load-bearing (when input was a dict, the
``dict(album_raw)`` copy preserved it), but when input was a bare
string the constructed dict had only name + album_type — no
release_date / total_tracks / etc.
Fix:
- Explicit comment on the dict-shape branch that release_date survives
via the unconditional ``dict(album_raw)`` copy + setdefault
semantics — so a future refactor that switches to a stricter copy
doesn't silently strip the field.
- String-shape branch now pulls release_date from
``track_info.album_release_date`` or ``track_info.release_date``
when present so the round-trip preserves the year for the path
template.
- track_data shape itself now carries ``track_number`` / ``disc_number``
at the top level (Bug A intersect — was dropping it entirely).
**Tests:** 4 new in tests/wishlist/test_payloads.py:
- ``test_ensure_wishlist_track_format_preserves_real_track_number``
- ``test_ensure_wishlist_track_format_keeps_missing_track_number_as_none``
- ``test_build_cancelled_task_wishlist_payload_preserves_track_number``
- ``test_build_cancelled_task_wishlist_payload_string_album_pulls_release_date_from_track_info``
14 payload tests pass; 879 across wishlist + imports + downloads
suites still green; 1410 wider suite all pass. Ruff clean.
Commits 2 + 3 of 3 in PR 2/4 of the wishlist-album-bundle issue fix
series. Commit 1 (
|
1 month ago |
|
|
94ba1d733d |
Staging match: log rejection reason on every silent-False exit
Pre-fix: ``try_staging_match`` silently returned False on three exit points (empty cache, no track title, low best-score). Could not diagnose the "track gets staged via album-bundle but never claimed → re-added to wishlist → infinite loop" bug from app.log because the match-attempt + rejection was invisible. Now every False exit logs at INFO with enough context to debug from a single grep: - ``[Staging] No match attempted for <track> — staging cache empty for batch <id>`` - ``[Staging] No match attempted for task <id> — track has empty title`` - ``[Staging] No match for <track> in batch <id> — best candidate <file> (title_sim=X, artist_sim=Y, combined=Z) below 0.75 threshold`` - ``[Staging] No match for <track> in batch <id> — N staging files but none had usable title variants`` Per-candidate skips (no title variants / title_sim < 0.80) log at DEBUG so the noise stays out of INFO unless explicitly enabled. Logs the near-miss candidate score on rejection so a 0.74 (one point below threshold) surfaces as a different kind of bug than a 0.10 (completely wrong file in staging). Same shape SAB's adapter logs now use for transient-vs-terminal status calls (PR #717). Zero behavior change — pure logging. Enables the follow-up commit that actually fixes the staging-match drop, by giving us real evidence of WHERE the wishlist tracks are being rejected during the user's next album-bundle run. 24 staging tests still pass; behavior unchanged. Commit 1 of 3 in PR 2/4 of the wishlist-album-bundle issue fix series. See ``memory/feedback_always_build_kettui_grade.md`` for the instrument-before-blind-fix rule that drove this ordering. |
1 month ago |
|
|
dd32e3bbe1 |
Wishlist: only engage album-bundle when multiple tracks from same album (PR 1/4)
Real-world wishlist case the original
|
1 month ago |
|
|
62ef39c4b7 |
Wire automation engine through next_run_at + register monthly_time (PR 2/4)
PR 1 (commit
|
1 month ago |
|
|
3e61105a1d |
Close three review gaps before PR 1 ships
Self-review pass on
|
1 month ago |
|
|
ec4a55c104 |
Add next_run_at pure function for Auto-Sync schedule types (PR 1/4)
Backend plumbing for upcoming weekly + monthly Auto-Sync schedules.
PR 1 of 4 in the schedule-types feature — see
``memory/project_auto_sync_schedule_types.md`` for the full plan.
Net behaviour change in this PR: zero. The automation engine still
computes next_run via its existing inline ``_calc_delay_seconds`` /
``_next_weekly_occurrence`` helpers; this module is unused until PR 2
wires the engine through. Lands separately so the foundation can sit
on dev for a beat before the engine change.
``core/automation/schedule.py:next_run_at(trigger_type, trigger_config,
now_utc, default_tz)``:
- Pure function. ``now_utc`` injected (tests freeze time without
monkeypatching ``datetime.now``); ``default_tz`` injected (so daily /
weekly / monthly schedules compute against the USER's timezone, not
the server's — the same class of bug that produced the May 2026
"Auto-Sync next in 8h" timezone fix).
- Returns aware-UTC ``datetime`` ready to serialise to the DB
``next_run`` column, or ``None`` for unrecognised / event-based
triggers (callers should not write a next_run for those).
- Naive ``now_utc`` inputs are assumed UTC for defensive symmetry
with the engine's DB-string parser convention.
Trigger types covered:
- ``schedule``: ``{interval: N, unit: 'minutes'|'hours'|'days'|'weeks'}``
— matches engine's existing ``_calc_delay_seconds``. Unknown unit
defaults to hours; zero/negative interval clamps to 1 (preserves
the engine's guard against scheduling for the past); non-numeric
interval falls back to 1.
- ``daily_time``: ``{time: 'HH:MM', tz: '<IANA>'}`` — DST-aware via
``zoneinfo``; ``tz`` falls back to ``default_tz``; unknown IANA
string falls back to UTC; garbage ``time`` falls back to 00:00.
- ``weekly_time``: ``{time, days: ['mon',...], tz}`` — empty / all-
invalid ``days`` list means "every day" (matches engine fallback);
abbreviations case-insensitive; 8-day scan finds the next match.
- ``monthly_time``: ``{time, day_of_month: 1-31, tz}`` — NEW shape.
Day clamped to [1, 31]. Months too short for the target day clamp
to the LAST valid day rather than skipping a month (standard cron
convention; running a day early in February is less surprising
than missing the whole month). 12-iteration loop cap so a
pathological config can't infinite-loop.
Tests (36 cases, all passing):
- Interval: every unit, unknown-unit fallback, zero/negative/garbage
interval clamp, tz field ignored on interval (wall-clock-independent).
- Daily: today-at-future-time runs today, today-at-past-time rolls to
tomorrow, exact-match rolls to tomorrow (no schedule-now-then-schedule-
again-immediately), user-tz vs server-tz, default_tz fallback,
garbage time / unknown tz defensive returns.
- Weekly: same-day-still-future qualifies, same-day-past rolls to next
allowed day, wraps across week boundary, empty days = every day,
garbage abbreviations dropped, case-insensitive, tz across day
boundary (LA Wednesday evening is Thursday UTC).
- Monthly: target day this month, rolls to next month when passed,
Feb 31 → Feb 28 / Feb 29 leap year, day_of_month above 31 / below
1 clamp, Dec → Jan year roll, user-tz pre-midnight edge case.
- Result-shape contract: every returned datetime is aware UTC at
offset zero (engine relies on this when serialising to the
``next_run`` string column).
Added ``tzdata==2026.2`` to requirements.txt. Windows ``zoneinfo`` and
minimal Docker base images ship without the system tz database;
without ``tzdata`` ``ZoneInfo('America/Los_Angeles')`` raises
``ZoneInfoNotFoundError`` and the helper silently falls back to UTC.
No WHATS_NEW entry — no user-visible behaviour change in this PR.
PR 2 (engine wire-through) will land the user-facing changelog entry
when ``monthly_time`` becomes a real schedulable trigger.
|
1 month ago |
|
|
e2d45c51e5 |
Address kettui-flagged items on usenet poll fix (#706)
Follow-up to
|
1 month ago |
|
|
f13d339584 |
Usenet album poll: tolerate SAB queue→history handoff, emit terminal failure (#706)
User reported usenet album downloads getting stuck on "downloading
release" while SABnzbd reported the job as complete. Container restart
did not help; reproducible on every usenet album download.
Three independent issues all causing the same symptom — the download
modal freezes mid-flow with no error surfaced to the user:
1. SAB queue → history transition window
SAB removes a slot from its queue BEFORE adding it to the history,
and on a busy server (par2 verify, unrar, multi-file move) that
window can span several poll iterations. The poll treated a single
None status as terminal failure ("disappeared from client") and
gave up. Now the poll tolerates up to ~10s of consecutive misses
(5 polls at the default 2s interval) before declaring the job gone.
2. SAB queue states like `Pp` were unmapped
`_SAB_QUEUE_STATE_MAP` didn't cover SAB's `Pp` (post-processing
summary), `Unpacking`, `Trying`, `Deleted`, or the `Prop_paused`
/ `Prop_failed` variants. Unmapped states fell through to the
default-'error' fallback, and the poll loop only treated explicit
'failed' / 'completed' as terminal — 'error' was neither, so the
loop spun until the 6-hour timeout. Map now covers every Status
value from SAB's `sabnzbd/api.py`, and the poll treats the default-
'error' fallback as a transient miss (warn-logged, retry within
the same tolerance window) so a brand-new unmapped state can't
infinite-loop the way `Pp` did here.
3. No terminal failure emit
The poll only logged on failure / timeout / disappeared — never
called the progress callback with 'failed', so the download modal
stayed at the last 'downloading' emit forever. Plumb a 'failed'
emit through every failure exit path so the UI flips out of the
downloading state when the poll gives up.
Plus:
4. SAB direct nzo_ids lookup instead of paging all-history
`_get_status_sync` was fetching the latest 50 history entries on
every poll and iterating to find the target nzo_id. On busy
servers (many recent downloads), the target job could roll past
the 50-entry window and look like a "disappeared" job. Replaced
with a targeted `mode=queue&nzo_ids=<id>` → `mode=history&nzo_ids=<id>`
chain. Falls back to the bulk path for SAB versions that pre-date
the nzo_ids filter — the transient-miss tolerance covers any
short-lived gap there too.
Implementation:
Lifted the album-bundle poll loop out of `usenet.py` and `torrent.py`
into `core/download_plugins/album_bundle.py:poll_album_download` —
near-duplicate implementations are now a single function with deps
injected so it's testable in isolation (kettui's extract-don't-AST-parse
standard; can't unit-test a `time.sleep` loop inside a plugin method).
The lifted helper takes:
- `get_status` callable bound to job_id, so the same loop works for
usenet UsenetStatus and torrent TorrentStatus shapes
- `complete_states` set so torrent's `{'seeding', 'completed'}` and
usenet's `{'completed'}` both Just Work
- `failed_states` set so torrent's `{'error'}` is terminal while
usenet's default-'error' fallback is transient
- `transient_miss_threshold` (default 5 ≈ 10s at 2s poll)
- `sleep` / `monotonic` injectables for deterministic tests
Per-track flows in both plugins gained the same transient-miss
tolerance inline — they don't use the emit pattern (update an
`active_downloads[id]` row dict via lock instead), so reusing the
helper would have required threading a no-op emit through. Inline
fix is small enough.
Tests:
- 11 new tests in `tests/test_album_bundle.py:poll_album_download`
cover the happy path, transient-miss tolerance with recovery,
hard-failure threshold, explicit-failed surface, timeout-emit,
default-'error' transient treatment, shutdown clean exit,
torrent's `seeding`-counts-as-complete, save_path captured across
iterations, and adapter-exception treated as transient miss.
- 521 download-suite tests pass (33 in test_album_bundle, others
pin existing torrent + usenet contracts).
- Ruff clean.
Closes #706.
|
1 month ago |
|
|
1d6ced286b |
Discogs: strip artist disambiguation suffixes at every name surface (#634)
Discogs uses two disambiguation conventions for duplicate artist names: - legacy `(N)` numeric suffix: "Bullet (2)", "Madonna (3)" - newer `*` asterisk suffix: "John Smith*", "Foo*" Both were leaking through to the UI on artist search and album search, and worse — through the import path into folder names on disk (reported: importing yielded folders literally named `Foo*`). The pre-existing cleanup only handled `(N)` and only at ONE site — `get_user_collection` (line 469) and one path inside `extract_track_from_release` (line 448 — `re.sub(r'\s*\(\d+\)$', '', artist_name)`). Every other surface (artist search, album search, album-track lookups, get_artist_albums feature matching) returned the raw Discogs string. Centralized into `_clean_discogs_artist_name(name)` at module top, with regex covering both suffixes including repeated forms (`Baz**`, `Foo (3)*`). Applied at six sites: - `Artist.from_discogs_artist` (artist search) - `Album.from_discogs_release` (album search — three fallbacks: array, string, title-split) - `Track.from_discogs_track` (track lookup — track-level + release-level fallback) - `extract_track_from_release` (replaces the inline `(N)`-only re.sub) - `get_user_collection` (existing site, now also strips `*`) - `get_artist_albums` (artist_name used for primary-vs-feature matching; cleaning prevents `Beyoncé*` from failing equality vs `Beyoncé`) - `get_album` (artists_list + per-track artists in the tracklist projection) Tests: - New `test_clean_discogs_artist_name` parametrized over 14 cases covering `(N)`, `*`, repeated `**`, combined `(N) *`, whitespace handling, empty/None defensive returns. - New `test_get_user_collection_strips_discogs_asterisk_disambiguation` pinning the asterisk path end-to-end through the collection import flow (sibling to the existing `(N)` test). - Existing 37 discogs tests still pass. Out of scope (separate issue): the same #634 report flagged track-count and year fields rendering as 0 / empty in Discogs album search. Both are inherent to Discogs `/database/search` response shape — search results don't carry `tracklist` (only release detail does) and `year` is often `0` in search payloads. Fixing requires lazy-fetching release detail per row, which hits the 25 req/min unauth limit hard. Not bundled here. |
1 month ago |
|
|
65d7756da2 |
Resolve pre-existing ruff lint errors blocking CI
Five pre-existing lint errors on dev baseline (all introduced May 25-26 before this branch was cut) were blocking CI on this PR. Cleared as courtesy fixes so the merge isn't gated on unrelated tech debt: - web_server.py:22613 — F811 duplicate `urlparse` import inside `_parse_itunes_link_url` (already imported at module top, line 20). Removed from the inline `from urllib.parse import parse_qs, urlparse`; kept `parse_qs` since that one is only used here. - core/listenbrainz_manager.py:746 — S110 silenced with `# noqa: S110 — best-effort lookup, delete proceeds either way`. Matches the existing project convention used in web_server.py:1693, core/watchlist/auto_scan.py:463, core/library_reorganize.py:548. - core/playlists/sources/listenbrainz.py:236 — B905 `zip()` without explicit `strict=`. Added `strict=False` — preserves existing behaviour where `matched` can legitimately be shorter than `match_indices` on partial discover failure. - core/playlists/sources/listenbrainz.py:273 — S110 silenced with `# noqa: S110 — caller falls back to last cached playlist on refresh failure`. - core/playlists/sources/soulsync_discovery.py:105 — S110 silenced with `# noqa: S110 — manager persists last_generation_error on failure; surface existing snapshot`. The existing multi-line comment that already explained the swallow was rolled into the noqa justification so the rule + reason live on one line. Ruff `python -m ruff check .` now passes; 664 discovery + metadata tests still pass. |
1 month ago |
|
|
6125ef8834 |
MB rerank: prefer_known_duration is now a score boost, not a tiebreaker
Live smoke against `/api/musicbrainz/search_tracks?track=Coffee+Break&artist=Zeds+Dead` exposed the edge case the tiebreaker implementation couldn't reach: The canonical Zeds Dead "Coffee Break" recording (mbid 6e2d4a70, length 184000ms) lives on the Coffee Break Single release — album_type='single', which carries a 0.85 album_type_weight in `score_track`. A sibling length-less recording (mbid 3b89bf3c) lives on an Album release — album_type='album', weight 1.0. After multiplying by EXACT_ARTIST_BOOST the canonical sat at 1.275 while the length-less sibling sat at 1.5. The previous tiebreaker only kicked in on equal scores, so the length-less album edition wins and the user sees 0:00 first instead of the actionable 3:04 row. Bug reproduced: ordering came out length-less / canonical / Omar-LinX-collab. Switched `prefer_known_duration` to a 1.25x score boost on recordings with non-zero duration_ms. The multiplier is sized above the album-vs-single weight spread (0.176) so length-known recordings can overcome an album-type penalty when scores would otherwise tie on title + artist match, but stays small enough that cover/karaoke penalty (0.05) and variant-tag penalty (0.85) still dominate — a length-known tribute still loses to a length-less canonical. Post-fix live response: 6e2d4a70 (canonical, 184000ms) sits first, 8ec2ce3f (Zeds Dead + Omar LinX collab, 153000ms) second, 3b89bf3c (length-less album edition) third. Verified Björk diacritic fallback path unaffected — `Bjork` + `Army of Me` still cascades strict-empty → bare and returns all 10 Björk recordings. 122 metadata tests pass — the three `prefer_known_duration` cases were designed to pin behaviour, not the specific multiplier value, so they all still pass under the boost implementation: ties promote length-known, relevance still beats length-pref, default-off behaviour unchanged. |
1 month ago |
|
|
8dbbf13c61 |
Branch cleanup: lift manual-match helpers, fix length-pref ordering, profile-scope view toggle
Self-review pass on the prior three commits — kettui-style cleanup
that should have landed first time.
**Length-preference sort ordering (real bug):**
The `search_tracks_with_artist` stable sort that promoted length-known
recordings ran in `core/musicbrainz_search.py`, but the MB endpoint in
`web_server.py:search_musicbrainz_tracks` runs `rerank_tracks` after
it — which re-sorts by relevance score and dropped the length-pref
ordering down to tiebreaker-only. For canonical-same-song MB duplicates
that all score identically the tiebreaker survived, but the
order-of-operations was wrong.
Moved into `rerank_tracks` itself via a new `prefer_known_duration`
flag. Sort key sits between relevance score and the stable-order
tiebreaker so relevance still wins (length only decides ties, never
overrides a higher-relevance match). The MB endpoint opts in via
`prefer_known_duration=True`; Spotify / iTunes / Deezer callers stay
on the default-off path since their search results always include
length. Pinned with three new `TestRerankTracks` cases:
ties-promote-length, relevance-still-wins, default-off-unchanged.
**Route logic lifted to `core/discovery/manual_match.py`:**
Two pieces lived as inline route logic in `web_server.py` — the
`derive_manual_match_provider` fallback chain (payload.source →
active source → 'spotify') used by `update_youtube_discovery_match`,
and the `is_drifted_for_redo` predicate (cached provider differs from
active AND not manual_match) used by `prepare_mirrored_discovery`.
Per kettui's "extract logic from web_server.py, don't AST-parse it"
standard, both helpers now live in `core/discovery/manual_match.py`
with 12 dedicated unit tests covering fallback resolution order,
non-dict payload defenses, manual_match exemption from drift,
absent-provider legacy default, and edge cases.
Side benefits from the lift:
- `match_source` now derived once before the cache-save try block
instead of being duplicated in try + except (the except block existed
only because the original used `match_source` later — pre-computing
killed the duplication).
- `prepare_mirrored_discovery`'s `has_cached` check now reuses
`is_drifted_for_redo` with inverted polarity instead of restating
the field whitelist inline, so a future schema change only has to
land in one place.
- The mirrored-DB persist block now gates on `matched_data is not None`
to avoid a pre-existing latent NameError if the cache-save block
raised before matched_data construction.
**Enhanced toggle localStorage key now profile-scoped:**
`soulsync-library-view-mode` was global — two admin profiles would
share one preference. Wrapped in `_libraryViewModeKey()` which appends
`:${currentProfile.id}` when a profile is loaded, falls back to the
unsuffixed key otherwise (preserves pre-multi-profile saved values).
Tests:
- 12 new in `tests/discovery/test_manual_match.py` pinning both helpers.
- 3 new in `tests/metadata/test_relevance.py` pinning the
`prefer_known_duration` semantics.
- `test_search_tracks_with_artist_prefers_results_with_known_length`
renamed to `_does_not_resort_by_length` since the sort moved out of
this method. 664 tests pass across discovery + metadata suites.
|
1 month ago |
|
|
39f582a690 |
Mirrored playlist: stop Playlist Pipeline from reverting manual Fix-popup matches
User reported that manually mapping a mirrored-playlist track via the
Fix popup (either by search or by pasting an MBID) worked end-to-end
once — match saved, library track downloaded — but the next Playlist
Pipeline run flipped the track back to "Provider Changed" and forced
them to re-do the manual map every cycle.
Three independent issues were combining to cause this:
1. Hardcoded `provider: 'spotify'` on manual-fix save
`update_youtube_discovery_match` (the endpoint the Fix popup posts
to, also used by mirrored playlists since the frontend routes
`platform === 'mirrored'` through the YouTube endpoint) always
stamped the cached match as Spotify-provided. The Fix-popup cascade
actually queries the user's primary metadata source first and falls
back to Spotify / Deezer / iTunes / MusicBrainz — so a user on
MusicBrainz primary picking an MB result still had it saved as
`provider: 'spotify'`. The next prepare-discovery call (which
compares cached_provider to the active source) then immediately
classified the match as drifted and pending re-discovery. Fixed by
deriving `match_source` from `spotify_track.get('source')` (every
*_search_tracks endpoint stamps `source` on results) with a fallback
to `_get_active_discovery_source()` for the MBID-paste path (which
uses the lean flat shape that doesn't carry source). `matched_data['source']`
and the mirrored `extra_data['provider']` both now use the derived
value. `match_source` is also recomputed in the cache-save except
handler so the downstream mirrored-DB save still has it.
2. Discovery worker re-queueing manual matches as "incomplete"
`run_playlist_discovery_worker` in `core/discovery/playlist.py`
re-adds any track to `undiscovered_tracks` when its `matched_data`
lacks `track_number` or `album.id` / `album.release_date`. The
check was designed as a legacy-fix backfill for old discoveries
that lost those fields to a Track-dataclass stripping bug. But
manual fixes from the popup are *intentionally* lean — search-
result rows don't include `track_number` (none of the search
endpoints return it), and the MBID-lookup flat shape doesn't
carry `album.id` / `release_date` (the recording lookup returns
only `album.name`). So every manual match looked "incomplete" and
got re-discovered every pipeline run, overwriting the user's pick
with whatever the auto-search ranked first. Manual matches now
short-circuit ahead of the incomplete-data branch.
3. `prepare_mirrored_discovery` ignored the `manual_match` flag
Independent of the provider-stamping fix above, the prepare-
discovery endpoint that powers the mirrored-playlist UI did its
own `cached_provider != current_provider` check and didn't honour
manual_match either. Defence in depth — even if a future code
path stamps the wrong provider on a manual match, the flag now
anchors it as cached. `has_cached` also extended so manual
matches with off-provider stamps still count toward the cached
tally for phase classification.
Tests:
- new `test_manual_match_skipped_even_when_matched_data_incomplete`
in `tests/discovery/test_discovery_playlist.py` pins the worker
short-circuit using a realistic MB-shape matched_data (album dict
without id / release_date, no top-level track_number). 16 existing
tests still green; 848 across discovery / metadata / automation
suites pass.
|
1 month ago |
|
|
acc5eb77ea |
Fix popup: anchor artist field in MB search to stop title-collision covers
`/api/musicbrainz/search_tracks` powers the Fix popup's auto-search
cascade for users on MusicBrainz as primary. When both track + artist
fields were filled, `search_tracks_with_artist` always took the bare
keyword path (`<track> <artist>` joined as one query string). MB's
recording-search scorer weights title matches far above artist matches,
so for "Coffee Break" + "Zeds Dead" the top results were Emapea / The
Vidalias / West One Orchestra's "Coffee Break" — three unrelated cover-
title collisions ahead of the canonical Zeds Dead recording. The
endpoint's `rerank_tracks` pass can't fix this when the right answer
is below the API's 50-result cutoff.
Both-fields mode now uses a strict field-scoped Lucene query first
(`recording:"<t>" AND artist:"<a>"`) which anchors the artist and
prunes title-collision covers at the source. `min_score=0` because the
field-scoped query is itself precise; rerank still does final ordering.
Bare query stays as the fallback when strict returns nothing — covers
the diacritic / alias cases the original `strict=False` path was added
for ("Bjork" query vs canonical "Björk" artist where Lucene phrase
match never hits the recording).
Single-field mode (track-only or artist-only) is unchanged: still bare-
query directly, since there's no artist value to anchor.
Also stable-sort results to prefer entries with non-zero `duration_ms`.
MB has multiple recordings per song (single release, album release,
remasters, compilations) and not every recording carries length data.
Without the preference sort, the user sees a 0:00 row first while a
sibling recording with the real 3:04 sits two rows below — matches the
report where MBID-paste lookup of the canonical recording (length 3:04)
contradicted the search-result's 0:00 row for the same song.
Tests:
- new `test_search_tracks_with_artist_strict_first_when_both_fields`
pins the strict=True call when both fields present
- new `test_search_tracks_with_artist_falls_back_to_bare_when_strict_empty`
pins the Björk-style fall-through path
- new `test_search_tracks_with_artist_prefers_results_with_known_length`
pins the length-preference sort
- existing `..._keeps_low_score_for_rerank` updated to side_effect so
the bare-fallback path is exercised; behaviour pinned identically
- existing `..._uses_bare_query_mode` renamed + repurposed for strict-
first; old name's behaviour no longer accurate
|
1 month ago |
|
|
4555ff7eb9 |
Wishlist modal: surface most-advanced live phase, not least-complete
The sibling-merge aggregator from
|
1 month ago |
|
|
7f751202d2 |
Wishlist modal: merge sibling sub-batches into one status response
Phase 1c.2.1 splits each wishlist run across multiple
``download_batches`` rows (per-album bundle dispatch). The
download-missing modal opens against the original batch_id
allocated by ``start_manual_wishlist_download_batch`` /
``process_wishlist_automatically``. Pre-fix that batch_id was
just one sibling among N, so the modal went stale as soon as the
primary sub-batch finished — subsequent albums downloaded fine
but no live status reached the UI.
Fix: backend merges every sibling sub-batch's tasks +
analysis_results into the response keyed under the originally-
requested batch_id. Modal sees one unified view of the whole run
without knowing about the split. Frontend untouched.
Architecture (Kettui standards):
- ``core/downloads/wishlist_aggregator.py`` — pure
``merge_wishlist_run_status(primary, siblings)`` helper.
No IO, no runtime state, no globals. Lifted out of
``status.py`` so the merge contract can be pinned via unit
tests without standing up the live ``download_batches`` /
``download_tasks`` state.
- ``core/downloads/status.py``'s ``build_batched_status`` now
pre-indexes ``download_batches`` by ``wishlist_run_id`` inside
the existing ``tasks_lock`` snapshot, then runs the merge
helper whenever a requested batch has a sibling.
Merge rules pinned by 12 tests:
- ``track_index`` re-indexed globally 0..N-1 across the merged
``analysis_results`` so the modal's ``data-track-index`` DOM
keys don't collide between siblings. Tasks' ``track_index``
follows the same remap so the analysis-results ↔ tasks
cross-reference stays intact.
- ``task_id`` is uuid per task — no collision concern.
- Phase: error is sticky; otherwise the LEAST-complete
pre-terminal phase wins (analysis < album_downloading <
downloading). All-complete returns ``complete``; mixed
complete + active returns ``downloading`` so the modal stays
alive until every sibling lands.
- ``album_bundle``: picks whichever sibling currently has an
active bundle download (state in
``{searching, downloading, downloading_release, staging}``).
Falls back to the first non-empty bundle so a completed run
still shows a progress bar.
- ``analysis_progress`` summed across siblings.
- ``active_count`` summed; ``max_concurrent`` keeps primary's
value as the representative.
- ``playlist_id`` + ``playlist_name`` preserved from the primary
(the row the modal originally opened against).
Legacy single-batch wishlist runs (no ``wishlist_run_id`` on the
batch) skip the merge entirely — passthrough. Back-compat by
absence.
1108 tests across downloads + wishlist + automation + imports +
playlist-sources + lb-series suites green. 12 new aggregator
tests pin the merge contract.
Closes the open UX gap from the Phase 1c.2.1 ship — modal now
tracks every sibling sub-batch's progress for the full duration
of the wishlist run.
|
1 month ago |
|
|
c002014f10 |
Wishlist: reify run id + gate cycle toggle on last-sibling completion
Phase 1c.2.1 splits each wishlist invocation into per-album sub- batches so the album-bundle dispatch can engage once per album. Side effect: the completion handler ``finalize_auto_wishlist_completion`` ran end-of-run logic (cycle toggle + state reset + automation event emit) once per BATCH, so a 2-album run fired the cycle toggle twice + emitted two ``wishlist_processing_completed`` events. The cycle landed at the right value either way but the state machine had become per-batch instead of per-run. Fix: reify "wishlist run" as a first-class concept via a shared ``wishlist_run_id`` UUID. Generated once per wishlist invocation in both the auto- and manual-wishlist paths, stamped on every sub-batch row in ``download_batches``. ``finalize_auto_wishlist_completion`` now reads the completing batch's ``wishlist_run_id`` and, when present, scans ``download_batches`` for siblings still in pre-terminal phases. If any sibling is still active, the per-batch summary records but the cycle toggle + state reset + automation emit are deferred. Only the last completing sibling fires the run-level finalization. Legacy single-batch runs (no run_id field) keep their toggle-immediately behavior — back-compat by absence. The run_id also lays groundwork for frontend grouping (one logical row in the Downloads view per wishlist run instead of N sibling rows), but that UX work is deferred. 3 new tests in ``test_processing.py`` pin: defer-when-siblings- active, toggle-when-last-sibling-done, back-compat-without-run_id. 1 new assertion in ``test_automation.py`` confirms all sub-batches of one auto-wishlist invocation share the same run_id. 309 tests across wishlist + automation suites green. Notes: dispatch concurrency unchanged — sub-batches still run via the shared download worker pool. Slskd serializes per-uploader at its own layer (same uploader = automatic queue, different uploaders = legit parallel), so SoulSync-side serial enforcement would duplicate work the right layer already handles. |
1 month ago |