SoulSync

Commit Graph

Author	SHA1	Message	Date
Broque Thomas	4ca3f70bf3	Show MusicBrainz release variants in import Expand matched MusicBrainz release groups into concrete releases for specific album searches so import users can choose the correct edition by track count, format, country, and disambiguation. Preserve distinct MusicBrainz release IDs instead of deduping same-title variants, carry release metadata through import matching, and surface those details on album result cards. Add coverage for variant preservation and release-group expansion.	1 day ago
Broque Thomas	eba7f61e04	Surface metadata source on Import album results (#681 ) Import album search silently fell through to the next source in METADATA_SOURCE_PRIORITY when the configured primary returned zero matches — intentional behavior shared with the auto-import worker (see core/auto_import_worker.py:1316). With MusicBrainz selected and a query MB couldn't resolve, users saw Deezer cards with no indication their primary was bypassed. Backend now echoes `primary_source` on /api/import/search/albums, /api/import/search/tracks, and /api/import/staging/suggestions. Frontend renders a per-card 'via {source}' badge when the served source differs from the primary, plus a banner above the grid when every card came from a fallback source. Fallback semantics unchanged. Also collapses an inline duplicate of _renderSuggestionCard inside importPageSearchAlbum into a single shared renderer. Regression test pins the contract: response carries primary_source + per-album source when the chain falls back.	2 days ago
Broque Thomas	6c9b43225a	Add torrent and usenet release staging support Adds torrent/usenet as release-oriented download sources with album-bundle staging, live progress reporting, and post-processing that selects the requested audio file from completed releases instead of blindly importing the first file. Keeps album-bundle behavior gated to single-source torrent/usenet album downloads, excludes release sources from hybrid album per-track searches, and allows hybrid non-album tracks to use release results safely. Improves staged-release matching for featured/bonus track filenames while preserving version mismatches, records torrent/usenet provenance in library history, and updates service/status UI labels. Covers the flow with focused lifecycle, status, staging, validation, task worker, post-processing, and import side-effect tests.	4 days ago
Broque Thomas	daaed373e7	fix(provenance): label torrent/usenet/staging downloads correctly in history The download history modal was tagging every torrent / usenet album-bundle download as 'Soulseek FLAC 24bit' because: - core/imports/side_effects.py's source_service dict didn't have entries for 'staging', 'torrent', or 'usenet' usernames. The staging matcher in core/downloads/staging.py sets download_tasks[task_id]['username'] = 'staging', which fell through to the dict's default and got recorded as 'soulseek' in the track download provenance row. Same fate for any amazon or other source that wasn't whitelisted. - The album-bundle flow specifically wants to be labeled as 'torrent' or 'usenet' (where the bytes actually came from), not 'staging' (the intermediate). The plugin already stashes the source on the batch state as ``album_bundle_source`` for the Downloads-page status card; provenance recording can read the same field. Fixes: - core/downloads/staging.py: when marking a task post_processing after a staging match, check the batch's album_bundle_source override and use that for username instead of 'staging' when set. Falls back to 'staging' when no override exists (manual file-drop case). - core/imports/side_effects.py: source_service map gets entries for 'staging', 'torrent', 'usenet', and the previously-missing 'amazon' (which was also falling through to 'soulseek'). - webui/static/library.js: the redownload modal's serviceLabels / serviceIcons dicts extended to cover lidarr, amazon, soundcloud, auto_import, staging, torrent, usenet so badges render the correct name instead of either the raw source_service string or no badge at all. - webui/static/wishlist-tools.js: history-source-chip color palette extended for the new source labels (Torrent sky-blue, Usenet violet, Staging / Auto-Import neutral grey). Note: existing tracks in the DB still carry the wrong 'soulseek' label — only NEW downloads after this fix get the right label. A future migration could rewrite historical rows but it's cosmetic and the underlying audio + metadata are correct.	5 days ago
Broque Thomas	79ad4d885d	fix(quarantine): drop already-quarantined sources from candidate picker (#652 ) When a file failed AcoustID verification and got quarantined, the next auto-wishlist cycle would search for the same track, the deterministic quality picker would re-select the same (uploader, filename) source, re-download it, and re-quarantine it. Users woke up to hundreds of duplicate .quarantined entries from a single bad upload — same source URL repeatedly, byte-for-byte identical files. Root cause: `SoulseekClient.filter_results_by_quality_preference` ranks candidates by quality + bitrate density only. Quarantine history wasn't consulted, so a high-bitrate FLAC upload with a wrong-track AcoustID fingerprint kept winning the picker against every other candidate. Fix shape: - New helper `core/imports/quarantine.py::get_quarantined_source_keys` reads every quarantine sidecar's `context.original_search_result` and returns the set of `(username, filename)` tuples for O(1) membership checks. Sidecars missing the context field (legacy thin sidecars written pre-Feb 2026, or orphaned files) and corrupt JSON are skipped silently — defensive against transient FS / encoding issues. - `SoulseekClient._drop_quarantined_sources` runs the membership filter against incoming TrackResults, drops matches, logs a single INFO line with the skip count. Called first inside `filter_results_by_quality_preference` so all four callers (search-and-download, master worker, validation, orchestrator) benefit transparently. - Approving or deleting a quarantine entry removes its sidecar, so the dedup key disappears from the set on the next search — gives the user a way to opt back in to a previously-quarantined source without restarting the app. 7 helper tests cover: missing dir, empty dir, well-formed sidecars collected as tuples, legacy sidecars skipped, empty source fields skipped (so empty-string keys can't accidentally drop unrelated results), corrupt JSON tolerated, duplicate quarantines collapse. 5 integration tests pin: clean candidates pass, known-bad candidates drop, missing quarantine dir returns input unchanged, filesystem errors swallowed (defensive), full `filter_results_by_quality_preference` runs the dedup BEFORE the quality picker — so a high-quality quarantined source can't win on bitrate. 692 existing download + import tests still green. Cosmetic surface of the fix is invisible — same UX as today when no quarantine entries exist; loop only kicks in once a sidecar has been written. Out of scope: bulk-select / multi-delete UI for the quarantine tab — S-Bryce mentioned this as a separate pain point in the issue, but it's its own UX work, not a one-commit drive-by.	6 days ago
Broque Thomas	f25433ea57	Harden quarantine approval flows	1 week ago
Broque Thomas	791e3630ff	fix(amazon): wire amazon into all streaming-source guards `validation.py` had amazon absent from `_streaming_sources`, causing Amazon TrackResult objects (bitrate=None, size=0) to fall through to the Soulseek P2P code path and get rejected by `filter_results_by_quality_preference`. Every album track was marked not found. Fix: add 'amazon' to every streaming-source guard tuple/set that was previously missing it: - core/downloads/validation.py — primary bug fix (quality-filter bypass) - core/downloads/status.py — _STREAMING_SOURCE_NAMES frozenset - core/downloads/task_worker.py — hybrid fallback client map - core/imports/side_effects.py — \|\| filename→stream-id extraction - web_server.py — is_streaming_source, transfer list display, candidate source label, _try_source_reuse, _store_batch_source - tests/test_download_plugin_conformance.py — registry count + parametrize Also updates the 2.5.3 What's New entry to drop the stale "not yet wired" disclaimer.	1 week ago
Broque Thomas	b42cafa150	AcoustID + quarantine modal: three bug fixes (closes #607 , closes #608 ) Issue #607 (AfonsoG6) -- two AcoustID problems: 1. Live recordings false-quarantining as "Version mismatch: expected '... (Live at Venue)' (live) but file is '...' (original)" because MusicBrainz often stores the recording entity with a bare title -- the venue / live annotation lives on the release entity, not the recording. The audio fingerprint correctly identifies the live recording, but the title-text comparison flagged it as wrong. New pure helper `core/matching/version_mismatch.py:is_acceptable_version_mismatch` accepts the mismatch only when: - One-sided AND involves 'live': exactly one side is 'live' and the other is bare 'original'. Two-sided mismatches stay strict. - Fingerprint score >= 0.85 (stricter than the existing 0.80 minimum -- escape valve only fires when AcoustID is more confident than its own threshold). - Bare title similarity >= 0.70. - Artist similarity >= 0.60. Other version markers (instrumental, remix, acoustic, demo, etc) stay strict -- those have distinct fingerprints AND MB always annotates them in the recording title. The existing test_acoustid_version_mismatch.py suite passes unchanged. 2. Audio-mismatch failure message reported "identified as '' by '' (artist=100%)" when AcoustID returned multiple recordings -- prior code mixed `recordings[0]`'s strings (which can be empty) with `best_rec`'s scores. Now uses `matched_title` / `matched_artist` consistently in both the high-confidence-skip path and the final fail message. Issue #608 (AfonsoG6) -- quarantine modal: 3. Approve / Delete buttons silently no-op'd when the filename contained an apostrophe -- the unescaped quote broke the inline JS in the onclick handler. Now wraps the id via `escapeHtml(JSON.stringify(id))`, which round-trips quotes / backslashes / unicode / newlines safely through the HTML attribute to JS string boundary. 4. Bonus UX: quarantine entry expanded view now shows source uploader (username) and original soulseek filename when the sidecar carries that context -- helps trace which uploader the bad file came from. Backend exposes `source_username` + `source_filename` fields from `sidecar.context.original_search_result`. Degrades to '' on legacy thin sidecars. Tests: - 23 new boundary tests in tests/matching/test_version_mismatch.py pin every shape: equal versions trivial, one-sided live both directions, threshold floors (each just below default -> reject), two-sided strict, non-live one-sided strict (covers exact test_instrumental_returned_for_vocal_request_fails scenario), custom-threshold overrides. - 4 existing test_acoustid_version_mismatch.py tests pass unchanged. - 507 AcoustID / matching / imports tests pass.	1 week ago
Broque Thomas	f4cff78f13	Quarantine management — list, approve, delete, recover Closes #584. Quarantined files used to sit in ss_quarantine/ with a thin sidecar — no UI, no recovery, no way to see what got dropped. This adds the management surface the user needs without going to the filesystem. UI: new "Quarantine" button on the downloads page header opens a modal with every quarantined file (filename, expected track/artist, reason, when, size). Three actions per row: - Approve (one-click): restores the file, re-runs the post-process pipeline with ONLY the failing check skipped, lands in the library with full tags + lyrics + scan - Recover (legacy fallback): moves to Staging for thin-sidecar entries that lack the embedded context Approve needs - Delete: permanent removal of file + sidecar Per-check bypass: context['_skip_quarantine_check'] = 'integrity' / 'acoustid' / 'bit_depth'. Skips ONLY the named check — other quality gates stay live. No blanket bypass-all flag. Sidecar expansion: move_to_quarantine now persists the full json-serializable context via serialize_quarantine_context (drops non-JSON-safe values, walks nested dicts/lists/sets, str-coerces unknown objects) plus the trigger name. Existing thin sidecars are detected and routed to Recover instead of Approve. Pure helpers in core/imports/quarantine.py: list_quarantine_entries / delete_quarantine_entry / approve_quarantine_entry / recover_to_staging / serialize_quarantine_context. 27 tests pin every shape: orphan files / orphan sidecars / corrupt sidecars / collision-safe filename restoration / full-context vs thin-sidecar dispatch / json round-trip safety. Four new endpoints in web_server.py — thin glue around the helpers: GET /api/quarantine/list, DELETE /api/quarantine/<id>, POST /api/quarantine/<id>/approve, POST /api/quarantine/<id>/recover. Download modal status differentiates "🛡️ Quarantined" from "❌ Failed" so recoverable files are visible at a glance — checked against the error_message text, no schema change needed. Pipeline changes are three minimal per-check conditionals at the existing quarantine sites in core/imports/pipeline.py. Each move_to_quarantine call now passes its trigger name so the sidecar records which check fired. Full suite: 2992 passed.	2 weeks ago
Broque Thomas	177bd85355	Configurable duration tolerance for downloaded-file integrity check Previously hardcoded at 3s (5s for tracks >10min) — files drifting past that got quarantined with no user override. Live recordings, alternate masterings, and some legitimate uploads routinely drift further. New setting `post_processing.duration_tolerance_seconds`. Default 0 means "use auto-scaled defaults" (unchanged behavior for users who don't touch it). Positive value overrides the per-track defaults. Capped at 60s — past that the check is effectively off. Logic lifted to pure helper `resolve_duration_tolerance` in file_integrity.py. Coerces every plausible input (None / empty / zero / negative / unparseable / above-cap / numeric string / float) to either a float override or None for auto. 12 tests pin every shape. Wired into `core/imports/pipeline.py` at the integrity-check call site — runs for ALL matched downloads (Soulseek / Tidal / Qobuz / HiFi / YouTube / Deezer-direct) since they all share that pipeline. Settings UI input under Settings → Metadata → Post-Processing.	2 weeks ago
Broque Thomas	8a11a660af	Extract manual import route handlers Move the remaining manual import endpoint logic out of web_server.py and into core.imports.routes behind ImportRouteRuntime. The Flask endpoints now stay as thin compatibility wrappers for album/track search, album match/process, single-file import processing, and batched singles processing. Keep legacy test patch points intact by re-exporting build_album_import_match_payload from web_server and routing singles_process through an injected process_single_import_file callable. This preserves existing route-level monkeypatch behavior while keeping the extracted helper testable. Add focused helper coverage for Hydrabase enqueueing, search limit clamping, album match payload forwarding, album import side effects, single-file worker outcomes, malformed manual matches, and singles aggregation/injected-worker behavior. Verification: py_compile and git diff --check passed locally; bundled-Python smoke covered the extracted helpers. Claude reran the project tests and reported all tests passing.	2 weeks ago
Broque Thomas	d703d33178	Extract import staging route helpers Move import staging files/groups/hints/suggestions controller logic out of web_server.py and into core.imports.routes behind an ImportRouteRuntime dependency object. Keep the existing Flask routes as thin compatibility wrappers so the UI endpoint surface stays unchanged. Add focused tests for staging file filtering, album grouping, hint generation, cached suggestions, empty missing staging paths, and error payloads from failed path/metadata reads. Verification: py_compile passed for web_server.py, core/imports/routes.py, and tests/imports/test_import_routes.py. A bundled-Python smoke pass covered the extracted helper behavior; pytest was not available in this Windows shell because the bundled Python lacks pytest and the repo venv is WSL/Linux-only here.	2 weeks ago
Broque Thomas	abab663eb7	Auto-import: album duration = album total + conservative re-import UPDATE path Two pre-existing parity gaps in `record_soulsync_library_entry` that the prior parity commits left untouched. Both close real holes between auto-import writes and what the soulsync_client deep scan would have produced. # Gap 1: Album duration was the first-imported track's duration `record_soulsync_library_entry` is called once per track. The album INSERT only fires for the FIRST track of a new album (subsequent tracks find the album row already exists). The INSERT was passing `duration_ms` — `track_info["duration_ms"]` — as the album's `duration` column. That's the duration of one track, not the album total. Compare to `SoulSyncAlbum.duration` in soulsync_client which is `sum(t.duration for t in self._tracks)`. Fix: - Worker computes `album_total_duration_ms = sum(...)` across every matched track and threads it onto context as `album.duration_ms`. - side_effects reads that value (or falls back to the per-track duration for legacy non-auto-import callers) and writes it as the album row's `duration`. # Gap 2: Re-imports of the same artist/album were insert-only When the SELECT-by-id or SELECT-by-name found an existing soulsync artist or album row, the function skipped completely — no UPDATE path. Meant: artist genres / thumb / source-id reflected ONLY whatever the FIRST imported album supplied, never refreshing as more albums by that artist landed. Ten more imports later, the artist row still held whatever the first random import wrote. Conservative fix: when an existing row matches, run an UPDATE that fills only the columns whose current value is NULL or empty. Never overwrites populated values — protects manual edits + enrichment-worker writes the same way the scanner UPDATE path preserves enrichment columns. Implementation note: the empty-check happens in Python, NOT SQL. Initial pass tried `COALESCE(NULLIF(col, ''), NULLIF(col, 0), ?)` but SQLite's `NULLIF(text_col, 0)` returns the original text value instead of NULL — different types, no coercion. So the SQL-only conditional was unreliable on text columns. New helper does `SELECT cols FROM table WHERE id`, compares each column in Python, and emits UPDATE clauses only for the ones that need filling. Allowlist defense: f-string column names go through `_SOULSYNC_FILLABLE_COLUMNS` validation before interpolation. Misuse adding new columns without an allowlist update fails closed (logger.debug + skip). # Tests added (4) - `test_album_duration_uses_album_total_not_single_track` — album with single-track context carrying explicit `album.duration_ms = 2_500_000` writes 2_500_000 to the album row, not the per-track 200_000 fallback. - `test_re_import_fills_empty_artist_fields` — first import lands artist with empty thumb + empty genres; second import for same artist with thumb + genres present updates the existing row. - `test_re_import_does_not_clobber_populated_artist_fields` — first import writes rich genres + thumb; second import with worse / different metadata leaves the existing row untouched. - `test_re_import_fills_empty_source_id_when_missing` — first import had no source artist ID; second import does — fills the empty `spotify_artist_id` column on the existing row. # Verification - 10/10 side-effects tests pass (including 4 new + 4 from prior parity commit + 2 history/provenance) - 217 imports tests pass (no regression) - 2369 full suite passes (+4 from prior, +22 PR-total from baseline 2347) - 1 pre-existing flake (`test_watchdog_warns_about_stuck_workers`, passes in isolation, unrelated) - Ruff clean	2 weeks ago
Broque Thomas	8493be207e	Auto-import: SoulSync standalone library writes server-quality rows # Background SoulSync standalone is meant to be a full replacement for Plex / Jellyfin / Navidrome — files imported via auto-import (or any other import path) should land in the database with the same field richness a media-server scan would write. They weren't. # Gaps fixed The auto-import worker built a context dict for each track and handed it to `_post_process_matched_download` (the same callback the regular download flow uses). That dict was missing three things downstream needed: 1. No `source` field anywhere. `record_soulsync_library_entry` reads `get_import_source(context)` to pick the source-aware ID columns (`spotify_track_id` / `deezer_id` / `itunes_track_id` / etc.) on the artists / albums / tracks rows. With no source, the resolver returned an empty string → `get_library_source_id_columns("")` returned an empty dict → the `UPDATE tracks SET <source>_id = ?` blocks were silently skipped. Result: every auto-imported track landed with NULL on every source-id column. Watchlist scans (which match by stable source IDs to detect "this track is already in library") couldn't recognise these rows and would re-download them on the next pass. 2. No `_download_username='auto_import'`. Both `record_library_history_download` and `record_download_provenance` default to "Soulseek" when no `username` is in the context. Every staging-folder import was being labelled as a Soulseek download in library history + provenance — false signal in the UI. 3. No per-recording IDs (`isrc`, `musicbrainz_recording_id`) on track_info. The Navidrome scanner already writes `musicbrainz_recording_id` directly to the tracks row when present. Picard-tagged libraries always carry MBID; metadata sources (Spotify via MusicBrainz enrichment, Deezer, etc.) carry ISRC. Auto-import had access to both via the metadata-source response but didn't propagate them — so the soulsync row went in with NULL on both columns. # Changes `core/auto_import_worker.py` — `_process_matches`: - Top-level `'source': source` (from `identification['source']`) - `'_download_username': 'auto_import'` - `track_info['isrc']`, `track_info['musicbrainz_recording_id']` — pulled from the per-track payload returned by the metadata source - `track_info['album_id']` — back-reference so source-aware ID resolution works on sources whose API nests album under `track.album.id` rather than `track.album_id` - `spotify_artist['id']` now correctly carries the artist's source ID (was `identification['album_id']`, a copy-paste bug from the original implementation that made artist-id resolution fall back to fuzzy matching) - `spotify_album['artists'][0]['id']` carries artist source ID for the same resolution path `core/imports/side_effects.py`: - `record_library_history_download` source_map: add `"auto_import": "Auto-Import"` — tags imported tracks correctly - `record_download_provenance` source_service: add `"auto_import": "auto_import"` — provenance shows real source - `record_soulsync_library_entry` track INSERT: now includes `musicbrainz_recording_id` + `isrc` columns (matches `insert_or_update_media_track`'s shape for Navidrome / Plex / Jellyfin scans). Both default to NULL when not present. # Behavior preserved - Files still land in the same library template path (no path-build change) - Other media-server flows (Plex / Jellyfin / Navidrome users) unaffected — `record_soulsync_library_entry` still gates on `get_active_media_server() == "soulsync"`. Auto-import on those servers continues to drop the file in the library folder + emits `batch_complete` for the scan-trigger automation, same as before. - Direct downloads (search → Download button) unaffected — they already passed `source` + `username` correctly. # Tests added `tests/imports/test_auto_import_context_shape.py` (8 tests, new file): - Worker context carries `source` for every metadata source (parametrised across spotify / deezer / itunes / discogs) - `_download_username='auto_import'` set unconditionally - ISRC + MBID propagate from track payload to track_info when present - ISRC + MBID default to empty string when absent (downstream normalises to NULL at write time) - track_info includes album-id back-reference `tests/imports/test_import_side_effects.py` (4 new tests + 2 schema column adds): - `record_soulsync_library_entry` writes mbid + isrc columns when present in track_info - Deezer source maps to deezer_id column (regression case for source-aware column resolver) - `record_library_history_download` labels `_download_username= 'auto_import'` as "Auto-Import" not "Soulseek" - `record_download_provenance` registers source_service as "auto_import" not "soulseek" # Verification - 8/8 new context-shape tests pass - 6/6 side-effects tests pass (4 new + 2 existing) - 207 imports tests pass - 2359 full suite passes (+12 from baseline 2347, no regressions) - 1 pre-existing flake (`test_watchdog_warns_about_stuck_workers`, passes in isolation, unrelated to this change) - Ruff clean	2 weeks ago
Broque Thomas	e11786ee40	Auto-import matching: fix Deezer source classification + bump tolerance User report: all 6 staging candidates failing with "Could not match tracks to album tracklist" despite identification correctly resolving each album. 18 properly-tagged Chris Brown F.A.M.E. tracks, 21 properly-tagged Mr. Morale tracks, etc. — every match attempt rejected by the duration sanity gate. Root cause: I had Deezer in `_SECONDS_DURATION_SOURCES`, assuming Deezer's `duration` field was raw seconds (which the API returns). But `DeezerClient.get_album_tracks` already converts seconds → ms INTERNALLY (`'duration_ms': item.get('duration', 0) * 1000`) before the value reaches the matcher. My helper saw `source='deezer'` → multiplied by 1000 again → 255000 ms became 255,000,000 ms (70 hours). Every track-file pair failed the gate by a factor of 1000×. Diagnostic chain that got me there: 1. Added `[Album Matching] No matches: X files, Y tracks, Z duration-rejected, W below threshold` summary log so future "0 matches" reports surface the rejection reason. 2. Fixed the helper's logger from `logging.getLogger(__name__)` (which resolves outside the soulsync handler tree → invisible in app.log) to `get_logger("imports.album_matching")` (under the namespace the file handler watches). 3. Added per-rejection-type diagnostic showing actual file vs track duration values + raw track keys + source. That third diagnostic surfaced `track 'United In Grief' resolved=255000000 (raw duration_ms=255000, raw duration=None, source='deezer')` — making the bug obvious. Fixes: - Moved Deezer from `_SECONDS_DURATION_SOURCES` to `_MS_DURATION_SOURCES`. Comment documents WHY (the client converts before returning) so a future reader doesn't "fix" the classification back the wrong way. - Bumped `DURATION_TOLERANCE_MS` from 3000 → 10000 (3s → 10s) to match Picard ~7s / Beets ~10-15s / Plex ~10s industry baselines. 3s was a defensive copy of the post-download integrity check threshold but that's a different problem (catching truncated downloads, not identifying recordings across remasters/encodings). - `_track_duration_ms` magnitude heuristic kept as fallback for unknown / missing source (mocked test data without `source` field). - Added `Match aborted` warnings at the three earlier silent return points in `_match_tracks` (no client, no album_data, no tracks) so future "Could not match" reports show WHICH step bailed. - Added per-run diagnostic in `match_files_to_tracks` that logs the first duration rejection's actual values — surfaces unit mismatches + drift problems without spamming N×M lines per run. Test changes: - `test_deezer_seconds_duration_converted_to_ms` renamed + rewritten as `test_deezer_already_normalised_to_ms_by_client` to pin the actual contract (matcher receives ms from the Deezer client, takes as-is). - `test_track_duration_source_aware_dispatch` updated — Deezer test case now uses ms input + expects ms output. - New `test_raw_deezer_seconds_falls_back_to_magnitude_heuristic` pins the rare edge case where raw Deezer items WITHOUT `source` reach the matcher (no client conversion path) — heuristic catches it. Verification: - 179 import tests pass after changes - Live test: all 6 user staging candidates now matching at 95-100% confidence - Multi-disc Mr. Morale lands with proper Disc 1 / Disc 2 / Disc 3 folder structure - Picard-tagged libraries hit MBID fast paths (verified earlier) - Tracks process in parallel via the existing scan-now thread spawn (next commit refactors this to a proper bounded executor)	2 weeks ago
Broque Thomas	f2cd95e0f1	Auto-import polish: real-file tag reader test, source-aware duration, pin consolation Cin-pass on the MBID/ISRC fast-paths + duration-gate work. Three small but real gaps closed. Gap 1 — Real-file tag reader integration test (tests/imports/test_auto_import_tag_reader_real_files.py, 6 tests): The matcher unit tests use dict fixtures, which prove the algorithm handles the right shapes once tags are read. They DON'T prove the tag reader itself extracts the right values from real files. Mutagen's easy-mode key normalisation (across FLAC / MP3 / M4A) is the exact spot a future mutagen version could silently drift and break the fast paths in production while every unit test stays green. These tests write real FLAC files via mutagen (using the same `_make_minimal_flac` pattern from `test_album_mbid_consistency.py`) and assert `_read_file_tags` extracts: - Picard's `MUSICBRAINZ_TRACKID` (lowercase normalisation in reader) - `ISRC` (uppercase normalisation in reader; matcher strips formatting at compare time) - "track/total" parsing (TRACKNUMBER='5/12' → 5) - Duration via `audio.info.length` from synthesised STREAMINFO - Graceful empty-default return for tagless files - Graceful empty-default return for invalid audio (not a crash) Acknowledged gap (carried forward): MP3 + M4A integration coverage not added — mutagen docs say easy-mode normalisation is identical across all three formats, but only FLAC is pinned here. Followup candidate. Gap 2 — Source-aware duration dispatch (core/imports/album_matching.py, 4 tests in test_album_matching_exact_id.py): The previous `_track_duration_ms` helper used a magnitude heuristic ("anything below 30000 is seconds, convert × 1000") to decide whether a track's duration was in seconds or ms. That worked for typical tracks but had a real edge case: an actual sub-30-second Spotify track (intros, interludes, skits) would be detected as seconds and converted to 8.5 hours, breaking the duration sanity gate. Replaced with deterministic source-aware dispatch: - Spotify / iTunes / Qobuz / HiFi / Hydrabase → ms (canonical) - Deezer / Discogs / MusicBrainz → seconds, × 1000 - Tidal classified as ms (album-tracks endpoint convention; flagged in code comment as needing real-world verification — defensive if wrong) - Magnitude heuristic kept as fallback for unknown / missing source (mocked test data without source field) Tests pin all four paths: confirmed-ms source, confirmed-seconds source, unknown source falls back to heuristic, and the regression case (sub-30s real track on a known-ms source — must not be × 1000-converted). Gap 3 — Cross-disc consolation rationale (tests/imports/test_album_matching_helper.py, 1 test): The `CROSS_DISC_POSITION_WEIGHT = 0.05` magic number had no test proving it was load-bearing. Anyone could have set it to 0 thinking "strict matching is better" without realising it would silently break a real scenario. New test (`test_cross_disc_consolation_is_load_bearing_for_imperfect_titles`) constructs the exact case the consolation exists for: file has the right title spelling but the metadata source returns a slightly- different version (e.g. "Auntie Diaries" file vs "Auntie Diaries (Remix)" track), AND the file's disc tag is wrong while the track number agrees. Title sim ~0.78 × 0.45 = ~0.35 (below MATCH_THRESHOLD 0.4). Without the 5% consolation → file goes unmatched. With it → ~0.40, just clears. The test doesn't justify "why 0.05 specifically" — that's still a tuned knob, not a measured value. But it forces a deliberate decision if someone wants to drop it: failing this test gives them the "you broke imperfect-title cross-disc matching" message explicitly. Verification: - 10 new tests across 3 files, all pass - 35 album-matching tests total now (including pre-existing 17 + 18 fast-path) - Full suite: 2321 passed, 1 pre-existing flaky timing test (`test_watchdog_warns_about_stuck_workers` — passes in isolation, fails only in full-suite runs, unrelated to this PR) - Ruff clean - All changes still scoped to import flow — download flow byte- identical (verified by grep on every changed file)	2 weeks ago
Broque Thomas	3246490800	Auto-import: MBID/ISRC fast paths + duration sanity gate Brings the auto-import matcher to picard / beets / roon parity by reaching for the existing AcoustID-grade infrastructure (typed Album foundation, integrity check thresholds) and layering id-based exact matches on top of the fuzzy scorer. Picard-tagged libraries now land every track with full confidence on the first pass. Three layered phases in `core/imports/album_matching.match_files_to_tracks`: 1. MBID exact match — file has `musicbrainz_trackid` tag, source returns the same id → instant pair, full confidence, no fuzzy scoring. Picard's primary identifier; per-recording. 2. ISRC exact match — file has `isrc` tag, source returns the same id → same fast-path, slightly lower priority than mbid (isrc can be shared across remasters). Both ids normalised before compare (uppercase + strip dashes/spaces for isrc, lowercase for mbid). 3. Duration sanity gate — files in the fuzzy phase whose audio length differs from the candidate track's duration by more than `DURATION_TOLERANCE_MS` (3s, matching the post-download integrity check) are rejected before scoring runs. Defends against the cross-disc / cross-release / wrong-edit problem the integrity check used to catch only AFTER the file had already been moved + tagged + db-inserted. Tag reader (`_read_file_tags`) extended: - Reads `isrc` (uppercased, strip / / spaces normalisation deferred to matcher) - Reads `musicbrainz_trackid` as `mbid` (lowercased) - Reads `audio.info.length` and converts to `duration_ms` to match the metadata-source convention Metadata-source layer (`_build_album_track_entry`) extended: - Propagates `isrc` from top-level OR `external_ids.isrc` (spotify shape — would otherwise be stripped before reaching the matcher) - Propagates `musicbrainz_id` from top-level OR `external_ids.mbid` / `external_ids.musicbrainz` - Without this layer, fast paths would silently never fire in production even though unit tests pass — pinned by `test_album_track_entry_propagates_isrc_and_mbid_from_source` 18 new tests in `tests/imports/test_album_matching_exact_id.py`: - Direct: `find_exact_id_matches` with mbid, isrc, isrc normalisation, mbid > isrc priority, spotify-shape `external_ids.isrc`, no-id empty result, file-used-at-most-once - Direct: `duration_sanity_ok` within / outside tolerance, missing durations defer - End-to-end via `match_files_to_tracks`: mbid match short-circuits fuzzy scoring, id-matched files excluded from fuzzy phase, duration gate rejects wrong-disc collisions in fuzzy phase, normal matches pass through the gate, missing durations fall through, deezer seconds-vs-ms conversion, full picard-tagged 10-track album via mbid only - Production-shape: `_build_album_track_entry` propagates isrc + mbid from spotify-shape (`external_ids.isrc`) AND itunes-shape (top- level `isrc`) Verification: - 35 album-matching tests pass total (17 helper + 18 fast-path) - 23 multi-disc tests still pass after the extension (additive) - Full suite: 2311 passed (+18 new), 1 pre-existing flaky timing test failure (`test_watchdog_warns_about_stuck_workers` — passes in isolation, fails only in full-suite runs, unrelated to this PR) - Ruff clean For users: - Picard / Beets / Mp3Tag-tagged libraries (anyone who's organised their music) get instant perfect-confidence matches every time. - Soulseek-tagged downloads (which usually carry isrc when sourced via metadata-aware soulseekers) get the fast path too. - Naively-named files with no useful tags fall through to the improved fuzzy + duration-gated path — same correctness as before for the common case, much harder for the matcher to confidently pair the wrong file. - One step closer to standalone-DB feature parity with plex / jellyfin / navidrome scanners. Acoustid fingerprint fallback (for files with NO useful tags AND no MBID/ISRC) is the next followup PR.	2 weeks ago
Broque Thomas	f9f74ac511	Lift auto-import matching to testable helper + pin contracts Cin-pass on the #524 + multi-disc fixes. Pre-merge polish. Lifts: `core/imports/album_matching.py` `AutoImportWorker._match_tracks` was a 100+-line method buried in a 1400-line class. Testing it required monkey-patching `_read_file_tags` + mocking the metadata client just to exercise the matching algorithm. Per Cin's "lift logic out of monolithic classes" pattern (same shape as the album-info builders / discography / quality scanner lifts), moved the dedup + scoring into `core/imports/album_matching.py` as pure functions over already-fetched data. Helper exposes: - Constants for every match weight (TITLE_WEIGHT, ARTIST_WEIGHT, POSITION_WEIGHT, NEAR_POSITION_WEIGHT, CROSS_DISC_POSITION_WEIGHT, ALBUM_WEIGHT, MATCH_THRESHOLD). Magic numbers killed. - `dedupe_files_by_position(audio_files, file_tags, , quality_rank)` — position-keyed quality dedup. - `score_file_against_track(file_path, file_tags, track, , target_album, similarity)` — pure per-(file, track) scorer. - `match_files_to_tracks(audio_files, file_tags, tracks, , target_album, similarity, quality_rank)` — full matching with greedy best-per-track + first-come-first-serve over deduped files. Worker shrinks from 100 lines of inline algorithm to 8 lines that fetch tags + delegate to the helper. Tests added (26 new across 3 files): `tests/imports/test_album_matching_helper.py` (19 tests): - Constants pin: weights sum to 1.0, threshold above position-only - `dedupe_files_by_position`: quality wins, cross-disc preserved, tag-less files passed through, first-wins on equal quality - `score_file_against_track`: perfect-agreement = 1.0, position needs both disc+track, near-position only same-disc, missing artist tags handled, disc field aliases (Spotify/Deezer/iTunes), filename fallback when title tag missing - `match_files_to_tracks`: happy path, file used at-most-once, below-threshold left unmatched - Edge case Cin would flag: tag-less file with strong filename title matches multi-disc album track via title alone (perfect-name scenario works); tag-less file with weak filename title against multi-disc API correctly stays unmatched (the behavior delta from the disc-aware fix — pinned so future readers see it's intentional) `tests/test_import_album_match_endpoint.py` (3 tests): - Backend warning fires when source missing from match POST - No warning fires on the legit path (catches noisy-warning regression) - Endpoint actually forwards source/name/artist to the payload builder (catches "logging the right warning but doing the wrong lookup" regression) `tests/test_import_page_album_lookup_pattern.py` (4 tests): - Source-text guard for the import-page #524 fix in stats-automations.js. Until the file is modularized enough for a behavioral JS test (under the existing tests/static/.mjs pattern), regex-based assertions pin: the `_albumLookup` field exists, the click handler reads from it, both card renderers populate it before emitting onclick, and the cache stores `source` per entry. Caveat documented in the test module docstring. Verification: - All 26 new tests pass. - Existing multi-disc tests (test_auto_import_multi_disc_matching.py) still pass after the lift — proves the helper is behavior-equivalent to the inline implementation it replaced. - Full suite: 2293 passed, 1 flaky-timing failure (test_library_reorganize_orchestrator.py::test_watchdog_warns_about_stuck_workers — passes in isolation, fails only in full-suite runs, pre-existing, unrelated to this PR). - Ruff clean. Notes for the reviewer: - The frontend stats-automations.js JS test is structural-only. Behavioral JS testing for that file requires modularizing the ~7k-line monolith first — out of scope for this fix. - The cross-disc 5% consolation bonus is a small behavior change for users with weak/missing tag info on multi-disc albums. Pinned explicitly in `test_tagless_file_with_weak_title_unmatched_in_multidisc` so the trade-off is visible: correct multi-disc matching wins over optimistic position-only matching that produced wrong-disc files.	2 weeks ago
Broque Thomas	aa54bed818	Surface silent exceptions across remaining modules — ~70 sites Final sweep. Covers: - Downloads: candidates / lifecycle / master / monitor / wishlist_failed - Metadata: source / registry / cache / common / artwork (+ plex_client) - Imports: pipeline / resolution / file_ops / paths / guards - Library: path_resolver / retag / duplicate_cleaner - Stats / playlists / wishlist / discovery / automation / enrichment - Misc: hydrabase_client, soulsync_client, tag_writer, debug_info, api_call_tracker, album_consistency, beatport_unified_scraper, reorganize_runner, seasonal_discovery, lidarr_download_client, services/sync_service.py, automation_engine, automation/progress Two `_e` renames in imports/file_ops.py (outer scope binding `e`). A few finally-block sites in metadata/album_mbid_cache.py, library/track_identity.py, listening_stats_worker.py, watchlist/ auto_scan.py left silent — same reason as the rest of the sweep (logger calls during cleanup paths can themselves raise). Refs #369	3 weeks ago
Broque Thomas	de348981a5	Surface silent exceptions in import pipeline — 11 sites - imports/side_effects.py: 8 sites (post-import cleanup paths, thumbnail+lyrics pulls, Plex refresh) - auto_import_worker.py: 3 sites (queue/dedup helpers) All converted to `logger.debug("...: %s", e)`. Refs #369	3 weeks ago
Broque Thomas	967c7f7c0a	Migrate album-info builders to typed Album path Steps 2+3 of typed metadata migration. Two album-info builders now route through Album.from_<source>_dict() when caller passes a known source: - _build_album_info (album-tracks lookups) - _build_single_import_context_payload (single-track import context) Legacy duck-typing stays as fallback for unknown source, non-dict input, or converter errors. Pure additive — existing callers without source kwarg unchanged.	3 weeks ago
Broque Thomas	2ab460f5c4	Add Library Disk Usage card to System Statistics Discord request (Samuel [KC]): show how much disk space the library takes on the Stats page. Implementation piggybacks on the existing deep scan — Plex/Jellyfin/Navidrome all return file size in their track API responses, so we read it during the deep scan and store it on the tracks row. Aggregation is then a single SQL query — no filesystem walk, no extra I/O during the scan, no separate stat job. SoulSync standalone gets size from os.path.getsize at insert time (different code path; the file is local when we write the row). Schema (`database/music_database.py`): - New `file_size INTEGER` column on `tracks`. Migration uses the established `try SELECT, except ALTER TABLE ADD COLUMN` pattern. Idempotent; safe on existing installs. NULL on legacy rows so they don't contribute to totals until next deep scan refreshes. - Added the column to the canonical CREATE TABLE so fresh installs get it without going through the migration path. Track-object plumbing: - `core/jellyfin_client.py` — JellyfinTrack reads MediaSources[0].Size alongside existing Bitrate read. None when 0 / missing. - `core/navidrome_client.py` — NavidromeTrack reads `size` from the Subsonic song object (int coercion + None on parse fail). - `core/soulsync_client.py` — SoulSyncTrack does os.path.getsize (only "server" where size has to come from disk). - Plex needs no client-side change: track.media[0].parts[0].size is read directly inside insert_or_update_media_track. Persistence — TWO separate insert paths: (a) `database/music_database.py:insert_or_update_media_track` — Plex/Jellyfin/Navidrome flows. Reads file_size from Plex's MediaPart OR `track_obj.file_size` wrapper attribute (defensive Plex-attr-not-present check + > 0 type guard). INSERT writes the new column. UPDATE uses COALESCE(?, file_size) so a None from the server on a re-sync (rare Jellyfin Size omission) doesn't blank an existing value. Pinned via test. (b) `core/imports/side_effects.py:record_soulsync_library_entry` — SoulSync standalone flow. Completely separate code path: the standalone deep scan moves files to staging for auto-import rather than calling insert_or_update_media_track. After the auto-import processes them, side_effects writes the tracks row directly. Reads file_size via os.path.getsize(final_path) at insert time (file is local) and includes it in the INSERT column list. SoulSync only does INSERT-if-not-exists (no UPDATE path), so no COALESCE concern. Aggregator (`database/music_database.py:get_library_disk_usage`): - SELECT COALESCE(SUM(file_size), 0), COUNT(file_size), COUNT() - COUNT(file_size) for the totals. - Per-format breakdown done in Python via os.path.splitext over (file_path, file_size) rows — sidesteps SQLite's first-vs-last-dot ambiguity for paths like /music/Kendrick/M.A.A.D City/01.flac. - Defensive: skips empty paths, paths without extension, and implausibly long extensions (>6 chars). Returns the full empty-shape dict (NOT a partial / undefined) when the column doesn't exist or queries fail, so the UI's `if (!data.has_data)` branch handles fresh installs cleanly. API + UI: - `core/stats/queries.py` — thin pass-through get_library_disk_usage matching the existing query-helper convention. - `web_server.py` — new /api/stats/library-disk-usage endpoint mirroring the /api/stats/db-storage pattern. - `webui/index.html` — new card in System Statistics above the Database Storage card. - `webui/static/stats-automations.js` — _loadLibraryDiskUsage + _renderLibraryDiskUsage. Empty state: "Run a Deep Scan to populate (X tracks pending)". Partial: "X measured (+Y pending)". Full: total + format bars proportional to the largest format. - `webui/static/style.css` — .stats-disk- styled to match the Database Storage card. Backward compatibility: - Migration is additive; existing rows get NULL file_size; the empty-shape return from the aggregator means the UI renders cleanly without errors before any deep scan runs. - Old installs upgrading will see "Run a Deep Scan to populate (N tracks pending)". Running their next deep scan fills sizes — the existing scan flow doesn't need any changes, just consumes the new track-wrapper attribute. Tests: - `tests/test_library_disk_usage.py` — 13 cases covering schema migration, NULL defaults on legacy inserts, fresh-install empty shape, summing with mixed NULL/known sizes, per-format breakdown, mixed-case extensions, paths with album-name dots, missing extensions, empty file_path, implausibly long extensions, JellyfinTrack.file_size persistence via insert_or_update_media_track, COALESCE preservation on null re-sync. - `tests/imports/test_import_side_effects.py` — extended the existing record_soulsync_library_entry test to assert track_row['file_size'] == os.path.getsize(final_path), pinning the SoulSync-standalone path. Test fixture's tracks schema also updated to include the file_size column. Verified: full suite 1813 pass (13 new, 1 existing-test extension), ruff clean, smoke test populating + reading the column round-trips correctly. WHATS_NEW entry under '2.4.2' dev cycle.	3 weeks ago
Broque Thomas	04a14f7e96	Fix: tasks showed Completed when file was quarantined User caught downloading Kendrick Mr. Morale: three tracks (Rich Interlude, Savior Interlude, Savior) showed ✅ Completed in the modal but were missing on disk. Log forensics revealed two layered bugs. Bug 1 — Verification wrapper assumed success on quarantined files (`core/imports/pipeline.py`): The outer `post_process_matched_download_with_verification` had a fallback at the "no `_final_processed_path` in context" branch that marked the task completed and notified `success=True`. The inner post-processor sets `_final_processed_path` only when the file actually reaches its destination. Integrity-rejected files (`_integrity_failure_msg` set) and race-guard-failed files (`_race_guard_failed` set) get quarantined or skipped without ever setting `_final_processed_path`, so they fell straight into the "assume success" branch. Confirmed in user's log: No _final_processed_path in context for task d5b88b84-... — cannot verify, assuming success That line fired for the same task right after the integrity check quarantined the source file. Result: ✅ Completed in UI, file in quarantine, never delivered. Fix: explicit checks for `_integrity_failure_msg` and `_race_guard_failed` markers BEFORE the assume-success fallback. Either marker set → task status='failed' with descriptive error_message + `_notify_download_completed(success=False)`. The pre-existing assume-success behavior preserved when no failure markers are set (some legitimate flows complete without setting `_final_processed_path`). Bug 2 — AcoustID skip-logic too lenient (`core/acoustid_verification.py`): The "language/script" exemption was: if best_score >= 0.95 and (title_sim >= 0.55 or artist_sim >= ARTIST_MATCH_THRESHOLD): The OR-clause fired for English-vs-English titles by the same artist that share NO actual content. Confirmed in user's log: requested "Rich (Interlude)" by Kendrick Lamar, AcoustID identified the audio as "R.O.T.C. (interlude)" by Kendrick Lamar (a totally different song from his 2010 mixtape) — same artist scored ≥ARTIST threshold, shared word "interlude" pushed title_sim above 0.55, skip fired. Verification returned SKIP instead of FAIL, the wrong file was accepted as the answer for three different track requests. Fix: skip now requires positive evidence the mismatch is a real language/script case: (a) Non-ASCII chars present in either title AND artist matches strongly → real transliteration case (kanji ↔ romaji etc) (b) BOTH title_sim >= 0.80 AND artist_sim >= ARTIST threshold → minor punctuation/casing differences English-vs-English with very different titles by the same artist no longer skipped — verification correctly returns FAIL, the wrong file gets quarantined, the new wrapper logic above marks the task failed. Tests: - `tests/test_integrity_failure_marks_task_failed.py` — 4 cases pinning the wrapper-level state machine: integrity marker → failed, race-guard marker → failed, no markers → still assumes success (legacy path preserved), integrity-failure-takes-priority over missing-final-path fallback. - `tests/test_acoustid_skip_logic.py` — 7 cases pinning the skip exemption: user's R.O.T.C-vs-Rich case → FAIL (regression test), Savior-vs-R.O.T.C → FAIL (same bug surface), Japanese kanji → romaji → SKIP (real language case still works), MAAD vs M.A.A.D → PASS or SKIP (punctuation tolerance), low fingerprint score → never skipped, high score but artist mismatch → no longer skipped, Crown vs Crown of Thorns → no longer skipped. Verified: full suite 1793 pass (11 new), ruff clean. WHATS_NEW entry under '2.4.2' dev cycle.	3 weeks ago
Broque Thomas	75fe04907f	Wire SoundCloud as a first-class download source Plug the previously-built SoundcloudClient (PR #478, the build-and-verify phase) into every place a download source needs to appear. Follows the same wiring contract as Tidal/Qobuz/HiFi/Deezer/Lidarr — orchestrator routing, hybrid-mode picker, search dispatch, queue/cancel/clear, provenance + library history, sidebar source label, settings UI all work plug-and-play. Backend wiring: - `core/download_orchestrator.py` — import SoundcloudClient, _safe_init it at startup, add to _client() lookup, get_source_status(), check_connection's sources_to_check default, search source_names map, search_and_download_best _streaming_sources tuple, download source_map + source_names, and every iteration loop in reload_settings download-path-update / get_all_downloads / get_download_status / cancel_download (route + iterate) / clear_all_completed_downloads / cancel_all_downloads. - `core/downloads/monitor.py` — added SoundCloud to the per-client loop that fetches active downloads outside the orchestrator (uses getattr fallback for older soulseek_client snapshots). - `core/downloads/task_worker.py` — added SoundCloud (and Lidarr, which was missing too — bonus fix) to source_clients dict for hybrid fallback dispatch. - `core/downloads/validation.py` — added 'soundcloud' to _streaming_sources so SoundCloud results go through the matching engine validation path instead of the Soulseek quality-filter path. - `core/imports/side_effects.py` — three call sites: source_map for download_source label written to library_history, streaming-source guard for the `\|\|`-encoded stream_id parsing, and source_service map for provenance recording. All three now include 'soundcloud'. - `web_server.py` — five streaming-source detection tuples updated. New `/api/soundcloud/status` endpoint returns {available, configured, reachable} mirroring the Deezer/HiFi status-endpoint pattern; reachability runs a real cheap yt-dlp search so the settings Test Connection button gives a meaningful pass/fail signal. - `config/settings.py` — added empty `soundcloud_download` defaults block so future tier-2 OAuth (SoundCloud Go+ session) doesn't have to migrate existing configs. Frontend: - `webui/index.html` — new `<option value="soundcloud">` in the download-source-mode dropdown, SoundCloud added to both hidden legacy hybrid-source selects, new settings container with info text + Test Connection button. - `webui/static/settings.js` — HYBRID_SOURCES entry (with the SoundCloud cloud SVG icon), _hybridSourceEnabled default, updateDownloadSourceUI container display, allSources for legacy hybrid picker, testSoundcloudConnection function (hits the new status endpoint, color-codes the result), saveSettings soundcloud_download empty block. - `webui/static/shared-helpers.js` — sidebar source-name map includes SoundCloud + Lidarr (Lidarr was also missing, bonus fix). - `webui/static/helper.js` — WHATS_NEW entry under '2.4.2' dev cycle describing the user-visible change in the chill terse voice. Tests: - `tests/test_download_orchestrator_soundcloud.py` — 14 integration tests verifying the wiring: client constructed at startup, _client lookup resolves 'soundcloud', get_source_status includes it, download dispatcher routes username='soundcloud' to the SoundCloud client (and unknown usernames still fall back to Soulseek), hybrid search iterates SoundCloud when in order and skips it cleanly when unconfigured, get_all_downloads / get_download_status / cancel / clear walk SoundCloud, soundcloud-only mode dispatches only to SoundCloud, _streaming_sources tuple in validation includes 'soundcloud'. - `tests/downloads/test_download_orchestrator.py` — added `soundcloud` to the test fixture's _build_orchestrator helper so the new orchestrator attribute doesn't AttributeError in pre- existing tests that bypass __init__. Verified: - Full suite green (1728 passed, 2 deselected for soundcloud_live) - Ruff clean - Live SoundCloud-only mode search returns 25 SoundCloud tracks for "kendrick lamar luther" in <2s, returning properly-shaped TrackResult objects with username='soundcloud' and dispatch-key filename ready for the download path. Out of scope (intentional deferrals): - SoundCloud Go+ OAuth tier (256 kbps AAC) — anonymous-only for now. Adding auth later is a settings-page extension, no orchestrator changes needed. - Album/playlist support — SoundCloud has playlists but they don't map to the album model the rest of SoulSync expects. Singles only.	3 weeks ago
Broque Thomas	42f3026eef	Reject broken downloads before tagging via universal integrity check Discord report (fresh.dumbledore [VRN]): slskd sometimes ships broken files (truncated transfers, corrupt FLAC, wrong file substituted on filename match). They flowed through post-processing and only surfaced later — Plex/Jellyfin scan failures, dead-air playback, duplicate detector tripping over the wrong length. By that point the file was already tagged, copied, mirrored to the media server, and recorded in provenance. New module `core/imports/file_integrity.py`: - `check_audio_integrity(path, expected_duration_ms=None) -> IntegrityResult` - Three tiered checks, cheapest to most expensive: 1. File size sanity (catches 0-byte stubs and stub transfers) 2. Mutagen parse (catches header damage, wrong-format-with-right-extension) 3. Duration agreement vs. metadata source's expected length, ±3s tolerance (5s for tracks over 10 minutes — long tracks naturally drift more) - Returns IntegrityResult with `ok`, human-readable `reason`, and per-check `checks` dict for debugging - Never raises; pathological inputs return ok=False with explanation Pipeline integration in `core/imports/pipeline.py:post_process_matched_download`: - Hooks between the existing file-stability wait and AcoustID verification - On failure: quarantine via existing `move_to_quarantine` helper, mark task failed with descriptive error, clear matched-context, fire `on_download_completed(success=False)` so the slot is released for retry - Mirrors the existing AcoustID-failure path so retry behavior stays consistent - Wrapped in try/except so an unexpected failure inside the check itself cannot block downloads — logs and continues This is intentionally tier 1: universal across formats, no external deps. A future tier could verify FLAC STREAMINFO MD5 by decoding audio (needs flac binary or libflac wrapper) — skipped for now since tier 1 catches the dominant Discord-reported cases (truncated, 0-byte, wrong file). Tests: - `tests/imports/test_file_integrity.py` — 14 cases covering all three check tiers, edge cases (zero/negative expected duration, long-track wider tolerance, caller tolerance override), and the mutagen-unavailable degradation path - `tests/imports/test_import_pipeline.py` — two existing tests use 5-byte fixture files that the new check would reject; they monkeypatch the integrity check since they're testing plumbing (notification + metadata_runtime forwarding), not integrity behavior WHATS_NEW entry under '2.4.2' dev cycle.	3 weeks ago
Broque Thomas	34ba26f5c8	Persist source IDs at download time + backfill onto tracks on sync Followup to fix/watchlist-external-id-match. The companion PR closed the demand side — the watchlist scanner asks for tracks by external IDs before falling back to fuzzy. But for users on Plex / Jellyfin / Navidrome the supply side was still broken: tracks.spotify_track_id (and the other ID columns) only got populated by the asynchronous enrichment workers, sometimes hours after the file was actually written. During that window the ID match fell through to fuzzy and the bug returned. We were already collecting every ID during post-processing — they live in the `pp` dict in core/metadata/source.py:embed_source_ids and get embedded into file tags. We just dropped the in-memory copy afterwards. This PR persists them and uses them: - Schema migration adds spotify_track_id / itunes_track_id / deezer_track_id / tidal_track_id / qobuz_track_id / musicbrainz_recording_id / audiodb_id / soul_id / isrc columns + indexes to the existing track_downloads table (already keyed by file_path). - core/metadata/source.py:embed_source_ids exposes pp["id_tags"] and the resolved ISRC back to the import context as _embedded_id_tags / _isrc. - core/imports/side_effects.py:record_download_provenance reads those context fields and passes them to db.record_track_download, which now accepts the new ID kwargs and persists them. - New db.get_provenance_by_file_path with exact + basename-suffix fallback (handles container mount-root differences between download-time path and media-server-reported path). - New db.backfill_track_external_ids_from_provenance copies IDs from track_downloads onto a tracks row idempotently — COALESCE on every column preserves any value the enrichment worker already wrote (enrichment is more authoritative for late binding). - database/music_database.py:insert_or_update_media_track (the single insertion point used by every Plex / Jellyfin / Navidrome sync) calls the backfill immediately after each INSERT/UPDATE. - New core/library/track_identity.py:find_provenance_by_external_id used as a second-tier fallback in watchlist_scanner.is_track_missing _from_library — catches the window between download and media-server sync. Caller checks os.path.exists on the provenance file_path before treating it as "already in library" so a deleted file doesn't prevent re-download. Effect: freshly downloaded files become ID-recognizable to the watchlist on the very next scan, no enrichment-wait window. 19 regression tests in tests/test_provenance_id_persistence.py: - Schema migration adds expected columns + indexes - record_track_download persists every ID kwarg - record_track_download backward-compat (old kwargs still work) - get_provenance_by_file_path: exact match, basename fallback for mount-root differences, multi-record latest-wins, defensive None - backfill: copies all IDs, preserves existing via COALESCE, no-op when no provenance exists - find_provenance_by_external_id: per-ID lookup, ISRC cross-bridge, OR semantics, latest-wins on multiple matches Out of scope: backfilling provenance for files downloaded BEFORE this PR (their track_downloads rows don't carry the new IDs). Those continue to wait for enrichment. Acceptable — only affects historical files; new downloads benefit immediately. Full pytest 1625 passed; ruff clean.	3 weeks ago
Broque Thomas	486116c34f	Honor lossy_copy.delete_original after successful conversion Reported case (CAL): with lossy_copy.enabled=True, lossy_copy.delete_original=True, and codec=mp3, every download left both the original FLAC AND the converted MP3 in the target folder. Users opting into a lossy-only library ended up dual-format on every import. Root cause: ``core/imports/file_ops.py:create_lossy_copy`` reads ``lossy_copy.codec`` and ``lossy_copy.bitrate`` from config but never reads ``lossy_copy.delete_original``. The setting is only consulted by the pre-move source-vanished check at ``core/imports/pipeline.py:651`` (so the pipeline knows to look for a lossy variant when the FLAC has already moved on), but no code path actually deletes the source after conversion. Fix: after ffmpeg returns success and the QUALITY tag is written, check ``lossy_copy.delete_original`` and ``os.remove`` the original when enabled. Belt-and-suspenders: - Same-path guard (``os.path.normpath(out_path) != os.path.normpath(final_path)``) prevents accidentally wiping the just-converted file if a future codec choice somehow resolves out_path to the source path. - ``FileNotFoundError`` is treated as success (concurrent worker / dedup cleanup got there first). - Other ``OSError`` (permission denied, locked file) is logged but doesn't propagate — the conversion already succeeded, the user just has to clean up the original manually. Failure paths skip the delete: - ffmpeg returns non-zero → returns None, original stays - lossy_copy.enabled=False → early return before conversion runs - delete_original=False (default) → original stays 7 regression tests cover honored-when-enabled, kept-when-disabled, default-keep, ffmpeg-failure-path, lossy-disabled-path, racing-delete, and locked-file paths. Full pytest 1563 passed; ruff clean. Note: this PR does NOT address the second bug CAL mentioned (track re-downloaded despite already existing on disk). That symptom is caused by stale album metadata on the user's existing files — the library DB has the track tagged on a different album than the metadata source reports — combined with wishlist.allow_duplicate_tracks defaulting to True. Same class of issue partially addressed in PR fix/watchlist-redownload-and-duplicate-detection but compilation- album drift is the only currently-handled case. Tracking separately.	3 weeks ago
Antti Kettunen	2b3022f6b0	Fix Spotify source ID fallback - Prefer real Spotify IDs when importing Spotify contexts - Skip numeric fallback IDs so Deezer values do not leak into spotify_* columns - Add regressions for import context and SoulSync library writes - Keep the route test asserting the Spotify album link	3 weeks ago
elmerohueso	f9f47f978e	fix post-download tagging, and enable tagging for hifi	3 weeks ago
Broque Thomas	46d8e15674	Prune slskd dedup orphans after import slskd appends "_<19-digit unix-nanosecond timestamp>" to a downloaded filename when the destination already contains a same-named file (concurrent downloads of the same track, partial-file retries after a connection drop, cancelled-then-redownloaded files, the same track surfacing in multiple synced playlists). The file-finder code already recognized the suffix when matching a download to its source — but after the canonical file moved into the library, the leftover "_<timestamp>" siblings sat orphaned in the downloads folder forever. Reported on Discord by Shdjfgatdif. cleanup_slskd_dedup_siblings() runs at the end of each successful import (3 safe_move_file sites in pipeline.py) and prunes any remaining siblings that strip down to the canonical stem with the same extension. Conservative match (>= 18 trailing digits) keeps legitimate filenames like "Track 5" and "Album 1995" untouched. Per- file unlink failures are swallowed so a single locked file doesn't block the rest. 17 regression tests cover the suffix-strip primitive, orphan removal, no-op cases, mismatched extensions, subdirectories, and partial-failure recovery.	3 weeks ago
Broque Thomas	99a38a6201	Route imported singles/EPs through album_path template Discord-reported (winecountrygames + fresh.dumbledore): "Import only makes Albums folder no singles or eps". Users with a ${albumtype}s/$albumartist/... album_path template saw an "Albums" folder fill up correctly but never any "Singles" or "EPs" folder. build_import_album_info detected an album using ``total_tracks > 1`` AND ``album_name != track_title``. Spotify singles fail both — total_tracks is 1 and the album is usually named after the song. The result was that staging/auto-import routed singles through single_path, which doesn't honour $albumtype, so the user's per-type folder layout never applied. Now also treats the metadata source's explicit release-type classification ("single", "ep", "compilation") as evidence that this is an album-shaped release, so it routes through album_path and the user's $albumtype substitution runs. The default fallback value "album" is deliberately excluded from this check so single-track downloads with no real metadata behave exactly as before. Adds 10 regression tests covering the reported scenario, EP and compilation explicit types, and three guards: normal multi-track albums still detected, default 'album' type falls through, and empty/unknown types fall through.	4 weeks ago
BoulderBadgeDad	58a4c1905b	Merge pull request #419 from kettui/refactor/metadata-service-split-and-metadata-client-management-optimizations Split metadata service logic into separate modules, move client management out of web_server	4 weeks ago
Antti Kettunen	50e1ae3a3f	Move metadata helpers into package modules - split metadata lookup logic into core/metadata/* - keep core/metadata_service.py as the legacy barrel - update tests and artist-detail code to patch concrete modules	4 weeks ago
Broque Thomas	d97d105b97	fix: substitute \$cdnum in download paths and skip auto disc folder when template uses it User report: multi-disc albums on the latest dev had literal "\$cdnum" in their filenames instead of the expected "CDxx" label, plus a redundant "Disc N" folder on top of the in-filename label. Two bugs in core/imports/paths.py: 1. _replace_template_variables (the substitution helper used by every download path builder) had no handling for \$cdnum or \${cdnum}. The matching helper in web_server.py and core/repair_jobs/library_reorganize.py did the substitution; this one didn't, so production downloads passed the placeholder through unchanged. Added a cdnum_value computation (CD%02d when total_discs > 1, empty otherwise) plus the corresponding bracket_map entry and \$cdnum replace before \$track (matches the ordering in the other path builders). 2. The album-path branch of build_final_path_for_track auto-injected a "Disc N" folder whenever total_discs > 1, suppressed only when the template contained \$disc. Templates using \$cdnum (or \${disc} / \${discnum} / \${cdnum}) got both a "CDxx" label in the filename and the auto folder. Widened the user_controls_disc check to cover all the disc-bearing placeholders. Bonus cleanup along the way: - Folder-part stripping now drops a leading \$cdnum token (mirrors the existing \$disc / \$discnum / \$quality strip — defensive against an empty cdnum landing alone in a folder segment). - Filename cleanup now strips a leading " - " left behind when \$cdnum expands to empty on a single-disc album (mirrors the same regex in library_reorganize.py). - album_template config access switched from the dotted-path key to the nested-dict access pattern used by the rest of the function — handles both production config_manager and the flat _Config used in tests. Tests: 4 new under tests/imports/test_import_paths.py - multi-disc cdnum substitution produces "CD02" - single-disc cdnum collapses to empty - folder-part containing only \$cdnum is dropped - build_final_path_for_track with \$cdnum template produces no auto "Disc N" folder Full suite: 1276 passing (was 1272). Ruff clean.	4 weeks ago
Antti Kettunen	f32fc9d56e	Extract wishlist logic into dedicated package - add core/wishlist as the home for wishlist payload, resolution, state, processing, reporting, and selection helpers - move wishlist-specific tests into tests/wishlist alongside the new package layout - keep web_server.py and the import/search callers as thin adapters for now	4 weeks ago
Broque Thomas	313b5677a5	Drop stale post-PR378 redefs and fix B009 Lifted-then-not-deleted leftovers from the PR378 merge: - web_server.py `_resolve_album_group` and `_build_final_path_for_track` were already imported at module top from `core/imports/`. Removed the shadowing local copies. - Mutagen reimports (FLAC/MP4/OggVorbis) at L17736-17738 shadowed the top-of-file imports. Picture/MP4Cover/MP4FreeForm were unused. Dropped the whole block. - core/imports/context.py: `getattr(artist, "name")` -> `artist.name` (B009). Ruff clean, 667 tests pass.	4 weeks ago
Antti Kettunen	02305096a3	Tighten metadata and import safety - Normalize album import track display handling so queue labels and match rows stay consistent - Bound MusicBrainz caches and avoid caching transient lookup failures - Stop swallowing programmer errors in source enrichment helpers - Restore import config test seams without reintroducing lazy imports - Guard task completion calls and fix the Windows path test expectation - Keep file lock tracking from growing without bound	4 weeks ago
Antti Kettunen	9315e74bea	Broaden import and metadata test coverage - Cover search_result fallback normalization and ambiguous album detection. - Add staging metadata, multi-disc path, and MusicBrainz enrichment cases. - Move the single-track context test next to the imports code it exercises.	4 weeks ago
Antti Kettunen	4f236baa6d	Fix import normalization and task completion locking - Promote legacy _source into source during import normalization. - Keep the normalized import context neutral after stripping aliases. - Avoid re-entering tasks_lock when marking completed download tasks.	4 weeks ago
Antti Kettunen	4c819681a1	Move single-track resolver; fix wishlist cleanup - keep single-track import lookup in imports/resolution.py - normalize simple-download search_result data before wishlist matching - run wishlist cleanup for simple-download post-processing - keep source-only artist detail on resolved names and MB short-circuit	4 weeks ago
Antti Kettunen	d04573f397	Fix single import source handling - pass the selected manual match through singles import - keep the import context source-aware so artist and album stay correct - avoid treating non-Spotify IDs as wishlist Spotify IDs - make wishlist logging and local variable names source-neutral	4 weeks ago
Antti Kettunen	9b2b6d856f	Split runtime builders into owning modules - Move the import pipeline runtime factory into core.imports.pipeline - Move the metadata runtime factory into core.metadata.enrichment - Keep the web server wiring thin and drop the shared glue module - Add contract tests that keep the two runtime bundles separate	4 weeks ago
Antti Kettunen	9e496397da	Move shared metadata helpers into package - Relocate the shared metadata helper module from core/metadata_common.py into core/metadata/common.py. - Update the new metadata package, the import pipeline, and the web entrypoint to use the package-scoped helper. - Keep the shared config, mutagen, file-lock, and tag-writing helpers centralized without touching unrelated files.	4 weeks ago
Antti Kettunen	9656dbd46a	Thread runtime through metadata enrichment - Pass the live runtime bundle into the shared metadata facade so worker-backed source enrichment can actually run. - Forward runtime from the import pipeline and web-server wrapper into embed_source_ids. - Add a regression test that verifies the runtime object reaches the source-ID embedding path.	4 weeks ago
Antti Kettunen	8319c6679f	Move new metadata helpers into a package - Keep existing metadata_cache and metadata_service at the top level for now - Move the new branch-local metadata helpers under core/metadata - Share MusicBrainz release cache state from core.metadata.source and update import sites	4 weeks ago
Antti Kettunen	bdef127dd6	Lift shared runtime state into core - Move app-wide task and activity registries out of core/imports - Share one runtime-state module across the web server, API, and import pipeline - Keep import-specific helpers focused on context and post-processing	4 weeks ago
Antti Kettunen	e10df4caf2	Rehome import helpers into core/imports - Move import flow modules into a dedicated package - Update app and test imports to the new namespace - Group the import-focused tests under tests/imports	4 weeks ago

47 Commits (dev)