SoulSync

Commit Graph

Author	SHA1	Message	Date
Broque Thomas	6c90d68de3	Discogs: count rows with empty type_ as real tracks too Reported by kettui on PR #374 review: the inline filter that backed `set_album_api_track_count` only counted rows where `type_ == 'track'`, but `discogs_client.get_album_tracks` itself accepts both `'track'` AND empty `type_` as real songs (line 660: `type_ in ('track', '')`). Releases where Discogs returns some real tracks with an empty `type_` field would be undercounted, which would silently disagree with the repair job's fallback `_get_expected_total` path (which calls into `get_album_tracks_for_source` and therefore uses the client's count). Extracted the filter into `count_discogs_real_tracks(tracklist)` — single source of truth for the rule, testable in isolation, and the worker call site is now a one-liner that names what it's doing. Also defensive about the input shape: `type_ == None`, missing field, and empty/None tracklist all handled cleanly. 10 tests pin the behavior: - empty/missing/None type_ all count as a real track (the kettui case) - 'heading', 'index', 'sub_track' excluded - unknown future type strings excluded conservatively - realistic multi-disc tracklist with mixed shapes counts correctly - empty/None input returns 0 without raising Credit: kettui — the PR #374 review comment that flagged this.	1 month ago
Broque Thomas	a60546929e	Fix Album Completeness job reporting zero findings for every album Reported by sassmastawillis: the Album Completeness maintenance job scans 3127 albums in 0.1 seconds and reports 0 findings — for every user, regardless of whether their library is actually complete. Restoring an older DB surfaced 7 correct findings, so the code logic works; the DB state is what's making everything look complete. Root cause: `albums.track_count` is only ever written by server-sync paths — Plex's `leafCount`/`childCount` and SoulSync standalone's `len(tracks)`. It's the OBSERVED count of tracks SoulSync has indexed, which is always exactly what `COUNT(tracks)` returns for that album. The completeness job treated it as the EXPECTED total and compared it against the observed count. They're equal by construction, so `actual >= expected` is always true: skip, 0.1s scan, 0 findings. Fix: new `api_track_count INTEGER` column on `albums`, written only by metadata-source code paths. Populated in two places so the scan is fast and the fallback is robust. 1. Enrichment workers — shared helper `set_album_api_track_count` in `core/worker_utils.py`. Called by each worker's existing `_update_album` method alongside its other album-column UPDATEs: - spotify_worker: `album_obj.total_tracks` from the Spotify Album dataclass (already in hand, zero new API calls) - itunes_worker: same, from the iTunes Album dataclass - deezer_worker: `nb_tracks` from full_data, falling back to search_data when the full lookup didn't run - discogs_worker: count of tracklist rows where `type_=='track'` (Discogs tracklists interleave heading and index rows that shouldn't count as songs) Helper skips the write on zero/None/negative/non-numeric inputs so a source lacking track info can't clobber a good value a different source already wrote. Caller owns the transaction — helper just queues an UPDATE on the caller's cursor without committing, so it batches cleanly with each worker's existing multi-UPDATE pattern. Hydrabase worker deliberately not touched — it's a P2P mirror that doesn't write album metadata to the local DB. Hydrabase- primary users hit the fallback path below. 2. Album Completeness repair job — new `al.api_track_count` column in the SELECT, read first in the scan loop. On miss (album never enriched, or enrichment workers haven't run yet on a fresh install), falls through to the existing `_get_expected_total()` API lookup and persists the result via the same shared helper (wrapped in connection/commit management since the repair job runs outside a worker's batched transaction). Also removed `al.track_count` from the scan's SELECT — now unused since the observed count was the whole source of this bug, and leaving a dead SELECT would invite a future engineer to re-introduce the same comparison. Help text on the job card was reworded so it honestly describes current behavior ("counts cached during normal enrichment are used when available; otherwise the job queries a metadata source directly") rather than the old "active provider first, then others as fallback" phrasing, which doesn't match how the cache actually fills — any enrichment worker that runs can populate it, and the last writer wins. Document-only follow-up if this edge case ever bites in practice: add a `api_track_count_source` column so the scan can prefer the configured primary source's count over others (e.g. deluxe vs. standard edition mismatches). Not worth the complexity today. For existing users, the first completeness scan after upgrade is fast to the extent their library is already enriched: the workers already ran and populated `api_track_count` on their normal schedule. For brand-new installs, the scan's fallback path handles the cold start — slower, but correct, and subsequent scans are fast. Does NOT affect: - Download / post-processing / wishlist / sync code paths — none of them read `track_count` for completeness semantics. - Plex / Jellyfin / Navidrome / standalone sync — still write `track_count` exactly as before; `api_track_count` is a separate column they never touch. - Other repair jobs. - Any UI path — same finding schema, just correct counts now. Files: - database/music_database.py — idempotent migration adding `api_track_count INTEGER DEFAULT NULL` to the existing album-column check block. - core/worker_utils.py — new `set_album_api_track_count` helper with the documented skip-on-bad-input contract. - core/spotify_worker.py, itunes_worker.py, deezer_worker.py, discogs_worker.py — one-liner call from each `_update_album`. - core/repair_jobs/album_completeness.py — scan uses the cache; fallback path persists API-lookup results via the shared helper; help text updated to match actual behavior. - tests/test_worker_utils_album_track_count.py — 9 tests covering the helper's write/skip contract + no-commit invariant. - tests/test_album_completeness_job.py — 2 tests for the repair job's fallback-path wrapper. - webui/static/helper.js — WHATS_NEW entry. Credit: sassmastawillis spotted the bug; the "restored older DB finds 7 albums" signal pinpointed DB state over code logic and made the diagnosis tractable.	1 month ago
Broque Thomas	288776a7f3	Add genre whitelist for filtering junk tags during enrichment New core/genre_filter.py with ~180 curated default genres. When strict mode is enabled in Settings → Library Preferences → Genre Whitelist, only whitelisted genres pass through during enrichment. Junk tags from Last.fm (artist names, radio shows, playlist names) are silently dropped. Applied at all 10 genre write points: Spotify, Last.fm, AudioDB, Deezer, Discogs, iTunes, Qobuz enrichment workers + post-processing genre merge + initial download artist/album creation. Strict mode is OFF by default — zero behavior change for existing users. First enable auto-populates the whitelist with defaults. Users can add, remove, search, and reset genres via the Settings UI.	1 month ago
Antti Kettunen	aec3047216	Improve graceful shutdown and rollback safety - Add interruptible stop events to background workers so shutdown wakes out of long sleeps instead of waiting on fixed delays. - Stop scan managers, repair worker, executors, and cleanup helpers deterministically so process exit does not leave background threads alive. - Add startup warnings for stale SQLite WAL/SHM sidecars so unclean shutdowns are easier to spot before init/migration errors cascade. - Prevent forced kills from leaving SQLite sidecars behind, which made rollbacks to older branches fail with malformed database errors.	1 month ago
Broque Thomas	240dd87727	Fix Discogs cache — add field extractor, wire worker caching, browser UI - Add _extract_discogs_fields to metadata cache — handles Discogs field names (title vs name, images array, Artist - Title format) - Worker uses _fetch_and_cache_artist/_fetch_and_cache_album helpers that cache raw data while returning it for enrichment - All search/lookup methods cache results for repeat queries - Cache browser: Discogs stat pill, source filter, clear button, badge - Fixes albums showing as 'Unknown' and artists missing images in cache	2 months ago
Broque Thomas	b68aa09469	Add Discogs enrichment worker with full metadata extraction - New core/discogs_worker.py — background worker enriching artists and albums with Discogs metadata following AudioDBWorker pattern exactly - Artist enrichment: discogs_id, bio, members, URLs, image backfill, genre backfill, summary backfill from bio - Album enrichment: discogs_id, genres, styles (400+ taxonomy), label, catalog number, country, community rating, image backfill - DB migration: discogs columns on artists (id, match_status, bio, members, urls) and albums (id, match_status, genres, styles, label, catno, country, rating, rating_count) - Worker initialization with pause/resume persistence - Status/pause/resume API endpoints - Integrated into enrichment status system, rate monitor, auto-pause during downloads/scans, WebSocket status emission	2 months ago

6 Commits (a9dcd60d3f0465b44bb80c8ff8fcc6c80886e9f7)