Foundation commit for issue #442 — Japanese kanji ↔ romanized name
quarantines and equivalent cross-script mismatches. MusicBrainz
exposes alternate-spelling aliases on every artist record but
SoulSync's matching never consulted them; cross-script comparison
scored 0% on raw similarity and the file got quarantined even when
MusicBrainz knew both names belonged to the same artist.
This commit only adds the column. Subsequent commits in this PR:
- Build a pure alias-aware artist comparison helper
- Wire the MusicBrainz worker to populate aliases on enrichment
- Add a live MB lookup with cache for un-enriched artists
- Wire the helper into the AcoustID verifier where the quarantine
decision actually fires
Schema change is additive (NULL default), gated by the same
`PRAGMA table_info` check the existing `_add_musicbrainz_columns`
helper uses, so re-running on databases that already have the
column is a no-op.
Verified:
- New `artists.aliases` column present in fresh DB init
- JSON round-trip works (mirrors the existing `genres` column pattern)
- No existing tests broken
Catches the silent excepts the awk-based earlier sweeps missed:
- Bare `except:` followed by `pass` (also swallows KeyboardInterrupt
and SystemExit — actively wrong). Upgraded to `except Exception as
e: logger.debug("...: %s", e)`. ~14 sites across connection_detect,
soulseek_client, listenbrainz_manager, watchlist_scanner,
youtube_client, navidrome_client, jellyfin_client, web_server.
- `except Exception:` + pass that the awk pattern missed (e.g.
multi-line or unusual whitespace). ~31 sites across automation_engine,
database_update_worker, music_database, spotify_client, web_server,
others.
- 14 legitimate cleanup sites left silent with explicit `# noqa: S110`
+ comment explaining why (atexit handlers, finally-block conn.close
calls). Logging during shutdown can itself crash because file handles
get torn down before the handler fires.
Also enables `S110` rule in pyproject.toml so this pattern fails CI
going forward — drift fails at PR review instead of at runtime against
a wedged worker thread. Tests path keeps S110 ignored (test fixtures
legitimately use try-except-pass for cleanup).
Adds a WHATS_NEW entry to helper.js summarizing the full #369 sweep.
Verified: `python -m ruff check .` → All checks passed.
Verified: `python -m pytest tests/` → 2188 passed.
Closes#369
Mostly schema-migration ALTER TABLE fallbacks (column-already-exists
is the silent expected case) plus a few cache-purge/notify-migration
spots. Same pattern as the web_server sweep: `except Exception as e:
logger.debug("...: %s", e)`.
Refs #369
GitHub issue #503 (@hadshaw21). Adding a HiFi instance via downloader
settings popped up ``no such table: hifi_instances`` even though
"Test Connection" and "Check All Instances" both worked.
Root cause: ``MusicDatabase._initialize_database`` runs every
``CREATE TABLE`` + every migration step inside one sqlite transaction.
Python's sqlite3 module doesn't autocommit DDL by default, so if any
later migration step throws on a user's specific DB shape (e.g. an
old volume from a prior SoulSync version with quirky schema state),
the WHOLE batch rolls back — including the ``hifi_instances`` CREATE
that ran earlier in the function. The user's next boot retries init,
hits the same migration failure, rolls back again. The ``hifi_instances``
table never lands no matter how many restarts.
Fix: defensive lazy-create. New ``_ensure_hifi_instances_table(cursor)``
helper runs ``CREATE TABLE IF NOT EXISTS`` on demand, called immediately
before every CRUD operation that touches ``hifi_instances``:
- ``get_hifi_instances`` / ``get_all_hifi_instances`` (read)
- ``add_hifi_instance`` / ``remove_hifi_instance`` (CRUD)
- ``toggle_hifi_instance`` / ``reorder_hifi_instances`` (CRUD)
- ``seed_hifi_instances`` (defaults seed)
Idempotent — costs one no-op CREATE check when the table is already
present, fully recovers from a broken init state. Read methods now
return empty instead of raising when init failed; write methods work
end-to-end.
Doesn't paper over the underlying init issue (still worth tracking
which migration step breaks for which user DB shapes — separate
concern) but makes HiFi instance management self-healing in the
meantime.
Tests:
- 7 obsolete tests that pinned ``raises sqlite3.OperationalError``
removed — that contract is no longer correct
- 7 new tests pin the lazy-create behavior: every CRUD method works
against a DB that's missing the ``hifi_instances`` table, verifying
the table gets created and the operation completes
2162/2162 full suite green. Pure additive — no behavior change for
users with a healthy DB; affected users get back to working hifi
instance management.
Closes#503.
Discord request: pull user's Discogs collection into the Your Albums
section on Discover, similar to how Spotify Liked Albums works.
Implementation extends the existing 3-source pipeline (Spotify /
Tidal / Deezer) to a 4-source pipeline with click-context dispatch —
Discogs-only albums open with rich Discogs release detail (vinyl/CD
format, year, label, country, tracklist). Mirrors the per-source
dispatch pattern from enhanced/global search.
Discogs client (`core/discogs_client.py`):
- New `get_authenticated_username()` resolves the username for the
configured personal token via Discogs's `/oauth/identity` endpoint.
Cached on the instance so subsequent collection page-fetches don't
re-hit it.
- New `get_user_collection(username=None, folder_id=0, per_page=100,
max_pages=50)` walks all pages of `/users/{username}/collection/
folders/{folder_id}/releases`. Returns normalized dicts ready for
upsert_liked_album. folder_id=0 = Discogs's "All" folder.
Pagination cap of max_pages*per_page = 5000 releases — bounds
runtime on heavy collections.
- New `get_release(release_id)` thin wrapper for `/releases/{id}` —
returns the raw API response so the album-detail endpoint can
render rich context.
- Both methods defensive: missing token → empty list, malformed
responses → skipped, falsy ids → None. Disambiguation suffix
stripping (`Madonna (3)` → `Madonna`) so Discogs artist names
match what Spotify/Tidal/Deezer use.
Schema (`database/music_database.py`):
- New `discogs_release_id TEXT` column on `liked_albums_pool`.
Migration uses the established `try SELECT, except ALTER TABLE`
pattern. Idempotent; safe on existing installs.
- Added the column to the canonical CREATE TABLE for fresh installs.
- `upsert_liked_album` extended with `'discogs': 'discogs_release_id'`
in BOTH the INSERT and UPDATE id-column maps so Discogs source_id
routes to the new column. INSERT statement column count + value
count updated together.
Backend (`web_server.py`):
- `/api/discover/your-albums/sources` — adds Discogs to the
`connected` list when `discogs.token` config is set.
- `_fetch_liked_albums` — new branch for Discogs. Lazy-imports
DiscogsClient, respects the `enabled_sources` config, walks the
collection, upserts each release. Same try/except shape as the
existing source branches.
- `/api/discover/album/<source>/<album_id>` — new `discogs` branch
fetches the release via DiscogsClient.get_release, normalizes the
Discogs tracklist format, parses Discogs's `MM:SS`/`HH:MM:SS`
duration strings to milliseconds, returns the same response shape
as the Spotify/Deezer/iTunes branches.
Frontend (`webui/static/discover.js`):
- `openYourAlbumsSourcesModal` — adds Discogs to `sourceInfo` with
the vinyl emoji icon. Existing toggle/save plumbing handles it.
- `openYourAlbumDownload` — restructured the per-source dispatch:
builds an ordered list of (source, id) tuples, tries each in turn,
breaks on the first successful response. Pure-Discogs albums go
straight to the Discogs detail endpoint → modal opens with Discogs
context. Multi-source albums prefer Spotify/Deezer first since
their tracklists carry proper streaming IDs ready for download.
Tests: `tests/test_discogs_collection_source.py` — 12 cases:
- get_user_collection: empty without token, normalizes response
shape, strips disambiguation suffix, handles missing year, skips
malformed releases, paginates correctly, caps at max_pages,
uses explicit username when provided.
- get_release: passes id through to /releases/{id}, returns None
for invalid ids without API call.
- liked_albums_pool: discogs_release_id round-trips through upsert
+ get; multi-source dedup carries both Spotify and Discogs IDs
on the same row.
Verified: full suite 1825 pass (12 new), ruff clean, smoke test
populating + reading the discogs_release_id column round-trips
correctly via the real DB.
WHATS_NEW entry under '2.4.2' dev cycle.
Discord request (Samuel [KC]): show how much disk space the library
takes on the Stats page. Implementation piggybacks on the existing
deep scan — Plex/Jellyfin/Navidrome all return file size in their
track API responses, so we read it during the deep scan and store
it on the tracks row. Aggregation is then a single SQL query — no
filesystem walk, no extra I/O during the scan, no separate stat
job. SoulSync standalone gets size from os.path.getsize at insert
time (different code path; the file is local when we write the row).
Schema (`database/music_database.py`):
- New `file_size INTEGER` column on `tracks`. Migration uses the
established `try SELECT, except ALTER TABLE ADD COLUMN` pattern.
Idempotent; safe on existing installs. NULL on legacy rows so
they don't contribute to totals until next deep scan refreshes.
- Added the column to the canonical CREATE TABLE so fresh installs
get it without going through the migration path.
Track-object plumbing:
- `core/jellyfin_client.py` — JellyfinTrack reads MediaSources[0].Size
alongside existing Bitrate read. None when 0 / missing.
- `core/navidrome_client.py` — NavidromeTrack reads `size` from
the Subsonic song object (int coercion + None on parse fail).
- `core/soulsync_client.py` — SoulSyncTrack does os.path.getsize
(only "server" where size has to come from disk).
- Plex needs no client-side change: track.media[0].parts[0].size
is read directly inside insert_or_update_media_track.
Persistence — TWO separate insert paths:
(a) `database/music_database.py:insert_or_update_media_track` —
Plex/Jellyfin/Navidrome flows. Reads file_size from Plex's
MediaPart OR `track_obj.file_size` wrapper attribute (defensive
Plex-attr-not-present check + > 0 type guard).
INSERT writes the new column.
UPDATE uses COALESCE(?, file_size) so a None from the server
on a re-sync (rare Jellyfin Size omission) doesn't blank an
existing value. Pinned via test.
(b) `core/imports/side_effects.py:record_soulsync_library_entry` —
SoulSync standalone flow. Completely separate code path: the
standalone deep scan moves files to staging for auto-import
rather than calling insert_or_update_media_track. After the
auto-import processes them, side_effects writes the tracks row
directly. Reads file_size via os.path.getsize(final_path) at
insert time (file is local) and includes it in the INSERT
column list. SoulSync only does INSERT-if-not-exists (no
UPDATE path), so no COALESCE concern.
Aggregator (`database/music_database.py:get_library_disk_usage`):
- SELECT COALESCE(SUM(file_size), 0), COUNT(file_size),
COUNT(*) - COUNT(file_size) for the totals.
- Per-format breakdown done in Python via os.path.splitext over
(file_path, file_size) rows — sidesteps SQLite's first-vs-last-dot
ambiguity for paths like /music/Kendrick/M.A.A.D City/01.flac.
- Defensive: skips empty paths, paths without extension, and
implausibly long extensions (>6 chars). Returns the full
empty-shape dict (NOT a partial / undefined) when the column
doesn't exist or queries fail, so the UI's `if (!data.has_data)`
branch handles fresh installs cleanly.
API + UI:
- `core/stats/queries.py` — thin pass-through get_library_disk_usage
matching the existing query-helper convention.
- `web_server.py` — new /api/stats/library-disk-usage endpoint
mirroring the /api/stats/db-storage pattern.
- `webui/index.html` — new card in System Statistics above the
Database Storage card.
- `webui/static/stats-automations.js` — _loadLibraryDiskUsage +
_renderLibraryDiskUsage. Empty state: "Run a Deep Scan to
populate (X tracks pending)". Partial: "X measured (+Y pending)".
Full: total + format bars proportional to the largest format.
- `webui/static/style.css` — .stats-disk-* styled to match the
Database Storage card.
Backward compatibility:
- Migration is additive; existing rows get NULL file_size; the
empty-shape return from the aggregator means the UI renders
cleanly without errors before any deep scan runs.
- Old installs upgrading will see "Run a Deep Scan to populate
(N tracks pending)". Running their next deep scan fills sizes —
the existing scan flow doesn't need any changes, just consumes
the new track-wrapper attribute.
Tests:
- `tests/test_library_disk_usage.py` — 13 cases covering schema
migration, NULL defaults on legacy inserts, fresh-install empty
shape, summing with mixed NULL/known sizes, per-format breakdown,
mixed-case extensions, paths with album-name dots, missing
extensions, empty file_path, implausibly long extensions,
JellyfinTrack.file_size persistence via insert_or_update_media_track,
COALESCE preservation on null re-sync.
- `tests/imports/test_import_side_effects.py` — extended the
existing record_soulsync_library_entry test to assert
track_row['file_size'] == os.path.getsize(final_path), pinning
the SoulSync-standalone path. Test fixture's tracks schema also
updated to include the file_size column.
Verified: full suite 1813 pass (13 new, 1 existing-test extension),
ruff clean, smoke test populating + reading the column round-trips
correctly.
WHATS_NEW entry under '2.4.2' dev cycle.
Discord report (Samuel [KC]): tracks of the same album sometimes carry
different MUSICBRAINZ_ALBUMID tags, which causes Navidrome (and other
media servers grouping by album MBID) to split the album into multiple
entries. Two-part fix — one for existing libraries, one for the root
cause that lets new imports drift.
Part 1 — Detector + fix action (catches existing dissenters):
`core/repair_jobs/mbid_mismatch_detector.py`:
- New helpers: `_read_album_mbid_from_file` and
`_write_album_mbid_to_file` use the Picard-standard tag conventions
(`TXXX:MusicBrainz Album Id` for MP3, `MUSICBRAINZ_ALBUMID` for
FLAC/OGG, `----:com.apple.iTunes:MusicBrainz Album Id` for MP4).
- New scan phase `_scan_album_mbid_consistency` runs after the
existing track-MBID scan: groups tracks by DB `album_id`, reads
each track's embedded album MBID, finds the consensus
(most-common) MBID via `Counter`, flags dissenters. Tracks without
an album MBID at all are skipped (they don't break Navidrome —
only an explicit MBID disagreement does). Albums where MBIDs are
perfectly tied (no clear consensus) are skipped too — surface as
a manual decision instead of fixing toward a 1/N tie.
- New finding type `album_mbid_mismatch` carries `consensus_mbid`,
`wrong_mbid`, `consensus_count`, `total_tracks_with_mbid`, and a
human-readable reason string.
`core/repair_worker.py`:
- Added `'album_mbid_mismatch': self._fix_album_mbid_mismatch` to the
fix dispatch dict and to the `fixable_types` tuple so auto-fix +
bulk-fix paths pick it up.
- New `_fix_album_mbid_mismatch` method reads `consensus_mbid` from
finding details, resolves the dissenter's file path via the shared
library resolver, calls `_write_album_mbid_to_file` to rewrite the
tag in place. Doesn't touch the album's other tracks (they're
already in agreement).
Part 2 — Root cause fix (prevents new SoulSync imports from drifting):
The original in-memory `mb_release_cache` in `core/metadata/source.py`
maps `(normalized_album, artist) -> release_mbid` so per-track
enrichment of the same album hits the cache and writes the same
MUSICBRAINZ_ALBUMID to every track. That cache is bounded (4096
entries) and in-process — so cache eviction (when other albums are
processed in between) and server restart can BOTH cause
inconsistency. Per-track album-name variation (e.g. some tracks
tagged `"Album"`, others tagged `"Album (Deluxe)"`) and per-track
artist variation (features) make it worse.
`core/metadata/album_mbid_cache.py` (new module):
- DB-backed `lookup(normalized_album, artist) -> release_mbid` and
`record(...)` functions. Same key shape as the in-memory cache.
- Strict additive design: every public function is wrapped in
try/except and degrades to None / no-op on ANY database error.
The existing in-memory cache + MusicBrainz lookup remains the
authoritative fallback. If this module breaks, downloads continue
exactly as they would today.
`database/music_database.py`:
- New `mb_album_release_cache` table with composite primary key
`(normalized_album_key, artist_key)`. Reverse-lookup index on
`release_mbid` for future debug tooling. Created via the existing
`CREATE TABLE IF NOT EXISTS` migration pattern — idempotent, no
schema version bump needed.
`core/metadata/source.py`:
- Surgical change inside the existing `embed_source_ids`
in-memory-cache-miss branch: BEFORE calling MusicBrainz, consult
the persistent cache. If a previous SoulSync run already resolved
this album's release MBID, reuse it. After a successful MB lookup,
store in BOTH caches. Both calls wrapped in defensive try/except
so any failure falls through to existing logic.
Tests:
- `tests/metadata/test_album_mbid_cache.py` — 16 cache tests:
round-trip, idempotent re-record, overwrite semantics, clear_all,
album+artist independence (no Greatest Hits collisions),
defensive None-on-empty-input, graceful degradation when the DB
is unavailable / connection raises / commit fails, schema sanity
(table + index exist after init).
- `tests/test_album_mbid_consistency.py` — 13 detector tests:
tag read/write round-trip on real FLAC files, Picard-standard tag
descriptors, defensive paths (unreadable file, empty input),
detector behavior (agreement → no flags, lone dissenter → flag,
ties → no flag, single-track albums → skipped, no-MBID tracks →
skipped, unresolvable file paths → skipped).
- `tests/metadata/test_metadata_enrichment.py` — added autouse
fixture monkeypatching the persistent cache to no-op for tests in
this file. The existing tests pin per-call MB counts and
in-memory cache state; without the fixture, persistent rows from
earlier tests would bypass the MB call. Persistent layer has its
own dedicated tests.
Verified: 1782 tests pass (29 new), ruff clean, smoke test confirms
end-to-end cache round-trip works.
WHATS_NEW entry under '2.4.2' dev cycle.
Followup to fix/watchlist-external-id-match. The companion PR closed
the demand side — the watchlist scanner asks for tracks by external IDs
before falling back to fuzzy. But for users on Plex / Jellyfin /
Navidrome the supply side was still broken: tracks.spotify_track_id
(and the other ID columns) only got populated by the asynchronous
enrichment workers, sometimes hours after the file was actually
written. During that window the ID match fell through to fuzzy and
the bug returned.
We were already collecting every ID during post-processing — they
live in the `pp` dict in core/metadata/source.py:embed_source_ids and
get embedded into file tags. We just dropped the in-memory copy
afterwards.
This PR persists them and uses them:
- Schema migration adds spotify_track_id / itunes_track_id /
deezer_track_id / tidal_track_id / qobuz_track_id /
musicbrainz_recording_id / audiodb_id / soul_id / isrc columns +
indexes to the existing track_downloads table (already keyed by
file_path).
- core/metadata/source.py:embed_source_ids exposes pp["id_tags"] and
the resolved ISRC back to the import context as _embedded_id_tags
/ _isrc.
- core/imports/side_effects.py:record_download_provenance reads those
context fields and passes them to db.record_track_download, which
now accepts the new ID kwargs and persists them.
- New db.get_provenance_by_file_path with exact + basename-suffix
fallback (handles container mount-root differences between
download-time path and media-server-reported path).
- New db.backfill_track_external_ids_from_provenance copies IDs
from track_downloads onto a tracks row idempotently — COALESCE on
every column preserves any value the enrichment worker already
wrote (enrichment is more authoritative for late binding).
- database/music_database.py:insert_or_update_media_track (the
single insertion point used by every Plex / Jellyfin / Navidrome
sync) calls the backfill immediately after each INSERT/UPDATE.
- New core/library/track_identity.py:find_provenance_by_external_id
used as a second-tier fallback in watchlist_scanner.is_track_missing
_from_library — catches the window between download and media-server
sync. Caller checks os.path.exists on the provenance file_path
before treating it as "already in library" so a deleted file
doesn't prevent re-download.
Effect: freshly downloaded files become ID-recognizable to the
watchlist on the very next scan, no enrichment-wait window.
19 regression tests in tests/test_provenance_id_persistence.py:
- Schema migration adds expected columns + indexes
- record_track_download persists every ID kwarg
- record_track_download backward-compat (old kwargs still work)
- get_provenance_by_file_path: exact match, basename fallback for
mount-root differences, multi-record latest-wins, defensive None
- backfill: copies all IDs, preserves existing via COALESCE,
no-op when no provenance exists
- find_provenance_by_external_id: per-ID lookup, ISRC cross-bridge,
OR semantics, latest-wins on multiple matches
Out of scope: backfilling provenance for files downloaded BEFORE
this PR (their track_downloads rows don't carry the new IDs). Those
continue to wait for enrichment. Acceptable — only affects historical
files; new downloads benefit immediately.
Full pytest 1625 passed; ruff clean.
Discord-reported scenario: a single "Super Single" by Artist1 feat.
Artist2 is also on Artist1's "Super Album". When the album is fully
owned, Artist1's discography correctly shows the single as complete,
but Artist2's discography (where the same track also appears as a
single) shows it as missing.
Two layers needed for the fix:
Scanner: the Jellyfin/Emby path was keeping only ArtistItems[0],
which is almost always equal to the album artist — so the
distinguishing per-track credit was silently suppressed. Now joins
every ArtistItems entry with "; " and stores the value when there
are multiple credits OR when the single credit differs from the
album artist. Plex's originalTitle already carries the full multi-
artist tag, so Plex users benefit without needing the scanner change.
Scorer: _calculate_track_confidence now splits track_artist on the
common multi-artist delimiters real-world tags use (",", ";", "&",
"feat.", "ft.", "featuring", "vs.", "x") and scores each piece
independently against the search artist, taking the max along with
the whole-string similarity as the floor. Never reduces a score —
purely additive matching for previously-missed featured-artist
credits.
Adds 12 regression tests covering the reported scenario, primary-
artist back-compat, every delimiter variant (parametrized), no-
regression on exact match, and the scanner storing every ArtistItem.
Existing Jellyfin-scanned rows persist their old single-artist value
until the next library scan rewrites them; Plex rows benefit
immediately on next match without needing a rescan.
Two bugs surfacing the same user-reported symptom: a Vaiana OST
track ("Where You Are" by Christopher Jackson) wouldn't match against
a Plex/Emby library because the album sits under the album artist
(Lin-Manuel Miranda).
Bug 1: the data was already there but scoring ignored it. The DB
schema has a tracks.track_artist column, the scanner populates it
from Plex's originalTitle and Jellyfin's ArtistItems[0], and the SQL
WHERE clause already searches it — but _rows_to_tracks dropped the
column on its way to the Python object, and _calculate_track_confidence
only scored against the album-artist JOIN. Candidates whose track-
artist matched got returned by the search and then immediately
filtered out by the low confidence score.
Fix: _rows_to_tracks now propagates row['track_artist'] onto the
returned object, and _calculate_track_confidence takes the better of
(album-artist similarity, track-artist similarity) so soundtracks
match through whichever credit the search query carries.
Bug 2: the album-aware fallback path constructed DatabaseTrack with
kwargs the dataclass doesn't accept (artist_name, album_title,
server_source). Every row TypeError'd, the outer except swallowed it
silently, and the fallback never matched anything since the column
was added — invisible because nothing logged it.
Fix: build DatabaseTrack with valid fields and attach the joined
columns afterwards, the same pattern _rows_to_tracks uses.
Adds 6 regression tests covering: track-artist match (the OST case),
album-artist still matches, scorer takes the better of the two,
defensive handling for tracks without track_artist, search-path
attribute propagation, and the previously-dead album-aware fallback.
- add neutral wishlist payload helpers while keeping legacy Spotify aliases
- route wishlist removal and classification through generic track data
- keep API and service compatibility for existing callers
Five issues kettui flagged on PR #377:
- Worker race (reorganize_queue.py): _next_queued() picked an item and
released the lock, then re-acquired to flip status='running'. A
cancel() landing in that window marked the item cancelled but the
worker still ran it. Replaced with _claim_next_or_wait() that picks
AND flips under one lock acquisition.
- Wakeup race (reorganize_queue.py): _wakeup.clear() after the empty
check could lose an enqueue's _wakeup.set(), parking a freshly-queued
album for up to 60 seconds. Replaced Lock + Event with a single
threading.Condition; cond.wait() releases and re-acquires atomically
on notify.
- Bulk dedupe (reorganize_queue.py:enqueue_many): looped single-item
enqueue, so a duplicate album_id later in the same batch could slip
through if the worker finished the first copy before the loop
reached the second. Now holds the lock for the whole batch and tracks
a per-batch seen set, so intra-batch duplicates dedupe against each
other and not just pre-existing items.
- Preview button stuck disabled (library.js:loadReorganizePreview):
early returns and thrown errors skipped the re-enable line. Moved
state into a canApply flag committed in finally, so any exit path
lands the button correctly.
- DB helpers swallowing failures (music_database.py): get_album_display_meta
and get_artist_albums_for_reorganize used to catch every Exception
and return None / [], so a real DB outage masqueraded as "album not
found" / "no albums". Now lets exceptions bubble; the route layer
already wraps them as 500.
Tests:
- test_cancel_and_run_are_mutually_exclusive — hammers enqueue+cancel
pairs and asserts the invariant that no successfully-cancelled item
ever ran (catches regressions to the atomic pick).
- test_enqueue_many_dedupes_batch_internal_duplicates — pins the
intra-batch dedupe.
- test_get_album_display_meta_propagates_db_errors and
test_get_artist_albums_for_reorganize_propagates_db_errors — pin
the bubble-up behavior.
Changelog updated in helper.js and version modal.
Replaces the single-slot "one reorganize at a time, return 409 on collision"
model with a per-user FIFO queue. Buttons stay clickable, "Reorganize All"
is one backend call instead of an N-call JS loop, and a status panel mounted
at the top of the artist actions bar shows live progress (active item,
queued count, recent completions) with per-item cancel buttons.
Backend
- core/reorganize_queue.py: singleton queue + worker thread, dedupe-on-
enqueue, cancel rules (queued cancellable, running not), enqueue_many
for bulk operations, progress fan-out via update_active_progress
- core/reorganize_runner.py: factory builds the worker's runner closure
with injected dependencies. Reads config per-call so changing the
download path in Settings takes effect on the next reorganize without
a server restart
- database/music_database.py: get_album_display_meta and
get_artist_albums_for_reorganize — moves the SQL out of route handlers
- web_server.py: thin enqueue/snapshot/cancel/clear endpoints, runner
registration at module load. Old _reorganize_state globals + status
endpoint deleted. Static-asset cache buster (?v=<server-start>)
added so JS/CSS updates ship live without users clearing cache
Frontend
- webui/static/library.js: status panel mount, polling (1.5s when
active, 8s when idle), expand/collapse, per-item cancel, debounced
enhanced-view reload (one reload per artist batch instead of N).
Per-album reorganize button paints with queued/running indicator
and short-circuits to a toast when the album is already in queue
- webui/static/style.css: panel + button styling matching the existing
glass-UI accents
- webui/static/helper.js + version modal: WHATS_NEW entry
Tests (22 new)
- tests/test_reorganize_queue.py (19 tests): FIFO order, dedupe,
per-item source, cancel rules, continue-on-failure, snapshot
shape, progress propagation, bulk enqueue
- tests/test_reorganize_runner.py (4 tests): per-call config reads,
setup-failure summary, dependency injection, progress fan-out
- tests/test_reorganize_db_methods.py (7 tests): SQL JOIN behavior,
ordering, fallback for blank strings, artist isolation
Full suite 549 passed in 27s.
Reported by kettui on PR #374 review:
> api_track_count is not copied during the ratingKey migration, so
> the cache disappears when an album row is rekeyed. Add it to
> enrichment_cols or the next completeness scan will fall back to
> live API lookups again.
When Plex changes an album's ratingKey (after a library rescan), the
sync code rekeys the album row by inserting a new row at the new ID
and copying enrichment columns from the old row. The list of
columns to copy did not include `api_track_count`, so the cached
authoritative track count was lost on rekey — and the next completeness
scan would hit the fallback path that calls back out to the
metadata source's API. Defeats the cache.
Added `api_track_count` to the album-level `enrichment_cols` at
`music_database.py:4724`. The artist-level lists at lines 4238 and
4554 don't need updating — those are for artist rekeys and don't
carry album-scoped fields.
No new test — existing migration code has no test infrastructure
and writing a Plex-mocked one is larger than this fix. Cin will say
if he wants test coverage in his next review pass.
Credit: kettui — PR #374 review comment that flagged the missing
column in the rekey allowlist.
Reported by sassmastawillis: the Album Completeness maintenance job
scans 3127 albums in 0.1 seconds and reports 0 findings — for every
user, regardless of whether their library is actually complete.
Restoring an older DB surfaced 7 correct findings, so the code logic
works; the DB state is what's making everything look complete.
Root cause: `albums.track_count` is only ever written by server-sync
paths — Plex's `leafCount`/`childCount` and SoulSync standalone's
`len(tracks)`. It's the OBSERVED count of tracks SoulSync has indexed,
which is always exactly what `COUNT(tracks)` returns for that album.
The completeness job treated it as the EXPECTED total and compared it
against the observed count. They're equal by construction, so
`actual >= expected` is always true: skip, 0.1s scan, 0 findings.
Fix: new `api_track_count INTEGER` column on `albums`, written only by
metadata-source code paths. Populated in two places so the scan is
fast and the fallback is robust.
1. Enrichment workers — shared helper `set_album_api_track_count`
in `core/worker_utils.py`. Called by each worker's existing
`_update_album` method alongside its other album-column UPDATEs:
- spotify_worker: `album_obj.total_tracks` from the Spotify Album
dataclass (already in hand, zero new API calls)
- itunes_worker: same, from the iTunes Album dataclass
- deezer_worker: `nb_tracks` from full_data, falling back to
search_data when the full lookup didn't run
- discogs_worker: count of tracklist rows where `type_=='track'`
(Discogs tracklists interleave heading and index rows that
shouldn't count as songs)
Helper skips the write on zero/None/negative/non-numeric inputs
so a source lacking track info can't clobber a good value a
different source already wrote. Caller owns the transaction —
helper just queues an UPDATE on the caller's cursor without
committing, so it batches cleanly with each worker's existing
multi-UPDATE pattern.
Hydrabase worker deliberately not touched — it's a P2P mirror
that doesn't write album metadata to the local DB. Hydrabase-
primary users hit the fallback path below.
2. Album Completeness repair job — new `al.api_track_count` column
in the SELECT, read first in the scan loop. On miss (album never
enriched, or enrichment workers haven't run yet on a fresh
install), falls through to the existing `_get_expected_total()`
API lookup and persists the result via the same shared helper
(wrapped in connection/commit management since the repair job
runs outside a worker's batched transaction).
Also removed `al.track_count` from the scan's SELECT — now unused
since the observed count was the whole source of this bug, and
leaving a dead SELECT would invite a future engineer to re-introduce
the same comparison.
Help text on the job card was reworded so it honestly describes
current behavior ("counts cached during normal enrichment are used
when available; otherwise the job queries a metadata source
directly") rather than the old "active provider first, then others
as fallback" phrasing, which doesn't match how the cache actually
fills — any enrichment worker that runs can populate it, and the
last writer wins. Document-only follow-up if this edge case ever
bites in practice: add a `api_track_count_source` column so the
scan can prefer the configured primary source's count over others
(e.g. deluxe vs. standard edition mismatches). Not worth the
complexity today.
For existing users, the first completeness scan after upgrade is
fast to the extent their library is already enriched: the workers
already ran and populated `api_track_count` on their normal schedule.
For brand-new installs, the scan's fallback path handles the cold
start — slower, but correct, and subsequent scans are fast.
Does NOT affect:
- Download / post-processing / wishlist / sync code paths — none
of them read `track_count` for completeness semantics.
- Plex / Jellyfin / Navidrome / standalone sync — still write
`track_count` exactly as before; `api_track_count` is a separate
column they never touch.
- Other repair jobs.
- Any UI path — same finding schema, just correct counts now.
Files:
- database/music_database.py — idempotent migration adding
`api_track_count INTEGER DEFAULT NULL` to the existing album-column
check block.
- core/worker_utils.py — new `set_album_api_track_count` helper with
the documented skip-on-bad-input contract.
- core/spotify_worker.py, itunes_worker.py, deezer_worker.py,
discogs_worker.py — one-liner call from each `_update_album`.
- core/repair_jobs/album_completeness.py — scan uses the cache;
fallback path persists API-lookup results via the shared helper;
help text updated to match actual behavior.
- tests/test_worker_utils_album_track_count.py — 9 tests covering
the helper's write/skip contract + no-commit invariant.
- tests/test_album_completeness_job.py — 2 tests for the repair
job's fallback-path wrapper.
- webui/static/helper.js — WHATS_NEW entry.
Credit: sassmastawillis spotted the bug; the "restored older DB
finds 7 albums" signal pinpointed DB state over code logic and
made the diagnosis tractable.
PR #340 added ruff to the build-and-test.yml CI gate, which surfaced
286 pre-existing lint errors. Left unfixed, every feature branch push
fails CI. This commit resolves all of them so CI goes green and
contributors can actually land work.
Auto-fixes (248 of 286): removed unused f-string prefixes (F541),
renamed unused loop control variables with underscore prefix (B007),
removed duplicate imports (F811).
Manually fixed 10 latent bugs ruff caught (all wrapped in try/except
today, silently failing):
- music_database.py: _add_discovery_tables() called undefined
conn.commit() — would have crashed the iTunes-support migration
for existing databases. Now uses cursor.connection.commit().
- web_server.py settings GET: referenced undefined download_orchestrator
when it should be soulseek_client. Feature (_source_status on the
settings payload) was silently missing for UI auto-disable logic.
- web_server.py _process_wishlist_automatically: active_server
undefined in track-ownership check. Auto-wishlist was falling
through to the error handler and re-downloading owned tracks.
- web_server.py start_wishlist_missing_downloads: same active_server
bug in the manual wishlist path.
- web_server.py _process_failed_tracks_to_wishlist_exact: emitted
wishlist_item_added automation event with undefined artist_name
and track. Automation event silently never fired correctly.
- web_server.py discovery metadata enrichment: referenced cache
without calling get_metadata_cache() first. Track enrichment from
cached API responses was silently skipped.
- web_server.py Beatport discovery worker: wing-it fallback branch
used undefined successful_discoveries variable. Wing-it counter
never incremented correctly. Now uses state['spotify_matches']
consistently with the rest of the function.
- web_server.py _run_full_missing_tracks_process: stale import json
mid-function shadowed the module-level import, making an earlier
json.dumps() call reference an unbound local (F823).
- web_server.py discovery loop: platform loop variable shadowed
the module-level platform import (F402).
- core/watchlist_scanner.py: 7 lambda captures of loop variables
(B023 classic Python closure-in-loop bug) now bind at creation.
No existing tests had to change. Full suite stays at 263 passed.
Artist detail pages ran check_album_exists_with_editions and check_track_exists
per discography item, each firing 5+ title variations times 3 artist variations
of fuzzy LIKE searches plus fallback broad-artist queries. For a 30-album artist
that was ~450 SQL round-trips just to answer "which of these do I own."
Hoist the artist's library albums and tracks into memory once per request via
two new helpers — get_candidate_albums_for_artist and get_candidate_tracks_for_albums —
and thread them through as optional candidate_albums / candidate_tracks params on
check_album_exists_with_editions, check_album_exists_with_completeness,
check_track_exists, check_album_completion, and check_single_completion.
Batched path scores the same _calculate_album_confidence / _calculate_track_confidence
against the in-memory list, preserving Smart Edition Matching and accuracy.
Title-only cross-artist fallback still fires for collaborative-album edge cases.
None on either param preserves legacy per-item SQL behavior for unaffected callers.
Applied to both /api/library/completion-stream (library artist detail page) and
iter_artist_discography_completion_events (Artists search page). Timing logs
added to confirm the pre-fetch cost and loop elapsed time.
On a Kendrick page load, per-album resolution drops from ~8 seconds to under
the 50ms streaming sleep floor. Observed ~100x SQL reduction on the happy path.
Four enrichment workers (Last.fm, MusicBrainz, Tidal, Qobuz) had a
bug where every background loop re-processed the same rows because
the existing-ID short-circuit path never set match_status, and two
workers queried the wrong column when checking for an existing ID.
lastfm_worker._get_existing_id queried a non-existent lastfm_id
column; the real column is lastfm_url. The method now reads
lastfm_url for all three entity types.
musicbrainz_worker._get_existing_id queried musicbrainz_id for all
entity types, but albums use musicbrainz_release_id and tracks use
musicbrainz_recording_id. The method now uses a per-type column map.
All four workers (lastfm, musicbrainz, tidal, qobuz) now write
match_status='matched' when they short-circuit on an already-present
external ID, so these rows are no longer re-selected on the next
worker sweep.
A new migration (_backfill_match_status_for_existing_ids) runs once
on startup to retroactively set match_status='matched' for rows that
already have an external ID but NULL match_status. This covers legacy
data, manual matches, and rows populated from file tags outside the
worker.
The /api/v1/library/tracks endpoint called search_tracks() to get
DatabaseTrack objects, then immediately called api_get_tracks_by_ids()
to re-hydrate full rows for serialization. Two round trips per search.
Added api_search_tracks() that returns dict rows with all track columns
plus artist_name, album_title, and album_thumb_url in a single query.
The basic and fuzzy search helpers were refactored to share raw-row
implementations, so the existing search_tracks() still returns
DatabaseTrack objects for the many internal callers that depend on
that shape (matching pipeline, repair worker, web UI search).
The wishlist list endpoint previously loaded and JSON-decoded the full
wishlist, filtered by category in Python, then sliced in memory. Cost
grew linearly with wishlist size on every page request.
get_wishlist_tracks now accepts offset and category parameters, both
applied in SQL via LIMIT/OFFSET and json_extract. get_wishlist_count
also accepts category so COUNT(*) matches the filtered page. The API
endpoint uses these to return only the requested page.
Backward compatible: other callers (core/wishlist_service) pass no
offset/category and still receive the full list.
Users can now override which metadata provider (Spotify, Deezer, Apple Music,
Discogs) is used when scanning a specific watchlist artist for new releases.
The selector appears in the artist config modal and only shows sources the
artist has enrichment IDs for. Default behavior is unchanged — all artists
use the global metadata source unless explicitly overridden.
Split Downloads page into main list (left) and batch panel (right).
Each active batch gets a color-coded card with artwork thumbnail,
progress bar, per-track status with download percentages, and
expandable track list. Download rows get matching color indicators.
- Click batch name to open its download/wishlist modal
- Filter icon narrows main list to one batch with clear banner
- Collapsible panel toggle for full-width list view
- Completed batches fade out after 15 seconds
- 7-day batch history with source type color dots
- Artwork fallback shows colored initial when no art available
- Per-track progress: download %, spinner for searching, proc label
- source_page column on sync_history for UI origin tracking
- /api/downloads/all includes batch summaries and per-track progress
- /api/downloads/batch-history endpoint for history queries
- Responsive layout, overflow-x hidden to prevent scroll flicker
Full auto-import pipeline: background worker watches the staging folder,
identifies music using embedded tags → folder name parsing → AcoustID
fingerprinting, matches files to metadata source tracklists, and
processes high-confidence matches through the existing post-processing
pipeline automatically.
Worker: AutoImportWorker with start/stop/pause/resume, configurable
scan interval (default 60s), confidence threshold (default 90%), and
auto-process toggle. Processes one folder per cycle, alphabetical
order. Disc folder detection, stability checking, content hash dedup.
Confidence gate: 90%+ auto-processes silently, 70-90% queued as
pending review with approve/dismiss actions, <70% flagged for manual
identification. Track matching uses weighted algorithm (title 45%,
artist 15%, track number 30%, album tag 10%).
Database: auto_import_history table tracks every scan result with
folder hash, match data JSON, confidence, status, timestamps.
API: 7 endpoints — status, toggle, settings (GET/POST), results
(filtered/paginated), approve, reject.
UI: Auto tab on Import page with enable toggle, confidence slider,
scan interval selector. Live result cards with album art, confidence
bar (green/yellow/red), status badges, match stats. 5-second polling.
Full automation page upgrade with group management and drag-and-drop:
Backend: batch_update_group() and bulk_set_enabled() DB methods, new
PUT /api/automations/group and POST /api/automations/bulk-toggle endpoints.
Group headers: rename (inline edit), delete (choice dialog — keep
automations or delete all), bulk toggle (enable/disable all in group).
Actions appear on hover, styled as small icon buttons.
Drag and drop: non-system cards are draggable between group sections.
Drop zones show dashed accent border feedback. Collapsed sections
auto-expand on 500ms drag-hover. System/Hub sections dimmed during drag.
dragenter counter pattern handles child element bubbling.
Delete group dialog: glass card modal with three options — keep
automations (move to My Automations), delete everything, or cancel.
Two bugs: (1) 'wishlist' was missing from the settings save whitelist,
so the toggle silently reset to ON on every page reload. (2) The
wishlist cleanup function unconditionally removed tracks sharing the
same name+artist regardless of album, ignoring the allow_duplicates
setting. Now when allow_duplicates is on, the dedup key includes the
album name so same song from different albums can coexist.
Explored status was stored only in frontend memory; on reload the badge
disappeared because the API never returned it. Added explored_at column
to mirrored_playlists (auto-migrated), written when build-tree completes,
and read back via SELECT * so the badge survives page refreshes.
track_downloads stores local Windows paths but tracks table stores
server-side paths (Plex/Jellyfin). Both the track_id lookup (NULL
due to failed auto-link at insert time) and exact file_path fallback
were failing.
Added filename-suffix LIKE matching as a final fallback in
get_track_source_info, plus a back-link so the track_id gets written
back for fast future lookups. Also improved the auto-link in
record_track_download to use the same suffix matching when exact path
fails.
Album completeness and downstream repair flow now follow the configured
primary provider first, with Discogs and Hydrabase support added alongside
existing Spotify, iTunes, and Deezer paths.
Keep spotify_track_id for compatibility while preserving source-aware track
IDs for provider-neutral handling.
Server Playlists was filtered to only show playlists matching mirrored_playlists entries,
but Discover syncs are stored in sync_history (not mirrored_playlists), so they were
excluded. Adds GET /api/sync/history/names returning distinct synced playlist names,
and includes those in the filter alongside mirrored playlists.
Builds a new Your Albums section on the Discover page that aggregates
saved/liked albums from all connected services, mirroring the Your Artists
pattern. Deezer works via both OAuth and ARL.
- tidal_client: add get_favorite_albums() with V2/V1 API fallback
- deezer_client: add get_user_favorite_albums() via OAuth (user/me/albums)
- deezer_download_client: add get_user_favorite_albums() via ARL session
- music_database: add liked_albums_pool table (deduped by artist::album
normalized key), upsert_liked_album, get_liked_albums,
get_liked_albums_last_fetch, clear_liked_albums
- web_server: GET /api/discover/your-albums (ownership-checked, paginated),
GET /api/discover/your-albums/sources, POST /api/discover/your-albums/refresh,
_fetch_liked_albums background worker (Spotify + Tidal + Deezer OAuth/ARL)
- frontend: Your Albums section with source selector cog, album grid reusing
spotify-library-card styles, search/filter/sort/pagination, download missing
button, auto-refresh poll on first load
Also fix: Deezer greyed out in Your Artists sources when using ARL — connection
check now accepts ARL auth (deezer_dl.is_authenticated()) in addition to OAuth,
and _fetch_and_match_liked_artists falls back to ARL client for artist fetching.
The wishlist table has a UNIQUE constraint on spotify_track_id, so
INSERT OR REPLACE silently overwrote the existing entry when the same
track appeared on a different album (same Spotify track ID). Now uses
a composite key (track_id::album_id) when allow_duplicates is on and
the base track ID already exists, allowing both versions to coexist.
Only affects users with allow_duplicates enabled.
One-time migration deletes metadata cache entries where artist_name is
null, empty, 'unknown', 'unknown artist', etc. The cache gate now
prevents new junk entries, but existing poisoned data from before the
fix needs cleaning so users don't keep getting Unknown Artist from
stale cache hits.
search_artists now uses unidecode_lower() and _normalize_for_comparison()
so 'Tiesto' finds 'Tiësto'. Track search already had this — artist
search was the only gap. No change to stored data, only the comparison.
- Add interruptible stop events to background workers so shutdown
wakes out of long sleeps instead of waiting on fixed delays.
- Stop scan managers, repair worker, executors, and cleanup helpers
deterministically so process exit does not leave background threads
alive.
- Add startup warnings for stale SQLite WAL/SHM sidecars so unclean
shutdowns are easier to spot before init/migration errors cascade.
- Prevent forced kills from leaving SQLite sidecars behind, which
made rollbacks to older branches fail with malformed database
errors.
When starting from scratch (no existing .db file), certain db init steps were being skipped. Upon subsequent startup, these remaining 4-5 steps would execute.
Up until recently this has not been much of an issue since all the db init steps were run repeatedly throughout the process' lifetime, but after the init was changed to be only done once per startup, this became more problematic
7-step full-screen wizard: Welcome, Metadata Source, Download Source,
Paths & Media Server, Add Artists, First Download, Done. All settings
save to DB identically to the Settings page. Supports all 6 download
sources with inline config and test buttons. First download goes through
the full matched download pipeline with metadata context.
Fixes:
- Download clients (YouTube/HiFi/Tidal/Qobuz/Deezer) now reload
download_path when settings change instead of caching from init
- watchlist_artists table migrations now include deezer_artist_id and
discogs_artist_id in all 3 table rebuild locations (was being dropped)
- CREATE TABLE for watchlist_artists includes all provider ID columns
- Serverless download sources (YouTube/HiFi/Qobuz) show green status
instead of red disconnected on sidebar and dashboard
- Suppress repeated slskd 401 errors — logs once then silences until
connection recovers
Stripped 4,200+ emoji characters from print(), logger calls across
39 Python files. Logs are now clean text — easier to grep, more
professional, no encoding issues on terminals without Unicode support.
Seasonal config icons preserved for UI display.
Navidrome provides musicBrainzId on tracks — now captured during
database updates so the MusicBrainz enrichment worker can skip
tracks that already have an MBID.
Uses COALESCE on UPDATE to never overwrite existing enrichment data
with NULL (safe for Plex/Jellyfin which don't provide this field).
Inspired by PR #279 — fixed data loss bug in the original where
unconditional UPDATE would erase existing MBIDs.