37725457 fixed _match_to_itunes to use the real iTunes client and flagged
the cross-source corruption as a possibility. Boulder's live DB proves it
happened: 6 of his 9 watchlist "iTunes" ids EQUAL the artist's Deezer id
(Taylor Swift's "iTunes" id was her Deezer id 12246; the real one is
159260351) — written back when the misnamed MetadataService.itunes slot
held a DeezerClient. The June-4 batch (Green Day, SOAD, Vulfpeck, ...) got
NULL instead because the slot now holds the Spotify primary.
The fix alone can't heal those rows: the backfill only fills EMPTY ids, so
a wrong non-empty id is permanent. New migration clears itunes_artist_id
where it equals deezer_artist_id (the corruption signature — distinct id
spaces, so a legitimate equal pair is effectively impossible, and the worst
case is a NULL that re-matches correctly on the next scan). Idempotent by
construction; similar_artists checked clean (its backfill always used the
registry correctly).
Tests: corrupted row cleared / legit + no-deezer rows kept / idempotent —
via a real re-init with the per-process init memo cleared (an app restart).
carlosjfcasero: 'Champagne Supernova (OurVinyl Sessions)' is in the library
but the artist page shows it unowned and wishlist cleanup never removes it.
Measured with the real catalogs: Deezer/iTunes title the TRACK with the
qualifier while the library track is bare (the qualifier lives in the album
title) — and _calculate_track_confidence crushed that pair to ~0.17: the
"clean" titles keep parenthetical words, so the length-ratio penalty treats
'Champagne Supernova' vs 'Champagne Supernova (OurVinyl Sessions)' as
different songs. (Also confirmed: the OurVinyl release is absent from
Deezer's discography for the artist, so the standard page's 25-release list
not showing it is the source catalog, not a bug.)
Fix 1 — core.text.title_match.strip_redundant_context_qualifiers: a
parenthetical qualifier whose text appears (word-bounded) in the db track's
ALBUM title — or in the other title — restates release context and is
stripped for a comparison variant scored with its own length guard. Genuine
version markers keep their penalty: '(Live)' on a studio album appears in no
context and still blocks; '(Live)' on 'Live at Wembley' correctly matches —
owning the live album IS owning the live cut. Wired into
_calculate_track_confidence, so every check_track_exists consumer (wishlist
cleanup, discography dedup, repair jobs) benefits.
Fix 2 — the artist-page ownership endpoint's album gate: when album-aware
narrowing eliminates EVERY library candidate (the source's album naming just
doesn't resemble the library's — 'Jillette Johnson | OurVinyl Sessions' vs
'Champagne Supernova (OurVinyl Sessions)' ~0.5), fall back to artist-wide
title matching instead of declaring everything unowned off a failed
album-NAME comparison.
Tests: 8 — the exact reported pair end-to-end through check_track_exists,
word-boundary containment ('live' in 'alive' doesn't count), version-marker
safety both ways, and prefix songs still blocked. 1125 matching/wishlist/
library tests pass.
User ask: "a modal that lists the tracks downloaded via watchlist" — extended,
as discussed, to playlists too. One modal, two tabs, opened from the Watchlist
page (watchlist tab preselected) and the Sync page (playlists tab) — same
shared-modal-different-entry-points UX as the rest of the app.
The data: library_history recorded which SERVICE a file came from but never
what TRIGGERED it. New origin/origin_context columns (migration + index) are
written once at the import chokepoint via core/downloads/origin.py, a pure
tested deriver that reads, in priority: an explicit _dl_origin stamp (set at
batch-task creation for direct playlist batches, where the playlist context
otherwise only survived in folder mode), the wishlist provenance already
riding in track_info.source_info (watchlist_artist_name / playlist_name —
watchlist_scanner has stamped these for ages), and the folder-mode playlist
thread. Manual downloads stay unclassified by design. History starts from
now — provenance can't be conjured retroactively.
API: GET /api/download-origins?origin=watchlist|playlist (paged) and POST
/api/download-origins/delete — deletes the file on disk (resolved through the
shared container/host path resolver), the matching library track row, and the
history entries; a file that refuses deletion keeps its row and reports the
error instead of lying.
UI: webui/static/origin-history.js — tabbed modal in the revamp design
language (accent light-edge, pill tabs, entry rows reusing the
library-history-entry components), per-row delete + select-all bulk delete
with honest result toasts, empty/loading states, per-tab totals.
Tests: 8 — deriver priority/shapes (incl. the exact watchlist_scanner
source_info shape and JSON-string survival), origin filtering + counts,
row fetch/delete isolation between origins, delete-track-by-path.
Users manually match an album to the regular edition, but enrichment/
repair keeps treating it as the deluxe (missing songs, renumbered tracks).
Root cause: an album has TWO identities — the enrichment match
(spotify_album_id, which manual-match sets and the worker already honors)
and a SEPARATE canonical version pin (canonical_album_id, added by #777).
The canonical pin is what track-number repair / reorganize / missing-track
detection actually read, and library_manual_match never wrote it — so it
was resolved independently and landed on the deluxe edition.
(So #777 did NOT solve #758: it added canonical pinning, but manual
matches didn't write the pin.)
Fix: a manual ALBUM match on a canonical-recognised source now also pins
AND locks the canonical version to the chosen release:
- new canonical_locked column (same migration pattern as the other
canonical cols).
- set_album_canonical(..., locked=False) gains an atomic WHERE-clause
guard: an auto write can't overwrite a locked pin; a manual write
(locked=True) always wins. get_album_canonical exposes `locked`.
- library_manual_match pins canonical for album matches via the pure
should_pin_manual_canonical(entity_type, source).
The auto resolve job already skips already-pinned albums, so the lock is
protected on two fronts; the new guard also covers any future
re-resolution. A new manual match still overrides.
18 tests: the pure gate (+ a sync-invariant test vs _ALBUM_ID_COLUMNS)
and the DB lock seam (auto can't clobber a manual lock; manual overrides;
auto-over-auto still works). Additive — locked defaults False, so the
auto path is unchanged unless a manual lock exists. Full suite clean.
Extends the manual "Import IDs from File Tags" backfill so newly-scanned
files get their embedded provider IDs pulled into the DB automatically —
no button press needed to keep up with new music.
How it works:
- insert_or_update_media_track now returns 'inserted' / 'updated' / False
(truthy-compatible; existing `if track_success` callers unaffected) so
the scan worker can tell a genuinely new row from an update.
- DatabaseUpdateWorker collects the ids it newly INSERTED this run
(self._new_track_ids) across all insert paths (Plex/Jellyfin/deep).
- After run()/run_deep_scan(), web_server calls _reconcile_after_scan(),
which gap-fills embedded IDs for just those new tracks. Runs as a
post-scan pass (the scan loop itself is untouched/fast — the media
server API never exposes these custom IDs, so the file must be read
once regardless; batching at the end keeps it out of the hot loop and
best-effort so it can never abort a scan). A progress phase ("Reading
file tags for N new tracks…") surfaces the full-refresh tail.
Shared engine:
- New reconcile_library() in core does the paging + lazy parent-map
loading (only loads albums/artists actually referenced — cheap when
scoped to a few new tracks) + per-page commits. BOTH the manual button
and the scan hook call it, so there's one tested orchestration, no
duplication. The backfill job was refactored onto it.
Same hardened safety: gap-fill only, atomically guarded against
overwrite, schema-introspected, idempotent. Scoped to new arrivals for
incremental/deep; full refresh re-inserts everything as new (recovering
the IDs a full-refresh wipe destroys).
+10 reconcile tests (reconcile_library scope/idempotency/progress/stop +
the engine). Full suite clean (only pre-existing soundcloud /app env
failures remain).
Ships the source-id cleanup to all users: a marker-gated one-time migration in
MusicDatabase init clears any source id (deezer/spotify/itunes/musicbrainz/
discogs/audiodb/qobuz/tidal) shared across differently-named artists — the
enrichment-corruption signature. Same-name cross-server duplicates are left
untouched (DISTINCT-name check). Cleared rows re-derive correct ids on the next
enrichment pass; the now name-guarded workers won't re-corrupt.
Runs once (CREATE TABLE _source_id_dedupe_v1 marker), idempotent, per-column
try/except so a missing column can't abort it. Test forces a re-run and asserts
corruption is cleared while a legit same-name dup survives.
Find & Add on the playlist-sync page only wrote sync_match_cache, which is
DELETEd wholesale after every DB scan — so the source->library pairing (and
the user's manual matches) reverted to 'extra'/red-dot on the next shallow
scan. The three match stores (sync_match_cache, manual_library_track_matches,
discovery extra_data) were disconnected and all pointed at tracks.id, which a
rescan re-keys (esp. Jellyfin/Navidrome GUIDs).
Unify the match so it's one durable fact, recorded once, honored everywhere:
- Find & Add also writes a durable manual_library_track_matches row (one-way;
the manual-match tool has no playlist to act on, so no reverse). Carries the
library file path.
- New library_file_path column (idempotent migration) + find_track_id_by_file_path:
re-resolve a stale library_track_id after a rescan re-keys the track, and
self-heal the row.
- The sync compare display's override lookup now falls back to the durable
manual match (resolve_durable_match_server_id) when sync_match_cache misses —
so the pairing persists across a scan instead of reverting to a red dot.
Purely additive: only adds matches when the cache returns nothing.
Tests: durable resolver (valid / stale-reresolve+self-heal / no-match / not-in-
playlist / missing-methods), file_path persistence + find_track_id_by_file_path.
resolve_mirrored_playlist tried the mirrored-playlists primary key FIRST for
any all-digit ref. Deezer upstream ids are all-numeric, so a Deezer playlist id
was mistaken for the PK and the organize-by-playlist toggle resolved a wrong row
(or nothing) — the toggle silently wouldn't save / 'Open in Mirrored' missed.
Resolve by (source, source_playlist_id) first, fall back to PK only when the
source lookup misses. Thread the batch/wishlist source through the download-path
callers so numeric upstream ids resolve correctly there too. Spotify (base62
ids) is unaffected.
Seam tests: numeric Deezer id resolves by source (not PK), spotify alphanumeric
by source, PK fallback still works, profile-scoped, empty refs -> None.
Adds get_recommendation_sources() — for each recommended similar artist it
resolves the polymorphic similar_artists.source_artist_id back to the display
names of the user's OWN artists (library + watchlist) that list it, by matching
against every provider-id column on both tables. The /api/discover/similar-artists
endpoint now attaches a 'because' array per recommendation so the UI can show
'because you have X, Y, Z' instead of just a count.
Seam tests cover: library + watchlist resolution across different provider-id
columns, dedup + name-sort, max_per cap, orphan source omission, profile scoping.
Closes the gap where similar artists only existed for WATCHLIST artists: a new
background worker populates them for the whole LIBRARY, slotting into the
existing enrichment-worker pattern (bubble + Manage Enrichment Workers modal,
status/pause/resume, matched/not_found/pending/errors).
Per source-matched library artist → get_musicmap_similar_artists(name, 25)
(the same matcher the artist-detail page uses: fetches MusicMap names, matches
each to the user's source chain — primary + active fallbacks — returns only
matched artists) → store via add_or_update_similar_artist keyed by the artist's
metadata source id, the SAME key the watchlist scanner + artist map use, so the
two cooperate (idempotent upsert + retry_days window).
- core/similar_artists_worker.py: pure seams (pick_source_artist_id,
map_payload_to_store_kwargs, process_artist) + the threaded worker; skips
artists not yet source-matched; classifies not_found vs transient error
(retry after 30d).
- DB migration: similar_artists_match_status / _last_attempted on artists
(mirrors every other source worker's tracking columns).
- Registered in EnrichmentService + instantiated in web_server, DEFAULT-PAUSED
(opt-in) like Amazon — MusicMap is scraped/outage-prone + this is library-wide.
- SERVICE_ENTITY_SUPPORT['similar_artists']=('artist',) so the modal breakdown
('artists with / without similars') + Retry work; manual-match (inapplicable
to a relationship) is gated out via relationship:true.
- 10 seam tests; existing 80 enrichment tests still pass.
Note: keys under profile 1 (single-profile setups); multi-profile is future work.
Persist organize_by_playlist on mirrored playlists and run playlist-folder
downloads from the auto-sync pipeline instead of the global wishlist phase.
Register SoulSync library rows after playlist-folder post-processing, route
failed organize batches to the wishlist correctly, and skip sync-time
unmatched wishlist only when organize download handles retries.
Invalidate stale playlist track caches on refresh (Spotify and Deezer ARL),
re-mirror on refetch, and improve standalone playlist modals (re-analysis,
Open in Mirrored). Add filesystem missing-track detection and tests.
Co-authored-by: Cursor <cursoragent@cursor.com>
Fixes a correctness bug and adds bulk re-queuing.
- Bug: per-row 'Retry' used clear-match, which sets an item to not_found
with last_attempted=NULL. The worker only retries not_found items where
last_attempted < (now - 30d), and 'NULL < cutoff' is false in SQLite, so
those items were never re-queued. Fixed by resetting match_status to NULL
(pending), which every worker's queue picks up on the next pass.
- New POST /api/enrichment/<id>/retry with scope 'item' | 'failed'
(failed = re-queue every not_found item of an entity type), backed by a
pure whitelisted build_reset_query + MusicDatabase.reset_enrichment().
- UI: per-row Retry now hits /retry; a 'Retry all failed' bulk button appears
when the current entity has not-found items (confirm + count toast); a hint
line explains retry/match/auto-retry behaviour.
- 11 new tests (38 enrichment tests total, all green).
Dashboard 'enrichment bubbles' could pause/hover but offered no way to
*manage* a worker. This adds a full management modal opened from a new
header button, covering all 11 enrichment sources.
Backend (testable core helper + seam tests; no live-DB dependency):
- core/enrichment/unmatched.py: pure, whitelisted SQL builders for the
unmatched browser. service/entity validated against a support map (never
interpolated raw); search + pagination bound as params; tracks join albums
for artwork; limit capped at 200.
- database/music_database.py: get_enrichment_unmatched() +
get_enrichment_breakdown() (the breakdown splits matched/not_found/pending,
which the existing get_stats().progress lumps together).
- core/enrichment/api.py: GET /api/enrichment/<id>/{unmatched,breakdown} on
the existing blueprint + a db_getter hook.
- web_server.py: wire db_getter=get_database.
- tests/enrichment/test_unmatched.py: 19 tests across builders, DB methods,
and Flask routes.
Frontend (vanilla, matches app conventions):
- webui/static/enrichment-manager.js: worker rail with live status + coverage
micro-bars, accent-themed detail panel (hero header, segmented matched/
not_found/pending stat cards, current item, pause/resume), and a searchable
paginated unmatched browser with inline manual match (reusing
search-service + manual-match) and retry (clear-match re-queues).
- Polish: entrance/exit motion, scroll-lock, Escape, refresh control,
flicker-free polling (in-place updates), skeleton loaders, relative
timestamps, per-worker accent theming, real dashboard logos reused at
runtime (with the same invert/circle treatment), responsive rail.
- index.html: header button + script include. style.css: full styling.
Reuses existing pause/resume, status, and manual search+assign endpoints.
Backend tests green (19 new + 11 existing enrichment tests).
Turns the Stage-1 scorer into an end-to-end resolver + persists the result.
Still DORMANT — no consumer reads it yet, so zero behavior change.
- core/metadata/canonical_resolver.py — resolve_canonical_for_album(): builds
candidate releases from the album's per-source IDs (in source-priority order),
fetches each tracklist via an INJECTED fetch_tracklist (so it's unit-testable
without live APIs), scores them with pick_canonical_release, and returns the
best-fit {source, album_id, score}. Skips sources with no id / failed fetch;
returns None when there are no files, no candidates, or nothing clears the
confidence floor.
- database/music_database.py — set_album_canonical() / get_album_canonical()
write/read the Stage-1 columns. get returns None when unresolved, which every
consumer will treat as "fall back to today's behavior".
Tests: tests/test_canonical_resolver.py (7) — best-fit beats priority, priority
breaks true ties, skips missing-id/failed-fetch sources, None on
no-candidates/no-files/below-floor, score rounding. tests/test_canonical_db.py
(4) — set/get round-trip incl. timestamp, unresolved -> None, overwrite,
missing-album -> False. 34 canonical + DB-migration tests pass.
Remaining for Stage 2 (the trigger): read on-disk file durations/titles for an
album, gather its source IDs, call the resolver, store — wired via a backfill
repair job + an enrichment hook. Then Stages 3-4 wire the Reorganizer and Track
Number Repair to READ the pinned canonical.
First stage of the canonical-album-version fix (#765 + #767-Bug2). Pins ONE
canonical (source, album_id) per album, chosen by best-fit to the user's actual
files, so the Reorganizer, Track Number Repair, and tagging stop re-resolving
independently and contradicting each other.
Ships DORMANT — nothing reads or writes the new data yet, so zero behavior
change. Later stages populate (Stage 2) and consume (Stages 3-4) it.
- core/metadata/canonical_version.py — pure scorer (the testable heart):
score_release_against_files() rates a candidate release by track-count fit +
duration alignment (greedy nearest within ±3s) + title overlap, dropping and
renormalizing missing signals so it never crashes on sparse metadata.
pick_canonical_release() takes candidates in source-priority order, picks the
best fit, breaks ties toward the earlier (higher-priority) candidate so the
choice is DETERMINISTIC — that determinism is what makes every tool agree
(#765), while count/duration fit picks the right EDITION (#767-Bug2). A
confidence floor (default 0.5) means a low-confidence guess is never pinned.
- database/music_database.py — additive, nullable columns on albums
(canonical_source / canonical_album_id / canonical_score /
canonical_resolved_at), guarded by the existing PRAGMA-table_info pattern.
NULL = unresolved = every consumer falls back to today's behavior.
Tests: tests/test_canonical_version.py (11) — edition discrimination (11 files
-> standard, 17 -> deluxe), deterministic priority tiebreak, duration
disambiguation on count ties, graceful degradation (no durations / counts only /
fuzzy titles), confidence floor, empty-input safety. tests/test_canonical_
columns_migration.py (4) — fresh DB has the columns, they're nullable w/ NULL
default, migration is idempotent, and it ALTERs them onto an old albums table.
60 DB/schema regression tests still pass.
Tracks NOT in the library were matched to a DIFFERENT song by the SAME artist
and reported with high confidence instead of as missing — e.g. "Dani
California" -> "Californication" (Red Hot Chili Peppers), "Under The Bridge"
-> "Around the World".
Root cause: _calculate_track_confidence scores 0.5*title + 0.5*artist. A
same-artist comparison always yields artist = 1.0, so the title score is the
only thing that can tell two of an artist's songs apart — but that score is a
SequenceMatcher CHARACTER ratio, which over-credits unrelated titles that
share a long substring ("californi…" = 0.67) or just a stopword ("the" =
0.62). With the flat 0.5 artist term, anything clearing the weak 0.6 char
floor lands at ~0.81-0.83, well over the 0.7 sync threshold. Reproduced on
dev: both reported pairs score 0.81/0.83.
Fix: new core/text/title_match.py:titles_plausibly_same, called in
_calculate_track_confidence right before the floor. It accepts a pair only
when it's near-identical char-wise (>=0.85, so typos / punctuation / casing
like "Beleive"->"Believe", "HUMBLE."->"Humble" still match) OR the titles
share at least one significant (non-stopword) word. Two different songs by the
same artist share no content word, so they're rejected and the real track is
correctly reported missing. ("the" is a stopword — that's what leaked "Under
The Bridge"/"Around the World".)
Scoped deliberately: the word-overlap test fires ONLY when at least one side
has 2+ content words. For single-word titles there is no other word to share,
so it defers to the existing char floor — otherwise legitimate stylized
spellings ("Grey"/"Gray", "Tonite"/"Tonight", "4ever"/"Forever") would become
new false-negatives. Verified those still match. The few single-word variants
that do score low (Ok/Okay, Thru/Through) were already rejected by the
pre-existing length-ratio penalty, not by this gate.
Both reported false positives now score 0.33/0.31 -> missing. Does NOT address
the harder case of two different same-artist songs that DO share a content
word (e.g. "Believe"/"Believer") — pre-existing and unworsened. Any residual
error fails safe: a false-missing is re-downloaded/wishlisted, vs the old
behavior which silently substituted the wrong song.
Tests: tests/test_title_match_guard.py (14) — pure-guard unit tests + a
13-pair battery driving the REAL _calculate_track_confidence (genuine matches
stay >=0.7, same-artist different songs drop below), plus an explicit
no-regression test for stylized single-word spellings. 292 matching/sync tests
pass.
Nothing was landing in the metadata cache browser because the
metadata_cache_entities / metadata_cache_searches tables did not exist, so every
cache write no-op-ed. Root cause: _add_metadata_cache_tables short-circuited on a
marker-only guard (if the metadata_cache_v1 marker row exists, return). After a
DB corruption-recovery the small metadata table (with the marker) survived but
the large cache tables did not, so the stale marker permanently blocked the
idempotent CREATE TABLE IF NOT EXISTS and the cache was dead forever.
Guard now skips only when the marker is set AND the tables actually exist, so a
stale marker self-heals: the tables are re-created on the next init.
Tests: marker present but tables dropped -> re-created; marker + tables present
-> no-op (idempotent).
The save endpoint coerced library_track_id with int(), which rejected
every non-numeric id with "Invalid library track id". Library ids are
str(ratingKey) — numeric for Plex but GUIDs/hashes for Navidrome,
Jellyfin, and other Subsonic servers — and are stored in the TEXT
tracks.id column, so the coercion broke manual matching on every
non-Plex server.
Replace the int() coercion with a normalize_library_track_id() helper
that trims and rejects only empty input, passing the opaque string id
straight through. Plex numeric ids are unaffected (SQLite INTEGER
affinity still stores a clean numeric string as an int, so existing
matches are byte-identical) and no schema migration is needed (the
INTEGER column already stores non-numeric ids as text).
Tests: pure-helper cases (numeric/GUID/whitespace/empty) plus a real-DB
round-trip proving a GUID id saves, reads back unchanged, and enriches.
listening_history was populated ONLY from the media server; the web player
recorded nothing. Now a play heard ~10s logs to listening_history AND bumps
tracks.play_count/last_played — so the existing 'recently played' query reflects
actual SoulSync listening, and the Phase-2 smart-radio recency signal gets real
data.
- core/playback/play_log.build_play_event(): pure, DB-agnostic normalizer from
player payload -> listening_history event shape. Caller supplies the
timestamp (stays pure). Composite/streamed ids never become the int
db_track_id; bool ids rejected; missing title -> skip. 9 unit tests.
- MusicDatabase.record_web_player_play(): inserts the history row + increments
play_count/last_played for the library track in one call.
- /api/library/log-play: thin endpoint, server-side timestamp, best-effort
(logging failure never 500s / never affects playback).
- Frontend: npMaybeLogPlay on timeupdate fires once per track at the 10s
threshold (flag reset in setTrackInfo, set-before-fetch so it can't
double-fire), fully fire-and-forget.
Pure builder is unit-tested; the DB write can't run in-sandbox (real DB throws)
so it's a thin straightforward insert+update. JS + web_server parse clean.
Replaces radio's pure ORDER BY RANDOM() with weighted ranking. Each tier now
fetches a generous random POOL (4x the needed count, floored) and
core/radio/selection ranks it before the collector keeps the best:
score_candidate = play_count(log-damped, w=1.0)
+ lastfm_playcount(log-damped, w=0.5)
- recently_played penalty(w=2.0)
+ stable per-id jitter(w=1.0, hash-derived so runs vary but
tests stay reproducible)
Modest weights so popularity guides without burying lesser-played tracks, and
jitter keeps radio from being identical every run. All intelligence is in pure
functions (rank_candidates / score_candidate) so it's tunable + unit-testable
without SQL.
Defensive: the DB method probes PRAGMA table_info(tracks) and omits
play_count/lastfm_playcount from the SELECT when absent (older DBs predating
the listening-history migration) — the scorer treats missing signals as 0, so
radio degrades to jitter-only instead of crashing on 'no such column'.
Tests (tests/radio/, 43 total):
- score_candidate / rank_candidates: deterministic unit coverage (popularity
ordering, lastfm contribution, recency penalty, garbage→0, stable jitter).
These CANNOT pass against pre-Phase-2 code.
- DB end-to-end: ranking surfaces the heavily-played track first out of a
decoy pool (wiring proof — probabilistic vs old random, documented honestly);
plus a no-rank-columns DB proving the defensive degrade path.
- All Phase-0a behavioral/refactor-equivalence tests still green.
60 radio + adjacent-DB tests pass; ruff clean.
First step of the stream/player/radio revamp (see revamp_plan.md). The radio
algorithm lived inline inside database.music_database.get_radio_tracks as raw
SQL tangled with selection logic — untestable without a live DB (which also
throws in the dev sandbox). Lifted the pure DECISIONS into core/radio/selection.py:
- parse_tags / merge_tags — JSON-or-CSV tag fields → ordered deduped list
- same_artist_cap — tier-1 30%-floored-at-5 cap
- build_like_conditions — OR-of-LIKEs SQL fragment + params per tier
- RadioCollector — dedup + cap + exclude-set + NOT-IN placeholder/value tracking
The DB method keeps the cursor work and now delegates every decision to these
helpers. Faithful extraction, not a rewrite — behavior unchanged.
This is the kettui foundation move: radio is now unit-testable, so Phase 2
(smart ranking — play-count / recency / feature seeding) becomes 'evolve a
tested function' instead of 'rewrite SQL and pray'.
Tests (tests/radio/):
- test_selection.py (22): unit coverage of every extracted helper
- test_get_radio_tracks_db.py (7): drive the REAL get_radio_tracks against
in-memory sqlite — tier fallback, dedup, exclude, file_path filter.
Behavior-pinned: these 7 pass against BOTH old inline and new extracted
code (refactor-equivalence proof). 52 adjacent DB+radio tests green.
Migration state was scattered across PRAGMA-table_info guards, sentinel marker
tables (_genius_search_fix_applied, ...) and metadata-flag rows
(id_columns_migrated, ...), with no single source of truth and no schema
version — so a half-migrated DB was undetectable.
Add a non-gating backstop: a schema_migrations(name, applied_at) ledger plus a
_sync_migration_ledger pass (runs last in init) that back-fills the ledger from
the existing signals and stamps PRAGMA user_version. ADDITIVE only — existing
migrations keep their own idempotency gates; nothing decides whether a
migration runs based on the ledger or the version. New one-time migrations call
_record_migration (the genres migration already does).
Tests: tests/test_db_migration_ledger.py — table exists, user_version stamped,
record idempotent, genres recorded on fresh init, backfill from flag + marker,
absent signals not recorded.
artists.genres / albums.genres stored EITHER a JSON array (new writes) OR a
legacy comma-separated string (old writes), forcing every reader to
try-JSON-then-split. Add a marker-gated one-time migration
(_normalize_genres_to_json) that rewrites legacy rows to JSON in place,
mirroring the readers' exact parse (JSON list, else comma-split/strip/
drop-empties) so genre VALUES are unchanged — only the storage format.
Per-row diffed (already-canonical rows untouched, no churn) and non-fatal on
error, consistent with the other migrations. Readers still tolerate both
formats, so this breaks nothing; it just removes the dual-format debt.
Tests: tests/test_db_genres_json_normalization.py — CSV->JSON, JSON-unchanged,
whitespace/empties dropped, albums table, legacy-reader-equivalence,
idempotent re-run, marker set on fresh init.
amazon_artist_id is added to watchlist_artists via ALTER (music_database.py
~1732), but both table-rebuild migrations — the spotify_id-nullable fix
(_fix_watchlist_spotify_id_nullable, two CREATE variants) and the
profile-scoped UNIQUE rebuild — recreated the table from a hardcoded column
list that omitted amazon_artist_id. Because shared_cols filters new_cols
against the old table, the column and any stored Amazon artist IDs were
silently dropped on every init (fresh OR upgraded), so Amazon watchlist IDs
never persisted at all.
Fix: add amazon_artist_id to all three rebuild CREATE schemas, both rebuild
new_cols lists, and the base CREATE TABLE (so fresh installs are consistent
and don't rely on the ALTER). Purely additive, column-named inserts + Row
factory mean column position is irrelevant.
Tests (tests/test_db_watchlist_amazon_id_migration.py): drive the real
migrations via MusicDatabase() against a seeded pre-migration temp DB and
assert the column + data survive; differential-proven to FAIL pre-fix.
Opening a library artist from a non-library search result (e.g. a
MusicBrainz hit) leaves the artist-detail page holding the source ID —
the MBID — not the integer library PK. The standard /api/artist-detail
route resolves that via find_library_artist_for_source, but the
enhanced-view (`/api/library/artist/<id>/enhanced`) and quality-analysis
endpoints call get_artist_full_detail directly with whatever ID the page
holds. Its lookup was `WHERE id = ?` only, so it 404'd ("Artist with ID
<mbid> not found") and the enhanced view failed to load.
When the direct PK lookup misses, fall back to matching any per-service
ID column, reusing SOURCE_ID_FIELD as the single source of truth so the
resolution covers every source (MusicBrainz, Spotify, Deezer, iTunes,
Discogs, Hydrabase, Amazon), not just MusicBrainz.
Adds 4 isolated DB-method tests: direct PK still works, resolves by
MBID, resolves by Spotify ID, and unknown IDs still 404.
Expose Navidrome album coverArt as a Subsonic getCoverArt thumbnail so library refreshes keep a real album-art URL. Preserve existing album thumb_url when an incoming server album has no thumbnail, preventing manual or server-corrected covers from being cleared and later replaced by loose missing-cover searches. Add regression tests for Navidrome album thumbnails and DB thumb preservation.
Three problems wrapped into one pass on the Playlist Auto-Sync surface:
1. Visual: the manager modal had its own vibe (radial gradient, pill
tabs, sky-blue chrome) that didn't line up with the rest of the
app. Reworked the modal shell, KPI summary, live pipeline monitor,
tab bar, schedule board sidebar, and column cards to use the
standard SoulSync patterns — gradient `#1a1a1a → #121212`,
accent-tinted 1px border, 20px radius, underline tabs, dense dark
card pattern that Automations + Library pages already use. Modal
now uses near-full screen so there's room for the schedule board
without horizontal scroll pain. Run history cards followed the
same path: slim horizontal row mirroring `.automation-card` plus
an expanded detail that mirrors the Automations run-history modal
(stats-grid + facts row + result pills + log section).
2. Hang: the previous SQL fix for the run-history "in library" count
added `COLLATE NOCASE` on the join columns of `tracks` and
`artists`. SQLite can't use `idx_artists_name` or `idx_tracks_title`
when the comparison collation doesn't match the column collation,
so the join did a full table scan per mirrored playlist track.
~18s per playlist × 30 playlists = `/api/mirrored-playlists` hung
indefinitely and the modal stayed at "Loading schedule…" forever.
Switched the join back to case-sensitive equality (~6ms per
playlist, 3000× faster). Spotify names canonicalize to the same
form as library imports so the recall loss is in the rounding
error of pure case-only mismatches.
3. Slowness: even after the hang fix, each modal open spent ~1.5s
gathering per-playlist status counts. The endpoint looped
`get_mirrored_playlist_status_counts(playlist_id)` per row, which
opened a fresh SQLite connection + PRAGMA setup each time. Added
`get_all_mirrored_playlist_status_counts(profile_id)` which
returns counts for every mirrored playlist owned by the active
profile in 4 batched `GROUP BY` queries over a single connection.
Modal load dropped to ~280ms.
Also fixed: `tracks.artist` reference in `get_mirrored_playlist_status_counts`
that never worked since the schema went relational — the query threw
"no such column", got swallowed by the try/except, and the in-library
count silently defaulted to 0 on every playlist. Rewired to join
through `artists`.
`get_mirrored_playlist_status_counts` (single-playlist) kept for
callers that still want it, but the modal endpoint uses the batched
version.
Persist per-playlist pipeline run snapshots from the shared playlist pipeline, expose a history API, and upgrade the Auto-Sync modal with live pipeline monitoring, Run now controls, and a runs-style history tab.
The Auto-Sync schedule board was detecting its own automations by
checking `group_name === 'Playlist Auto-Sync' || name.startsWith('Auto-Sync:')`.
That's fragile — renaming the row from the Automations page silently
hands ownership back to the read-only Automation Pipelines tab and the
board stops managing it.
This commit replaces the string convention with an explicit
`automations.owned_by` TEXT column:
- Migration `_add_automation_owned_by_column` adds the column and
backfills `'auto_sync'` for existing rows that match the legacy
`group_name`/`name`-prefix pattern, so users running the migration
don't lose their schedules.
- `database.create_automation` and `database.update_automation` accept
`owned_by` (the latter via its `allowed` kwarg set).
- `core/automation/api.py` forwards `owned_by` on both POST and PUT.
Missing field is left as None, preserving today's behavior for every
caller that doesn't opt in.
- The Auto-Sync schedule board posts `owned_by: 'auto_sync'` and the
detection helper now prefers that signal, falling back to the legacy
name/group convention so any hand-rolled rows still show up.
Tests: three new cases in `tests/automation/test_automation_api.py`
covering create-with-owned-by, create-without (defaults to None), and
update set/clear. The fake DB grew the matching kwarg.
Three changes folded into one perf+cleanup pass:
1. Indexed fast path for the per-artist pool fetch. The previous
`search_tracks(artist=name)` call hit
`unidecode_lower(artists.name) LIKE ?`, a function-in-WHERE that
can't use `idx_artists_name`. New `MusicDatabase.get_artist_tracks_indexed`
does a two-step lookup: exact-name match (indexed) plus a
case-insensitive fallback, then `tracks WHERE artist_id IN (...)`
via `idx_tracks_artist_id`. Drops per-artist fetch from seconds to
milliseconds for the common case. The sync helper falls back to
the old LIKE-based `search_tracks` only when the indexed lookup
finds nothing, preserving diacritic recall and `tracks.track_artist`
feature-artist matches with zero regression.
2. Public text-normalization helper. Lifted the body of
`MusicDatabase._normalize_for_comparison` into
`core/text/normalize.py:normalize_for_comparison` so callers outside
the database layer (matching engine, sync pool, future import-side
comparisons) don't reach across the module boundary into a
leading-underscore "private" method. The DB method now delegates,
so existing internal call sites stay untouched. Sync's lazy pool
now imports the public helper.
3. Artist-name walker extracted. `_artist_name` at module level in
`services/sync_service.py` replaces two near-identical inline
str-or-dict-or-fallback walkers (one in `sync_playlist`, one in
`_find_track_in_media_server`). Returns `''` for None instead of
the literal string `'None'`.
Plus three small tidies from the same review:
- `_POOL_FETCH_LIMIT = 10000` constant in place of the literal at the
pool-fetch call site.
- Trimmed the verbose docstring + comment block on the pool helper.
- Set-intersection predicate for the trigger-shape reset in
`core/automation/api.py` instead of a two-line `or` chain.
Also removed the duplicate `_get_active_media_client()` call at
sync_service.py:212/214 — pre-existing wart that was sitting in the
same block I was editing.
Tests: 21 new tests across `tests/database/`, `tests/sync/`, and
`tests/text/`, plus updates to the existing pool tests to cover the
new fast/fallback split. Full suite stays green (3953 passing).
Centralize mirrored playlist source reference normalization so edited links and IDs are stored consistently. Preserve URL-backed refresh refs, surface missing-source refresh failures, count background sync failures in pipeline summaries, and retry guarded automation skips after a short delay instead of losing a scheduled run. Add focused coverage for source refs, mirrored playlist source updates, refresh failures, and guarded retry behavior.
Keep full refresh moving when post-clear VACUUM hits a transient disk I/O error, and retry clear_server_data once when the clear step itself sees the same transient SQLite failure.
Retry metadata cache maintenance writes once on transient disk I/O errors so first-attempt cache jobs do not fail when an immediate retry would succeed.
Tests cover best-effort VACUUM, clear retry behavior, and cache maintenance retry behavior.
Ensure upgraded databases have the tracks.file_size and albums.api_track_count columns after all legacy migrations run. Add defensive repair paths for Jellyfin track imports and album track-count caching so stale schemas self-heal instead of dropping full-refresh track imports.
Tests cover legacy schema repair and api_track_count self-repair.
Add MusicBrainz watchlist artist ID storage, badges, linked-provider editing, and per-artist preferred source support.
Backfill watchlist MusicBrainz matches from already-enriched library artists so existing MusicBrainz worker matches appear in watchlist cards and settings.
Extend bulk watchlist add, liked artist matching, artist map source picking, and service status labels to recognize MusicBrainz, with regression tests for watchlist ID persistence and backfill.
Register MusicBrainz as a first-class metadata source alongside Deezer, iTunes, Spotify, Discogs, and Hydrabase. Expose the shared client through metadata services, add the settings option, and expand the MusicBrainz search adapter with source-compatible artist, album, track, and detail methods.
Carry MusicBrainz IDs through similar-artist discovery, recommended artists, artist map serialization, and personalized playlist selection. Update DB migrations and lookup filters so similar_artist_musicbrainz_id is preserved on older schemas and used for source requirements and library exclusion.
Normalize MusicBrainz album adapter output for import context and add regression coverage for registry mapping, typed album conversion, and similar-artist filtering. Verified by user with 120 focused tests passing.
Manual matches can be created from sync history as mirrored while wishlist and download flows later see the same track as wishlist or a provider source. Add a shared track-level lookup that falls back from exact source/id to source_track_id and title/artist, then use it for wishlist adds, cleanup, and download analysis so mapped tracks are not re-added or redownloaded.
Add coverage for mirrored-source matches being honored by wishlist cleanup and download batches, including the internal wishlist force-download path.
Show actionable missing album tracks in the enhanced library from canonical metadata, with a practical Manage flow for Add to Library or I Have This.
Implement I Have This as a non-destructive copy/import path: copy the chosen existing file, run normal post-processing with the missing track context, insert the real library row, and inherit album identity tags from target siblings so Navidrome does not split albums.
Improve the modal with selectable search results, visible import progress, disabled controls during import, and missing-track row styling.
- Artist cards, hero section, and enhanced view now show Amazon Music badges
when amazon_id is populated (AMAZON_LOGO_URL constant, orange #FF9900 brand)
- Enhanced view artist and album match status rows include amazon_match_status
chip with click-to-rematch via openManualMatchModal
- getServiceUrl: added amazon (album/track ASIN → music.amazon.com) and fixed
missing discogs entries; serviceLabels adds tidal/qobuz/amazon
- Enhanced view enhanced-artist-id-badges includes amazon_id entry
- DB SELECTs for library artists list and artist detail now return amazon_id;
both response dicts include the field
- watchlist_artists migration adds amazon_artist_id column
- Watchlist config GET: amazon_artist_id in SELECT/WHERE/response (index 18)
- Watchlist artists list response includes amazon_artist_id
- link-provider endpoint: amazon added to valid_providers and col_map
- _populateLinkedProviderSection: amazonId param + Amazon Music source row
- Watchlist card source badges render Amazon pill (watchlist-source-amazon CSS)
- _openSourceSearch labels map includes amazon
- service_search: amazon_worker injected via init(); _search_service amazon branch
uses search_artists/albums/tracks, same {id,name,image,extra} return shape
- _SERVICE_ID_COLUMNS: amazon → amazon_id for artist/album/track
- _init_service_search call passes amazon_worker_obj
- amazon_client._fetch_album_metas: 5-minute TTL cache per ASIN — cached hits
skip _rate_limit() and HTTP call entirely; fixes ~10s artist detail load
- registry.py: removed amazon from METADATA_SOURCE_PRIORITY and
METADATA_SOURCE_LABELS — T2Tunes has no discography API, cannot serve as a
primary metadata source; Amazon remains a download source + ASIN enricher
- Settings metadata source dropdown and help text updated accordingly
Background worker matching library artists/albums/tracks to Amazon ASINs
via T2Tunes search. Follows same 6-tier priority queue as Deezer/iTunes/
Spotify/Qobuz/Tidal workers. Backfills artist thumbnails from album cover
stand-ins (T2Tunes exposes no direct artist images).
- core/amazon_worker.py: new AmazonWorker class with full parity
- database/music_database.py: expand _add_amazon_columns to cover
amazon_id/amazon_match_status/amazon_last_attempted on artists,
albums, and tracks (was artists-only)
- web_server.py: import, init, register in enrichment panel, add to
scan pause/resume dicts and rate monitor key map
- helper.js: WHATS_NEW 2.5.3 entry for enrichment worker
Schema: ALTER TABLE artists ADD COLUMN amazon_id TEXT with index, added via
_add_amazon_columns migration called after Discogs in _run_migrations.
SOURCE_ID_FIELD: add "amazon" -> "amazon_id" entry. find_library_artist_for_
source now looks up Amazon artists by slug before falling back to name match,
same as every other source. artist_source_detail already stamps artist_info
[source_id_field] = artist_id so the amazon_id is set on source-only payloads.
Tests: add "amazon": "amazon_id" to EXPECTED_SOURCE_ID_FIELD; revert test
assertion back to strict equality (SOURCE_ONLY_ARTIST_SOURCES == SOURCE_ID_
FIELD.keys() holds again now that amazon has a column).
Snapshots now track when their source data changes. Watchlist scan
emits stale flags on the playlists whose underlying pool just got
refreshed; the next pipeline run sees the flag and regenerates the
snapshot before syncing, so the server playlist never lags the source.
Schema:
- new `is_stale INTEGER NOT NULL DEFAULT 0` column on
`personalized_playlists`, plus an idempotent ADD COLUMN migration
in `ensure_personalized_schema` for installs created before this PR.
- `PlaylistRecord.is_stale: bool = False` exposed on the dataclass so
callers can branch on freshness without re-querying.
Manager:
- new `mark_kinds_stale(kinds, profile_id=None)` flips the flag in
bulk for a list of kinds (used by upstream data refreshers).
- `_persist_snapshot` clears `is_stale = 0` on successful refresh.
- SELECT statements + `_row_to_record` updated to read the column
(with tuple-form length guard for safety).
Pipeline:
- `_build_payloads_for_kinds` now branches: refresh_first=True OR
`existing.is_stale` -> refresh_playlist, else read existing
snapshot. So the auto-refresh kicks in without needing the user to
toggle the refresh-each-run option.
Watchlist scanner emits stale flags at three sites:
- after `update_discovery_pool_timestamp` -> marks pool-fed kinds
stale: hidden_gems, discovery_shuffle, popular_picks, time_machine,
genre_playlist, daily_mix.
- after release_radar `save_curated_playlist` -> marks `fresh_tape`.
- after discovery_weekly `save_curated_playlist` -> marks `archives`.
All three calls go through a module-level `_mark_personalized_kinds_stale`
helper that builds a PersonalizedPlaylistManager with `deps=None` (only
DB access is needed for the flag update — no generator dispatch). Each
call is wrapped in try/except so a flag failure can never abort the
scan itself.
Tests:
- new `TestStaleFlag` class in `test_personalized_manager.py` (6
tests): default-false, single-kind flip, multi-kind, profile
scoping, refresh-clears, empty-list noop.
- two new pipeline tests pin the auto-refresh dispatch:
`test_stale_snapshot_auto_refreshes_even_without_refresh_first`
and `test_non_stale_snapshot_skips_refresh`.
- existing stub-manager `SimpleNamespace` returns gained
`is_stale=False` so the new attribute read doesn't AttributeError.
Full suite: 3391 pass.
User-facing WHATS_NEW entry added under 2.5.2 (above the prior
pipeline auto-sync entry) describing the auto-refresh behavior.
Begins the standardization of the personalized-playlist subsystem.
Pre-existing state was a patchwork: Group A (Fresh Tape / Archives /
Seasonal Mix) lived in `discovery_curated_playlists` and
`curated_seasonal_playlists` with inconsistent shapes; Group B
(Hidden Gems / Discovery Shuffle / Time Machine / Popular Picks /
Genre / Daily Mixes) was computed on-demand by
`PersonalizedPlaylistsService` with no persistence -- every call
reran the generator with `ORDER BY RANDOM()` so results rotated.
Post-overhaul (this PR) every personalized playlist lands in one
unified storage layer with stable identity, persistent track lists,
explicit refresh, and per-playlist user-tweakable config.
Foundation in this commit (no behavior change yet):
- `database/personalized_schema.py`: 3 tables created idempotently
at app startup (wired into `MusicDatabase._initialize_database`).
- `personalized_playlists`: one row per (profile, kind, variant)
with config_json, track_count, last_generated_at,
last_synced_at, last_generation_source, last_generation_error.
Variant '' (empty string) for singletons; non-empty for
time_machine / seasonal_mix / genre_playlist / daily_mix.
- `personalized_playlist_tracks`: current snapshot per playlist.
Atomically replaced on refresh.
- `personalized_track_history`: append-only log powering the
`exclude_recent_days` config knob.
- `core/personalized/types.py`: `Track`, `PlaylistConfig`,
`PlaylistRecord` dataclasses. `PlaylistConfig.merged()` for
partial-update PATCH semantics; `Track.from_dict()` accepts the
legacy generator output shape unchanged.
- `core/personalized/specs.py`: `PlaylistKindSpec` (kind,
name_template, default_config, generator, variant_resolver) and a
module-level registry. Generators register at import time;
manager dispatches by kind.
- `core/personalized/manager.py`: `PersonalizedPlaylistManager` --
the only thing that touches the new tables. Owns:
- ensure_playlist (auto-create row from kind defaults)
- get_playlist / list_playlists
- refresh_playlist (atomic snapshot replace; generator exception
preserves previous good snapshot + records error on row)
- get_playlist_tracks
- update_config (deep-merge with stored config, including extra dict)
- recent_track_ids (staleness lookup for generators)
35 boundary tests in `tests/test_personalized_manager.py` pin every
shape: config round-trip / merge semantics / extra deep-merge /
defaults; Track.from_dict tolerance + primary_id fallback chain;
registry dedup / display_name with+without variant; manager
ensure_playlist auto-create + idempotency, variant separation,
required-variant enforcement, unknown-kind error; refresh persists
+ replaces atomically + survives generator exception with previous
snapshot intact + records source from first track + round-trips
nested track_data_json; update_config patch semantics; list_playlists
profile scoping; staleness history scoped to (profile, kind, days).
3304 tests pass total. Generators ship in subsequent commits on this
branch -- each kind migrated one at a time with its own per-kind
boundary tests.
Foundation commit for issue #442 — Japanese kanji ↔ romanized name
quarantines and equivalent cross-script mismatches. MusicBrainz
exposes alternate-spelling aliases on every artist record but
SoulSync's matching never consulted them; cross-script comparison
scored 0% on raw similarity and the file got quarantined even when
MusicBrainz knew both names belonged to the same artist.
This commit only adds the column. Subsequent commits in this PR:
- Build a pure alias-aware artist comparison helper
- Wire the MusicBrainz worker to populate aliases on enrichment
- Add a live MB lookup with cache for un-enriched artists
- Wire the helper into the AcoustID verifier where the quarantine
decision actually fires
Schema change is additive (NULL default), gated by the same
`PRAGMA table_info` check the existing `_add_musicbrainz_columns`
helper uses, so re-running on databases that already have the
column is a no-op.
Verified:
- New `artists.aliases` column present in fresh DB init
- JSON round-trip works (mirrors the existing `genres` column pattern)
- No existing tests broken
Catches the silent excepts the awk-based earlier sweeps missed:
- Bare `except:` followed by `pass` (also swallows KeyboardInterrupt
and SystemExit — actively wrong). Upgraded to `except Exception as
e: logger.debug("...: %s", e)`. ~14 sites across connection_detect,
soulseek_client, listenbrainz_manager, watchlist_scanner,
youtube_client, navidrome_client, jellyfin_client, web_server.
- `except Exception:` + pass that the awk pattern missed (e.g.
multi-line or unusual whitespace). ~31 sites across automation_engine,
database_update_worker, music_database, spotify_client, web_server,
others.
- 14 legitimate cleanup sites left silent with explicit `# noqa: S110`
+ comment explaining why (atexit handlers, finally-block conn.close
calls). Logging during shutdown can itself crash because file handles
get torn down before the handler fires.
Also enables `S110` rule in pyproject.toml so this pattern fails CI
going forward — drift fails at PR review instead of at runtime against
a wedged worker thread. Tests path keeps S110 ignored (test fixtures
legitimately use try-except-pass for cleanup).
Adds a WHATS_NEW entry to helper.js summarizing the full #369 sweep.
Verified: `python -m ruff check .` → All checks passed.
Verified: `python -m pytest tests/` → 2188 passed.
Closes#369
Mostly schema-migration ALTER TABLE fallbacks (column-already-exists
is the silent expected case) plus a few cache-purge/notify-migration
spots. Same pattern as the web_server sweep: `except Exception as e:
logger.debug("...: %s", e)`.
Refs #369