Full Refresh INSERTs a per-track year (from file tags) into tracks.year, but that column
was only ever in the live INSERT — never in CREATE TABLE and never in a migration. So on
EVERY db (old and current — verified the shipped music_library.db lacks it too) every Full
Refresh track insert hard-failed with 'table tracks has no column named year', importing 0
tracks while artists/albums succeeded.
Fix (additive + nullable, nothing reads it but the writer):
- add year INTEGER to the tracks CREATE TABLE (new DBs)
- ALTER it onto existing tracks tables in _ensure_core_media_schema_columns (the repair
backstop that already runs every init), right beside the file_size repair
Tests (tests/test_tracks_year_migration.py): fresh-DB has it, nullable, idempotent, ALTERs
onto an old year-less table, and a regression that the exact Full Refresh insert fails
before the repair and succeeds after.
Re-running an export created a new LB playlist every time (LB keys on MBID, not name, and
create always mints a new one). Now remember which LB playlist a mirror was pushed to and
update it in place:
- listenbrainz_client: refactor batched-add into _add_tracks_in_batches; add
get_playlist_track_count, delete_playlist, update_playlist (verify exists -> clear items via
item/delete -> re-add -> edit title; reports gone=True if deleted on LB), and
create_or_update_playlist (update when we have a prior MBID, else create; falls back to
create if the remembered one was deleted). Stable URL/MBID across re-syncs.
- playlist_export_targets table + get/set_playlist_export_target: remember (mirror, target) -> LB MBID.
- export job consults/stores the target so push updates in place.
+6 mocked tests (clear+re-add same mbid, gone-fallback, create-or-update branches, delete). API
endpoints (item/delete, playlist/edit, playlist/delete, GET count) confirmed against LB docs;
live round-trip pending explicit auth.
Phase 3. Additive backbone for the export job:
- mb_recording_cache table (IF NOT EXISTS) + core/exports/recording_mbid_cache.py: persistent
(artist,title)->recording_mbid cache, mirrors album_mbid_cache (lazy DB, error-degrades to
miss). The MusicBrainz tail is ~1 req/s, so a resolved MBID is remembered once and reused
across every export/playlist.
- core/exports/playlist_export.py: resolve_playlist_tracks(tracks, resolve_fn) — walks tracks,
dedups repeated songs within a run (resolve once), builds the ordered pseudo-playlist, tallies
live stats (resolved/unmatched/deduped/by_source). Pure (I/O injected via resolve_fn + progress
callback), so dedup + accounting are unit-tested with no DB/network. 5 tests.
No wiring into runtime yet; nothing existing touched except the additive table.
The mirror_playlist fix only assigns stable ids to newly-imported playlists, so a
user with an existing file-import playlist would still have empty-id rows (and dead
Find & Add matches) until a manual re-import. Add an idempotent startup backfill that
assigns the SAME stable id a fresh import would to any mirrored track missing one —
so existing matches start sticking with no re-import. Runs once per db/process (the
init is guarded), only touches empty-id rows (no-op afterward), native ids untouched.
Tests: backfill fills empty ids with the exact fresh-import id, is idempotent (2nd
run = 0), and leaves native ids alone.
A Find & Add on a file-import (CSV/M3U/TXT) playlist track was silently dropped and
the track re-appeared as 'extra' (radoslav-orlov). Root cause: unlike Spotify/YouTube
(native ids), file-import + iTunes-only tracks arrive with an EMPTY source_track_id —
and the whole manual-match system keys on it. _persist_find_and_add_match is a no-op
on an empty id, and find_manual_library_match_by_source_track_id returns None for one,
so the match can be neither recorded nor looked up. That's the youtube-vs-file
difference the reporter noticed.
Fix: stable_source_track_id() derives a DETERMINISTIC 'file:<hash>' id from the track
identity (artist|title|album, normalized) when there's no native id; mirror_playlist
assigns it so the SAME song gets the SAME id across re-imports/discovery — exactly
what the match lookup needs. Native ids are used verbatim; bonus: discovery extra_data
now survives a re-import for these tracks too.
Tests: helper (native passthrough, deterministic + case/field-insensitive, distinct
per song, empty-on-no-title, file: prefix); mirror_playlist integration (file tracks
get stable distinct ids, stable across re-import, native ids untouched). 319 playlist/
sync/discovery/mirrored tests green.
Two issues behind #897:
1) Discoverability — the "Ignored" management modal (view/un-ignore/clear-all,
shipped with #874) was only reachable from the wishlist *overview modal*
footer, which most users never open. Add the same button to the wishlist
page toolbar next to Cleanup / Clear All, wired to openWishlistIgnoreModal().
2) Manual re-add silently blocked (carlosjfcasero) — the album-modal "add to
wishlist" endpoint passes source_type=album, but the ignore gate only
bypasses+clears for source_type=manual, so re-adding a previously-cancelled
track failed. We cannot just send manual: source_type drives Albums/Singles
categorisation and repair_worker legitimately uses album too. Thread an
explicit user_initiated flag (db.add_to_wishlist -> service -> album route)
that bypasses+clears the ignore while preserving the real source_type.
Regression test pins both: an automatic source_type=album add stays blocked,
the user_initiated add goes through, clears the ignore, and keeps source_type=album.
_get_artist_variations only widened the candidate fetch by diacritics, so a
request for "The Black Eyed Peas" never pulled a library track filed under
"Black Eyed Peas" (or vice-versa) — it "failed to match" and re-downloaded a
duplicate. Toggle the leading "The" in both directions when widening the fetch;
the confidence scorer (50/50 title/artist, 0.882 across the "The" gap) still has
the final say, so this can only widen what gets fetched, never merge genuinely
different artists. Mid-word "The" (e.g. "Theory of a Deadman") is untouched.
A single-use, user-designated 'which release does this track belong to' answer.
Written when the user picks a release in the Re-identify modal and the file is
staged; the import flow will read it at the top of matching and consume it.
- rematch_hints table (additive, IF NOT EXISTS + indexes) keyed on staged_path
with content_hash as a rename-proof fallback.
- core/imports/rematch_hints.py: pure DB seam over an injected cursor
(create/find/consume/list) + a cheap size+head+tail file fingerprint.
- exempt_dedup baked into the hint (a re-identify must bypass dedup-skip);
replace_track_id carried for deferred post-success cleanup.
Inert until wired (Phase 5) — nothing calls it yet. 9 seam tests.
radoslav-orlov: add AAC as a download quality option. AAC is more efficient than
MP3, so it's useful for Soulseek/torrents (streaming sources pick their own
codec; Amazon — the AAC-heavy one — is down).
Additive by construction: every quality tier already defaults enabled=false and
the waterfall is built only from enabled tiers, so AAC ships OFF and the bucketer
routes a not-enabled AAC file to the 'other' bucket EXACTLY as today (where it was
silently dropped). Only a user who turns AAC on makes it a first-class tier,
ranked above MP3 / below FLAC (priority 1.5, min-kbps gate so junk AAC can't beat
a good MP3).
- music_database: aac tier (disabled) in the default profile + all 3 presets.
- soulseek_client: map .m4a -> 'aac' in both result parsers (was 'unknown' ->
dropped); add the 'aac' bucket + a gated branch + a fallback size limit.
- settings UI: an 'AAC' tier toggle (unchecked) between FLAC and MP3; save
defaults its priority to 1.5 so upgraded profiles rank it right on first save.
7 seam tests pinning the additive guarantee (aac absent/disabled -> dropped as
before; FLAC/MP3 selection unchanged; aac on -> selectable, below FLAC, above
MP3); 81 quality/soulseek tests pass, ruff clean. quality_upgrade left untouched
(its AAC handling is unchanged).
A user who removes a wishlist track, or cancels an in-flight wishlist
download, would have it re-added on the next auto cycle (watchlist scan,
failed-track capture, or the cancel handler's own re-add), so the same
release downloaded -> failed/cancelled -> re-queued forever.
Adds a TTL'd skip-gate (30 days), softer than the blocklist: it expires
so the track is reconsidered later, and never blocks a manual
force-download — only the automatic re-queue.
- core/wishlist/ignore.py: pure TTL/normalization/display logic + a
best-effort orchestrator (no DB handle, caller passes now).
- database/music_database.py: migration-safe wishlist_ignore table +
add/check/remove/list(+purge)/clear methods, and the gate in
add_to_wishlist beside the blocklist guard. Fail-open throughout — an
ignore error can never block a legitimate add; a manual add bypasses
the gate AND clears the ignore.
- routes.py: user remove (single/album/batch) records an ignore. Hooked
at the route layer, NOT the DB remove, so success-cleanup never
ignores (regression-tested).
- web_server.py: cancel now ignores + removes from the wishlist instead
of re-adding for endless retry; three /api/wishlist/ignore-list*
endpoints.
- downloads.js: 'Ignored' modal (view / un-ignore / clear all).
- 13 tests: pure logic, DB seam, gate (block/bypass/fail-open),
route wiring, and the success-cleanup-does-not-ignore regression.
wolf's report: Spotify shows 'Calma - Remix', Find & Add searches that literal
string, but the library stores the track as just 'Calma' (only the 3:58 duration
marks it the remix). The literal LIKE '%calma - remix%' misses, so it fell to the
OR-fuzzy fallback which floods on the common word 'remix' (20 unrelated '... remix'
hits). Dropping '- Remix' (searching 'Calma') finds it instantly.
Fix: search_tracks (and api_search_tracks) now retry on the BASE title — the part
before Spotify's ' - ' version separator — BEFORE the OR-fuzzy flood. So
'Calma - Remix' resolves to 'Calma' (or 'Calma (Remix)') and the noise fallback is
never reached when the base matches. New core.text.title_match.base_title_before_dash
(splits the first spaced ' - '; leaves bare hyphens like 'Up-Tight' alone).
Tests: pure helper (3) + real-DB integration reproducing the Calma case, the
parenthesized-remix variant, plain-title-unaffected, and no-flood (4). 64
search/match tests green.
After saving a password or recovery question, a refresh made the section look
unset (passwords are never echoed back to the browser), so it seemed like you had
to redo it. Now the saved state is reflected:
- "✓ A login password is set" appears when the admin has a password; the field
becomes "Enter a new password to change it".
- "✓ Recovery question saved: <question>" appears, the saved question is pre-
selected (preset or custom), and the answer field becomes "Enter a new answer to
change it".
- Shown both on load (applyLoginSavedState from /api/profiles, which now includes
recovery_question — not secret, already shown on the sign-in screen) and
immediately after saving.
64 integrity tests pass.
The security section had grown into a flat pile of toggles with hidden
dependencies. Regrouped into three labelled cards so it reads top-to-bottom:
- 🔑 Lock with a PIN — set PIN (Step 1) → Require PIN
- 👤 User accounts (login) — Step 1 admin password → Step 2 recovery question →
Step 3 Require login. The Step 3 toggle is now visually LOCKED (greyed +
disabled + "set the admin password first" hint) until an admin password exists,
so the anti-lockout rule is obvious instead of surfacing as a 400 on save. It
unlocks the moment the password is saved.
- 🌐 Reverse proxy & remote access — the proxy toggle, with the auth-proxy header
nested under it (indented), plus WebSocket origins.
- get_all_profiles/get_profile now expose has_password + has_recovery so the UI
can reflect setup state; updateRequireLoginGate() drives the lock.
- New .security-subgroup/.security-subhead/.security-nested/.security-locked CSS.
All IDs + handlers preserved. Inert unless used; default install unaffected.
64 script-split integrity tests pass.
Closes the forgot-login-password gap. A per-profile recovery question + answer lets
a locked-out user reset their own password.
- DB: additive recovery_question + recovery_answer_hash columns (idempotent
migration). set/get-question/verify/has methods; answer is hashed (pbkdf2) and
matched forgivingly (trim + lowercase + collapse whitespace). No recovery set →
never verifies.
- Endpoints (allowlisted in the login gate so they work pre-auth):
GET /api/auth/recovery-question?username= (generic 404 when absent),
POST /api/auth/recovery-reset {username, answer, new_password} — brute-force
limited; a correct answer sets the new password + authenticates the session.
POST /api/profiles/<id>/set-recovery (admin or self) to configure it.
Tests: set/get/verify, forgiving match, hashed-not-plaintext, no-recovery-never-
verifies, full reset flow (wrong answer rejected + password intact; correct answer
resets), unknown-user 404. 25 tests pass. Next: the Settings + login-screen UI.
Opt-in username/password login — profiles become real accounts. This is the data
layer: a per-profile login password, kept SEPARATE from the quick-switch PIN
(different security purpose; a 4-digit PIN must not become the password guarding a
public instance).
- Additive migration: profiles.password_hash column (idempotent, metadata-flagged).
- set_profile_password / verify_profile_password / profile_has_password /
get_profile_by_name (the login username = profile name, unique + case-insensitive).
- Security default: a profile with NO password is NOT loginable (verify returns
False) — unlike the PIN where "no PIN = always valid". You can't authenticate to
an account with no credential.
Tests: migration adds the column; set/verify; no-password-never-loginable; clearing;
name lookup; and password is fully independent of the PIN. 6 tests pass. Next:
the login endpoint + require_login gate (increment 2).
Reported via Find & Add (Billie Eilish "bad guy"): the track was in the library
and on Plex, but never showed in the modal's 20 results. Root cause (proven
against the real 307k-track DB): the search did `ORDER BY tracks.title`, which is
case-SENSITIVE in SQLite (BINARY collation sorts 'B' before 'b'). Billie's title
is lowercase "bad guy"; everyone else's is "Bad Guy", so all the capitalised ones
sorted first, filled the LIMIT, and her exact match landed at ~#25 — cut off.
- search_tracks now ranks by relevance: exact title match first (case-insensitive
via unidecode_lower), then prefix, then alphabetical — so an exact match can't
be sorted below the limit by a capital letter. Helps every caller.
- Added a rank-only `rank_artist` hint (never filters): Find & Add already knows
the source track's artist, so it now passes it and the exact title+artist match
floats to #1. Filtering was deliberately avoided — if the track is tagged under
a slightly different artist on the server, a filter would re-hide it.
Verified on the real DB: title-only "bad guy" now surfaces Billie at #4 (was
>#20); with the artist hint she's #1. Seam tests: lowercase exact title isn't
buried; rank hint floats the match without filtering; exact title beats a
superstring title. 10 tests pass.
- ⚠ Unverified filter rows gain actions: inline play (range-streamed from the
history file path, server-side only), YouTube compare, Approve -> new
human_verified status (tag + history + tracks; AcoustID scanner skips these
entirely), Delete (file + entry)
- API: /api/verification/<id>/stream|approve|delete (path only from DB row)
- backfill: history rows with acoustid_result='fail' that exist at all were
imported despite the failure = force_imported (covers pre-fix fallback
imports like the user's 'My Ordinary Life')
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The pipeline has three success exits (simple download, playlist folder mode,
main) but only the main one persisted the verification status — force-imported
playlist tracks got no tag, no history status, and never appeared in the
Unverified filter. Extracted _persist_verification_status() and call it at
every exit. One-time idempotent backfill derives status for existing history
rows from their recorded acoustid_result (pass->verified, skip->unverified).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The persistent Completed list is built from library_history (not live tasks),
so the badge never showed after a session ended. Column added (additive),
written at import, passed through _build_history_download_item, rendered by
_adlVerifBadge next to the status label.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Also: evaluate() treats an empty expected artist as title-only comparison
(old scanner behaviour — a missing DB artist is no evidence of a wrong file),
and the thresholds are now defined once in the core and re-exported.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Second service. Each profile connects its own Tidal; its playlist reads use that
account, everything else stays global. The gotcha vs Spotify: TidalClient loads
AND saves tokens to one global slot (tidal_tokens), so a naive per-profile client
would clobber the admin's tokens on refresh.
- get_tidal_client_for_profile builds a dedicated TidalClient seeded with the
profile's tokens, refreshed via the shared/global app creds, and OVERRIDES its
_save_tokens to persist to the PROFILE row — never the global slot. Admin
(profile 1) + unconnected profiles use the global client unchanged. Cached per
profile + evicted on (dis)connect.
- DB: set_profile_tidal_tokens / get_profile_tidal (encrypted); the OAuth callback
now uses them + evicts the cached client.
- Wired the Tidal playlist reads (list + tracks) to the per-profile client; the
module import line left intact.
- My Accounts: Tidal row (Connect via /auth/tidal?profile_id=, status, Disconnect).
Connections API extended; disconnect made generic (/<service>/disconnect).
Admin sees "managed in Settings" for every service.
Tests: per-profile token refresh writes to the profile and leaves the global
tidal_tokens untouched (the safety guarantee); connect status + disconnect;
admin/unconnected → global client. 22 endpoint tests pass.
Groundwork for admin-created, per-profile-switchable credential sets ("pills")
across auth services (Spotify/Tidal/Deezer/Qobuz/Plex/Jellyfin/Navidrome).
Strictly additive and dormant — nothing reads it at runtime yet, so zero
behaviour change for existing installs.
- core/credentials/store.py: pure service registry + payload validation +
stale-safe active-set selection (pick_active_credential falls back to None
when a selected set was deleted, so a profile never breaks).
- migration service_credentials_v1: two new tables — service_credentials
(admin-created named sets; payload Fernet-encrypted at rest) and
profile_service_credentials (each profile's selected set per service).
- MusicDatabase CRUD: create/update/delete/list/get_service_credential
(list never returns the payload; get decrypts for the resolver), plus
set/get_profile_service_credential and resolve_profile_service_credential
(returns the profile's active payload or None → caller uses global default).
Tests: 12 — pure validation + stale-safe selection, and real-temp-DB storage
proving encryption round-trips, payload never lists, dup(service,label)
rejected, per-profile/per-service resolution, and delete clearing dangling
selections to a clean fallback. 95 migration/DB tests still pass.
Boulder: the live display was a cramped ~600px box showing a fraction of the
data the scan already tracks, with no animation and no history.
Live scan deck (replaces the three-column box, full width):
- Header: pulsing live dot, "x / y artists" progress text, and two live
counter chips (found / added) that pop when they change.
- Animated progress bar (artist index / total) with a shimmer sweep.
- Stage: artist avatar with accent glow + name + readable phase line
("Checking album 2 of 5"), album art + album + current track.
- "Added to wishlist this run" feed: taller, bigger art, slide-in animation
that plays once per new track (feed re-renders only when it changes).
- All data was already in scan_state (current_artist_index, total_artists,
tracks_found/added_this_scan, current_phase) — just never displayed. The
legacy fullscreen-modal markup shares element ids and lacks the new ones,
so it keeps working untouched.
Scan History (persistent):
- New watchlist_scan_runs table — one row per run (status, timestamps,
artists/found/added counts) + the full track ledger JSON. Saved at scan
completion AND cancellation; idempotent on run_id; pruned to the last 100
runs. Wishlist rows erode as tracks download, so this is the durable record.
- GET /api/watchlist/scan/history (runs) + /history/<run_id>/tracks (ledger).
- New History button on the Watchlist page → modal in the origins/blocklist
house style: run cards (date, cancelled chip, artists/found/added stats)
expanding into the Added / Skipped track lists with art and badges.
Tests: save+fetch with ledger, idempotent re-save, prune keeps newest,
unknown-run empty, cancelled runs recorded. 398 watchlist/wishlist/history
tests pass; JS syntax-checked; all rendered strings escaped.
carlosjfcasero round 2 (manual-add fix didn't help — different path). His log
pinned it: the mirrored sync auto-added 'Llamando a la tierra (Serenade From
the Stars)' by M-Clan every run even though his library has the song (stored
bare). Reproduced exactly: the subtitle restates no album context, so the #808
context strip keeps it, and the length-ratio penalty in
_calculate_track_confidence crushes the pair to 0.142 (needs 0.7). Sync →
"missing" → wishlist, forever; and the cleanup uses the SAME matcher, so it
deterministically never removed it. Self-reinforcing.
Fix at the matcher seam (benefits sync, cleanup, downloads, discography alike):
core/text/title_match.strip_subtitle_qualifiers(title, other) strips a
bracketed qualifier only when it (a) isn't restated in the other title, (b)
contains no version-marker token (EN + ES: live/remix/acoustic/version/
dueto/directo/vivo/...), and (c) introduces no new digit token ('(Pt. 2)',
'(2007)' stay different releases). Wired as a third comparison variant in
_calculate_track_confidence with its own length guard. Verified against his
log's other unmatched tracks: '(Live)' 0.15, '(Dueto 2007)' 0.179,
'(Versión 1988)' 0.167 all still correctly blocked — version qualifiers keep
their meaning; the M-Clan case goes 0.142 → 1.0 in both directions.
Also: sync's check_track_exists call now passes album= (cleanup already did),
enabling the album-aware fallback for multi-artist albums during sync.
Tests: tests/test_subtitle_qualifier_match.py — the reported case verbatim
(end-to-end through check_track_exists, both directions, batched candidate
path included), EN+ES version qualifiers still blocked, numeric guard,
'#769 Dani California' and '#808 OurVinyl' guards still hold. 1396
matcher/wishlist/sync tests pass.
Tacobell444: when tracks land in an album across multiple batches (a wishlist
run, the Album Completeness job, a missed track re-downloaded later), the folder
is rebuilt from API metadata each time — so when $albumtype or $year come back
blank/different on a later batch, the folder NAME changes and the album splits,
forcing a Reorganize.
Fix: build_final_path_for_track now checks whether the album already lives in a
single folder on disk and, if so, drops the new track there instead of a freshly
templated folder. Match (chosen): exact stored Spotify album id first, then a
STRICT >=0.85 name+artist match (vs the 0.7 used elsewhere) — a wrong match here
misplaces a file. New core/library/existing_album_folder.resolve_existing_album_folder
holds the logic; always-on with template fallback.
Safety rails: only returns a folder UNDER the transfer dir (never a read-only
library/NAS mount), only when the album lives in EXACTLY ONE folder (multiple =
disc subfolders, which DatabaseTrack can't disambiguate — those defer to the
template), and any failure falls through to the template path. Added
MusicDatabase.get_album_by_spotify_album_id for the id-first lookup.
Tests: single-folder reuse, no-match, below-threshold, multi-folder defer,
outside-transfer reject, id-first, missing transfer dir, no-files-on-disk.
8 tests; 1556 path/import/download tests pass (only the known soundcloud
failures remain).
Part 1 stopped existing full dates being destroyed; this adds first-class support
for full release dates so they can be set + persisted instead of truncated to a
year at the DB layer.
- Schema: new nullable `release_date TEXT` on the albums table (idempotent
ALTER-ADD-COLUMN repair on startup + the live CREATE). NULL = year-only, every
reader falls back to albums.year, so it ships safe/dormant.
- Tag writer: write_tags_to_file + build_tag_diff prefer db_data['release_date']
(the full date) over the year int; _date_to_write writes the full date. When
there's no release_date it's exactly Part-1 behavior (year, preserving an
equally-specific existing file date).
- Retag read path: SELECT al.release_date in the tag-preview/write queries and
thread it into _build_library_tag_db_data.
- Manual edit: release_date added to ALBUM_EDITABLE_FIELDS + a "Release Date"
field (YYYY-MM-DD, validated client-side) in the album editor; the artist-album
query returns it so existing values show. User-set dates are authoritative.
- Enrichment: Spotify + iTunes workers store the source's full release_date
(YYYY-MM / YYYY-MM-DD) when present, only when empty — never clobbering a
manual value.
Tests: writer uses release_date over year + overrides an existing file date;
falls back to year when absent; diff compares the full date. Migration verified
idempotent + enrichment no-clobber. 1435 tag/retag/db/library tests pass.
A Library Maintenance job that cleans up downloads tracked by Download Origins
once they pass a per-origin retention window — findings by default, opt-in
auto-delete.
A download is only ever proposed for deletion when ALL hold: older than its
origin's retention, NOT still in an actively-mirrored playlist / watched
artist, and played fewer than the keep-threshold (default 2 → "played more
than once is kept"). Only touches downloads recorded from the Download Origins
feature forward — never pre-existing or manual library.
- core/library/expired_cleanup.py: pure decision core (retention_cutoff,
is_expired, select_expired) — no DB/clock, fully tested. play_count is the
reliable listen signal (last_played is often unpopulated, so recency isn't
used).
- ExpiredDownloadCleanerJob: gathers facts (play_count via a new
get_origin_cleanup_candidates join; active-mirror via get_mirrored_playlists;
watch via get_watchlist_artists) and either creates 'expired_download'
findings or, with auto_delete on, deletes in-scan. Default OFF, both
retentions default 'off'. Settings auto-render in the Library Maintenance
panel (same as Cover Art / Lyrics / Re-tag).
- delete_origin_download(): shared delete (resolve path → remove file → drop
track row → drop history row); a file that won't delete keeps its row +
reports. Used by auto mode AND the _fix_expired_download apply handler.
- Frontend: type/action ('Delete')/result labels + finding detail render.
Tests: 9 on the pure brain (windows, off, per-origin, protected, play-count
threshold, bad age) + 7 on the job (no-op when off, findings, mirror/watch
protection, auto-delete, delete helper missing/real file). 185 repair/origin
tests pass.
Phase 1 guarded the wishlist; Phase 2a closes the other auto-acquisition path.
Playlist sync, album download, and discography backfill all flow through
run_full_missing_tracks_process, which queues missing tracks at one point —
right where the explicit-content filter already drops tracks. The blocklist
filter slots in beside it: each missing track is checked and a banned
artist/album/track is dropped before queueing (logged with a count), so a
blocked item can't slip in via these flows.
Same brain as Phase 1: the wishlist guard's matcher is generalized to
db.blocklist_reason_for_track(profile_id, track_data, source=None) — the new
`source` param lets the queue path supply the batch source, since an analysis
track dict may not carry a 'provider' field (artists still match by name
fallback regardless). One method, two callers (wishlist + queue), one cascade.
Manual single-track downloads (/api/download, candidate picker, redownload)
are deliberately NOT gated here — that's Phase 2b, pending a block-vs-warn-vs-
override policy decision.
Tests: source-fallback isolation (album id-only proves source drives the ID
match; artist name still matches sourceless), and a queue-filter simulation
mirroring master.py. 35 blocklist tests pass (the only failures in the
download family are the pre-existing soundcloud /app ones).
A proper artist/album/track blacklist (distinct from download_blacklist, which
stays untouched). ID-keyed across metadata sources so a ban survives a source
switch; profile-scoped; cascade artist→album→track.
- core/blocklist/matching.py — pure decision core (no I/O): build an index from
rows, candidate_block_reason() walks track→album→artist. Same-source ID match
is primary; artist NAME is a fallback (covers the ID-backfill window);
albums/tracks are ID-only (common titles like "Greatest Hits" must not
false-positive across artists). Source-isolated so a numeric Deezer id can't
collide with a numeric iTunes id of a different entity.
- DB: new `blocklist` table (profile_id, entity_type, name, 4 source-id cols,
match_status) + CRUD, match-row fetch, backfill-pending query, id-backfill
update (COALESCE — fills NULLs only).
- Guard: _wishlist_blocklist_reason at the top of add_to_wishlist — every
auto-acquisition path funnels through it, so one check covers watchlist,
discography backfill, repair, manual add. Fails OPEN (a guard error never
blocks a legitimate add).
- Discovery unified IN: legacy discovery_artist_blacklist is migrated into the
blocklist on upgrade (replicated to every profile so no global ban silently
stops working; idempotent; legacy table kept for rollback). Discovery reads
(hero + personalized-playlist SQL) now union the blocklist, so a new-modal
ban filters discovery too.
Tests: 13 on the pure matcher (cascade, id-vs-name rules, source isolation,
precedence) + 10 on the DB/guard (CRUD, profile isolation, dedup, backfill,
end-to-end wishlist refusal + cascade + the discovery migration upgrade path).
50 blocklist/personalized tests pass.
37725457 fixed _match_to_itunes to use the real iTunes client and flagged
the cross-source corruption as a possibility. Boulder's live DB proves it
happened: 6 of his 9 watchlist "iTunes" ids EQUAL the artist's Deezer id
(Taylor Swift's "iTunes" id was her Deezer id 12246; the real one is
159260351) — written back when the misnamed MetadataService.itunes slot
held a DeezerClient. The June-4 batch (Green Day, SOAD, Vulfpeck, ...) got
NULL instead because the slot now holds the Spotify primary.
The fix alone can't heal those rows: the backfill only fills EMPTY ids, so
a wrong non-empty id is permanent. New migration clears itunes_artist_id
where it equals deezer_artist_id (the corruption signature — distinct id
spaces, so a legitimate equal pair is effectively impossible, and the worst
case is a NULL that re-matches correctly on the next scan). Idempotent by
construction; similar_artists checked clean (its backfill always used the
registry correctly).
Tests: corrupted row cleared / legit + no-deezer rows kept / idempotent —
via a real re-init with the per-process init memo cleared (an app restart).
carlosjfcasero: 'Champagne Supernova (OurVinyl Sessions)' is in the library
but the artist page shows it unowned and wishlist cleanup never removes it.
Measured with the real catalogs: Deezer/iTunes title the TRACK with the
qualifier while the library track is bare (the qualifier lives in the album
title) — and _calculate_track_confidence crushed that pair to ~0.17: the
"clean" titles keep parenthetical words, so the length-ratio penalty treats
'Champagne Supernova' vs 'Champagne Supernova (OurVinyl Sessions)' as
different songs. (Also confirmed: the OurVinyl release is absent from
Deezer's discography for the artist, so the standard page's 25-release list
not showing it is the source catalog, not a bug.)
Fix 1 — core.text.title_match.strip_redundant_context_qualifiers: a
parenthetical qualifier whose text appears (word-bounded) in the db track's
ALBUM title — or in the other title — restates release context and is
stripped for a comparison variant scored with its own length guard. Genuine
version markers keep their penalty: '(Live)' on a studio album appears in no
context and still blocks; '(Live)' on 'Live at Wembley' correctly matches —
owning the live album IS owning the live cut. Wired into
_calculate_track_confidence, so every check_track_exists consumer (wishlist
cleanup, discography dedup, repair jobs) benefits.
Fix 2 — the artist-page ownership endpoint's album gate: when album-aware
narrowing eliminates EVERY library candidate (the source's album naming just
doesn't resemble the library's — 'Jillette Johnson | OurVinyl Sessions' vs
'Champagne Supernova (OurVinyl Sessions)' ~0.5), fall back to artist-wide
title matching instead of declaring everything unowned off a failed
album-NAME comparison.
Tests: 8 — the exact reported pair end-to-end through check_track_exists,
word-boundary containment ('live' in 'alive' doesn't count), version-marker
safety both ways, and prefix songs still blocked. 1125 matching/wishlist/
library tests pass.
User ask: "a modal that lists the tracks downloaded via watchlist" — extended,
as discussed, to playlists too. One modal, two tabs, opened from the Watchlist
page (watchlist tab preselected) and the Sync page (playlists tab) — same
shared-modal-different-entry-points UX as the rest of the app.
The data: library_history recorded which SERVICE a file came from but never
what TRIGGERED it. New origin/origin_context columns (migration + index) are
written once at the import chokepoint via core/downloads/origin.py, a pure
tested deriver that reads, in priority: an explicit _dl_origin stamp (set at
batch-task creation for direct playlist batches, where the playlist context
otherwise only survived in folder mode), the wishlist provenance already
riding in track_info.source_info (watchlist_artist_name / playlist_name —
watchlist_scanner has stamped these for ages), and the folder-mode playlist
thread. Manual downloads stay unclassified by design. History starts from
now — provenance can't be conjured retroactively.
API: GET /api/download-origins?origin=watchlist|playlist (paged) and POST
/api/download-origins/delete — deletes the file on disk (resolved through the
shared container/host path resolver), the matching library track row, and the
history entries; a file that refuses deletion keeps its row and reports the
error instead of lying.
UI: webui/static/origin-history.js — tabbed modal in the revamp design
language (accent light-edge, pill tabs, entry rows reusing the
library-history-entry components), per-row delete + select-all bulk delete
with honest result toasts, empty/loading states, per-tab totals.
Tests: 8 — deriver priority/shapes (incl. the exact watchlist_scanner
source_info shape and JSON-string survival), origin filtering + counts,
row fetch/delete isolation between origins, delete-track-by-path.
Users manually match an album to the regular edition, but enrichment/
repair keeps treating it as the deluxe (missing songs, renumbered tracks).
Root cause: an album has TWO identities — the enrichment match
(spotify_album_id, which manual-match sets and the worker already honors)
and a SEPARATE canonical version pin (canonical_album_id, added by #777).
The canonical pin is what track-number repair / reorganize / missing-track
detection actually read, and library_manual_match never wrote it — so it
was resolved independently and landed on the deluxe edition.
(So #777 did NOT solve #758: it added canonical pinning, but manual
matches didn't write the pin.)
Fix: a manual ALBUM match on a canonical-recognised source now also pins
AND locks the canonical version to the chosen release:
- new canonical_locked column (same migration pattern as the other
canonical cols).
- set_album_canonical(..., locked=False) gains an atomic WHERE-clause
guard: an auto write can't overwrite a locked pin; a manual write
(locked=True) always wins. get_album_canonical exposes `locked`.
- library_manual_match pins canonical for album matches via the pure
should_pin_manual_canonical(entity_type, source).
The auto resolve job already skips already-pinned albums, so the lock is
protected on two fronts; the new guard also covers any future
re-resolution. A new manual match still overrides.
18 tests: the pure gate (+ a sync-invariant test vs _ALBUM_ID_COLUMNS)
and the DB lock seam (auto can't clobber a manual lock; manual overrides;
auto-over-auto still works). Additive — locked defaults False, so the
auto path is unchanged unless a manual lock exists. Full suite clean.
Extends the manual "Import IDs from File Tags" backfill so newly-scanned
files get their embedded provider IDs pulled into the DB automatically —
no button press needed to keep up with new music.
How it works:
- insert_or_update_media_track now returns 'inserted' / 'updated' / False
(truthy-compatible; existing `if track_success` callers unaffected) so
the scan worker can tell a genuinely new row from an update.
- DatabaseUpdateWorker collects the ids it newly INSERTED this run
(self._new_track_ids) across all insert paths (Plex/Jellyfin/deep).
- After run()/run_deep_scan(), web_server calls _reconcile_after_scan(),
which gap-fills embedded IDs for just those new tracks. Runs as a
post-scan pass (the scan loop itself is untouched/fast — the media
server API never exposes these custom IDs, so the file must be read
once regardless; batching at the end keeps it out of the hot loop and
best-effort so it can never abort a scan). A progress phase ("Reading
file tags for N new tracks…") surfaces the full-refresh tail.
Shared engine:
- New reconcile_library() in core does the paging + lazy parent-map
loading (only loads albums/artists actually referenced — cheap when
scoped to a few new tracks) + per-page commits. BOTH the manual button
and the scan hook call it, so there's one tested orchestration, no
duplication. The backfill job was refactored onto it.
Same hardened safety: gap-fill only, atomically guarded against
overwrite, schema-introspected, idempotent. Scoped to new arrivals for
incremental/deep; full refresh re-inserts everything as new (recovering
the IDs a full-refresh wipe destroys).
+10 reconcile tests (reconcile_library scope/idempotency/progress/stop +
the engine). Full suite clean (only pre-existing soundcloud /app env
failures remain).
Ships the source-id cleanup to all users: a marker-gated one-time migration in
MusicDatabase init clears any source id (deezer/spotify/itunes/musicbrainz/
discogs/audiodb/qobuz/tidal) shared across differently-named artists — the
enrichment-corruption signature. Same-name cross-server duplicates are left
untouched (DISTINCT-name check). Cleared rows re-derive correct ids on the next
enrichment pass; the now name-guarded workers won't re-corrupt.
Runs once (CREATE TABLE _source_id_dedupe_v1 marker), idempotent, per-column
try/except so a missing column can't abort it. Test forces a re-run and asserts
corruption is cleared while a legit same-name dup survives.
Find & Add on the playlist-sync page only wrote sync_match_cache, which is
DELETEd wholesale after every DB scan — so the source->library pairing (and
the user's manual matches) reverted to 'extra'/red-dot on the next shallow
scan. The three match stores (sync_match_cache, manual_library_track_matches,
discovery extra_data) were disconnected and all pointed at tracks.id, which a
rescan re-keys (esp. Jellyfin/Navidrome GUIDs).
Unify the match so it's one durable fact, recorded once, honored everywhere:
- Find & Add also writes a durable manual_library_track_matches row (one-way;
the manual-match tool has no playlist to act on, so no reverse). Carries the
library file path.
- New library_file_path column (idempotent migration) + find_track_id_by_file_path:
re-resolve a stale library_track_id after a rescan re-keys the track, and
self-heal the row.
- The sync compare display's override lookup now falls back to the durable
manual match (resolve_durable_match_server_id) when sync_match_cache misses —
so the pairing persists across a scan instead of reverting to a red dot.
Purely additive: only adds matches when the cache returns nothing.
Tests: durable resolver (valid / stale-reresolve+self-heal / no-match / not-in-
playlist / missing-methods), file_path persistence + find_track_id_by_file_path.
resolve_mirrored_playlist tried the mirrored-playlists primary key FIRST for
any all-digit ref. Deezer upstream ids are all-numeric, so a Deezer playlist id
was mistaken for the PK and the organize-by-playlist toggle resolved a wrong row
(or nothing) — the toggle silently wouldn't save / 'Open in Mirrored' missed.
Resolve by (source, source_playlist_id) first, fall back to PK only when the
source lookup misses. Thread the batch/wishlist source through the download-path
callers so numeric upstream ids resolve correctly there too. Spotify (base62
ids) is unaffected.
Seam tests: numeric Deezer id resolves by source (not PK), spotify alphanumeric
by source, PK fallback still works, profile-scoped, empty refs -> None.
Adds get_recommendation_sources() — for each recommended similar artist it
resolves the polymorphic similar_artists.source_artist_id back to the display
names of the user's OWN artists (library + watchlist) that list it, by matching
against every provider-id column on both tables. The /api/discover/similar-artists
endpoint now attaches a 'because' array per recommendation so the UI can show
'because you have X, Y, Z' instead of just a count.
Seam tests cover: library + watchlist resolution across different provider-id
columns, dedup + name-sort, max_per cap, orphan source omission, profile scoping.
Closes the gap where similar artists only existed for WATCHLIST artists: a new
background worker populates them for the whole LIBRARY, slotting into the
existing enrichment-worker pattern (bubble + Manage Enrichment Workers modal,
status/pause/resume, matched/not_found/pending/errors).
Per source-matched library artist → get_musicmap_similar_artists(name, 25)
(the same matcher the artist-detail page uses: fetches MusicMap names, matches
each to the user's source chain — primary + active fallbacks — returns only
matched artists) → store via add_or_update_similar_artist keyed by the artist's
metadata source id, the SAME key the watchlist scanner + artist map use, so the
two cooperate (idempotent upsert + retry_days window).
- core/similar_artists_worker.py: pure seams (pick_source_artist_id,
map_payload_to_store_kwargs, process_artist) + the threaded worker; skips
artists not yet source-matched; classifies not_found vs transient error
(retry after 30d).
- DB migration: similar_artists_match_status / _last_attempted on artists
(mirrors every other source worker's tracking columns).
- Registered in EnrichmentService + instantiated in web_server, DEFAULT-PAUSED
(opt-in) like Amazon — MusicMap is scraped/outage-prone + this is library-wide.
- SERVICE_ENTITY_SUPPORT['similar_artists']=('artist',) so the modal breakdown
('artists with / without similars') + Retry work; manual-match (inapplicable
to a relationship) is gated out via relationship:true.
- 10 seam tests; existing 80 enrichment tests still pass.
Note: keys under profile 1 (single-profile setups); multi-profile is future work.
Persist organize_by_playlist on mirrored playlists and run playlist-folder
downloads from the auto-sync pipeline instead of the global wishlist phase.
Register SoulSync library rows after playlist-folder post-processing, route
failed organize batches to the wishlist correctly, and skip sync-time
unmatched wishlist only when organize download handles retries.
Invalidate stale playlist track caches on refresh (Spotify and Deezer ARL),
re-mirror on refetch, and improve standalone playlist modals (re-analysis,
Open in Mirrored). Add filesystem missing-track detection and tests.
Co-authored-by: Cursor <cursoragent@cursor.com>
Fixes a correctness bug and adds bulk re-queuing.
- Bug: per-row 'Retry' used clear-match, which sets an item to not_found
with last_attempted=NULL. The worker only retries not_found items where
last_attempted < (now - 30d), and 'NULL < cutoff' is false in SQLite, so
those items were never re-queued. Fixed by resetting match_status to NULL
(pending), which every worker's queue picks up on the next pass.
- New POST /api/enrichment/<id>/retry with scope 'item' | 'failed'
(failed = re-queue every not_found item of an entity type), backed by a
pure whitelisted build_reset_query + MusicDatabase.reset_enrichment().
- UI: per-row Retry now hits /retry; a 'Retry all failed' bulk button appears
when the current entity has not-found items (confirm + count toast); a hint
line explains retry/match/auto-retry behaviour.
- 11 new tests (38 enrichment tests total, all green).
Dashboard 'enrichment bubbles' could pause/hover but offered no way to
*manage* a worker. This adds a full management modal opened from a new
header button, covering all 11 enrichment sources.
Backend (testable core helper + seam tests; no live-DB dependency):
- core/enrichment/unmatched.py: pure, whitelisted SQL builders for the
unmatched browser. service/entity validated against a support map (never
interpolated raw); search + pagination bound as params; tracks join albums
for artwork; limit capped at 200.
- database/music_database.py: get_enrichment_unmatched() +
get_enrichment_breakdown() (the breakdown splits matched/not_found/pending,
which the existing get_stats().progress lumps together).
- core/enrichment/api.py: GET /api/enrichment/<id>/{unmatched,breakdown} on
the existing blueprint + a db_getter hook.
- web_server.py: wire db_getter=get_database.
- tests/enrichment/test_unmatched.py: 19 tests across builders, DB methods,
and Flask routes.
Frontend (vanilla, matches app conventions):
- webui/static/enrichment-manager.js: worker rail with live status + coverage
micro-bars, accent-themed detail panel (hero header, segmented matched/
not_found/pending stat cards, current item, pause/resume), and a searchable
paginated unmatched browser with inline manual match (reusing
search-service + manual-match) and retry (clear-match re-queues).
- Polish: entrance/exit motion, scroll-lock, Escape, refresh control,
flicker-free polling (in-place updates), skeleton loaders, relative
timestamps, per-worker accent theming, real dashboard logos reused at
runtime (with the same invert/circle treatment), responsive rail.
- index.html: header button + script include. style.css: full styling.
Reuses existing pause/resume, status, and manual search+assign endpoints.
Backend tests green (19 new + 11 existing enrichment tests).
Turns the Stage-1 scorer into an end-to-end resolver + persists the result.
Still DORMANT — no consumer reads it yet, so zero behavior change.
- core/metadata/canonical_resolver.py — resolve_canonical_for_album(): builds
candidate releases from the album's per-source IDs (in source-priority order),
fetches each tracklist via an INJECTED fetch_tracklist (so it's unit-testable
without live APIs), scores them with pick_canonical_release, and returns the
best-fit {source, album_id, score}. Skips sources with no id / failed fetch;
returns None when there are no files, no candidates, or nothing clears the
confidence floor.
- database/music_database.py — set_album_canonical() / get_album_canonical()
write/read the Stage-1 columns. get returns None when unresolved, which every
consumer will treat as "fall back to today's behavior".
Tests: tests/test_canonical_resolver.py (7) — best-fit beats priority, priority
breaks true ties, skips missing-id/failed-fetch sources, None on
no-candidates/no-files/below-floor, score rounding. tests/test_canonical_db.py
(4) — set/get round-trip incl. timestamp, unresolved -> None, overwrite,
missing-album -> False. 34 canonical + DB-migration tests pass.
Remaining for Stage 2 (the trigger): read on-disk file durations/titles for an
album, gather its source IDs, call the resolver, store — wired via a backfill
repair job + an enrichment hook. Then Stages 3-4 wire the Reorganizer and Track
Number Repair to READ the pinned canonical.
First stage of the canonical-album-version fix (#765 + #767-Bug2). Pins ONE
canonical (source, album_id) per album, chosen by best-fit to the user's actual
files, so the Reorganizer, Track Number Repair, and tagging stop re-resolving
independently and contradicting each other.
Ships DORMANT — nothing reads or writes the new data yet, so zero behavior
change. Later stages populate (Stage 2) and consume (Stages 3-4) it.
- core/metadata/canonical_version.py — pure scorer (the testable heart):
score_release_against_files() rates a candidate release by track-count fit +
duration alignment (greedy nearest within ±3s) + title overlap, dropping and
renormalizing missing signals so it never crashes on sparse metadata.
pick_canonical_release() takes candidates in source-priority order, picks the
best fit, breaks ties toward the earlier (higher-priority) candidate so the
choice is DETERMINISTIC — that determinism is what makes every tool agree
(#765), while count/duration fit picks the right EDITION (#767-Bug2). A
confidence floor (default 0.5) means a low-confidence guess is never pinned.
- database/music_database.py — additive, nullable columns on albums
(canonical_source / canonical_album_id / canonical_score /
canonical_resolved_at), guarded by the existing PRAGMA-table_info pattern.
NULL = unresolved = every consumer falls back to today's behavior.
Tests: tests/test_canonical_version.py (11) — edition discrimination (11 files
-> standard, 17 -> deluxe), deterministic priority tiebreak, duration
disambiguation on count ties, graceful degradation (no durations / counts only /
fuzzy titles), confidence floor, empty-input safety. tests/test_canonical_
columns_migration.py (4) — fresh DB has the columns, they're nullable w/ NULL
default, migration is idempotent, and it ALTERs them onto an old albums table.
60 DB/schema regression tests still pass.