- depth setting (light = core tags + matched source ids; full = same
multi-source enrichment cascade a fresh download gets, run additively
via embed_source_ids). Threaded through scan/finding/auto-apply and the
repair_worker fix handler.
- source now defaults to 'auto' (= your source priority / active source)
instead of blank.
- give native <option> popups a solid dark background (were white-on-white).
- tests for full-depth full_meta payload + enrich invocation + light no-op.
The job was the odd one out — auto_fix=False, no dry_run setting, so it never
showed the 'Dry Run' badge the other jobs do (the badge keys off
settings.dry_run === true). Aligned it to the standard pattern:
- auto_fix=True + dry_run setting defaulting True. Default behavior is unchanged
(findings only, nothing written) AND it now shows the Dry Run badge.
- Turning dry_run off makes the scan auto-apply in place (result.auto_fixed),
no finding — the opt-in 'just retag it' mode.
- Extracted a shared apply_track_plans() used by both the scan auto-apply and
the repair_worker fix handler (handler now resolves Docker paths then
delegates — one code path, no duplication).
Tests: dry_run=False auto-applies + writes + no finding; existing dry-run
finding/skip/apply tests still green. 410 passing.
Closes the kettui gap — the orchestration was unproven. Injected-fake seam
tests (temp sqlite + real empty track files, no metadata APIs / no real tag
writes):
- embed_known_source_ids: builds the right canonical id_tags from flat db keys,
honors the musicbrainz embed gate, no-ops when there's nothing to write.
- library_retag scan: produces a detailed finding with the per-track old->new
diff + stamped source ids, and skips an album that's already correct.
- _add_source_ids: per-source key mapping.
- _fix_library_retag apply: writes each track's payload, and reports failure
when files are unreachable.
476 tests pass; ruff clean.
The old per-download Retag Tool was limited (only native-pipeline downloads,
100-group cap, manual per-group) and did the wrong thing — it moved/reorganized
files instead of just tagging. It's superseded by the new Library Re-tag job
(whole-library, in-place) + the enhanced-library 'Write Tags' button.
Removed: the post-download record_retag_download ingestion hook (stops writing
retag_groups on every download), core/library/retag.py, the web_server state +
deps + /api/retag/* endpoints + the tool:retag WebSocket emit, the dashboard
card + both modals (index.html), the core.js socket handler, and the tools-page
wiring + help entry (wishlist-tools.js). Updated the import-pipeline test.
Verified: web_server parses, app + core imports OK, 392 tests pass, no live
references to removed symbols.
Left as inert (harmless) for a careful follow-up sweep: the retag_groups/
retag_tracks tables + their DB CRUD methods (no longer written/read), and the
now-orphaned retag JS helper functions (no entry point/wiring/socket calls them;
interspersed with wishlist functions, so not blind-deleted).
The testable core for the new library-wide re-tag job. Given a source album's
metadata + tracklist and the library tracks' current file tags, it:
- matches source tracks to library tracks (disc+track number, then title sim),
- computes the per-field diff (old -> new) for the dry-run finding,
- builds the minimal write_tags_to_file payload — only fields that actually
change under the chosen mode (overwrite vs fill-missing), so applying never
touches unrelated/unchanged tags.
No IO/network/DB — 10 unit tests cover matching, both modes, blank-source
fields, and the album-artist/track-count payload mapping.
Verification found a non-additive edge: embed_album_art_metadata uses FLAC
add_picture(), which APPENDS — so applying to an album where some tracks already
had art would have added a duplicate embedded picture. The apply now checks each
file and skips any that already carry art (shared _audio_has_art helper), so it
only ever ADDS art to files missing it. Test covers the skip (no re-embed).
Previously the filler only flagged albums whose DB thumb_url was empty and, on
apply, only updated that DB thumb_url — so albums whose files had no embedded
art and no cover.jpg (but whose DB row had a URL) were never found, and even
'applying' art never touched the files. That's the reported 'doesn't scan all
albums' gap.
New core.metadata.art_apply (reuses the post-processing standard so the user's
album_art_order is honored):
- album_has_art_on_disk(): cheap-first check — folder cover.jpg/folder.jpg
sidecar, then embedded art in a representative track (FLAC/ID3/MP4/Vorbis).
- apply_art_to_album_files(): embeds via embed_album_art_metadata + writes
cover.jpg via download_cover_art; only ADDS art (never rewrites the user's
tags); read-only/unwritable files are skipped + counted, never crash.
Scan now examines every titled album and flags it when art is missing in the DB
OR on disk. Apply embeds into the album's audio files + writes cover.jpg in
addition to the DB thumbnail (media-server-only albums fall back to DB-only).
Tests cover sidecar/embedded detection, the cheap-first short-circuit, and the
apply orchestration (embeds each file + cover.jpg; read-only failures counted).
The title/artist fallback search took results[0]'s artwork unconditionally, so
a loose full-text match returned the wrong album's cover (the 'new sources give
incorrect art' reports). Now it pulls a few results and only accepts one whose
title matches (subset, to allow Deluxe/Remaster) AND whose artist matches
exactly — the artist being the strong guard against wrong covers. Falls back to
an exact title match when a result carries no artist.
The album's own stored source-id path is unchanged (that id is authoritative).
Tests: wrong-artist rejected, skips wrong result for a matching one, + unit
coverage of the matcher (deluxe/feat/stopwords accepted, wrong artist/title
rejected).
qBittorrent 5.2.0 changed /api/v2/auth/login to return HTTP 204 (No Content)
on success instead of HTTP 200 with body 'Ok.'. The adapter required the body
to equal 'Ok.', so every login on 5.2.0+ failed with 'HTTP 204 body=' — the
connection probe and all torrent actions were broken.
Treat login as successful on the SID auth cookie and/or a success body: 'Ok.'
(<=5.1) or an empty HTTP 204 (>=5.2.0). Still reject bad creds, which
qBittorrent reports as HTTP 200 + 'Fails.' (not a 4xx).
Tests: 204-empty -> success, SID-cookie+empty-body -> success, 'Fails.' (even
with a stale cookie) -> failure.
resolve_mirrored_playlist tried the mirrored-playlists primary key FIRST for
any all-digit ref. Deezer upstream ids are all-numeric, so a Deezer playlist id
was mistaken for the PK and the organize-by-playlist toggle resolved a wrong row
(or nothing) — the toggle silently wouldn't save / 'Open in Mirrored' missed.
Resolve by (source, source_playlist_id) first, fall back to PK only when the
source lookup misses. Thread the batch/wishlist source through the download-path
callers so numeric upstream ids resolve correctly there too. Spotify (base62
ids) is unaffected.
Seam tests: numeric Deezer id resolves by source (not PK), spotify alphanumeric
by source, PK fallback still works, profile-scoped, empty refs -> None.
Adds get_recommendation_sources() — for each recommended similar artist it
resolves the polymorphic similar_artists.source_artist_id back to the display
names of the user's OWN artists (library + watchlist) that list it, by matching
against every provider-id column on both tables. The /api/discover/similar-artists
endpoint now attaches a 'because' array per recommendation so the UI can show
'because you have X, Y, Z' instead of just a count.
Seam tests cover: library + watchlist resolution across different provider-id
columns, dedup + name-sort, max_per cap, orphan source omission, profile scoping.
The worker's WARNING observability proved the '38 errors' were almost all
MusicMap returning 404 (artist has no map page) — a genuine not-found, not a
fetch failure. But iter_musicmap_similar_artist_events flattened every
RequestException to status_code 502, and the worker maps 400/404 -> not_found
/ everything-else -> error, so these inflated the error count.
Surface the real HTTP status from the exception's response (404 stays 404),
falling back to 502 only when there's no response (timeout/connection drop,
which is correctly still an error eligible for retry).
Regression tests: 404 -> 404 (not_found), timeout -> 502 (error), 500 stays
error, plus an end-to-end worker check that a 404 result marks 'not_found'
and stores nothing.
Verified against live data: 1312/1313 stored similars carry a metadata source id,
but 1 slipped through name-only (a match on a source with no id column, e.g.
discogs). Enforce the standard: process_artist now SKIPS any similar whose match
doesn't map to a storable id column (spotify/itunes/deezer/musicbrainz) instead
of writing a useless name-only row. Regression test covers discogs-match + no-id
cases. Now 100% of newly-stored similars are actionable.
The kettui move: 38/79 fetches errored on the first live run, but they were
logged at DEBUG only — invisible in app.log, so the cause (rate-limit vs
no-providers vs bug) is unprovable. process_artist now returns a (status, count,
detail) triple carrying the error reason (status code + message / exception),
and the worker logs the first 15 errors per session at WARNING (rest DEBUG) +
keeps _last_error. No blind pacing tweak — let it run, read the real reason, then
fix the proven cause. Seam tests updated + assert the reason is captured.
SoulSync standalone matches library tracks without Plex fetchItem,
reports missing counts correctly, and skips server playlist writes.
Automation re-syncs when the mirror grows; after sync finishes, starts
organize download (organize-by-playlist) or wishlist processing.
UI: Spotify URL playlist-folder controls, organize toggle layout in the
discovery modal, reload organize preference when reopening Download Missing.
Co-authored-by: Cursor <cursoragent@cursor.com>
Closes the gap where similar artists only existed for WATCHLIST artists: a new
background worker populates them for the whole LIBRARY, slotting into the
existing enrichment-worker pattern (bubble + Manage Enrichment Workers modal,
status/pause/resume, matched/not_found/pending/errors).
Per source-matched library artist → get_musicmap_similar_artists(name, 25)
(the same matcher the artist-detail page uses: fetches MusicMap names, matches
each to the user's source chain — primary + active fallbacks — returns only
matched artists) → store via add_or_update_similar_artist keyed by the artist's
metadata source id, the SAME key the watchlist scanner + artist map use, so the
two cooperate (idempotent upsert + retry_days window).
- core/similar_artists_worker.py: pure seams (pick_source_artist_id,
map_payload_to_store_kwargs, process_artist) + the threaded worker; skips
artists not yet source-matched; classifies not_found vs transient error
(retry after 30d).
- DB migration: similar_artists_match_status / _last_attempted on artists
(mirrors every other source worker's tracking columns).
- Registered in EnrichmentService + instantiated in web_server, DEFAULT-PAUSED
(opt-in) like Amazon — MusicMap is scraped/outage-prone + this is library-wide.
- SERVICE_ENTITY_SUPPORT['similar_artists']=('artist',) so the modal breakdown
('artists with / without similars') + Retry work; manual-match (inapplicable
to a relationship) is gated out via relationship:true.
- 10 seam tests; existing 80 enrichment tests still pass.
Note: keys under profile 1 (single-profile setups); multi-profile is future work.
Persist organize_by_playlist on mirrored playlists and run playlist-folder
downloads from the auto-sync pipeline instead of the global wishlist phase.
Register SoulSync library rows after playlist-folder post-processing, route
failed organize batches to the wishlist correctly, and skip sync-time
unmatched wishlist only when organize download handles retries.
Invalidate stale playlist track caches on refresh (Spotify and Deezer ARL),
re-mirror on refetch, and improve standalone playlist modals (re-analysis,
Open in Mirrored). Add filesystem missing-track detection and tests.
Co-authored-by: Cursor <cursoragent@cursor.com>
- Verified end-to-end: fetch_public_playlist_full pulled all 236 tracks of the
test playlist via SpotipyFree (the library handles the client-auth that 429'd
the raw approach). Name + tracks correct.
- requirements.txt: declare spotipyFree>=1.1.2 as a normal pip dependency (like
spotDL, also MIT — aggregation, not vendored) + websockets (a transitive dep
SpotipyFree/spotapi needs that pip doesn't pull automatically). Code still
soft-imports + falls back to embed, so it's never a hard runtime requirement.
- meta fetch uses limit=1 (name/owner only) so we don't pull the whole list
twice. 9 tests green.
The in-house anonymous-token path is blocked by Spotify (429 without the web
player's rotating client-auth). Switch the full-fetch to SpotipyFree — the
maintained no-creds spotipy drop-in spotDL uses, which tracks that machinery.
- core/spotify_public_api.fetch_public_playlist_full now uses a SpotipyFree
client (playlist + playlist_items + next), normalising the spotipy-shaped
items to the embed scraper's shape. Injectable client_factory keeps it
unit-testable without the library or network. Dropped the dead in-house
token/pagination code.
- Licensing: SpotipyFree is GPL-3.0, so it is NOT bundled/required (SoulSync is
MIT). Optional, user-installed: the import is soft, and on ImportError (or any
failure) fetch_spotify_public falls back to the embed scraper (~100). So the
shipped project stays cleanly MIT and the link path never regresses.
- requirements.txt: documents it as a commented optional extra
(pip install SpotipyFree) with the GPL/MIT rationale.
- 9 tests: normalisation, pagination past 100, library-missing -> raises (->
fallback), and the embed-fallback orchestration.
Needs a live click-through with SpotipyFree installed to confirm the exact
class/method names match (SpotipyFree.Spotify / playlist / playlist_items).
Live debugging the 'shows 100' report:
- The full playlist page no longer embeds an accessToken, and get_access_token
/ server-time now 403/404. The EMBED page (open.spotify.com/embed/playlist/{id})
still ships a usable anonymous token. Was fetching the wrong page -> no token
-> raised -> embed fallback (100). Now reads the embed page for the token.
- Confirmed live: token extraction + embed parse work; the token is accepted by
the Web API (429 rate-limit, not 401). Could not show >100 from here because
the test IP got rate-limited from probing; needs a clean-IP click-through.
While in there, made it more robust against the rate-limiting that's clearly in
play:
- Refactored scrape_spotify_embed -> reusable parse_embed_html.
- fetch_public_playlist_full now does ONE embed fetch for token + name + first
page (no separate metadata call = fewer requests = less 429 surface), then
paginates the API. If the API is unavailable/rate-limited, it keeps the embed
page's tracks (<=100) instead of raising — so the result is always >= today's
behaviour, never worse.
- 12 tests incl. the new API-fails-but-embed-tracks-survive path.
Caveat unchanged: rides Spotify's undocumented embed-page token; degrades to the
embed fallback, never crashes.
The no-auth 'add by link' path scrapes Spotify's embed widget, which only ever
contains ~100 tracks and can't paginate — so big public playlists got
truncated. This adds an in-house anonymous fetch that pulls the FULL list:
- core/spotify_public_api.py: reads the anonymous web-player accessToken Spotify
already embeds in its own open.spotify.com page HTML (no app credentials, and
no rotating TOTP secret for us to maintain), then paginates
/v1/playlists/{id}/tracks 100 at a time until the whole playlist is pulled.
Returns the embed scraper's exact shape. Pure helpers + injected http_get so
it's unit-testable without the network.
- core/spotify_public_scraper.fetch_spotify_public(): tries the full fetch for
playlists; on ANY failure (or for albums) falls back to scrape_spotify_embed.
Worst case == today's behaviour, so the link path can't regress.
- web_server: the link-tab endpoint and the authed flow's last-resort scrape
now both go through fetch_spotify_public.
Scoped entirely to the spotify_public_* (no-auth) path — the authenticated
playlist sync is untouched. 11 tests (token extraction, normalisation,
pagination past 100, and the embed-fallback orchestration).
Caveat: rides Spotify's undocumented page-embedded token — expected to break
when they change their page; it degrades to the embed fallback, never crashes.
Needs a live click-through to confirm the token path works end to end (can't
hit Spotify from the test env).
The onclick-coverage guard only scans the split modules + a hardcoded extras
list, so it flagged openEnrichmentManager() (defined in the new, loaded
enrichment-manager.js) as undefined. Add enrichment-manager.js to the scanned
non-split files. The function genuinely exists and is loaded via its script tag.
- #1 Unconfigured-source banner: when a source has enabled=false, show a
notice that browsing works but matches/retries won't run until it's set up.
- #2 Rate-limit detail: when rate_limited, surface 'resumes in ~Xm' (from the
status payload) instead of just a pill.
- #3 Richer rows: unmatched items now show parent context — an album's artist,
a track's album — via a parent expression in the query (+ test).
- #4 Bulk select: per-row checkboxes + a bulk bar to retry several at once
(capped concurrency), reusing the /retry item endpoint.
- #5 Remember last worker: selection persists in localStorage and is restored
on open; openEnrichmentManager(workerId) supports future deep-linking
(bubbles left on their pause-on-click behaviour).
- #6 Keyboard nav: ArrowUp/Down moves focus between rows; actions are native
buttons (Enter/Space) and Escape closes — list isn't poll-refreshed so focus
is stable.
53 enrichment tests green; JS syntax clean.
Per-worker processing-order override + UI polish.
Feature — pin an entity group to enrich first:
- Each worker normally runs artist -> album -> track. A user can pin one
group (artist/album/track) to run first from the modal; the worker keeps
that group first until it's exhausted, then resumes the normal chain.
- core/worker_utils.py: read_enrichment_priority() (reads
<service>_enrichment_priority each loop, live) + priority_pending_item()
(shared, whitelisted query returning the worker's expected item shape;
Spotify/iTunes get album_individual/track_individual via a type map).
- A guarded ~6-line hook at the top of all 11 workers' _get_next_item.
CRITICAL: when nothing is pinned (default) the hook returns immediately,
so default enrichment order is byte-identical to before. Discogs (no track)
and Genius (no album) only honor their supported entities.
- core/enrichment/api.py: GET/POST /api/enrichment/<id>/priority (+ config_get
hook); POST validates the entity against what the source enriches.
- 14 new tests (helper shapes, exhaustion, route get/set/clear/validate).
UI:
- Refined hero header: identity + inline status left, single Pause right,
'now enriching' quiet sub-line; overall coverage % moved into the stats
section ('82% matched · 1,203 of 1,460'). Hero gently pulses while running.
- New processing-order strip: artist→album→track steps showing the live phase
(pulsing 'now'), pinned group ('first' + 📌), and done/remaining; click a
step to pin it, click again for auto.
py_compile clean across all 11 workers; 52 enrichment tests green.
Fixes a correctness bug and adds bulk re-queuing.
- Bug: per-row 'Retry' used clear-match, which sets an item to not_found
with last_attempted=NULL. The worker only retries not_found items where
last_attempted < (now - 30d), and 'NULL < cutoff' is false in SQLite, so
those items were never re-queued. Fixed by resetting match_status to NULL
(pending), which every worker's queue picks up on the next pass.
- New POST /api/enrichment/<id>/retry with scope 'item' | 'failed'
(failed = re-queue every not_found item of an entity type), backed by a
pure whitelisted build_reset_query + MusicDatabase.reset_enrichment().
- UI: per-row Retry now hits /retry; a 'Retry all failed' bulk button appears
when the current entity has not-found items (confirm + count toast); a hint
line explains retry/match/auto-retry behaviour.
- 11 new tests (38 enrichment tests total, all green).
Dashboard 'enrichment bubbles' could pause/hover but offered no way to
*manage* a worker. This adds a full management modal opened from a new
header button, covering all 11 enrichment sources.
Backend (testable core helper + seam tests; no live-DB dependency):
- core/enrichment/unmatched.py: pure, whitelisted SQL builders for the
unmatched browser. service/entity validated against a support map (never
interpolated raw); search + pagination bound as params; tracks join albums
for artwork; limit capped at 200.
- database/music_database.py: get_enrichment_unmatched() +
get_enrichment_breakdown() (the breakdown splits matched/not_found/pending,
which the existing get_stats().progress lumps together).
- core/enrichment/api.py: GET /api/enrichment/<id>/{unmatched,breakdown} on
the existing blueprint + a db_getter hook.
- web_server.py: wire db_getter=get_database.
- tests/enrichment/test_unmatched.py: 19 tests across builders, DB methods,
and Flask routes.
Frontend (vanilla, matches app conventions):
- webui/static/enrichment-manager.js: worker rail with live status + coverage
micro-bars, accent-themed detail panel (hero header, segmented matched/
not_found/pending stat cards, current item, pause/resume), and a searchable
paginated unmatched browser with inline manual match (reusing
search-service + manual-match) and retry (clear-match re-queues).
- Polish: entrance/exit motion, scroll-lock, Escape, refresh control,
flicker-free polling (in-place updates), skeleton loaders, relative
timestamps, per-worker accent theming, real dashboard logos reused at
runtime (with the same invert/circle treatment), responsive rail.
- index.html: header button + script include. style.css: full styling.
Reuses existing pause/resume, status, and manual search+assign endpoints.
Backend tests green (19 new + 11 existing enrichment tests).
The canonical source_selection setting was rendering as a free-text box — easy
to typo an invalid mode. Added a generic choice mechanism so it's a dropdown:
- RepairJob.setting_options: {key: [allowed values]} (default {} — opt-in).
- CanonicalVersionResolveJob declares source_selection's three modes.
- repair_worker.get_all_job_info() includes setting_options in the job payload.
- enrichment.js renders a <select> (options prettified, current value selected)
for any key listed in setting_options; everything else renders by value type
as before. The save path already reads <select>.value as a string, so no
change needed there.
Generic — any future job can get dropdowns the same way. Jobs that don't
declare setting_options are untouched (empty dict -> existing input rendering).
Tests: source_selection exposes the 3 options and its default is one of them.
23 repair-job/worker + canonical tests pass (other jobs unaffected).
Per request, pack each finding with everything available WITHOUT extra API
calls (kettui: reuse what's already fetched, read the album row we already
loaded, degrade per-field, keep it tested):
- Pinned release's track titles — already fetched during scoring, so free
(capped at 60 to bound details_json).
- From the album row (free): year, DB track count, total duration, genres-free
context, and the album's currently-linked source IDs.
- file_track_titles (your library's titles) for a side-by-side with the release.
- Artist + album thumbs (artist via the guarded lookup) and names.
_describe_pin now renders: "Artist — Album (year)", the fit breakdown, "Currently
linked: … → pinning X", "Beat: <alternatives>", and the release tracklist — so
the card is judge-able at a glance, and the structured fields are in details for
a richer UI.
NOT included (would cost an extra per-album API fetch, left as opt-in): the
*release's* own year/type/cover/URL from get_album_for_source, vs the library's.
Tests: _describe_pin rich-render (year/linked/tracklist), resolver release-titles,
orchestration free-context fields. 94 canonical + reorganize regression pass.
Findings now carry artist_thumb_url alongside album_thumb_url (same key the
track-repair findings use, so the findings UI already renders it).
Fetched via a guarded _lookup_artist_thumb() — checks the artists table has a
thumb_url column first and swallows any error — rather than adding ar.thumb_url
to the shared load_album_and_tracks SELECT. The shared-loader approach was
tried first and REVERTED: it crashed reorganize on schemas whose artists table
has no thumb_url column (caught by 40 orchestrator tests). The lookup only runs
for albums that actually resolve, so it adds no cost to the no-source-id
short-circuit majority.
Tests: orchestration test asserts artist_name + album_thumb_url + artist_thumb_url
flow through. 47 canonical + 104 canonical/reorganize regression tests pass.
Live-run feedback: "Best-fit release: deezer (665666731), score 1.0" is too thin
to trust/accept. Each finding now explains WHY:
- score_release_detail() exposes the per-signal breakdown (count/duration/title)
instead of just the blended score.
- resolve_canonical_for_album returns an enriched result: the breakdown,
file_track_count vs release_track_count, and a `candidates` list of every
source it scored (so a finding can show what the winner beat).
- resolve_and_store adds album/artist/thumb context from the row it already
loaded (no extra query). Storage still only reads source/album_id/score.
- The job builds a real description via _describe_pin(), e.g.:
"Pin deezer release 665666731 (confidence 100%).
Fit to your library: 11 files vs 11 tracks on this release — track count
100%, durations 100%, titles 100%.
Beat: spotify 65% (17 tk)."
and a clearer title ("Pin deezer as canonical: <artist> — <album>").
Tests: resolver enrichment (breakdown + candidate comparison fields), and
_describe_pin (judge-able text incl. the beaten alternatives, and honest "n/a"
for a missing signal). 42 canonical tests pass.
Note: the description string carries the judge-able info regardless of UI; how
the findings tab renders the extra details keys (thumb image, candidates table)
is still UI-dependent and unverified.
Feedback from the live dry-run: the job was pinning whichever source best fit
the files regardless of which source it was, which was surprising — users
expect it to respect their active metadata source. Made it a per-job setting
instead of a baked-in policy.
source_selection (default 'active_preferred'):
- active_preferred — use the active/primary metadata source's release when the
album has an ID for it AND it clears the score floor; otherwise fall back to
the best-fit among the other sources. Respects the configured source but
self-heals when that link is clearly broken (below floor / no ID).
- active_only — only ever the active source; never considers others.
- best_fit — previous behavior: whichever source matches the files best.
resolve_canonical_for_album gains mode + primary_source; the orchestration
threads the primary source through; the job reads source_selection from its
settings. Note: active_preferred respects the active source as long as it clears
the floor, so it will NOT override a deluxe-vs-standard mismatch on the primary
(#767-Bug2) — that's what best_fit is for; the choice is now the user's.
Tests: per-mode coverage in test_canonical_resolver.py (active_preferred uses
primary when it fits, falls back when primary is below floor, keeps primary even
when another fits better; active_only pins primary / never falls back; best_fit
unchanged), orchestration default-mode test, and the setting default. 39
canonical tests pass.
The populate trigger that turns the (until now dormant) feature on. Until a user
enables and runs this job, no album has a canonical -> both read sides (Stages
3-4) fall back -> zero behavior change. So the whole feature ships safely off.
- core/repair_jobs/canonical_version_resolve.py — "Resolve Canonical Album
Versions". Iterates the active server's albums, skips ones already pinned, and
calls the tested resolve_and_store_canonical_for_album per album. Opt-in
(default_enabled=False) and dry-run-by-default: resolving compares an album's
candidate releases across sources (metadata-source API calls, once per album),
so it's deliberately user-triggered. Dry run reports a finding per album it
would pin; live mode stores. Registered in _JOB_MODULES.
- core/metadata/canonical_resolver.py — resolve_and_store gains store=True; the
job's dry run passes store=False to resolve-without-writing.
Tests: tests/test_canonical_version_job.py (6) — registered, opt-in + dry-run
defaults, live resolves+stores (auto_fixed), dry run creates findings without
persisting, already-pinned albums skipped. Registry loads all 19 jobs cleanly.
145 tests across the full feature + reorganize/track-repair/DB regression pass.
_resolve_album_tracklist gains a Fallback -1: if the album has a pinned
canonical (source, album_id), use it before the existing 6-level cascade — so
Track Number Repair resolves the SAME release the Reorganizer does (Stage 3) and
the two stop contradicting each other (#765, the Spotify-4 vs MusicBrainz-3
conflict).
Gated + additive: the entire existing cascade is untouched for albums without a
canonical, so this job's all-01-album rescue (which relies on the MusicBrainz/
AudioDB fallbacks for albums with no DB source ID) is fully preserved — that's
the regression we explicitly refused to take in a reactive fix.
New helper _lookup_canonical_from_db() mirrors _lookup_album_ids_from_db
(file-path -> track -> album), returns None when no DB / no match / columns
absent / unresolved.
Tests: tests/test_track_repair_canonical.py (4) — returns canonical when pinned,
None when unresolved / file untracked / no DB. Existing track_number_repair
tests still pass (no regression).
_resolve_source now prefers the album's pinned canonical (source, album_id) when
set, before the source-priority walk. So once an album's canonical is resolved,
reorganize agrees with Track Number Repair (Stage 4) and stops mislabelling a
standard album as deluxe (#767-Bug2).
Gated + side-effect-free: only changes behavior for albums that already carry a
canonical (none do until the populate step runs), an explicit user source pick
(strict_source) still wins over the canonical, and a failed canonical fetch
falls through to today's priority walk. So this stage is behavior-neutral until
canonical is populated.
Tests: tests/test_reorganize_canonical_source.py (4) — canonical preferred over
priority, fetch-failure falls back, strict_source ignores canonical, no-canonical
unchanged. 113 reorganize-orchestrator/tag-source/unknown-artist tests still pass
(no regression).
Completes Stage 2's populate path. Still dormant — no consumer calls it yet.
- resolve_and_store_canonical_for_album(db, album_id, ...): loads the album's
source IDs + its tracks' (duration_ms, title) from the DB via the SAME
loader the Reorganizer uses (load_album_and_tracks + _extract_source_ids), so
the canonical is chosen over exactly the source IDs the reorganizer sees;
scores off the DB track rows (the library's view of the files — no per-file
disk reads), resolves the best fit, and persists it. Returns the stored result
or None when unresolved.
- default_fetch_tracklist(): production fetcher wrapping
get_album_tracks_for_source, normalising to {title, track_number, duration_ms}
(duration best-effort; sec->ms; absent -> scorer leans on count+title).
Design note: chose LAZY resolution (Stages 3-4 consumers call this when they hit
an album with no canonical) over a standalone backfill repair job — no new
scheduling/UI surface, resolves only when a tool actually needs it, and stays
gated (NULL canonical = today's behavior).
Tests: tests/test_canonical_orchestration.py (5) — end-to-end on a real temp DB
(11 files pick the 11-track release over a 17-track deluxe and persist it),
no-source-ids -> None, missing-album -> None, and default_fetch_tracklist
normalization (dict items, seconds->ms) + failure -> None. All canonical +
DB-migration tests green.
Turns the Stage-1 scorer into an end-to-end resolver + persists the result.
Still DORMANT — no consumer reads it yet, so zero behavior change.
- core/metadata/canonical_resolver.py — resolve_canonical_for_album(): builds
candidate releases from the album's per-source IDs (in source-priority order),
fetches each tracklist via an INJECTED fetch_tracklist (so it's unit-testable
without live APIs), scores them with pick_canonical_release, and returns the
best-fit {source, album_id, score}. Skips sources with no id / failed fetch;
returns None when there are no files, no candidates, or nothing clears the
confidence floor.
- database/music_database.py — set_album_canonical() / get_album_canonical()
write/read the Stage-1 columns. get returns None when unresolved, which every
consumer will treat as "fall back to today's behavior".
Tests: tests/test_canonical_resolver.py (7) — best-fit beats priority, priority
breaks true ties, skips missing-id/failed-fetch sources, None on
no-candidates/no-files/below-floor, score rounding. tests/test_canonical_db.py
(4) — set/get round-trip incl. timestamp, unresolved -> None, overwrite,
missing-album -> False. 34 canonical + DB-migration tests pass.
Remaining for Stage 2 (the trigger): read on-disk file durations/titles for an
album, gather its source IDs, call the resolver, store — wired via a backfill
repair job + an enrichment hook. Then Stages 3-4 wire the Reorganizer and Track
Number Repair to READ the pinned canonical.
First stage of the canonical-album-version fix (#765 + #767-Bug2). Pins ONE
canonical (source, album_id) per album, chosen by best-fit to the user's actual
files, so the Reorganizer, Track Number Repair, and tagging stop re-resolving
independently and contradicting each other.
Ships DORMANT — nothing reads or writes the new data yet, so zero behavior
change. Later stages populate (Stage 2) and consume (Stages 3-4) it.
- core/metadata/canonical_version.py — pure scorer (the testable heart):
score_release_against_files() rates a candidate release by track-count fit +
duration alignment (greedy nearest within ±3s) + title overlap, dropping and
renormalizing missing signals so it never crashes on sparse metadata.
pick_canonical_release() takes candidates in source-priority order, picks the
best fit, breaks ties toward the earlier (higher-priority) candidate so the
choice is DETERMINISTIC — that determinism is what makes every tool agree
(#765), while count/duration fit picks the right EDITION (#767-Bug2). A
confidence floor (default 0.5) means a low-confidence guess is never pinned.
- database/music_database.py — additive, nullable columns on albums
(canonical_source / canonical_album_id / canonical_score /
canonical_resolved_at), guarded by the existing PRAGMA-table_info pattern.
NULL = unresolved = every consumer falls back to today's behavior.
Tests: tests/test_canonical_version.py (11) — edition discrimination (11 files
-> standard, 17 -> deluxe), deterministic priority tiebreak, duration
disambiguation on count ties, graceful degradation (no durations / counts only /
fuzzy titles), confidence floor, empty-input safety. tests/test_canonical_
columns_migration.py (4) — fresh DB has the columns, they're nullable w/ NULL
default, migration is idempotent, and it ALTERs them onto an old albums table.
60 DB/schema regression tests still pass.
A source row with no art of its own (e.g. a YouTube source, which provides
none at mirror time) now borrows the cover from its MATCHED server track, so
both sides of the sync editor show an image.
The endpoint already had a borrow fallback (_server_art_map), but it matched by
an exact normalized "{artist}|{title}" key — so a YouTube-shaped row like
"Arctic Monkeys - Do I Wanna Know?" never matched the library's "Do I Wanna
Know?" and stayed blank even though the server had the cover. This borrow is
keyed off the ACTUAL source<->server pairing the reconcile already computed, so
it works for those rows once #768's canonical matching pairs them.
Done in the pure reconcile_playlist (final pass), so no frontend change is
needed — the editor already renders source_track.image_url. Guarded so it only
fills an EMPTY source image (Spotify/CDN art is never overwritten) and only when
the matched server track actually has a thumb.
Composes with the rest: #766 made the server cover URL work, #768 made the
YouTube row match, this makes the matched source row borrow that cover — so an
artless YouTube row matched to a Navidrome track with art shows on both sides.
Tests: tests/test_playlist_reconcile.py (+4) — artless source borrows the
matched cover; source with its own art keeps it; unmatched source has nothing to
borrow; borrow skipped when the server track has no thumb. 15 reconcile + 59
sync/navidrome tests pass.
The sync editor renders server covers as <img src="/api/navidrome/cover/{id}">,
but no Flask route ever served that path — so every Navidrome cover 404'd, on
every album, art or not. The source (left) side then went blank too: a source
row with no native art (e.g. YouTube, which provides none at mirror time) falls
back to borrowing the matched server track's cover — i.e. that same dead route.
So both sides collapsed to nothing.
Fix:
- New NavidromeClient.build_cover_art_url(cover_id) — builds the absolute,
Subsonic-authenticated getCoverArt URL (base_url + token/salt), keeping
credentials server-side. Uses a FIXED cover-art salt so the URL is
deterministic for a given (server, password, cover_id): a rotating salt (as
in _generate_auth_params) would make every request a unique URL → image-cache
miss every time + a dead, never-reused cache row per fetch. Token auth doesn't
require a unique salt, and the password is never exposed (only its salted md5).
- New route /api/navidrome/cover/<cover_id> — resolves that URL and streams the
image through the shared image cache (same pattern as /api/image-proxy), with
a private max-age so the browser caches by the stable route URL.
Effect: server side works for any album that has art in Navidrome; matched
source rows with no native art now borrow the (now-working) server cover.
Unmatched YouTube rows stay blank — no image exists anywhere to show.
Tests: tests/test_navidrome_cover_url.py (8) — URL structure + salted-token auth
(never the raw password), determinism (same id -> same URL so the cache hits;
different id/password -> different URL), optional size, and the not-connected /
no-id / no-credentials guards.
Caveats: not executed against a live Navidrome (no server in CI) — the URL
builder is unit-tested; the route's cache→HTTP→bytes round-trip is read-verified
only. Scope is the sync editor's Navidrome route; Plex/Jellyfin server-cover
branches and any other modals using a different mechanism are untouched.
The reorganize preview (dry run) was physically creating destination album
folders, littering the library with empty dirs and making "changes" before the
user ever hit Apply.
Cause: preview_album_reorganize calls build_final_path_for_track purely to
COMPUTE the destination path string — but that shared helper has 9 os.makedirs
side effects (it's also the live download/import path builder, where creating
the dir is correct). So computing the preview path created "Lenka (Expanded
Edition)/" on disk.
Fix: build_final_path_for_track gains create_dirs=True; all 9 makedirs now route
through a gated helper. The reorganize PREVIEW passes create_dirs=False, so a
dry run computes the exact destination path with zero filesystem side effects.
Everything else keeps the default True:
- the download/import post-process flow (still writes files into the dir),
- retag,
- the reorganize APPLY path — verified it goes through post_process_fn (the real
pipeline → build_final_path_for_track with create_dirs=True), so live moves
still create their destination dirs. The gate only silences the dry run.
Tests: tests/imports/test_import_paths.py — create_dirs=False computes the
correct path (matching the reported "01 - The Show.flac") but writes NOTHING to
disk (not even the Transfer root); create_dirs=True still creates folders; both
yield an identical path. Updated two reorganize-orchestrator test doubles to
accept the new kwarg. 148 reorganize/paths/retag/pipeline tests pass.
Does NOT fix the second half of #767 (Expanded Edition picked over the standard
album). That is NOT a reorganizer bug: the library album row was linked to the
deluxe release at enrichment time (its stored spotify_album_id/itunes_album_id/
deezer_id points at "Lenka (Expanded Edition)"), and the reorganizer faithfully
reorganizes to whatever the album is linked to. The real fix is in album
enrichment's edition preference — tracked separately.
Three compounding bugs hit tracks whose source metadata is YouTube/streaming-
shaped — title "Artist - Song", artist "Official Artist"/"Artist - Topic"/
"ArtistVEVO" (reported: "Arctic Monkeys - Do I Wanna Know?" by "Official Arctic
Monkeys"). Server-agnostic — affects Plex/Jellyfin/Navidrome, not just the
reporter's Navidrome.
Bug A — the match fails. The confidence scorer and the editor's reconcile both
compared the raw "Artist - Song" title against the library's clean "Song"; the
length-ratio penalty + floor drove it to ~0.18 (NO-MATCH), so the track showed
unmatched while its server copy showed as an orphan "extra". New pure
core/text/source_title.py (clean_source_artist / strip_artist_prefix /
canonical_source_track) strips the channel/video decoration, applied at BOTH
matching seams: services/sync_service._find_track_in_media_server (tries raw
then canonical, keeps the best) and the editor reconcile. Conservative: a title
prefix is stripped only when it equals the artist, so "Self-Titled", "Jay-Z",
and "Marvin Gaye" (by another artist) are untouched, and the canonical form is
an additional best-of candidate so it can only help.
Bug B — manual matches never persisted. get_server_playlist_tracks built the
per-source entry WITHOUT source_track_id, so "Find & add" posted an empty id
and _persist_find_and_add_match returned early. The match reverted to "extra"
on reload and re-adding looped. The editor's 3-pass matcher is now lifted to a
pure, tested core.sync.playlist_reconcile.reconcile_playlist that includes
source_track_id (the frontend at pages-extra.js:1836 already reads + sends it).
Bug C — manual match duplicated + delete wiped all copies. "Find & add" always
inserted, so linking a source to an already-present server track appended a
duplicate (pos 72, 73...); remove filtered out EVERY entry with the target id.
New pure core.sync.playlist_edit (plan_playlist_add: link-don't-duplicate when
the target is already present; remove_one_occurrence: drop a single copy) wired
into the Plex/Jellyfin/Navidrome add + remove branches.
Tests (extreme): tests/test_source_title.py (35), tests/test_playlist_reconcile.py
(11 — incl. the reported case, parity for override/exact/fuzzy/extra, and
duplicate-server handling), tests/test_playlist_edit.py (12). 286 matching/sync
tests still pass.
Caveats: the sync_service change and the add/remove/editor endpoints are
read-verified, not executed against a live media server (none in CI). The pure
cores they call are exhaustively unit-tested; output-shape parity of the
reconcile lift is covered. Delete removes the first matching copy (duplicates
are identical, so harmless).
Tracks NOT in the library were matched to a DIFFERENT song by the SAME artist
and reported with high confidence instead of as missing — e.g. "Dani
California" -> "Californication" (Red Hot Chili Peppers), "Under The Bridge"
-> "Around the World".
Root cause: _calculate_track_confidence scores 0.5*title + 0.5*artist. A
same-artist comparison always yields artist = 1.0, so the title score is the
only thing that can tell two of an artist's songs apart — but that score is a
SequenceMatcher CHARACTER ratio, which over-credits unrelated titles that
share a long substring ("californi…" = 0.67) or just a stopword ("the" =
0.62). With the flat 0.5 artist term, anything clearing the weak 0.6 char
floor lands at ~0.81-0.83, well over the 0.7 sync threshold. Reproduced on
dev: both reported pairs score 0.81/0.83.
Fix: new core/text/title_match.py:titles_plausibly_same, called in
_calculate_track_confidence right before the floor. It accepts a pair only
when it's near-identical char-wise (>=0.85, so typos / punctuation / casing
like "Beleive"->"Believe", "HUMBLE."->"Humble" still match) OR the titles
share at least one significant (non-stopword) word. Two different songs by the
same artist share no content word, so they're rejected and the real track is
correctly reported missing. ("the" is a stopword — that's what leaked "Under
The Bridge"/"Around the World".)
Scoped deliberately: the word-overlap test fires ONLY when at least one side
has 2+ content words. For single-word titles there is no other word to share,
so it defers to the existing char floor — otherwise legitimate stylized
spellings ("Grey"/"Gray", "Tonite"/"Tonight", "4ever"/"Forever") would become
new false-negatives. Verified those still match. The few single-word variants
that do score low (Ok/Okay, Thru/Through) were already rejected by the
pre-existing length-ratio penalty, not by this gate.
Both reported false positives now score 0.33/0.31 -> missing. Does NOT address
the harder case of two different same-artist songs that DO share a content
word (e.g. "Believe"/"Believer") — pre-existing and unworsened. Any residual
error fails safe: a false-missing is re-downloaded/wishlisted, vs the old
behavior which silently substituted the wrong song.
Tests: tests/test_title_match_guard.py (14) — pure-guard unit tests + a
13-pair battery driving the REAL _calculate_track_confidence (genuine matches
stay >=0.7, same-artist different songs drop below), plus an explicit
no-regression test for stylized single-word spellings. 292 matching/sync tests
pass.
The manual-import routes (album + singles) call post_process_matched_download
directly. When the pipeline quarantines a file — integrity / AcoustID / FLAC
bit-depth — or hits the race guard, it sets a context flag and RETURNS
NORMALLY (it only marks the task failed + notifies when there's a task_id,
which manual imports don't have). So the inner pipeline raised no exception,
and routes.py counted `processed += 1` for a file that had just been moved to
ss_quarantine, not the library. Result: the UI shows a green "Done" while the
track silently vanished — exactly the #764 report (Coldplay - Yellow.flac ->
ss_quarantine, but "Done").
The download path already handles this in
post_process_matched_download_with_verification (it reads the same flags and
marks the task failed); only the manual-import routes were missing the check.
Fix: new pure helper import_rejection_reason(context) returns a human-readable
reason for any terminal rejection (_integrity_failure_msg / _acoustid_quarantined
/ _bitdepth_rejected / _race_guard_failed) or None for a clean import. Both
manual-import routes now consult it: album_process reports the track in
`errors` instead of counting it processed; process_single_import_file returns
("error", reason) instead of ("ok", ...). Verified every move_to_quarantine
call site (4, all in pipeline.py) sets one of those flags, so no quarantine
path slips through. This also delivers the "direct display of the error" the
reporter asked for — the reason now surfaces in the response `errors` list.
Does NOT address the reverse symptom ("failed even though it moved correctly")
— not yet root-caused — nor the separate bit-depth hole on the download-path
wrapper.
Tests: tests/imports/test_import_rejection_reason.py (10) — each trigger
detected, falsy flags ignored, deterministic ordering, plus two route-level
tests driving the REAL process_single_import_file (quarantine -> "error";
clean -> "ok").
enhance_file_metadata rebuilds tags from scratch: for FLAC it calls
clear_pictures(), for MP3/MP4 it clears the whole tag block — and it does
this UP FRONT, then saves the file, long before it tries to fetch and embed
the replacement art. So every way the re-embed could come up empty left the
file saved with the original art destroyed and nothing put back:
- extract_source_metadata returns nothing -> early save, no embed
- no album-art URL / art download fails / rejected by the min-size guard
-> embed_album_art_metadata returns early without adding a picture
- art embedding disabled in config -> embed skipped entirely
- embed raises mid-enrichment -> file left cleared on disk
This is the "cover art gets corrupted/destroyed during import" half of #764
(continuation of #755); distinct from #750's truncated-cache DISPLAY bug.
Fix: new core/metadata/art_preservation.py snapshots the existing art
(the live Picture / APIC / MP4Cover objects, so they re-apply verbatim)
BEFORE the clear, and restores it before each save IFF the file currently
has none. Wired into all three exit paths in enhance_file_metadata
(no-metadata early return, the final save, and the except handler). The
restore is a strict no-op when art is already present, so the happy path —
new art embedded — is byte-for-byte unchanged: it never clobbers or
duplicates a freshly-embedded cover. embed_album_art_metadata now returns a
bool so the intent (embedded / didn't) is explicit.
Tests:
- tests/test_art_preservation.py (5) — snapshot/restore round-trips through
real mutagen FLAC + ID3 objects; restore no-ops when new art is present.
- tests/test_enrichment_art_preservation.py (4) — runs the REAL
enhance_file_metadata over a real FLAC with embedded art and asserts the
art survives on disk for missing-metadata / failed-embed / embed-raises,
and is correctly REPLACED (exactly one picture, new bytes) on success.
1019 tests pass across the metadata/enrichment/imports/acoustid suites.
The DB-update + deep-scan automation monitor used a hard 2-hour TOTAL cap
(while elapsed < 7200). It tracked progress but only used it to print a stall
warning — the only thing that actually timed out was wall-clock. So a large
library that scans for >2h while progressing fine (reported: 4781 artists) trips
the cap and the automation card flips to 'error: timed out after 2 hours' even
though the scan thread is healthy and still running (the timeout never cancels
it, which is why it keeps progressing in the logs after the 'error').
Time out on STALL, not total runtime:
- 30 min with NO progress -> error ('stalled'); catches a genuinely hung scan.
- 10 min idle -> warning (repeats); unchanged heads-up.
- 24h absolute backstop, purely a runaway-loop guard.
- An actively-progressing scan keeps resetting the idle clock, so it never
times out no matter how many hours the whole library takes.
- Progress is judged on (processed, progress, current_item) so a slow stretch
where the rounded % holds steady (but the artist keeps changing) isn't a
false stall.
The decision is extracted into a pure, testable scan_wait_action(); both the
deep-scan and full-refresh handlers share the monitor loop, so both are fixed.
Tests: tests/test_scan_wait_action.py (9) — headline regression (5h/12h total
but progressing -> 'continue', not timeout), finished/stall-warn/stall-timeout/
abs-cap thresholds, and ordering. 280 automation tests still pass.
Defense-in-depth follow-up to #760. Even with the entrypoint chown fix, if the
album-bundle staging dir ever can't be created/written (permissions, read-only
mount, disk full), the dispatch caught the plugin exception and marked the whole
batch failed — even though the album had already downloaded (the #715 symptom:
'release finishes downloading but the batch fails').
Now an OSError from the plugin is flagged fallback-eligible, so the dispatch
returns to the per-track flow instead of hard-failing. OSError covers the
staging/filesystem failure that motivated this (#760's PermissionError) and, by
Python's IOError==OSError aliasing, any propagated transient I/O error —
falling back is never worse than hard-failing, and per-track is the universal
graceful path. Programming errors (TypeError, KeyError, RuntimeError, …) are
NOT OSError and stay terminal, so genuine bugs still fail loudly — the existing
'plugin exception => failure' contract and its test are preserved.
Test: new test_dispatch_staging_oserror_falls_back_to_per_track (PermissionError
on the staging dir -> result False, phase 'analysis', not failed). Existing
RuntimeError-is-terminal test still passes. 131 album-bundle/plugin tests green.
After an update, installs became unusable: the Amazon enrichment worker runs by
default, the default public T2Tunes proxy (t2tunes.site) was returning
503 'Amazon Music API is not initialized', and the worker treated every album
as an individual error -- logging an ERROR per item, churning network + DB
continuously across the whole library, and marking every row 'error' (a state
the retry tiers never re-attempt, so even after the proxy recovered nothing
re-enriched). The reporter couldn't reach the UI to turn it off.
Two-part fix:
1. Source-outage circuit breaker (core/amazon_outage.py, pure + tested):
- is_source_outage(exc) distinguishes a whole-source outage (HTTP 5xx,
'not initialized', connection failure, non-JSON error page) from a real
per-item miss (404, transient 400, etc.).
- On an outage the worker now leaves the item UNTOUCHED (so it's retried once
the proxy recovers instead of being permanently burned to 'error'), logs
ONCE per streak, and backs off with next_poll_delay_seconds() -- escalating
30s -> 60s -> ... capped at 30 min -- instead of grinding every 2s. It
auto-resumes the normal cadence the moment the source answers (success OR a
non-outage error both clear the streak).
- AmazonClientError now carries status_code so detection doesn't rely on
message parsing.
2. Opt-in by default (web_server.py): amazon_enrichment_paused now defaults to
True. Because enrichment depends on an external public proxy that can be
down, it stays paused unless the user explicitly enables it -- a proxy outage
can no longer take down installs that never opted in. (Behaviour change:
anyone on the old auto-on default is now paused; re-enable in Settings.)
Together: on update the worker is paused -> no flood -> UI accessible; opted-in
users are protected from future outages by the breaker.
Tests: tests/test_amazon_outage.py (12) pin the classifier across every error
surface (incl. the exact 503 'not initialized' case) and the back-off schedule
(monotonic, capped). 157 Amazon tests pass; lint clean.
Note: could not reproduce the exact 'UI fully unreachable' mechanism remotely
(WAL + 8 gthreads shouldn't hard-lock); the fix removes the flood/churn that is
the practical cause and defaults the feature off.
The Duplicate Detector's 'Keep Best' auto-selection ranked copies by highest
bitrate -> duration -> track number, with no notion of format. A FLAC whose
bitrate the library scan never populated (a common gap) therefore lost to a
282 kbps MP3: 282 > 0, so the MP3 was kept and the FLAC deleted (reported on
Havok 'Prepare For Attack', and again on Kendrick GNX).
Fix: rank by format/lossless tier FIRST, then bitrate, duration, track number.
A lossless file now always beats a lossy one regardless of the recorded
bitrate; bitrate/duration/track# only break ties within the same format.
- core/library/duplicate_keep.py (new): pure, importable pick_duplicate_to_keep
+ duplicate_keep_sort_key + format_rank_for_path (extension rank mirroring
auto_import_worker._quality_rank: flac=10 ... mp3=5 ... unknown=1).
- core/repair_worker.py: _fix_duplicates auto-pick now calls
pick_duplicate_to_keep instead of the bitrate-first max().
- webui/static/enrichment.js: the KEEP/REMOVE recommendation mirrors the same
format-first ranking so the badge matches what the backend will delete.
Parity: Python uses '.ext' keys (os.path.splitext), JS uses 'ext'
(split('.').pop()) -> identical results; both keep the first copy on a full
tie. Verified the only other dedup path (the standalone Duplicate Cleaner
automation, core/library/duplicate_cleaner.py) was already format-priority-first
and correct -- no change needed there.
Tests: tests/test_duplicate_keep.py (11 -- incl. the exact FLAC-with-missing-
bitrate vs 282 kbps MP3 case, format ranking, within-format tie-breakers, and
edge cases). 147 repair/duplicate tests still pass.
Note: why FLAC bitrate is NULL in the DB is a separate library-scan gap;
format-first ranking makes the keep decision correct regardless.
Follow-up to the preferred-art feature. Real test runs showed a source could
win on priority while handing back a small cover: Cover Art Archive is
volunteer-uploaded with no size floor, so CAA-first gave a 599x531 (Taylor
Swift) and a 600x600 (Kendrick GNX) -- front-1200 only caps the max, so a
~600px upload stays ~600px -- and Deezer/iTunes lower in the order never got a
turn.
Fix:
- Minimum-resolution guard: artwork._min_size_art_validator builds the
resolver's validate hook -- it fetches each candidate, caches the bytes (so
the winner isn't fetched twice), and accepts art only when its shortest side
>= metadata_enhancement.min_art_size (default 1000px; 0 disables). Art that's
too small is a miss, so the resolver falls through to the next source instead
of winning on priority. Unmeasurable images are accepted (don't over-reject;
fallback is still today's art). Wired into both embed_album_art_metadata and
download_cover_art.
- iTunes art upgraded to /3000x3000bb/ (was the 600px default) so it
contributes high-res when it wins.
- select_preferred_art_url gains a validate passthrough to the resolver.
- config default metadata_enhancement.min_art_size: 1000.
Effect: with an order like caa > deezer > spotify > itunes, a ~600px CAA upload
is now skipped and Deezer's ~1900px wins -- consistent big art. (Spotify art
often maxes ~640px, so it's skipped at the 1000 floor in favor of bigger
sources; lower min_art_size to ~640 to allow it.)
Tests: tests/metadata/test_art_min_size.py (6 -- incl. the real 599x531 and
600x600 cases, shortest-side logic, unmeasurable-accept, no-bytes-reject,
0-disables) + iTunes max-res upgrade test. Full metadata suite green (617).