mirror of https://github.com/Nezreka/SoulSync.git
main
dev
video
fix/disable-beatport-features
johnbaumb-discover-redesign
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
2.4.0
2.4.1
2.4.2
2.5.0
2.5.1
2.5.2
2.5.3
2.5.4
2.5.5
2.5.6
2.5.7
2.5.9
2.6.0
2.6.1
2.6.2
2.6.3
2.6.4
2.6.5
2.6.6
2.6.7
2.6.8
2.6.9
2.7.0
2.7.1
2.7.2
2.7.3
2.7.4
2.7.5
2.7.6
v0.65
${ noResults }
80 Commits (adbdda7b0eeecaaabce5f5b2fa52c026ff84400a)
| Author | SHA1 | Message | Date |
|---|---|---|---|
|
|
3ea9da1cba |
Tagging: write source IDs too (Write Tags button + library re-tag now complete)
write_tags_to_file wrote the core fields + cover but never the source IDs (Spotify/iTunes/MusicBrainz) the import post-process embeds. Added a focused source.embed_known_source_ids() that writes ALREADY-KNOWN ids (from db_data) via the canonical, Picard-compatible frame writer the import uses (_write_embedded_metadata) — no API re-fetch, frames correct by construction. write_tags_to_file now calls it whenever db_data carries id keys. Fed from both paths: the enhanced-library 'Write Tags' button now carries the track's spotify/itunes/musicbrainz ids, and the Library Re-tag job stamps the matched album/track source ids onto each track. So both now write the full tag set, not a subset. |
3 weeks ago |
|
|
405b0988d6 |
Cover Art Filler: skip files that already have art (keep apply purely additive)
Verification found a non-additive edge: embed_album_art_metadata uses FLAC add_picture(), which APPENDS — so applying to an album where some tracks already had art would have added a duplicate embedded picture. The apply now checks each file and skips any that already carry art (shared _audio_has_art helper), so it only ever ADDS art to files missing it. Test covers the skip (no re-embed). |
3 weeks ago |
|
|
33965c7cbd |
Cover Art Filler: detect missing art ON DISK + actually write it to files
Previously the filler only flagged albums whose DB thumb_url was empty and, on apply, only updated that DB thumb_url — so albums whose files had no embedded art and no cover.jpg (but whose DB row had a URL) were never found, and even 'applying' art never touched the files. That's the reported 'doesn't scan all albums' gap. New core.metadata.art_apply (reuses the post-processing standard so the user's album_art_order is honored): - album_has_art_on_disk(): cheap-first check — folder cover.jpg/folder.jpg sidecar, then embedded art in a representative track (FLAC/ID3/MP4/Vorbis). - apply_art_to_album_files(): embeds via embed_album_art_metadata + writes cover.jpg via download_cover_art; only ADDS art (never rewrites the user's tags); read-only/unwritable files are skipped + counted, never crash. Scan now examines every titled album and flags it when art is missing in the DB OR on disk. Apply embeds into the album's audio files + writes cover.jpg in addition to the DB thumbnail (media-server-only albums fall back to DB-only). Tests cover sidecar/embedded detection, the cheap-first short-circuit, and the apply orchestration (embeds each file + cover.jpg; read-only failures counted). |
3 weeks ago |
|
|
f883e99feb |
Fix: MusicMap 404s miscounted as errors in similar-artists worker
The worker's WARNING observability proved the '38 errors' were almost all MusicMap returning 404 (artist has no map page) — a genuine not-found, not a fetch failure. But iter_musicmap_similar_artist_events flattened every RequestException to status_code 502, and the worker maps 400/404 -> not_found / everything-else -> error, so these inflated the error count. Surface the real HTTP status from the exception's response (404 stays 404), falling back to 502 only when there's no response (timeout/connection drop, which is correctly still an error eligible for retry). Regression tests: 404 -> 404 (not_found), timeout -> 502 (error), 500 stays error, plus an end-to-end worker check that a 404 result marks 'not_found' and stores nothing. |
3 weeks ago |
|
|
eaf74732f9 |
Canonical: fix ruff lint (B023 loop-bound lambda, S110 bare except-pass)
- B023: default_fetch_tracklist built a per-item lambda closing over the loop variable `it`. Replaced with a module-level _item_get(item, key, default) helper (takes the item as a param — no closure). Behavior unchanged; the dict/object normalization test still passes. - S110: the two best-effort guards in the canonical job (skip-already-pinned read, estimate_scope active-server read) now carry `# noqa: S110 — <reason>`, matching the repo's existing convention for intentional swallow-and-continue. ruff check passes on all canonical files + tests; 30 affected tests green. |
3 weeks ago |
|
|
2fcdfd3145 |
Canonical findings: include as much (free) data as possible
Per request, pack each finding with everything available WITHOUT extra API calls (kettui: reuse what's already fetched, read the album row we already loaded, degrade per-field, keep it tested): - Pinned release's track titles — already fetched during scoring, so free (capped at 60 to bound details_json). - From the album row (free): year, DB track count, total duration, genres-free context, and the album's currently-linked source IDs. - file_track_titles (your library's titles) for a side-by-side with the release. - Artist + album thumbs (artist via the guarded lookup) and names. _describe_pin now renders: "Artist — Album (year)", the fit breakdown, "Currently linked: … → pinning X", "Beat: <alternatives>", and the release tracklist — so the card is judge-able at a glance, and the structured fields are in details for a richer UI. NOT included (would cost an extra per-album API fetch, left as opt-in): the *release's* own year/type/cover/URL from get_album_for_source, vs the library's. Tests: _describe_pin rich-render (year/linked/tracklist), resolver release-titles, orchestration free-context fields. 94 canonical + reorganize regression pass. |
3 weeks ago |
|
|
03d099fb1d |
Canonical findings: add artist image (guarded, schema-safe)
Findings now carry artist_thumb_url alongside album_thumb_url (same key the track-repair findings use, so the findings UI already renders it). Fetched via a guarded _lookup_artist_thumb() — checks the artists table has a thumb_url column first and swallows any error — rather than adding ar.thumb_url to the shared load_album_and_tracks SELECT. The shared-loader approach was tried first and REVERTED: it crashed reorganize on schemas whose artists table has no thumb_url column (caught by 40 orchestrator tests). The lookup only runs for albums that actually resolve, so it adds no cost to the no-source-id short-circuit majority. Tests: orchestration test asserts artist_name + album_thumb_url + artist_thumb_url flow through. 47 canonical + 104 canonical/reorganize regression tests pass. |
3 weeks ago |
|
|
ec8091caad |
Canonical: richer, judge-able findings (the why behind a pin)
Live-run feedback: "Best-fit release: deezer (665666731), score 1.0" is too thin
to trust/accept. Each finding now explains WHY:
- score_release_detail() exposes the per-signal breakdown (count/duration/title)
instead of just the blended score.
- resolve_canonical_for_album returns an enriched result: the breakdown,
file_track_count vs release_track_count, and a `candidates` list of every
source it scored (so a finding can show what the winner beat).
- resolve_and_store adds album/artist/thumb context from the row it already
loaded (no extra query). Storage still only reads source/album_id/score.
- The job builds a real description via _describe_pin(), e.g.:
"Pin deezer release 665666731 (confidence 100%).
Fit to your library: 11 files vs 11 tracks on this release — track count
100%, durations 100%, titles 100%.
Beat: spotify 65% (17 tk)."
and a clearer title ("Pin deezer as canonical: <artist> — <album>").
Tests: resolver enrichment (breakdown + candidate comparison fields), and
_describe_pin (judge-able text incl. the beaten alternatives, and honest "n/a"
for a missing signal). 42 canonical tests pass.
Note: the description string carries the judge-able info regardless of UI; how
the findings tab renders the extra details keys (thumb image, candidates table)
is still UI-dependent and unverified.
|
3 weeks ago |
|
|
57e039e34d |
Canonical: make source selection a job setting (default active-preferred)
Feedback from the live dry-run: the job was pinning whichever source best fit the files regardless of which source it was, which was surprising — users expect it to respect their active metadata source. Made it a per-job setting instead of a baked-in policy. source_selection (default 'active_preferred'): - active_preferred — use the active/primary metadata source's release when the album has an ID for it AND it clears the score floor; otherwise fall back to the best-fit among the other sources. Respects the configured source but self-heals when that link is clearly broken (below floor / no ID). - active_only — only ever the active source; never considers others. - best_fit — previous behavior: whichever source matches the files best. resolve_canonical_for_album gains mode + primary_source; the orchestration threads the primary source through; the job reads source_selection from its settings. Note: active_preferred respects the active source as long as it clears the floor, so it will NOT override a deluxe-vs-standard mismatch on the primary (#767-Bug2) — that's what best_fit is for; the choice is now the user's. Tests: per-mode coverage in test_canonical_resolver.py (active_preferred uses primary when it fits, falls back when primary is below floor, keeps primary even when another fits better; active_only pins primary / never falls back; best_fit unchanged), orchestration default-mode test, and the setting default. 39 canonical tests pass. |
3 weeks ago |
|
|
f9271c0cd8 |
Canonical album version — backfill job (the opt-in activation)
The populate trigger that turns the (until now dormant) feature on. Until a user enables and runs this job, no album has a canonical -> both read sides (Stages 3-4) fall back -> zero behavior change. So the whole feature ships safely off. - core/repair_jobs/canonical_version_resolve.py — "Resolve Canonical Album Versions". Iterates the active server's albums, skips ones already pinned, and calls the tested resolve_and_store_canonical_for_album per album. Opt-in (default_enabled=False) and dry-run-by-default: resolving compares an album's candidate releases across sources (metadata-source API calls, once per album), so it's deliberately user-triggered. Dry run reports a finding per album it would pin; live mode stores. Registered in _JOB_MODULES. - core/metadata/canonical_resolver.py — resolve_and_store gains store=True; the job's dry run passes store=False to resolve-without-writing. Tests: tests/test_canonical_version_job.py (6) — registered, opt-in + dry-run defaults, live resolves+stores (auto_fixed), dry run creates findings without persisting, already-pinned albums skipped. Registry loads all 19 jobs cleanly. 145 tests across the full feature + reorganize/track-repair/DB regression pass. |
3 weeks ago |
|
|
43878b4d3d |
Canonical album version — Stage 2 (trigger): resolve+store orchestration
Completes Stage 2's populate path. Still dormant — no consumer calls it yet.
- resolve_and_store_canonical_for_album(db, album_id, ...): loads the album's
source IDs + its tracks' (duration_ms, title) from the DB via the SAME
loader the Reorganizer uses (load_album_and_tracks + _extract_source_ids), so
the canonical is chosen over exactly the source IDs the reorganizer sees;
scores off the DB track rows (the library's view of the files — no per-file
disk reads), resolves the best fit, and persists it. Returns the stored result
or None when unresolved.
- default_fetch_tracklist(): production fetcher wrapping
get_album_tracks_for_source, normalising to {title, track_number, duration_ms}
(duration best-effort; sec->ms; absent -> scorer leans on count+title).
Design note: chose LAZY resolution (Stages 3-4 consumers call this when they hit
an album with no canonical) over a standalone backfill repair job — no new
scheduling/UI surface, resolves only when a tool actually needs it, and stays
gated (NULL canonical = today's behavior).
Tests: tests/test_canonical_orchestration.py (5) — end-to-end on a real temp DB
(11 files pick the 11-track release over a 17-track deluxe and persist it),
no-source-ids -> None, missing-album -> None, and default_fetch_tracklist
normalization (dict items, seconds->ms) + failure -> None. All canonical +
DB-migration tests green.
|
3 weeks ago |
|
|
f37bc34082 |
Canonical album version — Stage 2 (core): resolver + persistence (dormant)
Turns the Stage-1 scorer into an end-to-end resolver + persists the result.
Still DORMANT — no consumer reads it yet, so zero behavior change.
- core/metadata/canonical_resolver.py — resolve_canonical_for_album(): builds
candidate releases from the album's per-source IDs (in source-priority order),
fetches each tracklist via an INJECTED fetch_tracklist (so it's unit-testable
without live APIs), scores them with pick_canonical_release, and returns the
best-fit {source, album_id, score}. Skips sources with no id / failed fetch;
returns None when there are no files, no candidates, or nothing clears the
confidence floor.
- database/music_database.py — set_album_canonical() / get_album_canonical()
write/read the Stage-1 columns. get returns None when unresolved, which every
consumer will treat as "fall back to today's behavior".
Tests: tests/test_canonical_resolver.py (7) — best-fit beats priority, priority
breaks true ties, skips missing-id/failed-fetch sources, None on
no-candidates/no-files/below-floor, score rounding. tests/test_canonical_db.py
(4) — set/get round-trip incl. timestamp, unresolved -> None, overwrite,
missing-album -> False. 34 canonical + DB-migration tests pass.
Remaining for Stage 2 (the trigger): read on-disk file durations/titles for an
album, gather its source IDs, call the resolver, store — wired via a backfill
repair job + an enrichment hook. Then Stages 3-4 wire the Reorganizer and Track
Number Repair to READ the pinned canonical.
|
3 weeks ago |
|
|
818c4f0bff |
Canonical album version — Stage 1: schema + pure scorer (dormant)
First stage of the canonical-album-version fix (#765 + #767-Bug2). Pins ONE canonical (source, album_id) per album, chosen by best-fit to the user's actual files, so the Reorganizer, Track Number Repair, and tagging stop re-resolving independently and contradicting each other. Ships DORMANT — nothing reads or writes the new data yet, so zero behavior change. Later stages populate (Stage 2) and consume (Stages 3-4) it. - core/metadata/canonical_version.py — pure scorer (the testable heart): score_release_against_files() rates a candidate release by track-count fit + duration alignment (greedy nearest within ±3s) + title overlap, dropping and renormalizing missing signals so it never crashes on sparse metadata. pick_canonical_release() takes candidates in source-priority order, picks the best fit, breaks ties toward the earlier (higher-priority) candidate so the choice is DETERMINISTIC — that determinism is what makes every tool agree (#765), while count/duration fit picks the right EDITION (#767-Bug2). A confidence floor (default 0.5) means a low-confidence guess is never pinned. - database/music_database.py — additive, nullable columns on albums (canonical_source / canonical_album_id / canonical_score / canonical_resolved_at), guarded by the existing PRAGMA-table_info pattern. NULL = unresolved = every consumer falls back to today's behavior. Tests: tests/test_canonical_version.py (11) — edition discrimination (11 files -> standard, 17 -> deluxe), deterministic priority tiebreak, duration disambiguation on count ties, graceful degradation (no durations / counts only / fuzzy titles), confidence floor, empty-input safety. tests/test_canonical_ columns_migration.py (4) — fresh DB has the columns, they're nullable w/ NULL default, migration is idempotent, and it ALTERs them onto an old albums table. 60 DB/schema regression tests still pass. |
3 weeks ago |
|
|
3dfec8a157 |
Fix #764: import no longer destroys embedded cover art
enhance_file_metadata rebuilds tags from scratch: for FLAC it calls
clear_pictures(), for MP3/MP4 it clears the whole tag block — and it does
this UP FRONT, then saves the file, long before it tries to fetch and embed
the replacement art. So every way the re-embed could come up empty left the
file saved with the original art destroyed and nothing put back:
- extract_source_metadata returns nothing -> early save, no embed
- no album-art URL / art download fails / rejected by the min-size guard
-> embed_album_art_metadata returns early without adding a picture
- art embedding disabled in config -> embed skipped entirely
- embed raises mid-enrichment -> file left cleared on disk
This is the "cover art gets corrupted/destroyed during import" half of #764
(continuation of #755); distinct from #750's truncated-cache DISPLAY bug.
Fix: new core/metadata/art_preservation.py snapshots the existing art
(the live Picture / APIC / MP4Cover objects, so they re-apply verbatim)
BEFORE the clear, and restores it before each save IFF the file currently
has none. Wired into all three exit paths in enhance_file_metadata
(no-metadata early return, the final save, and the except handler). The
restore is a strict no-op when art is already present, so the happy path —
new art embedded — is byte-for-byte unchanged: it never clobbers or
duplicates a freshly-embedded cover. embed_album_art_metadata now returns a
bool so the intent (embedded / didn't) is explicit.
Tests:
- tests/test_art_preservation.py (5) — snapshot/restore round-trips through
real mutagen FLAC + ID3 objects; restore no-ops when new art is present.
- tests/test_enrichment_art_preservation.py (4) — runs the REAL
enhance_file_metadata over a real FLAC with embedded art and asserts the
art survives on disk for missing-metadata / failed-embed / embed-raises,
and is correctly REPLACED (exactly one picture, new bytes) on success.
1019 tests pass across the metadata/enrichment/imports/acoustid suites.
|
3 weeks ago |
|
|
b202c176f7 |
Cover-art sources: skip low-res art (min-resolution guard) + max-res iTunes
Follow-up to the preferred-art feature. Real test runs showed a source could win on priority while handing back a small cover: Cover Art Archive is volunteer-uploaded with no size floor, so CAA-first gave a 599x531 (Taylor Swift) and a 600x600 (Kendrick GNX) -- front-1200 only caps the max, so a ~600px upload stays ~600px -- and Deezer/iTunes lower in the order never got a turn. Fix: - Minimum-resolution guard: artwork._min_size_art_validator builds the resolver's validate hook -- it fetches each candidate, caches the bytes (so the winner isn't fetched twice), and accepts art only when its shortest side >= metadata_enhancement.min_art_size (default 1000px; 0 disables). Art that's too small is a miss, so the resolver falls through to the next source instead of winning on priority. Unmeasurable images are accepted (don't over-reject; fallback is still today's art). Wired into both embed_album_art_metadata and download_cover_art. - iTunes art upgraded to /3000x3000bb/ (was the 600px default) so it contributes high-res when it wins. - select_preferred_art_url gains a validate passthrough to the resolver. - config default metadata_enhancement.min_art_size: 1000. Effect: with an order like caa > deezer > spotify > itunes, a ~600px CAA upload is now skipped and Deezer's ~1900px wins -- consistent big art. (Spotify art often maxes ~640px, so it's skipped at the 1000 floor in favor of bigger sources; lower min_art_size to ~640 to allow it.) Tests: tests/metadata/test_art_min_size.py (6 -- incl. the real 599x531 and 600x600 cases, shortest-side logic, unmeasurable-accept, no-bytes-reject, 0-disables) + iTunes max-res upgrade test. Full metadata suite green (617). |
3 weeks ago |
|
|
6bc2836f47 |
Feature: preferred album-art source selection (opt-in, ordered, with fallback)
Lets users pick which providers' cover art to use and in what priority, generalizing the single prefer_caa_art toggle into an ordered, mix-and-match list (Sokhi's request). Fully opt-in: default album_art_order is [], so every existing install is byte-for-byte unchanged until the user enables sources. How it works: - Per album, walk the user's ordered sources top-to-bottom; the first source that actually has THIS album's cover wins. A miss falls through to the next; if all miss, the download's own art is kept (today's default). The worst case is always exactly the cover you'd get today -- never wrong art, never an error into the download. - Connection-gated: a source is only tried when the user is connected to it (free sources CAA/Deezer/iTunes/AudioDB always; Spotify only when authenticated). Tidal/Qobuz/HiFi deferred (cover-URL construction + no clean core accessor -- not shipping unverified extraction). - Album-match validated: a source's art is used only when the album it returns matches the requested artist+album (significant-token subset, tolerant of Deluxe/Remastered/articles/feat./multi-artist). A loose top search hit for a different record is treated as a miss -> guarantees no wrong-album art. - The list supersedes the legacy prefer_caa_art toggle: when album_art_order is non-empty it is the sole authority (add 'caa' to the list to use Cover Art Archive), and prefer_caa_art is neutralized for both the embedded-tag art and cover.jpg paths. With an empty list, prefer_caa_art behaves exactly as before. Implementation: - core/metadata/art_sources.py: pure resolver -- effective_art_order (config + legacy back-compat) and resolve_cover_art (ordered walk + fallback, exception-safe per source). No network/config/DB; fully unit-testable. - core/metadata/art_lookup.py: availability gating, per-source lookups against existing clients (Deezer/iTunes/AudioDB/Spotify search + CAA via MBID), album-match validation, per-album caching, and select_preferred_art_url -- the single gate the pipeline calls (no-op unless an explicit list is set). - core/metadata/artwork.py: wired into embed_album_art_metadata and download_cover_art, gated so no configured list == current behavior. - web_server.py: GET /api/metadata/art-sources (connected sources only). - config/settings.py: default album_art_order: []. - webui (index.html + settings.js): reorderable list in Core Features reusing the hybrid-source-list pattern + real service logos (with emoji fallback); load/save wired through the existing metadata_enhancement settings flow. loadArtSourceOrder populates the saved order synchronously (filtered to known sources, not availability) so a save before the availability fetch resolves, or a temporarily-disconnected source, can never wipe the saved order. Tests: 40 unit/seam tests (resolver ordering/fallback/back-compat, availability, per-source extraction, album-match validation incl. wrong-album/wrong-artist rejection, caching, exception-safety, the off-by-default gate). Full metadata suite still green (610 passed) -- the gated integration changes nothing when no list is configured. Note: the settings UI (DOM-heavy, not unit-testable in the JS harness) and the live per-source art-fetch quality are validated by manual testing. |
3 weeks ago |
|
|
bb2241498f |
Metadata cache: hard LRU row cap to stop unbounded growth (7.6GB incident)
Investigation (not assumption): the cache's TTL eviction + junk cleanup ARE correct and DO run automatically every 6h (CacheEvictorJob, auto_fix=True). The real gap is there's NO SIZE CEILING — TTL-only eviction means 'how big can it get' = 'however much you fetch within the 30-day window', so heavy discovery/enrichment legitimately grew metadata_cache_entities to ~1.8M rows / 7.6 GB, bloating the main DB (a factor in the corruption incident). Fix — add a bounded LRU cap: - entities_to_evict_for_capacity(total, max_rows): pure decision fn (cap<=0 disables), unit-testable like core.db_integrity.prune_backups. - MetadataCache.evict_over_capacity(): deletes the least-recently-ACCESSED rows (uses the already-stored last_accessed_at; NULL = never-touched = evicted first) down to the ceiling. Default 250k rows, tunable. - Wired as Phase 5 of CacheEvictorJob — runs LAST, after TTL/junk/orphan/null cleanup, so it only trims a still-oversized HEALTHY cache. Verified safe to bound/wipe: audited every cache reader (get_entity/ get_entities_batch/get_search_results/get_entity_detail/browse) — all degrade to None/[]/empty on miss, treated as 'go fetch'. Nothing depends on a row existing, so eviction can't break callers. Tests: tests/metadata/test_cache_capacity_eviction.py (8) — pure-fn coverage + real temp-DB proof that it drops the LRU rows specifically (not arbitrary) and NULL-access rows go first. 18 adjacent cache tests still green; ruff clean. Follow-ups (separate phases, scoped): (2) move the cache to its own bounded metadata_cache.sqlite3 (no JOINs to library tables — confirmed clean to split; invalidate-and-rebuild rather than migrate the 7.6GB), (3) kill the raw_json + 22-extracted-column double storage. |
3 weeks ago |
|
|
560156abee |
Fix: import overwrites album-artist tag to "Unknown Artist" (#735)
Reported by CubeComming: importing media keeps the track artist correct
(e.g. Billie Eilish) but changes the album-artist tag ("Albuminterpret") to
"Unknown Artist", breaking grouping in the media server.
Cause: in extract_source_metadata (core/metadata/source.py), album_artist is
seeded from the resolved track artist, then overridden by the album CONTEXT's
first artist. When the album lookup comes back unresolved, that first artist is
the literal "Unknown Artist" placeholder — which is truthy, so it clobbered the
real artist.
Fix: treat "Unknown Artist" (and empty) as a non-value — only let the album
context override the album_artist when it names a real artist. A genuine album
artist (e.g. "Various Artists") still overrides as before.
Tests: tests/metadata/test_album_artist_unknown.py — placeholder doesn't
clobber, real album artist still used, no-album-context falls back to track
artist, empty doesn't clobber. (Pre-existing test_album_mbid_cache.py failures
are an unrelated sandbox DB disk-I/O issue.)
|
4 weeks ago |
|
|
abdea631a7 |
HiFi/MB cover art: use CAA 1200px thumbnail, not the flaky /front original
Follow-up to the album-art resolution fix. That change upgraded MusicBrainz
Cover Art Archive thumbnails (/front-250) to the bare /front original — but
/front redirects to archive.org, which is unreliable: probing release-group
covers showed intermittent HTTP 500s (same URL 500s one second, serves the
next) and multi-MB originals (2.9 MB seen). The result was the user-reported
flakiness: cover art that "sometimes works, sometimes shows nothing", and a
huge image embedded into every track when it did work.
The sized thumbnails (/front-250, -500, -1200) are served by CAA's own CDN,
not the archive.org redirect — which is why /front-250 (240p) was always
reliable. Upgrade to /front-1200 instead: 1200x1200 is a massive jump from
240p, reliably CDN-served, and a sane ~40 KB instead of multi-MB.
Applied in all three CAA spots for consistency: the _upgrade_art_url helper
(embed + cover.jpg paths) and both prefer_caa ("CCA") blocks, which fetched
the bare /front directly with no fallback — so CCA-on users hit the same
flakiness. _fetch_art_bytes still falls back to the original /front-250 if
/front-1200 is ever refused.
Tests updated to assert the 1200px target, idempotency, and that the bare
/front original is intentionally left untouched.
|
4 weeks ago |
|
|
c9ad4f496f |
Embed highest-resolution album art across all art paths
User report: embedded album art came out ~600x600 while the cover.jpg in
the folder was high-res. The cover.jpg path upgraded the source CDN URL
to its highest resolution, but the tag-embed path fetched the raw URL —
so iTunes art embedded at its 600x600 default, Spotify at 640, Deezer at
1000. The "Write Tags to File" retag path had the same gap (Deezer-only
upgrade), and MusicBrainz art was worse still: every Cover Art Archive
URL is built as the /front-250 thumbnail, so MB-sourced downloads
embedded 250x250.
Factor the resolution upgrade + fetch into two shared helpers in
core/metadata/artwork.py and route every art path through them:
_upgrade_art_url(url) — bump to the source's highest resolution:
- Spotify (i.scdn.co) -> original master (~2000px+)
- iTunes (mzstatic.com) -> 3000x3000
- Deezer (dzcdn) -> 1900x1900
- Cover Art Archive -> /front original (was /front-250)
_fetch_art_bytes(url) — upgrade, fetch, and fall back once to the
original size if the CDN refuses the larger one (non-regressive).
Now consistent across: embed-into-tags (post-process), folder cover.jpg
(post-process), and the enhanced-library "Write Tags to File" retag flow.
The YouTube path already upgraded via Album.from_spotify_album, unchanged.
De-duplicates the per-source upgrade code that was copied across sites
and drops the now-unused urllib import from tag_writer.
Not covered (follow-up): Last.fm / Amazon / Tidal / Qobuz have no
explicit upgrade yet — some already serve full-res, others may hand over
a capped size that passes through unchanged.
Tests: new tests/metadata/test_artwork_resolution.py pins every upgrade
(Spotify 300/640->master, iTunes 100/600->3000, Deezer->1900, CAA
thumbnail->original, unrecognized/empty unchanged) and the fetch
fallback. Updated the two tag_writer fallback tests to patch the network
at its new home in artwork.
|
4 weeks ago |
|
|
6125ef8834 |
MB rerank: prefer_known_duration is now a score boost, not a tiebreaker
Live smoke against `/api/musicbrainz/search_tracks?track=Coffee+Break&artist=Zeds+Dead` exposed the edge case the tiebreaker implementation couldn't reach: The canonical Zeds Dead "Coffee Break" recording (mbid 6e2d4a70, length 184000ms) lives on the Coffee Break Single release — album_type='single', which carries a 0.85 album_type_weight in `score_track`. A sibling length-less recording (mbid 3b89bf3c) lives on an Album release — album_type='album', weight 1.0. After multiplying by EXACT_ARTIST_BOOST the canonical sat at 1.275 while the length-less sibling sat at 1.5. The previous tiebreaker only kicked in on equal scores, so the length-less album edition wins and the user sees 0:00 first instead of the actionable 3:04 row. Bug reproduced: ordering came out length-less / canonical / Omar-LinX-collab. Switched `prefer_known_duration` to a 1.25x score boost on recordings with non-zero duration_ms. The multiplier is sized above the album-vs-single weight spread (0.176) so length-known recordings can overcome an album-type penalty when scores would otherwise tie on title + artist match, but stays small enough that cover/karaoke penalty (0.05) and variant-tag penalty (0.85) still dominate — a length-known tribute still loses to a length-less canonical. Post-fix live response: 6e2d4a70 (canonical, 184000ms) sits first, 8ec2ce3f (Zeds Dead + Omar LinX collab, 153000ms) second, 3b89bf3c (length-less album edition) third. Verified Björk diacritic fallback path unaffected — `Bjork` + `Army of Me` still cascades strict-empty → bare and returns all 10 Björk recordings. 122 metadata tests pass — the three `prefer_known_duration` cases were designed to pin behaviour, not the specific multiplier value, so they all still pass under the boost implementation: ties promote length-known, relevance still beats length-pref, default-off behaviour unchanged. |
4 weeks ago |
|
|
8dbbf13c61 |
Branch cleanup: lift manual-match helpers, fix length-pref ordering, profile-scope view toggle
Self-review pass on the prior three commits — kettui-style cleanup
that should have landed first time.
**Length-preference sort ordering (real bug):**
The `search_tracks_with_artist` stable sort that promoted length-known
recordings ran in `core/musicbrainz_search.py`, but the MB endpoint in
`web_server.py:search_musicbrainz_tracks` runs `rerank_tracks` after
it — which re-sorts by relevance score and dropped the length-pref
ordering down to tiebreaker-only. For canonical-same-song MB duplicates
that all score identically the tiebreaker survived, but the
order-of-operations was wrong.
Moved into `rerank_tracks` itself via a new `prefer_known_duration`
flag. Sort key sits between relevance score and the stable-order
tiebreaker so relevance still wins (length only decides ties, never
overrides a higher-relevance match). The MB endpoint opts in via
`prefer_known_duration=True`; Spotify / iTunes / Deezer callers stay
on the default-off path since their search results always include
length. Pinned with three new `TestRerankTracks` cases:
ties-promote-length, relevance-still-wins, default-off-unchanged.
**Route logic lifted to `core/discovery/manual_match.py`:**
Two pieces lived as inline route logic in `web_server.py` — the
`derive_manual_match_provider` fallback chain (payload.source →
active source → 'spotify') used by `update_youtube_discovery_match`,
and the `is_drifted_for_redo` predicate (cached provider differs from
active AND not manual_match) used by `prepare_mirrored_discovery`.
Per kettui's "extract logic from web_server.py, don't AST-parse it"
standard, both helpers now live in `core/discovery/manual_match.py`
with 12 dedicated unit tests covering fallback resolution order,
non-dict payload defenses, manual_match exemption from drift,
absent-provider legacy default, and edge cases.
Side benefits from the lift:
- `match_source` now derived once before the cache-save try block
instead of being duplicated in try + except (the except block existed
only because the original used `match_source` later — pre-computing
killed the duplication).
- `prepare_mirrored_discovery`'s `has_cached` check now reuses
`is_drifted_for_redo` with inverted polarity instead of restating
the field whitelist inline, so a future schema change only has to
land in one place.
- The mirrored-DB persist block now gates on `matched_data is not None`
to avoid a pre-existing latent NameError if the cache-save block
raised before matched_data construction.
**Enhanced toggle localStorage key now profile-scoped:**
`soulsync-library-view-mode` was global — two admin profiles would
share one preference. Wrapped in `_libraryViewModeKey()` which appends
`:${currentProfile.id}` when a profile is loaded, falls back to the
unsuffixed key otherwise (preserves pre-multi-profile saved values).
Tests:
- 12 new in `tests/discovery/test_manual_match.py` pinning both helpers.
- 3 new in `tests/metadata/test_relevance.py` pinning the
`prefer_known_duration` semantics.
- `test_search_tracks_with_artist_prefers_results_with_known_length`
renamed to `_does_not_resort_by_length` since the sort moved out of
this method. 664 tests pass across discovery + metadata suites.
|
4 weeks ago |
|
|
4ca3f70bf3 |
Show MusicBrainz release variants in import
Expand matched MusicBrainz release groups into concrete releases for specific album searches so import users can choose the correct edition by track count, format, country, and disambiguation. Preserve distinct MusicBrainz release IDs instead of deduping same-title variants, carry release metadata through import matching, and surface those details on album result cards. Add coverage for variant preservation and release-group expansion. |
1 month ago |
|
|
b9af4ef4ef |
Handle transient SQLite IO during maintenance
Keep full refresh moving when post-clear VACUUM hits a transient disk I/O error, and retry clear_server_data once when the clear step itself sees the same transient SQLite failure. Retry metadata cache maintenance writes once on transient disk I/O errors so first-attempt cache jobs do not fail when an immediate retry would succeed. Tests cover best-effort VACUUM, clear retry behavior, and cache maintenance retry behavior. |
1 month ago |
|
|
136d665c8a |
feat(webui): cache artwork images on disk
Add a disk-backed image cache with hashed browser URLs, SQLite metadata, size/type validation, stale fallback, and per-image fetch locking. Route normalized artwork through /api/image-cache while keeping /api/image-proxy as a compatibility shim, and align browser max-age with the image cache TTL. Add focused tests for cache behavior and image URL normalization. |
1 month ago |
|
|
987409508b |
fix(metadata): surface MusicBrainz 'Other' release-groups in discography (#650)
S-Bryce reported that for some artists (Vocaloid producers, JP indie
acts, niche Western indie) the artist detail page was missing whole
release-groups visible on musicbrainz.org. Downloaded tracks from
those release-groups appeared in artist track counts but were not
bound to any visible album / single card — orphan "ghost" tracks the
user couldn't browse to.
Two duplicated bugs fed each other:
1. `core/musicbrainz_search.py` browsed MB release-groups with
`release_types=['album', 'ep', 'single']`. MB's primary-type
vocabulary is {Album, Single, EP, Broadcast, Other} — music
videos, one-off web releases, and broadcast singles use Other.
Pre-fix the filter dropped them at the API layer.
2. Three sites duplicated the same "raw primary-type → internal
album_type" mapping with slightly different vocabularies and all
silently defaulted unknown values (including 'Other') to 'album':
core/musicbrainz_search.py `_map_release_type`
core/metadata/types.py inline `{single:single, ep:ep}.get(...)`
core/metadata/cache.py Deezer-specific record_type guard
Letting Other through the filter without a real mapper would have
placed music videos in the Albums view alongside LPs — visually
misleading.
Fix shape:
- New `core/metadata/release_type.py` — single canonical mapper
consumed by every provider's raw→Album projection. Knows the full
MB vocabulary including 'other' and 'broadcast'; routes both into
the singles bucket since they're functionally single-track
releases. Compilation secondary-type override preserved (MB's
canonical Greatest-Hits pattern is `primary=Album,
secondary=[Compilation]`).
- `core/musicbrainz_search.py` `_map_release_type` becomes a thin
alias for the new helper so the six internal call sites stay
intact. API filter gains 'other'.
- `core/metadata/types.py` Album projection drops its inline mini-
mapper and calls the canonical helper. Now also handles the
compilation secondary-type override it was previously missing.
- The Deezer-specific cache.py guard stays as-is — Deezer's
record_type vocabulary is closed (album|single|ep), not affected
by this issue.
Verified end-to-end against MB for S-Bryce's artist (`46196b9c-affa-
4616-b53b-e967c8bd70e0`, inabakumori): pre-fix returned 22 release-
groups; post-fix returns 27, with the 5 extra all landing in the
Singles section with album_type='single' as intended.
23 new unit tests pin the mapper contract (case-insensitive primary
types, compilation secondary override, Other/Broadcast → single,
unknown → album default preserved, defensive empty/None inputs).
2 new tests in test_musicbrainz_search pin the API filter inclusion
of 'other' and the round-trip into the Singles bucket. All 516
existing metadata tests still green — refactor leaves historical
behaviour for {album, ep, single, compilation} unchanged.
|
1 month ago |
|
|
5bc5fbb662 |
Add MusicBrainz as a metadata source
Register MusicBrainz as a first-class metadata source alongside Deezer, iTunes, Spotify, Discogs, and Hydrabase. Expose the shared client through metadata services, add the settings option, and expand the MusicBrainz search adapter with source-compatible artist, album, track, and detail methods. Carry MusicBrainz IDs through similar-artist discovery, recommended artists, artist map serialization, and personalized playlist selection. Update DB migrations and lookup filters so similar_artist_musicbrainz_id is preserved on older schemas and used for source requirements and library exclusion. Normalize MusicBrainz album adapter output for import context and add regression coverage for registry mapping, typed album conversion, and similar-artist filtering. Verified by user with 120 focused tests passing. |
1 month ago |
|
|
54dbd150cb |
Preserve full release dates in audio tags
|
1 month ago |
|
|
025007b97f |
Tighten artist discography soundtrack matching
|
1 month ago |
|
|
42a833fcb2 |
Amazon Music: UI badges, enrichment match chips, watchlist linking, metadata cache
- Artist cards, hero section, and enhanced view now show Amazon Music badges
when amazon_id is populated (AMAZON_LOGO_URL constant, orange #FF9900 brand)
- Enhanced view artist and album match status rows include amazon_match_status
chip with click-to-rematch via openManualMatchModal
- getServiceUrl: added amazon (album/track ASIN → music.amazon.com) and fixed
missing discogs entries; serviceLabels adds tidal/qobuz/amazon
- Enhanced view enhanced-artist-id-badges includes amazon_id entry
- DB SELECTs for library artists list and artist detail now return amazon_id;
both response dicts include the field
- watchlist_artists migration adds amazon_artist_id column
- Watchlist config GET: amazon_artist_id in SELECT/WHERE/response (index 18)
- Watchlist artists list response includes amazon_artist_id
- link-provider endpoint: amazon added to valid_providers and col_map
- _populateLinkedProviderSection: amazonId param + Amazon Music source row
- Watchlist card source badges render Amazon pill (watchlist-source-amazon CSS)
- _openSourceSearch labels map includes amazon
- service_search: amazon_worker injected via init(); _search_service amazon branch
uses search_artists/albums/tracks, same {id,name,image,extra} return shape
- _SERVICE_ID_COLUMNS: amazon → amazon_id for artist/album/track
- _init_service_search call passes amazon_worker_obj
- amazon_client._fetch_album_metas: 5-minute TTL cache per ASIN — cached hits
skip _rate_limit() and HTTP call entirely; fixes ~10s artist detail load
- registry.py: removed amazon from METADATA_SOURCE_PRIORITY and
METADATA_SOURCE_LABELS — T2Tunes has no discography API, cannot serve as a
primary metadata source; Amazon remains a download source + ASIN enricher
- Settings metadata source dropdown and help text updated accordingly
|
1 month ago |
|
|
1f579cede8 |
Add Amazon Music as a primary metadata source
Wires AmazonClient into the metadata source registry following the exact same pattern as DeezerClient. No existing source paths touched. - Add get_album_metadata / get_artist_info / get_artist_albums_list aliases to AmazonClient (mirrors DeezerClient interface aliases) - Register amazon in METADATA_SOURCE_PRIORITY and METADATA_SOURCE_LABELS - Add _get_amazon_factory() + get_amazon_client() to registry.py - Add amazon branch to get_client_for_source(); thread amazon_client_factory kwarg through get_primary_client() and get_primary_source_status() - Re-export get_amazon_client from the core.metadata_service shim - Add Amazon Music option to Settings metadata source dropdown - 3530 tests pass |
1 month ago |
|
|
d9529fc801 |
Token leak round 2: artist endpoint + playlist sync + URL-encoded redaction
The first token-leak fix scrubbed the artwork URL fixer's own log
calls. This catches three more sites that ALSO leaked tokens, plus
one upstream gap that let URL-encoded tokens slip through the
redactor.
Three sites in `web_server.py` (artist endpoint at line 8765-8773):
- "Artist image before fix: '...'" -- logged the raw image_url with
the auth token in plain form.
- "Artist image after fix: '...'" -- logged the URL-encoded form
after it had been wrapped in the image proxy
(`/api/image-proxy?url=<percent-encoded-token>`).
- "Final artist data being sent: {...}" -- dumped the entire
artist_info dict on every render, including the image_url field.
All three were dev-time debug noise. Removed entirely. The "No
artist image URL found" warning at line 8770 stays (no URL, just
the artist name).
One site in `core/discovery/sync.py:402`:
- "[PLAYLIST IMAGE] image_url=..." -- logged the playlist poster URL
during sync. Same auth-token leak risk for Plex / Jellyfin
playlists. Changed to log only `has_image=True/False`.
Upstream gap in `_redact_url_secrets`:
- The original regex only matched plain query params (`?key=value`).
When an auth-bearing URL gets wrapped inside another URL's query
string (our `/api/image-proxy?url=<encoded>` flow) the auth params
end up percent-encoded -- `%3FX-Plex-Token%3D...` -- and slipped
through.
- New second pattern catches the URL-encoded form. Both passes run
on every redact call; idempotent.
Verified manually:
/api/image-proxy?url=...%3FX-Plex-Token%3DABC...
-> /api/image-proxy?url=...%3FX-Plex-Token%3D***REDACTED***
6 artwork tests pass.
|
1 month ago |
|
|
2fe1926074 |
Stop leaking Plex / Jellyfin / Navidrome tokens into app.log
The artwork URL normalizer was logging the full constructed media- server URL on every cover-art lookup at INFO level, including the auth query params (X-Plex-Token / X-Emby-Token / Subsonic t+s+p). Those lines pile up in app.log on disk -- anyone with read access to the log file gains full read access to the user's media server. Also dropped the noisy per-call "Plex/Jellyfin/Navidrome config - base_url: ..., token: ..." INFO lines that fired on every thumbnail. Even the truncated `token[:10]` form is enough partial-known-plaintext to be uncomfortable to leak. - New `_redact_url_secrets` helper masks the values of X-Plex-Token, X-Emby-Token, api_key, apikey, Subsonic t / s / p, generic token / password query params. Regex anchored on `?` or `&` boundary so short keys like `t` don't false-match inside `format=Jpg`. - "Fixed URL: ..." log calls moved from INFO to DEBUG so they don't persist by default, and the URL passed in is run through the redactor first. - Per-call "Plex config - ..." / "Jellyfin config - ..." / "Navidrome config - ..." INFO lines removed entirely. Config inspection has dedicated UI; per-thumbnail spam belongs to no one. - Error-path logging (line 149) also routed through the redactor in case the failing URL had auth params attached. Users with existing app.log files containing the leaked tokens should rotate / wipe the log. Plex tokens can be regenerated by signing out of all devices in Plex settings; Jellyfin api_keys can be revoked from the dashboard; Navidrome users should rotate the account password. |
1 month ago |
|
|
30f017d1f0 |
Stop writing TRCK as "6/0" when album total_tracks is unknown
Discord report (netti93): downloaded album tracks were tagged with
TRCK = "6/0" instead of "6/13" when source data was incomplete. The
retag tool wrote correct "6/13" because core/tag_writer.py already
handled the case.
Trace: core/metadata/enrichment.py:105 formatted unconditionally as
f"{track_number}/{total_tracks}" and many album-dict construction
sites pass total_tracks: 0 (per types.py, 0 means "unknown" — not a
real count). That 0 propagated straight to disk.
Fix at the consumer boundary so every album-dict constructor stays
unchanged. Lifted to pure helper
core/metadata/track_number_format.py:format_track_number_tag that
drops the /N suffix when total is 0 / None / negative — emits just
"6" instead. Matches retag's behavior + ID3 spec convention (TRCK
can be "N" or "N/M"). MP4 trkn tuple gets the same treatment via
format_track_number_tuple returning (6, 0) per spec's "unknown
total" marker.
Wired into all three format-write sites in enrichment.py: ID3 (TRCK),
Vorbis (tracknumber), MP4 (trkn). When source data has correct
total_tracks (album downloads via the metadata-source pipeline,
retag flow), behavior unchanged — still writes "6/13".
16 boundary tests pin every shape: known total / zero total / none
total / none track / zero track / negative inputs / string coercion
/ unparseable strings / floats truncate.
Full suite: 3113 passed.
|
1 month ago |
|
|
0769fcd5cc |
Fix Soulseek downloads losing collab artist tags
Soulseek matched-download contexts populate `original_search_result` with `artist` (singular string) and no `artists` list — the full multi-artist array lives on `track_info` (the matched Spotify track object). `extract_source_metadata` only read `original_search.artists`, so the Soulseek path always fell through to the single-artist branch and TPE1 ended up with the primary artist only. Deezer-direct downloads were unaffected because their context populates `original_search.artists` as a proper list. Lifted artist resolution into a pure helper `core/metadata/artist_resolution.py:resolve_track_artists` that walks `original_search.artists` → `track_info.artists` → `artist_dict.name` fallback chain. Normalizes mixed list-item shapes (Spotify-style dicts, bare strings, anything else stringified) and drops empty entries. 13 new tests pin the resolution order, fallback chain, mixed-shape normalization, whitespace stripping, and empty/none handling. The existing `_artists_list` no-fall-through test in `test_multi_artist_tag_settings.py` was updated to reflect the new contract (always populated; multi-value write still gated on `len > 1`) plus a new regression test for the Soulseek shape. Composes with the existing Deezer per-track upgrade (still fires when single-artist + track_id available) and feat_in_title / artist_separator settings (still drive the joined ARTIST string downstream). |
1 month ago |
|
|
c77aa61fdf
|
Merge pull request #530 from dlynas/feat/explicit-badges
feat: add explicit badges to discography modal and artist-detail cards |
1 month ago |
|
|
5eae24b8bb |
Fix $albumtype defaulting to album for non-Spotify sources
- legacy duck-typed builder only checked the `album_type` key; deezer uses `record_type`, tidal uses `type` (uppercase), some flattened musicbrainz shapes use `primary-type` — all defaulted to album, so EPs and singles ended up filed under Album/ in user templates that reference $albumtype - widen lookup to album_type / record_type / type / primary-type and route through new pure `_normalize_album_type` helper that case-folds + validates against the canonical token set (album / single / ep / compilation), unknown → album - typed-converter path (spotify / deezer / itunes / discogs / mb / hydrabase / qobuz) unchanged — those were already correct Discord report (CAL). |
1 month ago |
|
|
42bee21c9f |
feat: add explicit badges to discography modal and artist-detail cards
Adds an explicit field to the Album dataclass in core/metadata/types.py
and the client-level Album dataclasses in deezer_client.py,
itunes_client.py, and hydrabase_client.py (the legacy discography path
reads from client objects, not typed dicts).
Deezer extracts explicit_lyrics (int→bool), iTunes extracts
collectionExplicitness ('explicit' string), Hydrabase forwards the
explicit field from the server response. Spotify, Discogs, MusicBrainz,
Qobuz, and Tidal have no explicit signal and stay None.
The flag threads through both builder functions in discography.py and
renders as a small "E" badge next to explicit titles in the discography
download modal and artist-detail page cards.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
1 month ago |
|
|
4892baf8d4 |
Skip already-owned tracks during download discography
- new track_already_owned helper wraps db.check_track_exists at the same confidence threshold the discography backfill repair job uses (0.7) — name+artist+album, format-agnostic so blasphemy-mode libraries (flac → mp3 + delete original) match correctly - endpoint runs the check after the artist + content-type filters and before add_to_wishlist, so a second discography click on the same artist no longer re-queues every track that already downloaded - per-album response carries a new tracks_skipped_owned counter alongside the existing artist/content/wishlist skip categories Discord report (Skowl). |
1 month ago |
|
|
d4ad5bf57f |
Filter cross-artist + content-type tracks during download discography
- drop tracks where the requested artist isn't named in track.artists (keeps features, drops compilation / appears_on contamination) - honor watchlist.global_include_live/remixes/acoustic/instrumentals the same way the discography backfill repair job already does - surface per-album skip counts in the ndjson stream (artist mismatch + content filter) so the ui can show what was filtered Closes #559. |
1 month ago |
|
|
d5de724f9b |
Multi-artist Deezer upgrade + double-append guard hardening
Two follow-ups to the multi-artist tag settings PR:
1. Deezer contributors upgrade — closes the "known limitation"
flagged in the prior commit. Deezer's `/search` endpoint only
returns the primary artist for each track; the full contributors
array (feat., remix collaborators, producers credited as artists)
lives on `/track/<id>` and gets parsed by `_build_enhanced_track`.
Without the upgrade Deezer-sourced tracks never got multi-artist
tags even with the right settings on.
Fix in `core/metadata/source.py`: when source==deezer AND the
search response had a single artist AND a track_id is available,
fetch full track details via `get_deezer_client().get_track_details`
and replace `all_artists` with the upgraded list.
- One extra API call per affected Deezer track
- Skipped when search already returned multiple (no-op fast path)
- Skipped for non-Deezer sources (Spotify/Tidal/iTunes search
responses already include all artists)
- Skipped when no track_id is available
- Defensive try/except: on /track/<id> failure (network error,
deezer client unavailable), fall through to the search-result
list — never lose the data we already had
2. Double-append guard hardened with a word-boundary regex.
Prior commit checked for `"feat." not in title.lower() and "(ft."
not in title.lower()` — too narrow. Source platforms produce
wildly different feat-marker conventions: "(feat. X)", "(Feat X)",
"(FEAT X)", "(Featuring X)", "[feat. X]", "ft. X" (no parens),
"FT. X", etc. Any of these as the SOURCE title would cause a
double-append: `"Track (Feat X) (feat. Y)"`.
Replaced with `re.search(r'\b(?:feat|feat\.|featuring|ft|ft\.)\b',
title, IGNORECASE)`. Word-boundary regex catches every common
variant. Substring matches like "Aftermath" containing `ft`
correctly fall through to the append path (pinned by a regression
test).
16 new tests (29 total in the file):
- 9 parametrized variants of the double-append guard
- 1 substring guard ("Aftermath")
- 6 Deezer upgrade scenarios (fires when expected, doesn't fire
for non-Deezer / multi-artist search / no track_id, defensive
fall-through on failure, no false-positive when /track/<id>
confirms single artist)
Full pytest 2727 passed.
|
1 month ago |
|
|
c11a5b7eab |
Multi-artist tag settings: implement artist_separator + feat_in_title + populate _artists_list
Three settings on Settings → Metadata → Tags were partially or
completely unimplemented. Reporter (Netti93) traced each one.
(1) `write_multi_artist` only "worked" because of a never-populated
`_artists_list` field. `core/metadata/source.py` built
`metadata["artist"]` as a hardcoded ", "-joined string but never
assigned `metadata["_artists_list"]`. `core/metadata/enrichment.py`
line 107 reads that field and gates the multi-value tag write
on `len(_artists_list) > 1` — always saw an empty list, silently
no-op'd the write.
(2) `artist_separator` (default ", ") was referenced in the UI +
settings.js save path but ZERO Python code read the value. Every
multi-artist track ended up with hardcoded ", " regardless of
what the user picked.
(3) `feat_in_title` (when true: pull featured artists into the title
as " (feat. X, Y)" and leave only primary in the ARTIST tag —
Picard convention) had no implementation at all.
Fix in source.py:
* Populate `_artists_list` from the search response's artists array
* Read `feat_in_title` and `artist_separator` configs
* When `feat_in_title=True` and >1 artist: ARTIST = primary only,
append "(feat. X, Y)" to title with double-append guard
* Else: ARTIST = artists joined with `artist_separator`
* Single-artist case unaffected by either setting
Double-append guard uses a word-boundary regex catching all common
"feat" variants source platforms produce — `feat`, `feat.`,
`featuring`, `ft`, `ft.` — case-insensitive. Substring matches
(e.g. "Aftermath" containing "ft") correctly fall through to the
append path.
Fix in enrichment.py ID3 branch:
* TPE1 stays as the display string (with separator or primary-only
per the user's settings)
* Multi-value list goes to a separate `TXXX:Artists` frame (Picard
convention) when `write_multi_artist` is on
* Pre-fix the ID3 path wrote TPE1 twice — single-string then list
— and the second `add` overwrote the first, clobbering both the
configured separator AND the feat_in_title semantics. Vorbis path
was already correct (separate "artist" + "artists" keys).
Known limitation (flagged in WHATS_NEW): Deezer's `/search` endpoint
only returns the primary artist. The full contributors array lives
on `/track/<id>`. Enrichment uses search-result data so Deezer-
sourced tracks may still get only the primary artist until a follow-
up commit wires the per-track contributors fetch into the enrichment
flow. Spotify, Tidal, and iTunes search responses include all
artists so they work now.
23 new tests in `tests/metadata/test_multi_artist_tag_settings.py`:
* `_artists_list` populated for multi/single/no-artist cases
* `artist_separator` drives ARTIST string (default ", " + custom
";" + custom "; " + " & ")
* Single-artist case unaffected by either setting
* `feat_in_title=True` pulls featured to title, leaves primary in
ARTIST
* `feat_in_title` no-op for single artist
* Double-append guard recognizes 9 source-title variants ("(feat.
X)", "(Feat. X)", "(FEAT X)", "(feat X)", "(Featuring X)",
"[feat. X]", "ft. X", "(ft X)", "FT. X")
* Substring guard test pins "Aftermath" doesn't false-positive
* Combined-settings precedence: feat_in_title wins ARTIST string
but `_artists_list` carries everyone for multi-value tag
Full pytest 2711 passed.
|
1 month ago |
|
|
8a4c0dc92a |
Deezer cover-art download: fallback to original URL on CDN refusal
Defensive followup. If Deezer CDN ever refuses the upgraded 1900×1900 URL for a specific album (rare — empirically tested 4 albums and none hit it), pre-fix would have succeeded with the 1000×1000 URL and post-fix would have failed entirely. Both download sites now retry with the original URL when the upgraded URL fails: - `core/metadata/artwork.py::download_cover_art` — auto post-process flow. Resolves the original URL from album_info / context the same way the existing path does. - `core/tag_writer.py::download_cover_art` — captures the original URL before upgrade so the retry has it without a second context lookup. Strictly non-regressive: worst plausible post-fix case is now identical to pre-fix (cover at 1000×1000 succeeds). Fallback only fires on the rare CDN-refusal edge. Tests added (2): - `test_tag_writer_retries_with_original_on_failure` — upgraded URL raises, original succeeds, both attempts logged in call order - `test_tag_writer_no_fallback_for_non_dzcdn_url` — non-Deezer URLs go through unchanged, no fallback path triggered (single attempt) Verification: - 18/18 helper + integration tests pass - 2561 full suite passes - Ruff clean |
1 month ago |
|
|
80cf16339c |
Deezer cover art: upgrade CDN URL to 1900×1900 (was embedding 1000×1000)
Discord report (Tim): downloaded cover art via Deezer metadata
source came out visibly blurry in Navidrome / on phones — large
displays exposed the limited resolution.
# Cause
Deezer's API returns `cover_xl` URLs at 1000×1000. The underlying
CDN actually serves up to 1900×1900 by rewriting the size segment
in the URL path (same trick the iTunes mzstatic + Spotify scdn
upgrades already use). SoulSync wasn't doing the rewrite — every
Deezer-sourced cover got embedded at 1000×1000 regardless of how
much higher resolution the CDN had available.
# Verified empirically
```
$ for size in 1000 1400 1800 1900 2000; do curl -I "...{size}x{size}-..."; done
1000: 200 OK 106 KB
1400: 200 OK 198 KB
1800: 200 OK 331 KB
1900: 200 OK 371 KB
2000: 403 Forbidden
```
1900 is the safe ceiling. Above that the CDN returns 403. CDN
serves source-native bytes when source < target (smaller-source
albums get same bytes whether we ask for 1000 or 1900), so asking
for 1900 universally is safe.
# Fix
New `_upgrade_deezer_cover_url(url, target_size=1900)` helper in
`core/deezer_client.py`. Pure function, mirrors the
`_upgrade_spotify_image_url` pattern that already lives in
`core/spotify_client.py`. Defensive on every input shape:
- Empty / None → returned as-is
- Non-Deezer URL (no `dzcdn`) → returned as-is
- No size segment in URL → returned as-is
- Already at/above target → returned as-is (idempotent, never
downgrades)
Applied at both cover-download sites:
- `core/metadata/artwork.py::download_cover_art` — auto post-process
flow. Mirrors the existing iTunes mzstatic upgrade right above it.
- `core/tag_writer.py::download_cover_art` — enhanced library view's
"Write Tags to File" feature.
# Scope discipline
- Helper applied at the DOWNLOAD boundary, not the source extraction
point in `deezer_client.py`. Means cached entries in the metadata
cache + DB row `image_url` columns keep the original 1000×1000 URL
Deezer's API returned. Future CDN behavior changes only affect the
download path, not stored data.
- Pre-existing `prefer_caa_art` toggle (Settings → Library →
Post-Processing) untouched — orthogonal workaround for users who
want even higher quality (MusicBrainz Cover Art Archive, often
3000×3000+).
- iTunes / Spotify upgrade paths untouched — they already worked.
# Tests added (16)
`tests/metadata/test_deezer_cover_url_upgrade.py`:
- Standard upgrade: default target 1900 on cover URL, alternate
dzcdn host (`e-cdns-images.dzcdn.net` vs `cdn-images.dzcdn.net`),
artist picture URLs (same path pattern), 500×500 source upgrades
too
- Custom target size: smaller target = no-op (never downgrade),
larger target works
- Idempotent: already at/above target returned unchanged
- Defensive on non-Deezer URLs: parametrised across 5 hosts
(Spotify scdn, iTunes mzstatic, MB CAA, Last.fm, random) — all
returned untouched
- Defensive on malformed Deezer URL (no size segment) → returned
as-is
- Empty / None handling
# Verification
- 16/16 helper tests pass
- 560/560 metadata + imports tests pass (no regression)
- 2559 full suite passes
- Ruff clean
|
1 month ago |
|
|
402d851cac |
Deezer search: drop advanced-syntax at endpoint, free-text + rerank wins
Live-API verification revealed advanced-syntax queries hurt more than they help on this endpoint. Switching the import-modal Deezer search back to free-text + local rerank. # What live testing showed Hit Deezer's public API with both query forms for the issue #534 case (`Dirty White Boy` + `Foreigner`): **Free-text (`q=Dirty White Boy Foreigner`):** - Returns 21 results - Real Foreigner Head Games studio cut at #1 - Live versions at #2-10 - Karaoke / cover variants at #11-15 **Advanced (`q=track:"Dirty White Boy" artist:"Foreigner"`):** - Returns 12 results - "(2008 Remaster)" at #1 — canonical Head Games cut MISSING from top 8 entirely - Live + alt-album versions follow Advanced syntax DOES filter karaoke at the API level (none in the 12-result set vs. 5 at positions 11-15 in free-text), but it has its own ranking bias that surfaces remasters / "Best Of" cuts ahead of the canonical recording. Net regression for the user- facing goal. # Fix 1. Endpoint reverts to free-text query with local rerank applied. 2. Local rerank gains "remaster" / "remastered" / "reissue" patterns under VARIANT_TAG_PATTERNS (soft 0.4× penalty — user may want them but they shouldn't outrank the original). 3. Client kwarg support (`track=` / `artist=` / `album=`) preserved for future opt-in callers (e.g. exact-match flows where API- level filtering matters more than ranking). # Verified end-to-end against live Deezer API Re-ran the exact #534 case through the live API + new rerank. Top 15 results post-rerank: 1. Dirty White Boy — Foreigner — Head Games ← REAL CUT AT TOP 2-10. Various Live versions 11-15. Karaoke / cover / tribute variants ← BURIED Real Foreigner Head Games studio cut at #1, exactly the user's ask. # Tests - `test_relevance.py` — variant tag patterns extended; existing tests still pass (50 tests). - `test_search_match_endpoints.py::test_joins_track_and_artist_into_free_text_query` — replaces `test_passes_track_and_artist_as_kwargs`; verifies endpoint sends free-text join, NOT field-scoped kwargs (the prior test asserted the wrong direction now). - Karaoke-burying assertion at the endpoint still pins the user-visible behaviour. - Client kwarg path tests untouched (still pin advanced-syntax construction for future opt-in callers). # Verification - 75 relevance + endpoint + query tests pass - 2445 full suite passes - Ruff clean - Live Deezer API shows real cut at #1 post-rerank |
1 month ago |
|
|
1cc37081a6 |
Fix Deezer search relevance — issue #534
# Background User reported (#534) that the import-modal "Search for Match" dialog returned irrelevant results when Deezer was the metadata source. Searching `Dirty White Boy` + `Foreigner` returned 5+ karaoke / "originally performed by" / "in the style of" / "re-recorded" / tribute-band results ranked above the actual Foreigner studio cut from Head Games. User had to scroll past the junk every time, or fall back to iTunes search which is much slower. # Root cause — two layers 1. **Endpoint joined `track + artist` into free-text query.** `/api/deezer/search_tracks` was passing `q=Dirty White Boy Foreigner` to Deezer's `/search/track` API. Deezer fuzzy-matches that string across title / lyrics / artist / album / contributors and orders by global popularity — anything that appears across many compilations outranks the canonical recording. 2. **No local rerank.** None of the search-modal endpoints applied any post-filtering. Deezer's API order shipped straight to the user. # Fix — same architectural shape Cin would build ## Layer 1: field-scoped query at the client boundary `core/deezer_client.py::search_tracks()` now accepts optional `track`, `artist`, `album` kwargs. When provided, builds Deezer's advanced search syntax: `q=track:"X" artist:"Y" album:"Z"`. Massive relevance improvement because each term matches the right field instead of fuzzy-matching everywhere. Backward compat preserved: legacy free-text `query=` callers still work unchanged. Field-scoped path takes precedence when both are provided. Empty input fast-fails without an API call. Embedded double-quotes stripped (Deezer's syntax has no escape mechanism). ## Layer 2: provider-neutral relevance reranker New `core/metadata/relevance.py` module — pure-function rerank over the canonical `Track` dataclass. Composable scoring: - **Cover/karaoke patterns** (multiplier 0.05, effectively buries): matches "karaoke", "originally performed by", "in the style of", "made famous by", "tribute", "vocal version", "backing track", "cover version", "re-recorded", "cover by", etc. across title, album, AND artist fields. Catches the screenshot's exact junk: artist credits like "Pop Music Workshop" / "The Karaoke Channel" / "Foreigner Tribute Band". - **Variant tags** (multiplier 0.4): live / acoustic / demo / instrumental / remix / radio edit / club mix etc. — softer penalty since the user MAY want them. Skipped entirely when the expected_title contains the same tag (so searching "Track (Live)" still ranks Live versions first). - **Exact artist boost** (multiplier 1.5): primary artist exactly matches expected_artist after normalisation. Single strongest signal for "this is the canonical recording". - **Title + artist similarity** via SequenceMatcher (parentheticals + punctuation stripped before comparison). - **Album-type weighting**: album=1.0 > single/ep=0.85 > compilation=0.7. Compilations are more likely tribute / karaoke repackages. Each component is a standalone function so tests pin them individually without standing up the full pipeline. ## Wired at three search-modal endpoints - `/api/deezer/search_tracks` — uses both layers (field-scoped query + rerank). - `/api/itunes/search_tracks` — uses rerank only (iTunes API has no advanced-syntax search, but karaoke / cover variants still leak through and need the local penalty). - `/api/spotify/search_tracks` — already builds field-scoped `track:X artist:Y` query; rerank added as the consistency safety net so all three sources behave the same from the user's perspective. Other Deezer call sites (matching engine, watchlist scanner, auto-import single-track ID) deliberately not touched in this PR — they have their own elaborate scoring pipelines tuned to their specific contexts and aren't surfacing the user-reported issue. Per Cin: "don't refactor beyond what the task requires." # Tests 71 new tests across 3 files: - `tests/metadata/test_relevance.py` (50 tests) — every scoring component pinned individually + the issue #534 screenshot reproduced as a regression test (real Foreigner cut wins after rerank, karaoke variants drop to bottom). - `tests/metadata/test_deezer_search_query.py` (14 tests) — advanced-syntax query construction, field-scoped wiring at the client boundary, free-text path unchanged, kwargs win when ambiguous, limit clamping, cache key consistency. - `tests/imports/test_search_match_endpoints.py` (7 tests) — end-to-end through Flask test client: Deezer endpoint passes kwargs not joined query; karaoke buried at bottom for all three sources; legacy query param still works without rerank. # Verification - 2441 full suite passes (+71 from baseline 2370) - 0 failures (the prior watchdog flake fix held) - Ruff clean across all changed files - JS parses clean (`node -c webui/static/helper.js`) # Architectural standards followed - **Logic at the right boundary.** Query construction lives in the client (every caller benefits from one change). Rerank lives in a neutral module (`core/metadata/relevance.py`) over the canonical `Track` dataclass — works for any source, not Deezer- specific. - **Explicit > implicit.** Every scoring rule has its own named function. Pattern tables are module-level constants tests can introspect. - **Scope discipline.** Audited every Deezer search call site; fixed the user-reported one + the consistent siblings. Did NOT speculatively normalise every Deezer call across the codebase. - **Backward compat.** Free-text `query=` callers untouched. Kwargs added to existing client method signature with safe defaults. - **Tests pin contract at correct boundary.** Pure-function rerank tests don't mock anything; client-query tests stub at `_api_get`; endpoint tests run through the real Flask app. |
1 month ago |
|
|
3246490800 |
Auto-import: MBID/ISRC fast paths + duration sanity gate
Brings the auto-import matcher to picard / beets / roon parity by reaching for the existing AcoustID-grade infrastructure (typed Album foundation, integrity check thresholds) and layering id-based exact matches on top of the fuzzy scorer. Picard-tagged libraries now land every track with full confidence on the first pass. Three layered phases in `core/imports/album_matching.match_files_to_tracks`: 1. **MBID exact match** — file has `musicbrainz_trackid` tag, source returns the same id → instant pair, full confidence, no fuzzy scoring. Picard's primary identifier; per-recording. 2. **ISRC exact match** — file has `isrc` tag, source returns the same id → same fast-path, slightly lower priority than mbid (isrc can be shared across remasters). Both ids normalised before compare (uppercase + strip dashes/spaces for isrc, lowercase for mbid). 3. **Duration sanity gate** — files in the fuzzy phase whose audio length differs from the candidate track's duration by more than `DURATION_TOLERANCE_MS` (3s, matching the post-download integrity check) are rejected before scoring runs. Defends against the cross-disc / cross-release / wrong-edit problem the integrity check used to catch only AFTER the file had already been moved + tagged + db-inserted. Tag reader (`_read_file_tags`) extended: - Reads `isrc` (uppercased, strip / / spaces normalisation deferred to matcher) - Reads `musicbrainz_trackid` as `mbid` (lowercased) - Reads `audio.info.length` and converts to `duration_ms` to match the metadata-source convention Metadata-source layer (`_build_album_track_entry`) extended: - Propagates `isrc` from top-level OR `external_ids.isrc` (spotify shape — would otherwise be stripped before reaching the matcher) - Propagates `musicbrainz_id` from top-level OR `external_ids.mbid` / `external_ids.musicbrainz` - Without this layer, fast paths would silently never fire in production even though unit tests pass — pinned by `test_album_track_entry_propagates_isrc_and_mbid_from_source` 18 new tests in `tests/imports/test_album_matching_exact_id.py`: - Direct: `find_exact_id_matches` with mbid, isrc, isrc normalisation, mbid > isrc priority, spotify-shape `external_ids.isrc`, no-id empty result, file-used-at-most-once - Direct: `duration_sanity_ok` within / outside tolerance, missing durations defer - End-to-end via `match_files_to_tracks`: mbid match short-circuits fuzzy scoring, id-matched files excluded from fuzzy phase, duration gate rejects wrong-disc collisions in fuzzy phase, normal matches pass through the gate, missing durations fall through, deezer seconds-vs-ms conversion, full picard-tagged 10-track album via mbid only - Production-shape: `_build_album_track_entry` propagates isrc + mbid from spotify-shape (`external_ids.isrc`) AND itunes-shape (top- level `isrc`) Verification: - 35 album-matching tests pass total (17 helper + 18 fast-path) - 23 multi-disc tests still pass after the extension (additive) - Full suite: 2311 passed (+18 new), 1 pre-existing flaky timing test failure (`test_watchdog_warns_about_stuck_workers` — passes in isolation, fails only in full-suite runs, unrelated to this PR) - Ruff clean For users: - Picard / Beets / Mp3Tag-tagged libraries (anyone who's organised their music) get instant perfect-confidence matches every time. - Soulseek-tagged downloads (which usually carry isrc when sourced via metadata-aware soulseekers) get the fast path too. - Naively-named files with no useful tags fall through to the improved fuzzy + duration-gated path — same correctness as before for the common case, much harder for the matcher to confidently pair the wrong file. - One step closer to standalone-DB feature parity with plex / jellyfin / navidrome scanners. Acoustid fingerprint fallback (for files with NO useful tags AND no MBID/ISRC) is the next followup PR. |
2 months ago |
|
|
9602d1827c |
Final silent-exception sweep + ruff S110 lint guardrail — ~45 sites
Catches the silent excepts the awk-based earlier sweeps missed:
- Bare `except:` followed by `pass` (also swallows KeyboardInterrupt
and SystemExit — actively wrong). Upgraded to `except Exception as
e: logger.debug("...: %s", e)`. ~14 sites across connection_detect,
soulseek_client, listenbrainz_manager, watchlist_scanner,
youtube_client, navidrome_client, jellyfin_client, web_server.
- `except Exception:` + pass that the awk pattern missed (e.g.
multi-line or unusual whitespace). ~31 sites across automation_engine,
database_update_worker, music_database, spotify_client, web_server,
others.
- 14 legitimate cleanup sites left silent with explicit `# noqa: S110`
+ comment explaining why (atexit handlers, finally-block conn.close
calls). Logging during shutdown can itself crash because file handles
get torn down before the handler fires.
Also enables `S110` rule in pyproject.toml so this pattern fails CI
going forward — drift fails at PR review instead of at runtime against
a wedged worker thread. Tests path keeps S110 ignored (test fixtures
legitimately use try-except-pass for cleanup).
Adds a WHATS_NEW entry to helper.js summarizing the full #369 sweep.
Verified: `python -m ruff check .` → All checks passed.
Verified: `python -m pytest tests/` → 2188 passed.
Closes #369
|
2 months ago |
|
|
aa54bed818 |
Surface silent exceptions across remaining modules — ~70 sites
Final sweep. Covers: - Downloads: candidates / lifecycle / master / monitor / wishlist_failed - Metadata: source / registry / cache / common / artwork (+ plex_client) - Imports: pipeline / resolution / file_ops / paths / guards - Library: path_resolver / retag / duplicate_cleaner - Stats / playlists / wishlist / discovery / automation / enrichment - Misc: hydrabase_client, soulsync_client, tag_writer, debug_info, api_call_tracker, album_consistency, beatport_unified_scraper, reorganize_runner, seasonal_discovery, lidarr_download_client, services/sync_service.py, automation_engine, automation/progress Two `_e` renames in imports/file_ops.py (outer scope binding `e`). A few finally-block sites in metadata/album_mbid_cache.py, library/track_identity.py, listening_stats_worker.py, watchlist/ auto_scan.py left silent — same reason as the rest of the sweep (logger calls during cleanup paths can themselves raise). Refs #369 |
2 months ago |
|
|
822759740d |
Fix Download Discography pulling wrong artist + log routing
Two fixes. (1) Discography endpoint now does server-side per-source ID resolution. When the user clicked Download Discography on a library artist, the endpoint received whichever artist ID the frontend happened to pick (spotify_artist_id || itunes_artist_id || deezer_id || library_db_id) and dispatched it as-is to whichever source it queried. If the picked ID didn't match the queried source's ID format, the lookup returned wrong-artist results (numeric ID collisions) or fell back to a fuzzy name search that picked a wrong artist. Two reproducible cases: - 50 Cent's library row had DB id 194687 — coincidentally a real Deezer artist ID for "Young Hot Rod". When the frontend's /enhanced fetch silently fell back to the DB id, the backend sent 194687 to Deezer, and Deezer returned Young Hot Rod's 50 albums in 50 Cent's discography modal. - Weird Al's library row had a stored Spotify ID. The frontend sent that to Deezer, which rejected the alphanumeric ID and fell back to fuzzy name search — which picked The Beatles somehow, returning 45 Beatles albums. The mechanism for per-source ID dispatch already exists in ``MetadataLookupOptions.artist_source_ids``, and the watchlist scanner already uses it; the on-demand discography endpoint just wasn't wired to it. Fix: when the URL artist_id matches a library row by ANY stored ID (DB id, spotify_artist_id, itunes_artist_id, deezer_id, or musicbrainz_id), pull every stored provider ID and pass them as ``artist_source_ids``. Each source gets its OWN stored ID regardless of which one the URL carries. When the URL ID is a non-library source-native ID and the row lookup misses entirely, behavior is identical to before (single-ID dispatch fallback). Logged the resolved per-source ID dict at INFO so future "wrong artist showed up" diagnostics are immediately legible in app.log. (2) Logger namespace fix in core/artists/quality.py and core/metadata/multi_source_search.py. Both modules used ``logging.getLogger(__name__)`` which resolves to ``core.artists.quality`` / ``core.metadata.multi_source_search`` — neither under the ``soulsync`` namespace where the file handler is wired. Result: every [Enhance], [MultiSourceSearch], and direct-lookup INFO line was being written to a logger with no handlers and silently dropped. App log showed the slow-request warning but no diagnostic detail. Switched both to ``get_logger()`` from utils.logging_config so the soulsync.* namespace picks them up. Same content, now actually lands in app.log. Confirmed working in live test: ``[Enhance] Direct lookup matched: deezer ID 1476162252 → 'Desastre'`` No behavior change in any other caller. Empty ``artist_source_ids`` (no library row matched) reaches lookup as ``None`` → identical to current single-ID dispatch path. Logger fix is pure routing — no content change. |
2 months ago |