WHATS_NEW: cross-script artist names no longer quarantine files (#442)

1 week ago · 80e9398e16
parent 7066233c37
commit 80e9398e16
1 changed files with 1 additions and 0 deletions
--- a/webui/static/helper.js
+++ b/webui/static/helper.js
@ -3416,6 +3416,7 @@ const WHATS_NEW = {
    '2.4.3': [
        // --- post-release patch work on the 2.4.3 line — entries hidden by _getLatestWhatsNewVersion until the build version bumps ---
        { date: 'Unreleased — 2.4.3 patch work' },
+        { title: 'Cross-Script Artist Names No Longer Quarantine Files (Hiroyuki Sawano / 澤野弘之, Сергей Лазарев / Sergey Lazarev)', desc: 'github issue #442 (afonsog6): files where the artist tag was in one script and the expected metadata was in another — japanese kanji `澤野弘之` for `hiroyuki sawano`, cyrillic `сергей лазарев` for `sergey lazarev`, etc. — got quarantined post-download because acoustid verification scored the artist similarity at 0% (the two scripts share no characters). reporter could not even rescue the file via manual import — the import-modal goes through the same verifier and re-quarantined the same file. cause: verifier compared expected vs actual artist with raw `_similarity` and never consulted musicbrainz aliases, even though MB exposes them on every artist record. fix: new `core/matching/artist_aliases.py` pure helper with alias-aware comparison + new `artists.aliases` JSON column populated by the existing MB enrichment worker on every artist match (one extra `inc=aliases` request per artist) + new multi-tier resolver `MusicBrainzService.lookup_artist_aliases` (library DB → cache → live MB) so the verifier finds aliases even for un-enriched artists without thrashing the MB API. verifier resolves aliases ONCE per `verify_audio_file` call and feeds them through three artist comparison sites (best-match scoring, secondary scan when title matches but artist doesn\'t, final fallback scan). reporter\'s exact two cases reproduced as regression tests with stubbed MB service. backward compat: aliases unavailable / MB unreachable → verifier falls back to direct similarity (identical to pre-fix behaviour — never quarantines stricter than today). 70 new tests pin every layer: pure helper (28), service methods (31), verifier integration (11). audited adjacent artist-comparison sites (auto-import single-track id, discovery scoring, matching engine) — left untouched per scope discipline since they aren\'t the user-reported pain.', page: 'downloads' },
        { title: 'Plex: Library Scan Trigger No Longer Fails On Non-English Section Names', desc: 'github issue #535 (adrigzr): plex servers with the music library named anything other than "music" — Música, Musique, Musik, Musica, etc. — got a `Failed to trigger library scan for "Music": Invalid library section: Music` error after every import cycle, and `wishlist.processing` kept reporting "missing from media server after sync" for tracks that DID import correctly because the post-import scan never fired. cause: `trigger_library_scan` and `is_library_scanning` ignored the auto-detected `self.music_library` (correctly populated by `_find_music_library` filtering by `section.type == "artist"`) and called `self.server.library.section(library_name)` with a hardcoded "music" default — raised NotFound on any non-english server. read methods like `get_artists` already routed through `_get_music_sections` so they didn\'t have the bug; this aligns the scan-trigger path with the same resolution. fix: both single-library branches prefer `self.music_library` first, fall back to literal section lookup only when auto-detection hasn\'t run. activity-feed match in `is_library_scanning` also corrected to use the resolved section\'s actual title instead of the unused `library_name` arg — the prior log line read "triggered scan for music" even on Spanish servers. 13 new tests pin: trigger uses auto-detected section across 6 locale variants (Música / Musique / Musik / Musica / 音乐 / موسيقى), backward-compat fallback when music_library is None, explicit library_name kwarg ignored when auto-detected section exists, log line surfaces correct section title, scan-status check uses auto-detected section\'s `refreshing` attr, activity-feed match filters by resolved title (not library_name).', page: 'settings' },
        { title: 'Search For Match: No More Karaoke / Cover / "Originally Performed By" Junk At The Top', desc: 'github issue #534 (radoslav-orlov): typing "dirty white boy" + "foreigner" into the import-modal "search for match" dialog returned karaoke versions, "originally performed by" compilations, and tribute-band cuts ranked above the actual foreigner studio recording in some regions. user had to scroll past 5+ junk results before finding the canonical track. fix: new `core/metadata/relevance.py` helper reranks results locally with cover/karaoke/tribute/re-recorded penalties (multiplier 0.05× — effectively buries) + exact-artist-match boost (1.5×) + variant-tag (live/acoustic/remix/remaster) penalty (0.4×, skipped when user explicitly typed the variant — searching "track (live)" still ranks live versions correctly). applied at the deezer + itunes + spotify search-tracks endpoints so all three sources behave consistently. validated against live deezer api with the actual #534 query: real foreigner head games cut now lands at #1, live versions follow, karaoke / cover / tribute variants drop to positions 11-15. deezer client also gained optional field-scoped query kwargs (`track="X" artist="Y"`) that build deezer\'s advanced search syntax `track:"X" artist:"Y"` for future opt-in callers (e.g. exact-match flows where api-level filtering is more important than ranking) — kept in client but NOT used at the import-modal endpoint after live testing showed the advanced syntax has its own ranking bias (surfaced "(2008 remaster)" instead of the canonical recording). free-text + local rerank is the more reliable combination here. 75 new tests pin every scoring component, pattern detection (13 cover patterns, 11 variant patterns, 3 fields), score composition (real-cut > karaoke > remaster > re-recorded), the issue #534 screenshot reproduced as a regression test, deezer client query construction + free-text fallback safety net.', page: 'import' },
        { title: 'Auto-Import: Album Duration Is Album Total + Re-Imports Fill Metadata Gaps', desc: 'two more parity gaps closed in the soulsync standalone library write path. (1) album row\'s `duration` column was being written with the FIRST imported track\'s duration instead of the album total — pre-existing bug that survived the prior parity commit. soulsync_client deep scan computes `sum(t.duration for t in self._tracks)` for each album; auto-import now mirrors that by computing the sum across every matched track in the worker and threading it through context to the album INSERT. (2) `record_soulsync_library_entry` was insert-only on artists + albums — once a row existed (matched by id OR name fallback), subsequent imports of the same artist or album skipped completely. meant: artist genres / thumb / source-id reflected ONLY whatever the FIRST imported album supplied, never refreshing as more albums by that artist landed (ten more deezer/spotify imports later, artist row still had whatever the first random import wrote). new conservative UPDATE path: when an existing row matches, fill ONLY the columns whose current value is NULL or empty — never overwrites populated values. protects manual edits + enrichment-worker writes the same way scanner UPDATEs preserve enrichment columns. f-string column names are validated against an allowlist (`_SOULSYNC_FILLABLE_COLUMNS`) before interpolation — defensive against accidental misuse adding columns without an allowlist update. 4 new tests pin: album duration uses sum not single-track, re-import fills empty thumb + genres on existing artist row, re-import does NOT clobber populated values, re-import fills empty source-id columns when later import has them.', page: 'import' },