SoulSync

History

Broque Thomas df304eb016 AcoustID scanner: handle multi-value artist credits Discord report (Foxxify): the AcoustID scanner repair job flagged multi-artist tracks as Wrong Song because AcoustID returns the FULL credit ("Okayracer, aldrch & poptropicaslutz!") while the library DB carries only the primary artist ("Okayracer"). Raw SequenceMatcher similarity scored ~43% — well below the 60% threshold — so the scanner created a finding even though the audio was correct. User couldn't fix without lowering the global artist threshold to ~30% (which would let real mismatches through). # Fix Extended the shared `core/matching/artist_aliases.py::artist_names_match` helper (originally lifted for #441) with credit-token splitting. When the actual artist string contains common separators — - punctuation: `,` `&` `;` `/` `+` - keywords (whitespace-bounded): `feat.` `ft.` `featuring` `with` `vs.` `x` — the helper splits into individual contributors and checks each against the expected artist. Primary-in-credit cases now resolve at 100% instead of 43%. Two pattern groups because punctuation separators don't need surrounding whitespace, but keyword separators MUST be whitespace-bounded — otherwise we'd split artists with `x` / `with` etc. in their names ("JAY-X" → "JAY-" / "" issue). Composes with the existing alias path: cross-script multi-artist credits ("Hiroyuki Sawano" expected, "澤野弘之, FeaturedJp" actual) work via alias-token-against-credit-token compare. # Wire-in Scanner at `core/repair_jobs/acoustid_scanner.py:202` replaces the raw `SequenceMatcher` call with `artist_names_match`. Pass RAW artist strings (not pre-normalised by `_normalize`) so the splitter can recognise separators — `_normalize` strips ALL punctuation, which destroyed the very tokens the splitter needs. The AcoustID post-download verifier (`core/acoustid_verification.py`) already routes through `_alias_aware_artist_sim` which calls the same helper — gets the multi-value benefit automatically without a separate wire-in. # New `split_artist_credit` exported helper Pure-function helper for callers who want token-level access to the credit list (debugging, UI, future per-token enrichment). Same splitter logic, exposed as a top-level function. # Tests added (14) `tests/matching/test_artist_aliases.py` (+11): - `TestSplitArtistCredit` — parametrised across 12 credit-string formats (comma, ampersand, semicolon, slash, plus, feat./ft./ featuring, with, vs., x, single-token, empty), drops empty tokens, strips per-token whitespace - `TestMultiValueCreditMatching` — reporter's exact case (Okayracer in 3-artist credit → 100%), primary in middle/end of credit, genuine-mismatch still fails, single-token actual falls through to direct compare, multi-value composes with aliases, threshold still respected `tests/test_acoustid_scanner.py` (+3): - Reporter's case end-to-end through `_scan_file` — fingerprint 99% / title 100% / multi-artist credit → no finding created - Genuine artist mismatch still creates finding (no false suppression of real mismatches) - `JobResultStub` minimal scaffold for the integration tests # Verification - 14 new tests pass (49 helper + 5 scanner total in their files) - 110 matching + scanner tests pass total - 2584 full suite passes (+25 from baseline 2559) - Ruff clean - Reporter's exact case (Okayracer in `Okayracer, aldrch & poptropicaslutz!`) now scores 100% match → no Wrong Song flag		2 weeks ago
..
__init__.py	Add pure artist-name comparison helper with alias awareness	2 weeks ago
test_acoustid_verification_aliases.py	Tighten alias-lookup trust + add ambiguity gate + diagnostic log	2 weeks ago
test_artist_alias_service.py	Tighten alias-lookup trust + add ambiguity gate + diagnostic log	2 weeks ago
test_artist_aliases.py	AcoustID scanner: handle multi-value artist credits	2 weeks ago