Discord report (Foxxify): the AcoustID scanner repair job flagged
multi-artist tracks as Wrong Song because AcoustID returns the
FULL credit ("Okayracer, aldrch & poptropicaslutz!") while the
library DB carries only the primary artist ("Okayracer"). Raw
SequenceMatcher similarity scored ~43% — well below the 60%
threshold — so the scanner created a finding even though the
audio was correct. User couldn't fix without lowering the global
artist threshold to ~30% (which would let real mismatches through).
# Fix
Extended the shared `core/matching/artist_aliases.py::artist_names_match`
helper (originally lifted for #441) with credit-token splitting.
When the actual artist string contains common separators —
- punctuation: `,` `&` `;` `/` `+`
- keywords (whitespace-bounded): `feat.` `ft.` `featuring` `with`
`vs.` `x`
— the helper splits into individual contributors and checks each
against the expected artist. Primary-in-credit cases now resolve
at 100% instead of 43%.
Two pattern groups because punctuation separators don't need
surrounding whitespace, but keyword separators MUST be
whitespace-bounded — otherwise we'd split artists with `x` /
`with` etc. in their names ("JAY-X" → "JAY-" / "" issue).
Composes with the existing alias path: cross-script multi-artist
credits ("Hiroyuki Sawano" expected, "澤野弘之, FeaturedJp"
actual) work via alias-token-against-credit-token compare.
# Wire-in
Scanner at `core/repair_jobs/acoustid_scanner.py:202` replaces
the raw `SequenceMatcher` call with `artist_names_match`. Pass
RAW artist strings (not pre-normalised by `_normalize`) so the
splitter can recognise separators — `_normalize` strips ALL
punctuation, which destroyed the very tokens the splitter needs.
The AcoustID post-download verifier (`core/acoustid_verification.py`)
already routes through `_alias_aware_artist_sim` which calls the
same helper — gets the multi-value benefit automatically without
a separate wire-in.
# New `split_artist_credit` exported helper
Pure-function helper for callers who want token-level access to
the credit list (debugging, UI, future per-token enrichment). Same
splitter logic, exposed as a top-level function.
# Tests added (14)
`tests/matching/test_artist_aliases.py` (+11):
- `TestSplitArtistCredit` — parametrised across 12 credit-string
formats (comma, ampersand, semicolon, slash, plus, feat./ft./
featuring, with, vs., x, single-token, empty), drops empty
tokens, strips per-token whitespace
- `TestMultiValueCreditMatching` — reporter's exact case
(Okayracer in 3-artist credit → 100%), primary in middle/end of
credit, genuine-mismatch still fails, single-token actual falls
through to direct compare, multi-value composes with aliases,
threshold still respected
`tests/test_acoustid_scanner.py` (+3):
- Reporter's case end-to-end through `_scan_file` — fingerprint
99% / title 100% / multi-artist credit → no finding created
- Genuine artist mismatch still creates finding (no false
suppression of real mismatches)
- `JobResultStub` minimal scaffold for the integration tests
# Verification
- 14 new tests pass (49 helper + 5 scanner total in their files)
- 110 matching + scanner tests pass total
- 2584 full suite passes (+25 from baseline 2559)
- Ruff clean
- Reporter's exact case (Okayracer in `Okayracer, aldrch &
poptropicaslutz!`) now scores 100% match → no Wrong Song flag