mirror of https://github.com/Nezreka/SoulSync.git
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
main
dev
fix/usenet-album-poll-sab-handoff
fix/quarantine-source-dedup
release/2.5.3
fix/disable-beatport-features
johnbaumb-discover-redesign
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
2.4.0
2.4.1
2.4.2
2.5.0
2.5.1
2.5.2
2.5.3
2.5.4
2.5.5
2.5.6
2.5.7
2.5.9
2.6.0
2.6.1
2.6.2
2.6.3
2.6.4
v0.65
${ noResults }
Discord report (Foxxify): the AcoustID scanner repair job flagged
multi-artist tracks as Wrong Song because AcoustID returns the
FULL credit ("Okayracer, aldrch & poptropicaslutz!") while the
library DB carries only the primary artist ("Okayracer"). Raw
SequenceMatcher similarity scored ~43% — well below the 60%
threshold — so the scanner created a finding even though the
audio was correct. User couldn't fix without lowering the global
artist threshold to ~30% (which would let real mismatches through).
# Fix
Extended the shared `core/matching/artist_aliases.py::artist_names_match`
helper (originally lifted for #441) with credit-token splitting.
When the actual artist string contains common separators —
- punctuation: `,` `&` `;` `/` `+`
- keywords (whitespace-bounded): `feat.` `ft.` `featuring` `with`
`vs.` `x`
— the helper splits into individual contributors and checks each
against the expected artist. Primary-in-credit cases now resolve
at 100% instead of 43%.
Two pattern groups because punctuation separators don't need
surrounding whitespace, but keyword separators MUST be
whitespace-bounded — otherwise we'd split artists with `x` /
`with` etc. in their names ("JAY-X" → "JAY-" / "" issue).
Composes with the existing alias path: cross-script multi-artist
credits ("Hiroyuki Sawano" expected, "澤野弘之, FeaturedJp"
actual) work via alias-token-against-credit-token compare.
# Wire-in
Scanner at `core/repair_jobs/acoustid_scanner.py:202` replaces
the raw `SequenceMatcher` call with `artist_names_match`. Pass
RAW artist strings (not pre-normalised by `_normalize`) so the
splitter can recognise separators — `_normalize` strips ALL
punctuation, which destroyed the very tokens the splitter needs.
The AcoustID post-download verifier (`core/acoustid_verification.py`)
already routes through `_alias_aware_artist_sim` which calls the
same helper — gets the multi-value benefit automatically without
a separate wire-in.
# New `split_artist_credit` exported helper
Pure-function helper for callers who want token-level access to
the credit list (debugging, UI, future per-token enrichment). Same
splitter logic, exposed as a top-level function.
# Tests added (14)
`tests/matching/test_artist_aliases.py` (+11):
- `TestSplitArtistCredit` — parametrised across 12 credit-string
formats (comma, ampersand, semicolon, slash, plus, feat./ft./
featuring, with, vs., x, single-token, empty), drops empty
tokens, strips per-token whitespace
- `TestMultiValueCreditMatching` — reporter's exact case
(Okayracer in 3-artist credit → 100%), primary in middle/end of
credit, genuine-mismatch still fails, single-token actual falls
through to direct compare, multi-value composes with aliases,
threshold still respected
`tests/test_acoustid_scanner.py` (+3):
- Reporter's case end-to-end through `_scan_file` — fingerprint
99% / title 100% / multi-artist credit → no finding created
- Genuine artist mismatch still creates finding (no false
suppression of real mismatches)
- `JobResultStub` minimal scaffold for the integration tests
# Verification
- 14 new tests pass (49 helper + 5 scanner total in their files)
- 110 matching + scanner tests pass total
- 2584 full suite passes (+25 from baseline 2559)
- Ruff clean
- Reporter's exact case (Okayracer in `Okayracer, aldrch &
poptropicaslutz!`) now scores 100% match → no Wrong Song flag
|
3 weeks ago | |
|---|---|---|
| .. | ||
| __init__.py | Add pure artist-name comparison helper with alias awareness | 3 weeks ago |
| artist_aliases.py | AcoustID scanner: handle multi-value artist credits | 3 weeks ago |