Sokhi (continued from #806): volume-numbered series ('B小町 …キャラクター
ソングCD Vol.2' / 'Vol.2.5' / 'Vol.4' / 'Vol.4.5') got each other's art from
both normal downloads and the retag tool. Two distinct holes, one principle:
1. The art picker's _album_matches validates by significant-token SUBSET —
built to tolerate '(Deluxe)'/'- Remastered' suffixes. CJK strips out of
the normalizer entirely, so Vol.4 → {b,tv,cd,vol,4}, a clean subset of
Vol.4.5's {b,tv,cd,vol,4,5}: the wrong volume validated as "the same
album with a suffix". Affected every fuzzy art source (iTunes, Deezer,
AudioDB, Spotify) in downloads, retag, and the missing-art repair.
2. MusicBrainz match_release scores by string similarity — Vol.4 vs Vol.4.5
is 0.973, so the wrong volume could win the match outright, and its MBID
then feeds Cover Art Archive with NO downstream validation (CAA is
MBID-keyed, trusted by design). With Sokhi's MB metadata source this is
the likely path in his logs (his release-group 404s push re-matching).
The shared rule (core.text.title_match.numeric_tokens_differ): digit-bearing
tokens must be IDENTICAL between the two titles. A number on one side only —
volume, part, sequel, remaster year — is a different release, never a
suffix. '1989' vs '1989 (Deluxe)' still matches (digits shared); 'Album' vs
'Album 2' now rejects (sequels!). Art picker rejects outright (falls through
to next source / the download's own art — the designed cost of a false
reject); MB matcher halves the candidate's confidence, landing it below the
70 gate while the exact-volume result is untouched.
Tests: helper truth table, the exact reported pairs through _album_matches,
and match_release end-to-end (wrong volume alone → no match beats a wrong
MBID; exact volume beats near-identical wrong one despite lower MB score).
828 matching/metadata + 301 musicbrainz/retag/artwork tests pass.