Issue #607 (AfonsoG6) -- two AcoustID problems:
1. Live recordings false-quarantining as "Version mismatch: expected
'... (Live at Venue)' (live) but file is '...' (original)" because
MusicBrainz often stores the recording entity with a bare title --
the venue / live annotation lives on the release entity, not the
recording. The audio fingerprint correctly identifies the live
recording, but the title-text comparison flagged it as wrong.
New pure helper `core/matching/version_mismatch.py:is_acceptable_version_mismatch`
accepts the mismatch only when:
- One-sided AND involves 'live': exactly one side is 'live' and
the other is bare 'original'. Two-sided mismatches stay strict.
- Fingerprint score >= 0.85 (stricter than the existing 0.80
minimum -- escape valve only fires when AcoustID is more
confident than its own threshold).
- Bare title similarity >= 0.70.
- Artist similarity >= 0.60.
Other version markers (instrumental, remix, acoustic, demo, etc)
stay strict -- those have distinct fingerprints AND MB always
annotates them in the recording title. The existing
test_acoustid_version_mismatch.py suite passes unchanged.
2. Audio-mismatch failure message reported "identified as '' by ''
(artist=100%)" when AcoustID returned multiple recordings -- prior
code mixed `recordings[0]`'s strings (which can be empty) with
`best_rec`'s scores. Now uses `matched_title` / `matched_artist`
consistently in both the high-confidence-skip path and the final
fail message.
Issue #608 (AfonsoG6) -- quarantine modal:
3. Approve / Delete buttons silently no-op'd when the filename
contained an apostrophe -- the unescaped quote broke the inline JS
in the onclick handler. Now wraps the id via
`escapeHtml(JSON.stringify(id))`, which round-trips quotes /
backslashes / unicode / newlines safely through the HTML attribute
to JS string boundary.
4. Bonus UX: quarantine entry expanded view now shows source uploader
(username) and original soulseek filename when the sidecar carries
that context -- helps trace which uploader the bad file came from.
Backend exposes `source_username` + `source_filename` fields from
`sidecar.context.original_search_result`. Degrades to '' on legacy
thin sidecars.
Tests:
- 23 new boundary tests in tests/matching/test_version_mismatch.py
pin every shape: equal versions trivial, one-sided live both
directions, threshold floors (each just below default -> reject),
two-sided strict, non-live one-sided strict (covers exact
test_instrumental_returned_for_vocal_request_fails scenario),
custom-threshold overrides.
- 4 existing test_acoustid_version_mismatch.py tests pass unchanged.
- 507 AcoustID / matching / imports tests pass.
Cin pre-review pass on the false-positive risk. Three tightenings:
# 1. Bumped MB-search trust threshold from 0.6 → 0.85
`MusicBrainzService.lookup_artist_aliases` previously trusted any
MB search match scoring ≥ 0.6 combined (name-similarity + MB
relevance). For distinctive cross-script artists the user-reported
case targets (Hiroyuki Sawano, Сергей Лазарев, etc.) real matches
score ~1.0 — well above 0.85. The 0.6 floor was loose enough to
let in moderate matches for ambiguous names, risking aliases for
the wrong artist getting cached + applied.
Bumped to 0.85. Tighter without rejecting any of the legit
cross-script cases the PR is for.
# 2. Ambiguity gate — skip when results within 0.1 of best
When MB search returns multiple results all scoring high (within
0.1 of the best), the artist name is ambiguous — common name with
multiple distinct artists ("John Smith" returning 10 different
John Smiths). Pulling aliases for any one of them risks the wrong
artist's data bridging incorrectly to a file's tag.
Added explicit ambiguity detection: when 2+ results within 0.1,
skip alias lookup entirely + cache empty. Matches Cin's
"explicit > implicit" — the prior code just picked the highest
score blindly.
# 3. Diagnostic log when alias rescues a comparison
When the alias path triggers a PASS that direct similarity would
have FAILed, emit an INFO log: `Artist alias rescued comparison:
expected='X' vs actual='Y' (direct sim=0.00, alias 'Z' →
score=1.00)`.
Lets future bug reports trace which alias triggered which decision.
Doesn't change behavior — visibility only. Logs ONLY the rescue
case, not happy-path direct matches (no log spam).
# Tests added (5)
`test_artist_alias_service.py` (+3):
- `test_moderate_confidence_match_now_skipped_strict_threshold`
- `test_ambiguous_results_skipped`
- `test_unambiguous_high_confidence_match_succeeds`
`test_acoustid_verification_aliases.py` (+3):
- `test_alias_rescue_emits_info_log` — direct-fail + alias-pass
emits INFO log
- `test_no_log_when_direct_match_succeeds` — happy path quiet
- `test_no_log_when_alias_doesnt_help` — failed path also quiet
# Test infrastructure note
Logging tests use a directly-attached `ListHandler` on
`soulsync.acoustid.verification` (the actual logger name —
dot-separated by `get_logger`), NOT pytest's caplog. Same pattern
as the prior watchdog-test fix — caplog is intermittently flaky
in full-suite runs for soulsync namespace loggers. An owned
handler sidesteps both issues.
# Verification
- 85/85 matching tests pass (+5 from prior commit)
- 2543 full suite passes (+6 from prior, +85 PR-total)
- Ruff clean
- Reporter's Japanese + Russian regression tests still pass —
legit cross-script case (sim ≈ 1.0) clears the new 0.85
threshold easily
Two perf gaps that would have failed Cin's review:
# Gap #1: alias lookup fired unconditionally
Pre-fix in this commit, `_resolve_expected_artist_aliases` ran at
the top of every `verify_audio_file` call regardless of whether
the direct artist match would have passed. For users whose library
is mostly same-script (95% of cases), every successful verification
was paying for a wasted DB query (and possibly a wasted MB API
call for un-enriched artists).
Restructured the helper to accept a callable provider instead of a
pre-resolved list. Provider invoked LAZILY only when direct
similarity falls below `ARTIST_MATCH_THRESHOLD`. Verifier passes a
memoising thunk that resolves once across the 3 comparison sites
within one verification.
`_alias_aware_artist_sim` now accepts `aliases` as either:
- iterable of strings (used eagerly — backward compat with tests
that already know the aliases)
- callable returning the iterable (resolved on first need within
a verification)
Happy path (direct match passes): zero DB queries, zero MB calls.
Cross-script case: one resolution shared across 3 sites — same as
the prior contract.
# Gap #2: existing-MBID artists never got alias backfill
Worker's `_process_item` artist branch had an `existing_id` short-
circuit (line 296) that updated MBID status but skipped alias
fetch. Result: every user with an already-enriched library had
MBIDs but NULL aliases on day-one of this PR. Live MB lookup at
verify-time covered them, but at the cost of N live calls for N
artists across the library.
Added one-time backfill: when existing-MBID is found AND
`artists.aliases` for that row is empty, fetch + persist aliases.
Subsequent re-scan cycles short-circuit on the populated column —
no repeated MB calls.
New helper `_artist_aliases_empty(artist_id)` does the cheap NULL
check via direct SQL. Best-effort: defensively returns True on
errors so backfill happens (a redundant MB call is cheaper than
missing the backfill entirely).
# Tests added (9)
`test_acoustid_verification_aliases.py` (+6):
- `TestLazyAliasResolution` (3): no lookup when direct match passes,
lookup fires only when direct fails, lookup memoised across the
3 sites within one verification.
- `TestAliasProviderCallable` (3): iterable passed directly,
callable resolves lazily, callable returning empty falls back to
direct sim.
`test_artist_alias_service.py` (+3):
- `test_existing_mbid_path_backfills_aliases_when_column_empty`
- `test_existing_mbid_path_skips_backfill_when_aliases_already_set`
- `test_existing_mbid_backfill_failure_does_not_break_match`
# Verification
- 79/79 matching tests pass (+9 from prior commit)
- 2537 full suite passes (+9, +79 PR-total)
- Ruff clean
- Backward compat: every prior-commit test still passes (the
iterable-shape API still works alongside the new callable shape)
This is the user-visible commit. The reporter's exact two cases
(Japanese kanji, Russian Cyrillic) now pass verification instead of
being quarantined.
# What changed
Verifier's three artist-similarity sites now route through the
shared `core.matching.artist_aliases.artist_names_match` helper
instead of raw `_similarity`:
- `_find_best_title_artist_match` (per-recording scoring at the
best-match stage)
- Secondary scan when title matches but best-match's artist doesn't
(line ~355 pre-fix)
- Final fallback scan over all recordings (line ~403 pre-fix)
Aliases for the expected artist are resolved ONCE at the top of
`verify_audio_file` via `_resolve_expected_artist_aliases`, which
calls the new `MusicBrainzService.lookup_artist_aliases` chain
(library DB → cache → live MB). Single resolution per verification
regardless of how many AcoustID recordings come back — pinned by
test.
New helper `_alias_aware_artist_sim(expected, actual, aliases)`
wraps the pure helper with the verifier's normaliser
(`_similarity`) and threshold (`ARTIST_MATCH_THRESHOLD`). Returns a
single float so existing threshold-comparison code paths keep their
shape — minimal diff.
# Reporter's cases — verified
Case 1 (issue #442 verbatim):
File: YAMANAIAME by 澤野弘之
Expected: YAMANAIAME by Hiroyuki Sawano
Pre-fix: Quarantined (artist=0%)
Post-fix: PASS (alias '澤野弘之' resolved from MB)
Case 2 (issue #442 verbatim):
File: On the Other Side by Sergey Lazarev
Expected: On the other side by Сергей Лазарев
Pre-fix: Quarantined (artist=7%)
Post-fix: PASS (alias 'Sergey Lazarev' resolved from MB)
Both reproduced as regression tests with stubbed MB service.
# Backward compat
Three test cases pin that no-aliases / failure paths preserve
pre-fix behaviour exactly:
- Clear artist mismatch (different artist, same script) still
FAILs — aliases bridge synonyms, not unrelated artists.
- Exact title + artist match still PASSes regardless of aliases.
- MB service raise → verifier completes with direct similarity
(treats failure as "no aliases available" — same as pre-fix).
Also covers manual import: the import-modal "Search for Match"
flow goes through the same verifier, so the reporter's complaint
that "manual import simply throws them back in quarantine again"
is fixed by the same change.
# Tests added (11)
`tests/matching/test_acoustid_verification_aliases.py`:
- `_alias_aware_artist_sim`: alias bridges score ↑, no-aliases
falls back, aliases don't mask genuine mismatches
- `_find_best_title_artist_match` accepts + uses aliases
- Reporter's case 1 (Japanese) end-to-end
- Reporter's case 2 (Russian) end-to-end
- Backward compat: no-aliases mismatch still fails, exact match
still passes, MB-service-raise doesn't break verification
- Performance: alias lookup fires ONCE per verification regardless
of recording count
# Verification
- 11 new verifier tests pass
- 31 prior service tests pass
- 28 prior helper tests pass
- 294 matching + imports tests pass total (no regression)
- Ruff clean
Discord report (corruption [BWC]): downloads coming through as the
instrumental cut when a vocal track was requested. The verification
step's `_normalize` function strips parentheticals and version-suffix
tags ("(Instrumental)", "- Live", etc) so legitimate name variations
don't false-fail the title-similarity check. That also means "In My
Feelings" and "In My Feelings (Instrumental)" both normalize to "in
my feelings", title similarity is 1.0, and the wrong cut passes
verification.
Detect the version label on each side BEFORE normalization runs. If
the expected and matched recordings disagree on version (one is
original, the other is instrumental / live / acoustic / remix /
etc), return FAIL — the fingerprint identified a real song, just
not the version the caller asked for.
Reuses `MusicMatchingEngine.detect_version_type` so the same regex
patterns the pre-download Soulseek matcher applies also drive
post-download verification. No duplicated tables.
Also gates the secondary fallback scan, so a wrong-version variant
sitting in the same fingerprint cluster can't win the loop after
the best match has already been version-rejected.
6 tests pin the behavior:
- instrumental returned for vocal request → FAIL
- vocal returned for instrumental request → FAIL
- live vs acoustic → FAIL
- matching versions on both sides → PASS
- original-to-original happy path → PASS (regression guard)
- secondary scan skips wrong-version recordings → not PASS
2194/2194 full suite green (was 2188 + 6 new).
User caught downloading Kendrick Mr. Morale: three tracks (Rich
Interlude, Savior Interlude, Savior) showed ✅ Completed in the modal
but were missing on disk. Log forensics revealed two layered bugs.
Bug 1 — Verification wrapper assumed success on quarantined files
(`core/imports/pipeline.py`):
The outer `post_process_matched_download_with_verification` had a
fallback at the "no `_final_processed_path` in context" branch that
marked the task completed and notified `success=True`. The inner
post-processor sets `_final_processed_path` only when the file
actually reaches its destination. Integrity-rejected files
(`_integrity_failure_msg` set) and race-guard-failed files
(`_race_guard_failed` set) get quarantined or skipped without ever
setting `_final_processed_path`, so they fell straight into the
"assume success" branch.
Confirmed in user's log:
No _final_processed_path in context for task d5b88b84-... —
cannot verify, assuming success
That line fired for the same task right after the integrity check
quarantined the source file. Result: ✅ Completed in UI, file in
quarantine, never delivered.
Fix: explicit checks for `_integrity_failure_msg` and
`_race_guard_failed` markers BEFORE the assume-success fallback.
Either marker set → task status='failed' with descriptive
error_message + `_notify_download_completed(success=False)`. The
pre-existing assume-success behavior preserved when no failure
markers are set (some legitimate flows complete without setting
`_final_processed_path`).
Bug 2 — AcoustID skip-logic too lenient
(`core/acoustid_verification.py`):
The "language/script" exemption was:
if best_score >= 0.95 and (title_sim >= 0.55 or
artist_sim >= ARTIST_MATCH_THRESHOLD):
The OR-clause fired for English-vs-English titles by the same artist
that share NO actual content. Confirmed in user's log: requested
"Rich (Interlude)" by Kendrick Lamar, AcoustID identified the audio
as "R.O.T.C. (interlude)" by Kendrick Lamar (a totally different
song from his 2010 mixtape) — same artist scored ≥ARTIST threshold,
shared word "interlude" pushed title_sim above 0.55, skip fired.
Verification returned SKIP instead of FAIL, the wrong file was
accepted as the answer for three different track requests.
Fix: skip now requires positive evidence the mismatch is a real
language/script case:
(a) Non-ASCII chars present in either title AND artist matches strongly
→ real transliteration case (kanji ↔ romaji etc)
(b) BOTH title_sim >= 0.80 AND artist_sim >= ARTIST threshold
→ minor punctuation/casing differences
English-vs-English with very different titles by the same artist no
longer skipped — verification correctly returns FAIL, the wrong file
gets quarantined, the new wrapper logic above marks the task failed.
Tests:
- `tests/test_integrity_failure_marks_task_failed.py` — 4 cases
pinning the wrapper-level state machine: integrity marker → failed,
race-guard marker → failed, no markers → still assumes success
(legacy path preserved), integrity-failure-takes-priority over
missing-final-path fallback.
- `tests/test_acoustid_skip_logic.py` — 7 cases pinning the skip
exemption: user's R.O.T.C-vs-Rich case → FAIL (regression test),
Savior-vs-R.O.T.C → FAIL (same bug surface), Japanese kanji →
romaji → SKIP (real language case still works), MAAD vs M.A.A.D →
PASS or SKIP (punctuation tolerance), low fingerprint score →
never skipped, high score but artist mismatch → no longer skipped,
Crown vs Crown of Thorns → no longer skipped.
Verified: full suite 1793 pass (11 new), ruff clean.
WHATS_NEW entry under '2.4.2' dev cycle.
User reported searching "Maduk - Leave A Light On" on Tidal silently
downloaded Tom Walker's completely different song of the same name, then
embedded Maduk's metadata into Tom Walker's audio. Three layers of
defense all failed permissively. Two of them are fixed here; the third
(score formula weights) was left alone since these two together cover it.
Layer 1 fix — candidate artist gate (web_server.py:27782)
Old: `if _best_artist < 0.4 and confidence < 0.85: continue`
New: `if _best_artist < 0.5 and confidence < 0.85: continue`
SequenceMatcher returns exactly 0.400 for "maduk" vs "tom walker"
(5-char vs 10-char strings with coincidental char matches), which
slipped past the strict `< 0.4` check. The word-boundary containment
check earlier in the function already short-circuits legitimate
formatting variations to sim=1.0, so falling to SequenceMatcher means
strings are genuinely different. 0.5 closes the fencepost AND gives
a small safety buffer.
Layer 3 fix — AcoustID verification (acoustid_verification.py:316)
When title matches but artist doesn't AND expected artist isn't found
anywhere in AcoustID's returned recordings:
Old: always SKIP (let file through, assume cover/collab)
New: FAIL if artist_sim < 0.3 (clear mismatch)
SKIP if artist_sim >= 0.3 (ambiguous — cover/collab/formatting)
The 0.3 cutoff catches hard mismatches like Maduk/Tom Walker (sim ~0.2)
while preserving benefit-of-the-doubt for borderline artist formatting
differences. Legitimate covers and collabs where the expected artist
appears anywhere in AcoustID's recordings still PASS via the existing
secondary-match loop above.
Both fixes are defense-in-depth — either alone would have caught this
bug. Together they close the pre-download AND post-download gaps.
All 292 tests pass. Version bumped to 2.39 with changelog entries.
The high-confidence fingerprint skip (≥0.95) assumed title mismatches
were language/script differences and bypassed verification. But a high
fingerprint score just means AcoustID identified the audio confidently —
not that it matches the requested track. Now requires partial title
(≥0.55) or artist (≥0.60) similarity before skipping, so completely
wrong files (e.g. different song/artist from same remix producer) are
correctly rejected.
When the fingerprint score is >=0.95 but title/artist don't match
(e.g. English expected vs Japanese returned), SKIP instead of FAIL.
A 95%+ fingerprint means the audio IS the correct recording — the
metadata mismatch is just a language/script difference, not a wrong
file. Prevents Japanese, Chinese, Korean, and other non-Latin tracks
from being falsely quarantined.
Happy path unchanged — matching title/artist still returns PASS at
the earlier check before this code is reached.
AcoustID sometimes returns featuring info in square brackets like [W/ Barnes Blvd.] instead of parentheses. The normalizer only stripped parenthetical featuring tags, so these tracks failed verification and got quarantined despite being correct. Now strips [W/ ...], [with ...], [feat. ...], and [ft. ...] bracket patterns too.
Expand track title normalization and refine album grouping logic.
- core/acoustid_verification.py: Broadened the parenthetical-suffix regex to strip year-based remasters and additional variants (e.g. "2025 Remaster", "single edit", "album edit") while still removing common extras like (Live), (Deluxe), (Radio Edit), and featuring tags.
- web_server.py: Restrict the smart album grouping to only run for singles/auto-detected albums; explicit album downloads now preserve the original Spotify album name to avoid mangling names (e.g. reworked/remastered vs deluxe). Added explicit logging for both smart grouping and skipped grouping paths. The verification post-processing worker now checks an is_album_download flag in context and skips re-grouping when true, with fallback logging on errors.
These changes prevent unintended renaming of explicit album downloads and improve normalization of common title suffixes.
Add optional post-download audio fingerprint verification using AcoustID.
Downloads are verified against expected track/artist using fuzzy string
matching on AcoustID results. Mismatched files are quarantined and
automatically added to the wishlist for retry.
- AcoustID verification with title/artist fuzzy matching (not MBID comparison)
- Quarantine system with JSON metadata sidecars for failed verifications
- fpcalc binary auto-download for Windows, macOS (universal), and Linux
- MusicBrainz enrichment worker with live status UI and track badges
- Settings page AcoustID section with real-fingerprint connection test
- Source reuse for album downloads to keep tracks from same Soulseek user
- Enhanced search queries for better track matching
- Bug fixes: wishlist tracking, album splitting, regex & handling, log rotation