SoulSync

Commit Graph

Author	SHA1	Message	Date
Broque Thomas	b42cafa150	AcoustID + quarantine modal: three bug fixes (closes #607 , closes #608 ) Issue #607 (AfonsoG6) -- two AcoustID problems: 1. Live recordings false-quarantining as "Version mismatch: expected '... (Live at Venue)' (live) but file is '...' (original)" because MusicBrainz often stores the recording entity with a bare title -- the venue / live annotation lives on the release entity, not the recording. The audio fingerprint correctly identifies the live recording, but the title-text comparison flagged it as wrong. New pure helper `core/matching/version_mismatch.py:is_acceptable_version_mismatch` accepts the mismatch only when: - One-sided AND involves 'live': exactly one side is 'live' and the other is bare 'original'. Two-sided mismatches stay strict. - Fingerprint score >= 0.85 (stricter than the existing 0.80 minimum -- escape valve only fires when AcoustID is more confident than its own threshold). - Bare title similarity >= 0.70. - Artist similarity >= 0.60. Other version markers (instrumental, remix, acoustic, demo, etc) stay strict -- those have distinct fingerprints AND MB always annotates them in the recording title. The existing test_acoustid_version_mismatch.py suite passes unchanged. 2. Audio-mismatch failure message reported "identified as '' by '' (artist=100%)" when AcoustID returned multiple recordings -- prior code mixed `recordings[0]`'s strings (which can be empty) with `best_rec`'s scores. Now uses `matched_title` / `matched_artist` consistently in both the high-confidence-skip path and the final fail message. Issue #608 (AfonsoG6) -- quarantine modal: 3. Approve / Delete buttons silently no-op'd when the filename contained an apostrophe -- the unescaped quote broke the inline JS in the onclick handler. Now wraps the id via `escapeHtml(JSON.stringify(id))`, which round-trips quotes / backslashes / unicode / newlines safely through the HTML attribute to JS string boundary. 4. Bonus UX: quarantine entry expanded view now shows source uploader (username) and original soulseek filename when the sidecar carries that context -- helps trace which uploader the bad file came from. Backend exposes `source_username` + `source_filename` fields from `sidecar.context.original_search_result`. Degrades to '' on legacy thin sidecars. Tests: - 23 new boundary tests in tests/matching/test_version_mismatch.py pin every shape: equal versions trivial, one-sided live both directions, threshold floors (each just below default -> reject), two-sided strict, non-live one-sided strict (covers exact test_instrumental_returned_for_vocal_request_fails scenario), custom-threshold overrides. - 4 existing test_acoustid_version_mismatch.py tests pass unchanged. - 507 AcoustID / matching / imports tests pass.	1 week ago
Broque Thomas	bc34d39ce9	Tighten alias-lookup trust + add ambiguity gate + diagnostic log Cin pre-review pass on the false-positive risk. Three tightenings: # 1. Bumped MB-search trust threshold from 0.6 → 0.85 `MusicBrainzService.lookup_artist_aliases` previously trusted any MB search match scoring ≥ 0.6 combined (name-similarity + MB relevance). For distinctive cross-script artists the user-reported case targets (Hiroyuki Sawano, Сергей Лазарев, etc.) real matches score ~1.0 — well above 0.85. The 0.6 floor was loose enough to let in moderate matches for ambiguous names, risking aliases for the wrong artist getting cached + applied. Bumped to 0.85. Tighter without rejecting any of the legit cross-script cases the PR is for. # 2. Ambiguity gate — skip when results within 0.1 of best When MB search returns multiple results all scoring high (within 0.1 of the best), the artist name is ambiguous — common name with multiple distinct artists ("John Smith" returning 10 different John Smiths). Pulling aliases for any one of them risks the wrong artist's data bridging incorrectly to a file's tag. Added explicit ambiguity detection: when 2+ results within 0.1, skip alias lookup entirely + cache empty. Matches Cin's "explicit > implicit" — the prior code just picked the highest score blindly. # 3. Diagnostic log when alias rescues a comparison When the alias path triggers a PASS that direct similarity would have FAILed, emit an INFO log: `Artist alias rescued comparison: expected='X' vs actual='Y' (direct sim=0.00, alias 'Z' → score=1.00)`. Lets future bug reports trace which alias triggered which decision. Doesn't change behavior — visibility only. Logs ONLY the rescue case, not happy-path direct matches (no log spam). # Tests added (5) `test_artist_alias_service.py` (+3): - `test_moderate_confidence_match_now_skipped_strict_threshold` - `test_ambiguous_results_skipped` - `test_unambiguous_high_confidence_match_succeeds` `test_acoustid_verification_aliases.py` (+3): - `test_alias_rescue_emits_info_log` — direct-fail + alias-pass emits INFO log - `test_no_log_when_direct_match_succeeds` — happy path quiet - `test_no_log_when_alias_doesnt_help` — failed path also quiet # Test infrastructure note Logging tests use a directly-attached `ListHandler` on `soulsync.acoustid.verification` (the actual logger name — dot-separated by `get_logger`), NOT pytest's caplog. Same pattern as the prior watchdog-test fix — caplog is intermittently flaky in full-suite runs for soulsync namespace loggers. An owned handler sidesteps both issues. # Verification - 85/85 matching tests pass (+5 from prior commit) - 2543 full suite passes (+6 from prior, +85 PR-total) - Ruff clean - Reporter's Japanese + Russian regression tests still pass — legit cross-script case (sim ≈ 1.0) clears the new 0.85 threshold easily	2 weeks ago
Broque Thomas	11397307b2	Alias resolution polish: lazy-fire on direct-match failure + worker backfill Two perf gaps that would have failed Cin's review: # Gap #1: alias lookup fired unconditionally Pre-fix in this commit, `_resolve_expected_artist_aliases` ran at the top of every `verify_audio_file` call regardless of whether the direct artist match would have passed. For users whose library is mostly same-script (95% of cases), every successful verification was paying for a wasted DB query (and possibly a wasted MB API call for un-enriched artists). Restructured the helper to accept a callable provider instead of a pre-resolved list. Provider invoked LAZILY only when direct similarity falls below `ARTIST_MATCH_THRESHOLD`. Verifier passes a memoising thunk that resolves once across the 3 comparison sites within one verification. `_alias_aware_artist_sim` now accepts `aliases` as either: - iterable of strings (used eagerly — backward compat with tests that already know the aliases) - callable returning the iterable (resolved on first need within a verification) Happy path (direct match passes): zero DB queries, zero MB calls. Cross-script case: one resolution shared across 3 sites — same as the prior contract. # Gap #2: existing-MBID artists never got alias backfill Worker's `_process_item` artist branch had an `existing_id` short- circuit (line 296) that updated MBID status but skipped alias fetch. Result: every user with an already-enriched library had MBIDs but NULL aliases on day-one of this PR. Live MB lookup at verify-time covered them, but at the cost of N live calls for N artists across the library. Added one-time backfill: when existing-MBID is found AND `artists.aliases` for that row is empty, fetch + persist aliases. Subsequent re-scan cycles short-circuit on the populated column — no repeated MB calls. New helper `_artist_aliases_empty(artist_id)` does the cheap NULL check via direct SQL. Best-effort: defensively returns True on errors so backfill happens (a redundant MB call is cheaper than missing the backfill entirely). # Tests added (9) `test_acoustid_verification_aliases.py` (+6): - `TestLazyAliasResolution` (3): no lookup when direct match passes, lookup fires only when direct fails, lookup memoised across the 3 sites within one verification. - `TestAliasProviderCallable` (3): iterable passed directly, callable resolves lazily, callable returning empty falls back to direct sim. `test_artist_alias_service.py` (+3): - `test_existing_mbid_path_backfills_aliases_when_column_empty` - `test_existing_mbid_path_skips_backfill_when_aliases_already_set` - `test_existing_mbid_backfill_failure_does_not_break_match` # Verification - 79/79 matching tests pass (+9 from prior commit) - 2537 full suite passes (+9, +79 PR-total) - Ruff clean - Backward compat: every prior-commit test still passes (the iterable-shape API still works alongside the new callable shape)	2 weeks ago
Broque Thomas	7066233c37	Wire alias-aware artist match into AcoustID verifier — fixes #442 This is the user-visible commit. The reporter's exact two cases (Japanese kanji, Russian Cyrillic) now pass verification instead of being quarantined. # What changed Verifier's three artist-similarity sites now route through the shared `core.matching.artist_aliases.artist_names_match` helper instead of raw `_similarity`: - `_find_best_title_artist_match` (per-recording scoring at the best-match stage) - Secondary scan when title matches but best-match's artist doesn't (line ~355 pre-fix) - Final fallback scan over all recordings (line ~403 pre-fix) Aliases for the expected artist are resolved ONCE at the top of `verify_audio_file` via `_resolve_expected_artist_aliases`, which calls the new `MusicBrainzService.lookup_artist_aliases` chain (library DB → cache → live MB). Single resolution per verification regardless of how many AcoustID recordings come back — pinned by test. New helper `_alias_aware_artist_sim(expected, actual, aliases)` wraps the pure helper with the verifier's normaliser (`_similarity`) and threshold (`ARTIST_MATCH_THRESHOLD`). Returns a single float so existing threshold-comparison code paths keep their shape — minimal diff. # Reporter's cases — verified Case 1 (issue #442 verbatim): File: YAMANAIAME by 澤野弘之 Expected: YAMANAIAME by Hiroyuki Sawano Pre-fix: Quarantined (artist=0%) Post-fix: PASS (alias '澤野弘之' resolved from MB) Case 2 (issue #442 verbatim): File: On the Other Side by Sergey Lazarev Expected: On the other side by Сергей Лазарев Pre-fix: Quarantined (artist=7%) Post-fix: PASS (alias 'Sergey Lazarev' resolved from MB) Both reproduced as regression tests with stubbed MB service. # Backward compat Three test cases pin that no-aliases / failure paths preserve pre-fix behaviour exactly: - Clear artist mismatch (different artist, same script) still FAILs — aliases bridge synonyms, not unrelated artists. - Exact title + artist match still PASSes regardless of aliases. - MB service raise → verifier completes with direct similarity (treats failure as "no aliases available" — same as pre-fix). Also covers manual import: the import-modal "Search for Match" flow goes through the same verifier, so the reporter's complaint that "manual import simply throws them back in quarantine again" is fixed by the same change. # Tests added (11) `tests/matching/test_acoustid_verification_aliases.py`: - `_alias_aware_artist_sim`: alias bridges score ↑, no-aliases falls back, aliases don't mask genuine mismatches - `_find_best_title_artist_match` accepts + uses aliases - Reporter's case 1 (Japanese) end-to-end - Reporter's case 2 (Russian) end-to-end - Backward compat: no-aliases mismatch still fails, exact match still passes, MB-service-raise doesn't break verification - Performance: alias lookup fires ONCE per verification regardless of recording count # Verification - 11 new verifier tests pass - 31 prior service tests pass - 28 prior helper tests pass - 294 matching + imports tests pass total (no regression) - Ruff clean	2 weeks ago
Broque Thomas	01c528fd5f	Reject AcoustID matches whose version disagrees with the expected track Discord report (corruption [BWC]): downloads coming through as the instrumental cut when a vocal track was requested. The verification step's `_normalize` function strips parentheticals and version-suffix tags ("(Instrumental)", "- Live", etc) so legitimate name variations don't false-fail the title-similarity check. That also means "In My Feelings" and "In My Feelings (Instrumental)" both normalize to "in my feelings", title similarity is 1.0, and the wrong cut passes verification. Detect the version label on each side BEFORE normalization runs. If the expected and matched recordings disagree on version (one is original, the other is instrumental / live / acoustic / remix / etc), return FAIL — the fingerprint identified a real song, just not the version the caller asked for. Reuses `MusicMatchingEngine.detect_version_type` so the same regex patterns the pre-download Soulseek matcher applies also drive post-download verification. No duplicated tables. Also gates the secondary fallback scan, so a wrong-version variant sitting in the same fingerprint cluster can't win the loop after the best match has already been version-rejected. 6 tests pin the behavior: - instrumental returned for vocal request → FAIL - vocal returned for instrumental request → FAIL - live vs acoustic → FAIL - matching versions on both sides → PASS - original-to-original happy path → PASS (regression guard) - secondary scan skips wrong-version recordings → not PASS 2194/2194 full suite green (was 2188 + 6 new).	3 weeks ago
Broque Thomas	04a14f7e96	Fix: tasks showed Completed when file was quarantined User caught downloading Kendrick Mr. Morale: three tracks (Rich Interlude, Savior Interlude, Savior) showed ✅ Completed in the modal but were missing on disk. Log forensics revealed two layered bugs. Bug 1 — Verification wrapper assumed success on quarantined files (`core/imports/pipeline.py`): The outer `post_process_matched_download_with_verification` had a fallback at the "no `_final_processed_path` in context" branch that marked the task completed and notified `success=True`. The inner post-processor sets `_final_processed_path` only when the file actually reaches its destination. Integrity-rejected files (`_integrity_failure_msg` set) and race-guard-failed files (`_race_guard_failed` set) get quarantined or skipped without ever setting `_final_processed_path`, so they fell straight into the "assume success" branch. Confirmed in user's log: No _final_processed_path in context for task d5b88b84-... — cannot verify, assuming success That line fired for the same task right after the integrity check quarantined the source file. Result: ✅ Completed in UI, file in quarantine, never delivered. Fix: explicit checks for `_integrity_failure_msg` and `_race_guard_failed` markers BEFORE the assume-success fallback. Either marker set → task status='failed' with descriptive error_message + `_notify_download_completed(success=False)`. The pre-existing assume-success behavior preserved when no failure markers are set (some legitimate flows complete without setting `_final_processed_path`). Bug 2 — AcoustID skip-logic too lenient (`core/acoustid_verification.py`): The "language/script" exemption was: if best_score >= 0.95 and (title_sim >= 0.55 or artist_sim >= ARTIST_MATCH_THRESHOLD): The OR-clause fired for English-vs-English titles by the same artist that share NO actual content. Confirmed in user's log: requested "Rich (Interlude)" by Kendrick Lamar, AcoustID identified the audio as "R.O.T.C. (interlude)" by Kendrick Lamar (a totally different song from his 2010 mixtape) — same artist scored ≥ARTIST threshold, shared word "interlude" pushed title_sim above 0.55, skip fired. Verification returned SKIP instead of FAIL, the wrong file was accepted as the answer for three different track requests. Fix: skip now requires positive evidence the mismatch is a real language/script case: (a) Non-ASCII chars present in either title AND artist matches strongly → real transliteration case (kanji ↔ romaji etc) (b) BOTH title_sim >= 0.80 AND artist_sim >= ARTIST threshold → minor punctuation/casing differences English-vs-English with very different titles by the same artist no longer skipped — verification correctly returns FAIL, the wrong file gets quarantined, the new wrapper logic above marks the task failed. Tests: - `tests/test_integrity_failure_marks_task_failed.py` — 4 cases pinning the wrapper-level state machine: integrity marker → failed, race-guard marker → failed, no markers → still assumes success (legacy path preserved), integrity-failure-takes-priority over missing-final-path fallback. - `tests/test_acoustid_skip_logic.py` — 7 cases pinning the skip exemption: user's R.O.T.C-vs-Rich case → FAIL (regression test), Savior-vs-R.O.T.C → FAIL (same bug surface), Japanese kanji → romaji → SKIP (real language case still works), MAAD vs M.A.A.D → PASS or SKIP (punctuation tolerance), low fingerprint score → never skipped, high score but artist mismatch → no longer skipped, Crown vs Crown of Thorns → no longer skipped. Verified: full suite 1793 pass (11 new), ruff clean. WHATS_NEW entry under '2.4.2' dev cycle.	3 weeks ago
Broque Thomas	8f85b0c251	Fix silent wrong-artist track downloads (Maduk/Tom Walker bug) User reported searching "Maduk - Leave A Light On" on Tidal silently downloaded Tom Walker's completely different song of the same name, then embedded Maduk's metadata into Tom Walker's audio. Three layers of defense all failed permissively. Two of them are fixed here; the third (score formula weights) was left alone since these two together cover it. Layer 1 fix — candidate artist gate (web_server.py:27782) Old: `if _best_artist < 0.4 and confidence < 0.85: continue` New: `if _best_artist < 0.5 and confidence < 0.85: continue` SequenceMatcher returns exactly 0.400 for "maduk" vs "tom walker" (5-char vs 10-char strings with coincidental char matches), which slipped past the strict `< 0.4` check. The word-boundary containment check earlier in the function already short-circuits legitimate formatting variations to sim=1.0, so falling to SequenceMatcher means strings are genuinely different. 0.5 closes the fencepost AND gives a small safety buffer. Layer 3 fix — AcoustID verification (acoustid_verification.py:316) When title matches but artist doesn't AND expected artist isn't found anywhere in AcoustID's returned recordings: Old: always SKIP (let file through, assume cover/collab) New: FAIL if artist_sim < 0.3 (clear mismatch) SKIP if artist_sim >= 0.3 (ambiguous — cover/collab/formatting) The 0.3 cutoff catches hard mismatches like Maduk/Tom Walker (sim ~0.2) while preserving benefit-of-the-doubt for borderline artist formatting differences. Legitimate covers and collabs where the expected artist appears anywhere in AcoustID's recordings still PASS via the existing secondary-match loop above. Both fixes are defense-in-depth — either alone would have caught this bug. Together they close the pre-download AND post-download gaps. All 292 tests pass. Version bumped to 2.39 with changelog entries.	1 month ago
Antti Kettunen	01d118daa6	Separate AcoustID file logging - keep AcoustID logs out of app.log - route client and verification to logs/acoustid.log - align tag writer with the soulsync logger namespace	1 month ago
Broque Thomas	1695953705	Fix AcoustID high-confidence skip letting wrong files through The high-confidence fingerprint skip (≥0.95) assumed title mismatches were language/script differences and bypassed verification. But a high fingerprint score just means AcoustID identified the audio confidently — not that it matches the requested track. Now requires partial title (≥0.55) or artist (≥0.60) similarity before skipping, so completely wrong files (e.g. different song/artist from same remix producer) are correctly rejected.	1 month ago
Broque Thomas	f68cae64a7	Skip AcoustID verification for high-confidence cross-language matches When the fingerprint score is >=0.95 but title/artist don't match (e.g. English expected vs Japanese returned), SKIP instead of FAIL. A 95%+ fingerprint means the audio IS the correct recording — the metadata mismatch is just a language/script difference, not a wrong file. Prevents Japanese, Chinese, Korean, and other non-Latin tracks from being falsely quarantined. Happy path unchanged — matching title/artist still returns PASS at the earlier check before this code is reached.	2 months ago
Broque Thomas	e5a6ac7821	Strip all parentheticals in AcoustID title normalization	2 months ago
Broque Thomas	6c4de45b32	fix acoustID match issue and css changes	2 months ago
Broque Thomas	98746961aa	Improve AcoustID verification normalization for version tags and suffixes	3 months ago
Broque Thomas	e38c0adc57	Fix: Various artist compilations caused failure on acoustID check.	3 months ago
Broque Thomas	890d47928d	Fix AcoustID false rejections for tracks with featured artists in square brackets AcoustID sometimes returns featuring info in square brackets like [W/ Barnes Blvd.] instead of parentheses. The normalizer only stripped parenthetical featuring tags, so these tracks failed verification and got quarantined despite being correct. Now strips [W/ ...], [with ...], [feat. ...], and [ft. ...] bracket patterns too.	3 months ago
Broque Thomas	a0828c7aad	Improve album grouping and normalize suffixes Expand track title normalization and refine album grouping logic. - core/acoustid_verification.py: Broadened the parenthetical-suffix regex to strip year-based remasters and additional variants (e.g. "2025 Remaster", "single edit", "album edit") while still removing common extras like (Live), (Deluxe), (Radio Edit), and featuring tags. - web_server.py: Restrict the smart album grouping to only run for singles/auto-detected albums; explicit album downloads now preserve the original Spotify album name to avoid mangling names (e.g. reworked/remastered vs deluxe). Added explicit logging for both smart grouping and skipped grouping paths. The verification post-processing worker now checks an is_album_download flag in context and skips re-grouping when true, with fallback logging on errors. These changes prevent unintended renaming of explicit album downloads and improve normalization of common title suffixes.	3 months ago
Broque Thomas	3f0854e070	Fix AcoustID verification: MusicBrainz metadata fallback and quarantine reliability	3 months ago
Broque Thomas	1071337659	fix normalization issue on comparison with acoustID	3 months ago
Broque Thomas	d9efcbdf99	feat: AcoustID audio verification, MusicBrainz enrichment UI, v1.5 Add optional post-download audio fingerprint verification using AcoustID. Downloads are verified against expected track/artist using fuzzy string matching on AcoustID results. Mismatched files are quarantined and automatically added to the wishlist for retry. - AcoustID verification with title/artist fuzzy matching (not MBID comparison) - Quarantine system with JSON metadata sidecars for failed verifications - fpcalc binary auto-download for Windows, macOS (universal), and Linux - MusicBrainz enrichment worker with live status UI and track badges - Settings page AcoustID section with real-fingerprint connection test - Source reuse for album downloads to keep tracks from same Soulseek user - Enhanced search queries for better track matching - Bug fixes: wishlist tracking, album splitting, regex & handling, log rotation	4 months ago

19 Commits (dev)