SoulSync

Commit Graph

Author	SHA1	Message	Date
Broque Thomas	9cc09118bf	AcoustID scanner: multi-candidate match + duration guard + multi-value retag Closes #587. Three coordinated fixes per codex's diagnosis. AcoustID verification gate left intact — these fixes target the upstream scanner false-positive surface plus a separate retag-path gap. Bug 1 — scanner used recordings[0] as authoritative `core/repair_jobs/acoustid_scanner.py:_scan_file` only checked the top fingerprint match's metadata. AcoustID often returns multiple recordings per fingerprint (sample collisions, multi-MB-record cases) and the wrong-credited recording can outrank the right- credited one. Foxxify case 2 (Nana / Nana): top match credited the wrong artist while a lower-ranked candidate matched the user's expected metadata exactly. Lifted the verifier's all-candidates check to a shared pure helper `core/matching/acoustid_candidates.py:find_matching_recording`. Both verifier and scanner can now ask "given these candidates, does ANY of them match expected (title, artist)?" with the same contract. Scanner suppresses the finding when any candidate matches. Bug 2 — no duration check guards against fingerprint hash collisions Foxxify case 3: 17-minute mashup edit fingerprinted to a 5-minute late-70s Japanese hiphop track (different songs, fingerprint hash collision on a sampled section). Scanner had no signal to detect this and would have recommended retagging the 17-min file as the 5-min track. `duration_mismatches_strongly` in the same helper module flags drifts beyond max(60s, 35%). Scanner now skips findings when the candidate's duration disagrees strongly with the file's expected duration. Loaded duration via the existing tracks SQL (added `t.duration` to the SELECT). Returns False when either side is unknown — no behavior change for older rows without duration data. Bug 3 — scanner retag bypassed multi-value ARTISTS tag setting `core/repair_worker.py:_fix_wrong_song` called `write_tags_to_file` with single-string artist updates. The writer only wrote TPE1 (single string) and never read the user's `metadata_enhancement.tags.write_multi_artist` config. Multi-value ARTISTS tags got stripped on every retag, contradicting the post-download enrichment pipeline's behavior. Per codex's pick (option B over routing through enhance_file_metadata), extended `write_tags_to_file` with an optional `artists_list` parameter. Each format-specific writer respects the config flag the same way enrichment.py does: - ID3: TPE1 stays as joined display string + TXXX:Artists multi-value - Vorbis/Opus/FLAC: `artist` display string + `artists` multi-value key - MP4: \xa9ART as list when on, single string when off Scanner retag derives the per-artist list by splitting AcoustID's credit through the existing `split_artist_credit` helper (same separators the matching layer already uses). Backward compatible: callers that don't pass `artists_list` get the exact same single-string write as before. No regression for the write_artist_image button or any other tag_writer caller. 15 tests on the candidate helper + duration guard. 13 tests on the tag_writer multi-value path (write/skip/single/ no-list cases for FLAC + the config-gate helper). 4 new scanner regression tests pinning lower-ranked candidate suppression, no-suppression when no candidate matches, duration mismatch skip, no-skip when duration matches. Existing scanner tests updated for the new 11-column SQL select (added duration column to fake schema + test row tuples). Full suite: 3097 passed. Ruff clean.	2 weeks ago
Broque Thomas	7a23d60f28	AcoustID scanner: file-tag fallback for legacy compilation tracks Follow-up to the prior compilation-album scanner fix. That patch made the scanner read `tracks.track_artist` (per-track artist column) via COALESCE so compilation tracks would compare against the right value. But tracks downloaded BEFORE the `track_artist` column existed have track_artist=NULL — COALESCE falls back to album artist (the curator) and the wrong-comparison case returns. Fix: explicit 3-tier resolution in `_scan_file`: 1. DB `tracks.track_artist` if populated → trust it. Respects manual edits from the enhanced library view (user who curated the DB value but didn't re-tag the file gets their edit respected, not overridden by stale file tag). 2. File's ARTIST tag via mutagen if present → use it. Tidal / Spotify / Deezer all write the per-track artist into the audio file at download time regardless of SoulSync's DB schema, so it's ground truth even when the DB column is stale or NULL. File is already open for fingerprinting so mutagen tag-read is essentially free. 3. Album artist → final fallback for files without proper ARTIST tags AND no DB track_artist. Existing pre-fix behavior. `_load_db_tracks` SELECT now surfaces `track_artist` (raw, may be empty/NULL via NULLIF) and `album_artist` separately in addition to the COALESCE'd `artist` field — so `_scan_file` can tell the difference between 'DB has a curated value' and 'DB fell back to album artist'. Without this distinction, the file-tag fallback would create false positives when DB is curated but file is stale. 5 new tests (11 total in the file) pin: - File-tag-trumps-DB resolves the legacy NULL case (DB says 'Andromedik' (album curator), file says 'Eclypse', AcoustID says 'Eclypse' → no flag) - Tag-missing falls back to album artist (preserves existing genuine-mismatch contract — file without tag + AcoustID mismatch still flags) - Mutagen exception swallowed (debug log, fall-through) - File-tag matches DB → no behavioral change - DB curated value trumps stale file tag (false-positive guard — user edited DB without re-tagging file shouldn't get flagged) Two existing test fixtures (`_make_context` callers) updated to the new 10-column row shape. SQL behavior verified empirically against real SQLite: NULL and empty-string both flow through NULLIF → None in Python → file-tag-fallback path. Modern populated values trump file tag.	2 weeks ago
Broque Thomas	812db1fbbf	AcoustID scanner: prefer track_artist for compilation albums Discord report (Skowl): downloaded a compilation album ("High Tea Music: Vol 1") where every track has a different artist (Eclypse, Andromedik, T & Sugah, Gourski, etc.) and the AcoustID scanner flagged every single track as Wrong Song. The file tags had the correct per-track artist (e.g. "Eclypse" for "City Lights"), but the scanner compared against the album-level artist ("Andromedik", the curator). Raw similarity 12% → Wrong Song flag. # Why the prior multi-value fix didn't help Foxxify's case (just-merged PR): AcoustID returned multi-value credit "Okayracer, aldrch & poptropicaslutz!" — primary IS in the credit. Splitting found it. Skowl's case: both sides single-value but DIFFERENT artists. Splitter has nothing to find — Eclypse simply isn't in "Andromedik". Different bug. # Cause Scanner SQL at `core/repair_jobs/acoustid_scanner.py:281` joined the `artists` table via `tracks.artist_id` which points at the ALBUM artist (the curator/label-name applied to every row in a compilation). The `tracks.track_artist` column already holds the correct per-track artist for compilations — populated by every server-scan path (Plex `originalTitle`, Jellyfin `ArtistItems`, Navidrome per-track `artist`) AND the auto-import / direct-download post-process flow (`record_soulsync_library_entry` writes it when different from album artist). Scanner just wasn't reading it. # Fix ```sql SELECT t.id, t.title, COALESCE(NULLIF(t.track_artist, ''), ar.name) AS artist, ... ``` Prefers per-track artist when populated, falls back to album artist for legacy rows / single-artist albums where `track_artist` is NULL. `NULLIF(t.track_artist, '')` handles the empty-string-instead-of-null case some legacy rows might have. # Composes with Foxxify's multi-value fix For the rare compilation track where AcoustID ALSO returns a multi-value credit (e.g. compilation track has multiple credited performers), both paths work together — `track_artist` gives the correct expected primary, then the helper splits the credit and finds it. # Tests added (2) - `test_load_db_tracks_prefers_track_artist_for_compilation` — reporter's exact case: track with `track_artist='Eclypse'` AND `artist_id` pointing at album artist 'Andromedik' resolves to 'Eclypse'. Second track with NULL `track_artist` falls back to album artist 'Andromedik' (single-artist + legacy compat). - `test_load_db_tracks_falls_back_when_track_artist_empty_string` — empty string in `track_artist` (some legacy rows) → NULLIF returns NULL → COALESCE falls back to album artist. Both use a real SQLite DB so the COALESCE/NULLIF logic + JOIN runs against actual schema (SimpleNamespace fakes can't simulate JOINs). # Verification - 6/6 scanner tests pass (2 new + 4 existing) - 2586 full suite passes (+2 from prior commit) - Ruff clean	3 weeks ago
Broque Thomas	df304eb016	AcoustID scanner: handle multi-value artist credits Discord report (Foxxify): the AcoustID scanner repair job flagged multi-artist tracks as Wrong Song because AcoustID returns the FULL credit ("Okayracer, aldrch & poptropicaslutz!") while the library DB carries only the primary artist ("Okayracer"). Raw SequenceMatcher similarity scored ~43% — well below the 60% threshold — so the scanner created a finding even though the audio was correct. User couldn't fix without lowering the global artist threshold to ~30% (which would let real mismatches through). # Fix Extended the shared `core/matching/artist_aliases.py::artist_names_match` helper (originally lifted for #441) with credit-token splitting. When the actual artist string contains common separators — - punctuation: `,` `&` `;` `/` `+` - keywords (whitespace-bounded): `feat.` `ft.` `featuring` `with` `vs.` `x` — the helper splits into individual contributors and checks each against the expected artist. Primary-in-credit cases now resolve at 100% instead of 43%. Two pattern groups because punctuation separators don't need surrounding whitespace, but keyword separators MUST be whitespace-bounded — otherwise we'd split artists with `x` / `with` etc. in their names ("JAY-X" → "JAY-" / "" issue). Composes with the existing alias path: cross-script multi-artist credits ("Hiroyuki Sawano" expected, "澤野弘之, FeaturedJp" actual) work via alias-token-against-credit-token compare. # Wire-in Scanner at `core/repair_jobs/acoustid_scanner.py:202` replaces the raw `SequenceMatcher` call with `artist_names_match`. Pass RAW artist strings (not pre-normalised by `_normalize`) so the splitter can recognise separators — `_normalize` strips ALL punctuation, which destroyed the very tokens the splitter needs. The AcoustID post-download verifier (`core/acoustid_verification.py`) already routes through `_alias_aware_artist_sim` which calls the same helper — gets the multi-value benefit automatically without a separate wire-in. # New `split_artist_credit` exported helper Pure-function helper for callers who want token-level access to the credit list (debugging, UI, future per-token enrichment). Same splitter logic, exposed as a top-level function. # Tests added (14) `tests/matching/test_artist_aliases.py` (+11): - `TestSplitArtistCredit` — parametrised across 12 credit-string formats (comma, ampersand, semicolon, slash, plus, feat./ft./ featuring, with, vs., x, single-token, empty), drops empty tokens, strips per-token whitespace - `TestMultiValueCreditMatching` — reporter's exact case (Okayracer in 3-artist credit → 100%), primary in middle/end of credit, genuine-mismatch still fails, single-token actual falls through to direct compare, multi-value composes with aliases, threshold still respected `tests/test_acoustid_scanner.py` (+3): - Reporter's case end-to-end through `_scan_file` — fingerprint 99% / title 100% / multi-artist credit → no finding created - Genuine artist mismatch still creates finding (no false suppression of real mismatches) - `JobResultStub` minimal scaffold for the integration tests # Verification - 14 new tests pass (49 helper + 5 scanner total in their files) - 110 matching + scanner tests pass total - 2584 full suite passes (+25 from baseline 2559) - Ruff clean - Reporter's exact case (Okayracer in `Okayracer, aldrch & poptropicaslutz!`) now scores 100% match → no Wrong Song flag	3 weeks ago
Antti Kettunen	88e2527b96	Fix null-pointer error in acoustid_scanner The root cause (null track ids) needs to be solved elsewhere, but this is a band-aid for now	1 month ago

5 Commits (dev)