mirror of https://github.com/Nezreka/SoulSync.git
dev
main
fix/usenet-album-poll-sab-handoff
fix/quarantine-source-dedup
release/2.5.3
fix/disable-beatport-features
johnbaumb-discover-redesign
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
2.4.0
2.4.1
2.4.2
2.5.0
2.5.1
2.5.2
2.5.3
2.5.4
2.5.5
2.5.6
2.5.7
2.5.9
2.6.0
2.6.1
2.6.2
2.6.3
v0.65
${ noResults }
5 Commits (dev)
| Author | SHA1 | Message | Date |
|---|---|---|---|
|
|
9cc09118bf |
AcoustID scanner: multi-candidate match + duration guard + multi-value retag
Closes #587. Three coordinated fixes per codex's diagnosis. AcoustID verification gate left intact — these fixes target the upstream scanner false-positive surface plus a separate retag-path gap. Bug 1 — scanner used recordings[0] as authoritative `core/repair_jobs/acoustid_scanner.py:_scan_file` only checked the top fingerprint match's metadata. AcoustID often returns multiple recordings per fingerprint (sample collisions, multi-MB-record cases) and the wrong-credited recording can outrank the right- credited one. Foxxify case 2 (Nana / Nana): top match credited the wrong artist while a lower-ranked candidate matched the user's expected metadata exactly. Lifted the verifier's all-candidates check to a shared pure helper `core/matching/acoustid_candidates.py:find_matching_recording`. Both verifier and scanner can now ask "given these candidates, does ANY of them match expected (title, artist)?" with the same contract. Scanner suppresses the finding when any candidate matches. Bug 2 — no duration check guards against fingerprint hash collisions Foxxify case 3: 17-minute mashup edit fingerprinted to a 5-minute late-70s Japanese hiphop track (different songs, fingerprint hash collision on a sampled section). Scanner had no signal to detect this and would have recommended retagging the 17-min file as the 5-min track. `duration_mismatches_strongly` in the same helper module flags drifts beyond max(60s, 35%). Scanner now skips findings when the candidate's duration disagrees strongly with the file's expected duration. Loaded duration via the existing tracks SQL (added `t.duration` to the SELECT). Returns False when either side is unknown — no behavior change for older rows without duration data. Bug 3 — scanner retag bypassed multi-value ARTISTS tag setting `core/repair_worker.py:_fix_wrong_song` called `write_tags_to_file` with single-string artist updates. The writer only wrote TPE1 (single string) and never read the user's `metadata_enhancement.tags.write_multi_artist` config. Multi-value ARTISTS tags got stripped on every retag, contradicting the post-download enrichment pipeline's behavior. Per codex's pick (option B over routing through enhance_file_metadata), extended `write_tags_to_file` with an optional `artists_list` parameter. Each format-specific writer respects the config flag the same way enrichment.py does: - ID3: TPE1 stays as joined display string + TXXX:Artists multi-value - Vorbis/Opus/FLAC: `artist` display string + `artists` multi-value key - MP4: \xa9ART as list when on, single string when off Scanner retag derives the per-artist list by splitting AcoustID's credit through the existing `split_artist_credit` helper (same separators the matching layer already uses). Backward compatible: callers that don't pass `artists_list` get the exact same single-string write as before. No regression for the write_artist_image button or any other tag_writer caller. 15 tests on the candidate helper + duration guard. 13 tests on the tag_writer multi-value path (write/skip/single/ no-list cases for FLAC + the config-gate helper). 4 new scanner regression tests pinning lower-ranked candidate suppression, no-suppression when no candidate matches, duration mismatch skip, no-skip when duration matches. Existing scanner tests updated for the new 11-column SQL select (added duration column to fake schema + test row tuples). Full suite: 3097 passed. Ruff clean. |
2 weeks ago |
|
|
7a23d60f28 |
AcoustID scanner: file-tag fallback for legacy compilation tracks
Follow-up to the prior compilation-album scanner fix. That patch
made the scanner read `tracks.track_artist` (per-track artist
column) via COALESCE so compilation tracks would compare against
the right value. But tracks downloaded BEFORE the `track_artist`
column existed have track_artist=NULL — COALESCE falls back to
album artist (the curator) and the wrong-comparison case returns.
Fix: explicit 3-tier resolution in `_scan_file`:
1. DB `tracks.track_artist` if populated → trust it. Respects
manual edits from the enhanced library view (user who curated
the DB value but didn't re-tag the file gets their edit
respected, not overridden by stale file tag).
2. File's ARTIST tag via mutagen if present → use it. Tidal /
Spotify / Deezer all write the per-track artist into the
audio file at download time regardless of SoulSync's DB
schema, so it's ground truth even when the DB column is
stale or NULL. File is already open for fingerprinting so
mutagen tag-read is essentially free.
3. Album artist → final fallback for files without proper ARTIST
tags AND no DB track_artist. Existing pre-fix behavior.
`_load_db_tracks` SELECT now surfaces `track_artist` (raw, may be
empty/NULL via NULLIF) and `album_artist` separately in addition
to the COALESCE'd `artist` field — so `_scan_file` can tell the
difference between 'DB has a curated value' and 'DB fell back to
album artist'. Without this distinction, the file-tag fallback
would create false positives when DB is curated but file is stale.
5 new tests (11 total in the file) pin:
- File-tag-trumps-DB resolves the legacy NULL case (DB says
'Andromedik' (album curator), file says 'Eclypse', AcoustID
says 'Eclypse' → no flag)
- Tag-missing falls back to album artist (preserves existing
genuine-mismatch contract — file without tag + AcoustID
mismatch still flags)
- Mutagen exception swallowed (debug log, fall-through)
- File-tag matches DB → no behavioral change
- DB curated value trumps stale file tag (false-positive guard
— user edited DB without re-tagging file shouldn't get flagged)
Two existing test fixtures (`_make_context` callers) updated to
the new 10-column row shape.
SQL behavior verified empirically against real SQLite: NULL and
empty-string both flow through NULLIF → None in Python →
file-tag-fallback path. Modern populated values trump file tag.
|
2 weeks ago |
|
|
812db1fbbf |
AcoustID scanner: prefer track_artist for compilation albums
Discord report (Skowl): downloaded a compilation album ("High Tea
Music: Vol 1") where every track has a different artist (Eclypse,
Andromedik, T & Sugah, Gourski, etc.) and the AcoustID scanner
flagged every single track as Wrong Song. The file tags had the
correct per-track artist (e.g. "Eclypse" for "City Lights"), but
the scanner compared against the album-level artist ("Andromedik",
the curator). Raw similarity 12% → Wrong Song flag.
# Why the prior multi-value fix didn't help
Foxxify's case (just-merged PR): AcoustID returned multi-value
credit "Okayracer, aldrch & poptropicaslutz!" — primary IS in the
credit. Splitting found it.
Skowl's case: both sides single-value but DIFFERENT artists.
Splitter has nothing to find — Eclypse simply isn't in "Andromedik".
Different bug.
# Cause
Scanner SQL at `core/repair_jobs/acoustid_scanner.py:281` joined
the `artists` table via `tracks.artist_id` which points at the
ALBUM artist (the curator/label-name applied to every row in a
compilation). The `tracks.track_artist` column already holds the
correct per-track artist for compilations — populated by every
server-scan path (Plex `originalTitle`, Jellyfin `ArtistItems`,
Navidrome per-track `artist`) AND the auto-import / direct-download
post-process flow (`record_soulsync_library_entry` writes it when
different from album artist). Scanner just wasn't reading it.
# Fix
```sql
SELECT t.id, t.title,
COALESCE(NULLIF(t.track_artist, ''), ar.name) AS artist,
...
```
Prefers per-track artist when populated, falls back to album artist
for legacy rows / single-artist albums where `track_artist` is NULL.
`NULLIF(t.track_artist, '')` handles the empty-string-instead-of-null
case some legacy rows might have.
# Composes with Foxxify's multi-value fix
For the rare compilation track where AcoustID ALSO returns a
multi-value credit (e.g. compilation track has multiple credited
performers), both paths work together — `track_artist` gives the
correct expected primary, then the helper splits the credit and
finds it.
# Tests added (2)
- `test_load_db_tracks_prefers_track_artist_for_compilation` —
reporter's exact case: track with `track_artist='Eclypse'` AND
`artist_id` pointing at album artist 'Andromedik' resolves to
'Eclypse'. Second track with NULL `track_artist` falls back to
album artist 'Andromedik' (single-artist + legacy compat).
- `test_load_db_tracks_falls_back_when_track_artist_empty_string`
— empty string in `track_artist` (some legacy rows) → NULLIF
returns NULL → COALESCE falls back to album artist.
Both use a real SQLite DB so the COALESCE/NULLIF logic + JOIN
runs against actual schema (SimpleNamespace fakes can't simulate
JOINs).
# Verification
- 6/6 scanner tests pass (2 new + 4 existing)
- 2586 full suite passes (+2 from prior commit)
- Ruff clean
|
3 weeks ago |
|
|
df304eb016 |
AcoustID scanner: handle multi-value artist credits
Discord report (Foxxify): the AcoustID scanner repair job flagged
multi-artist tracks as Wrong Song because AcoustID returns the
FULL credit ("Okayracer, aldrch & poptropicaslutz!") while the
library DB carries only the primary artist ("Okayracer"). Raw
SequenceMatcher similarity scored ~43% — well below the 60%
threshold — so the scanner created a finding even though the
audio was correct. User couldn't fix without lowering the global
artist threshold to ~30% (which would let real mismatches through).
# Fix
Extended the shared `core/matching/artist_aliases.py::artist_names_match`
helper (originally lifted for #441) with credit-token splitting.
When the actual artist string contains common separators —
- punctuation: `,` `&` `;` `/` `+`
- keywords (whitespace-bounded): `feat.` `ft.` `featuring` `with`
`vs.` `x`
— the helper splits into individual contributors and checks each
against the expected artist. Primary-in-credit cases now resolve
at 100% instead of 43%.
Two pattern groups because punctuation separators don't need
surrounding whitespace, but keyword separators MUST be
whitespace-bounded — otherwise we'd split artists with `x` /
`with` etc. in their names ("JAY-X" → "JAY-" / "" issue).
Composes with the existing alias path: cross-script multi-artist
credits ("Hiroyuki Sawano" expected, "澤野弘之, FeaturedJp"
actual) work via alias-token-against-credit-token compare.
# Wire-in
Scanner at `core/repair_jobs/acoustid_scanner.py:202` replaces
the raw `SequenceMatcher` call with `artist_names_match`. Pass
RAW artist strings (not pre-normalised by `_normalize`) so the
splitter can recognise separators — `_normalize` strips ALL
punctuation, which destroyed the very tokens the splitter needs.
The AcoustID post-download verifier (`core/acoustid_verification.py`)
already routes through `_alias_aware_artist_sim` which calls the
same helper — gets the multi-value benefit automatically without
a separate wire-in.
# New `split_artist_credit` exported helper
Pure-function helper for callers who want token-level access to
the credit list (debugging, UI, future per-token enrichment). Same
splitter logic, exposed as a top-level function.
# Tests added (14)
`tests/matching/test_artist_aliases.py` (+11):
- `TestSplitArtistCredit` — parametrised across 12 credit-string
formats (comma, ampersand, semicolon, slash, plus, feat./ft./
featuring, with, vs., x, single-token, empty), drops empty
tokens, strips per-token whitespace
- `TestMultiValueCreditMatching` — reporter's exact case
(Okayracer in 3-artist credit → 100%), primary in middle/end of
credit, genuine-mismatch still fails, single-token actual falls
through to direct compare, multi-value composes with aliases,
threshold still respected
`tests/test_acoustid_scanner.py` (+3):
- Reporter's case end-to-end through `_scan_file` — fingerprint
99% / title 100% / multi-artist credit → no finding created
- Genuine artist mismatch still creates finding (no false
suppression of real mismatches)
- `JobResultStub` minimal scaffold for the integration tests
# Verification
- 14 new tests pass (49 helper + 5 scanner total in their files)
- 110 matching + scanner tests pass total
- 2584 full suite passes (+25 from baseline 2559)
- Ruff clean
- Reporter's exact case (Okayracer in `Okayracer, aldrch &
poptropicaslutz!`) now scores 100% match → no Wrong Song flag
|
3 weeks ago |
|
|
88e2527b96 |
Fix null-pointer error in acoustid_scanner
The root cause (null track ids) needs to be solved elsewhere, but this is a band-aid for now |
1 month ago |