SoulSync

Commit Graph

Author	SHA1	Message	Date
BoulderBadgeDad	bba0836324	Fix #768 : playlist sync editor refusing to match certain tracks Three compounding bugs hit tracks whose source metadata is YouTube/streaming- shaped — title "Artist - Song", artist "Official Artist"/"Artist - Topic"/ "ArtistVEVO" (reported: "Arctic Monkeys - Do I Wanna Know?" by "Official Arctic Monkeys"). Server-agnostic — affects Plex/Jellyfin/Navidrome, not just the reporter's Navidrome. Bug A — the match fails. The confidence scorer and the editor's reconcile both compared the raw "Artist - Song" title against the library's clean "Song"; the length-ratio penalty + floor drove it to ~0.18 (NO-MATCH), so the track showed unmatched while its server copy showed as an orphan "extra". New pure core/text/source_title.py (clean_source_artist / strip_artist_prefix / canonical_source_track) strips the channel/video decoration, applied at BOTH matching seams: services/sync_service._find_track_in_media_server (tries raw then canonical, keeps the best) and the editor reconcile. Conservative: a title prefix is stripped only when it equals the artist, so "Self-Titled", "Jay-Z", and "Marvin Gaye" (by another artist) are untouched, and the canonical form is an additional best-of candidate so it can only help. Bug B — manual matches never persisted. get_server_playlist_tracks built the per-source entry WITHOUT source_track_id, so "Find & add" posted an empty id and _persist_find_and_add_match returned early. The match reverted to "extra" on reload and re-adding looped. The editor's 3-pass matcher is now lifted to a pure, tested core.sync.playlist_reconcile.reconcile_playlist that includes source_track_id (the frontend at pages-extra.js:1836 already reads + sends it). Bug C — manual match duplicated + delete wiped all copies. "Find & add" always inserted, so linking a source to an already-present server track appended a duplicate (pos 72, 73...); remove filtered out EVERY entry with the target id. New pure core.sync.playlist_edit (plan_playlist_add: link-don't-duplicate when the target is already present; remove_one_occurrence: drop a single copy) wired into the Plex/Jellyfin/Navidrome add + remove branches. Tests (extreme): tests/test_source_title.py (35), tests/test_playlist_reconcile.py (11 — incl. the reported case, parity for override/exact/fuzzy/extra, and duplicate-server handling), tests/test_playlist_edit.py (12). 286 matching/sync tests still pass. Caveats: the sync_service change and the add/remove/editor endpoints are read-verified, not executed against a live media server (none in CI). The pure cores they call are exhaustively unit-tested; output-shape parity of the reconcile lift is covered. Delete removes the first matching copy (duplicates are identical, so harmless).	4 weeks ago
BoulderBadgeDad	174513d351	Fix #769 : playlist sync matched wrong same-artist track with high confidence Tracks NOT in the library were matched to a DIFFERENT song by the SAME artist and reported with high confidence instead of as missing — e.g. "Dani California" -> "Californication" (Red Hot Chili Peppers), "Under The Bridge" -> "Around the World". Root cause: _calculate_track_confidence scores 0.5title + 0.5artist. A same-artist comparison always yields artist = 1.0, so the title score is the only thing that can tell two of an artist's songs apart — but that score is a SequenceMatcher CHARACTER ratio, which over-credits unrelated titles that share a long substring ("californi…" = 0.67) or just a stopword ("the" = 0.62). With the flat 0.5 artist term, anything clearing the weak 0.6 char floor lands at ~0.81-0.83, well over the 0.7 sync threshold. Reproduced on dev: both reported pairs score 0.81/0.83. Fix: new core/text/title_match.py:titles_plausibly_same, called in _calculate_track_confidence right before the floor. It accepts a pair only when it's near-identical char-wise (>=0.85, so typos / punctuation / casing like "Beleive"->"Believe", "HUMBLE."->"Humble" still match) OR the titles share at least one significant (non-stopword) word. Two different songs by the same artist share no content word, so they're rejected and the real track is correctly reported missing. ("the" is a stopword — that's what leaked "Under The Bridge"/"Around the World".) Scoped deliberately: the word-overlap test fires ONLY when at least one side has 2+ content words. For single-word titles there is no other word to share, so it defers to the existing char floor — otherwise legitimate stylized spellings ("Grey"/"Gray", "Tonite"/"Tonight", "4ever"/"Forever") would become new false-negatives. Verified those still match. The few single-word variants that do score low (Ok/Okay, Thru/Through) were already rejected by the pre-existing length-ratio penalty, not by this gate. Both reported false positives now score 0.33/0.31 -> missing. Does NOT address the harder case of two different same-artist songs that DO share a content word (e.g. "Believe"/"Believer") — pre-existing and unworsened. Any residual error fails safe: a false-missing is re-downloaded/wishlisted, vs the old behavior which silently substituted the wrong song. Tests: tests/test_title_match_guard.py (14) — pure-guard unit tests + a 13-pair battery driving the REAL _calculate_track_confidence (genuine matches stay >=0.7, same-artist different songs drop below), plus an explicit no-regression test for stylized single-word spellings. 292 matching/sync tests pass.	4 weeks ago
Broque Thomas	feb6778af4	Address Cin review: extract helpers, indexed pool fetch, tidy nits Three changes folded into one perf+cleanup pass: 1. Indexed fast path for the per-artist pool fetch. The previous `search_tracks(artist=name)` call hit `unidecode_lower(artists.name) LIKE ?`, a function-in-WHERE that can't use `idx_artists_name`. New `MusicDatabase.get_artist_tracks_indexed` does a two-step lookup: exact-name match (indexed) plus a case-insensitive fallback, then `tracks WHERE artist_id IN (...)` via `idx_tracks_artist_id`. Drops per-artist fetch from seconds to milliseconds for the common case. The sync helper falls back to the old LIKE-based `search_tracks` only when the indexed lookup finds nothing, preserving diacritic recall and `tracks.track_artist` feature-artist matches with zero regression. 2. Public text-normalization helper. Lifted the body of `MusicDatabase._normalize_for_comparison` into `core/text/normalize.py:normalize_for_comparison` so callers outside the database layer (matching engine, sync pool, future import-side comparisons) don't reach across the module boundary into a leading-underscore "private" method. The DB method now delegates, so existing internal call sites stay untouched. Sync's lazy pool now imports the public helper. 3. Artist-name walker extracted. `_artist_name` at module level in `services/sync_service.py` replaces two near-identical inline str-or-dict-or-fallback walkers (one in `sync_playlist`, one in `_find_track_in_media_server`). Returns `''` for None instead of the literal string `'None'`. Plus three small tidies from the same review: - `_POOL_FETCH_LIMIT = 10000` constant in place of the literal at the pool-fetch call site. - Trimmed the verbose docstring + comment block on the pool helper. - Set-intersection predicate for the trigger-shape reset in `core/automation/api.py` instead of a two-line `or` chain. Also removed the duplicate `_get_active_media_client()` call at sync_service.py:212/214 — pre-existing wart that was sitting in the same block I was editing. Tests: 21 new tests across `tests/database/`, `tests/sync/`, and `tests/text/`, plus updates to the existing pool tests to cover the new fast/fallback split. Full suite stays green (3953 passing).	1 month ago

Author

SHA1

Message

Date

BoulderBadgeDad

bba0836324

Fix #768 : playlist sync editor refusing to match certain tracks

Three compounding bugs hit tracks whose source metadata is YouTube/streaming-
shaped — title "Artist - Song", artist "Official Artist"/"Artist - Topic"/
"ArtistVEVO" (reported: "Arctic Monkeys - Do I Wanna Know?" by "Official Arctic
Monkeys"). Server-agnostic — affects Plex/Jellyfin/Navidrome, not just the
reporter's Navidrome.

Bug A — the match fails. The confidence scorer and the editor's reconcile both
compared the raw "Artist - Song" title against the library's clean "Song"; the
length-ratio penalty + floor drove it to ~0.18 (NO-MATCH), so the track showed
unmatched while its server copy showed as an orphan "extra". New pure
core/text/source_title.py (clean_source_artist / strip_artist_prefix /
canonical_source_track) strips the channel/video decoration, applied at BOTH
matching seams: services/sync_service._find_track_in_media_server (tries raw
then canonical, keeps the best) and the editor reconcile. Conservative: a title
prefix is stripped only when it equals the artist, so "Self-Titled", "Jay-Z",
and "Marvin Gaye" (by another artist) are untouched, and the canonical form is
an additional best-of candidate so it can only help.

Bug B — manual matches never persisted. get_server_playlist_tracks built the
per-source entry WITHOUT source_track_id, so "Find & add" posted an empty id
and _persist_find_and_add_match returned early. The match reverted to "extra"
on reload and re-adding looped. The editor's 3-pass matcher is now lifted to a
pure, tested core.sync.playlist_reconcile.reconcile_playlist that includes
source_track_id (the frontend at pages-extra.js:1836 already reads + sends it).

Bug C — manual match duplicated + delete wiped all copies. "Find & add" always
inserted, so linking a source to an already-present server track appended a
duplicate (pos 72, 73...); remove filtered out EVERY entry with the target id.
New pure core.sync.playlist_edit (plan_playlist_add: link-don't-duplicate when
the target is already present; remove_one_occurrence: drop a single copy) wired
into the Plex/Jellyfin/Navidrome add + remove branches.

Tests (extreme): tests/test_source_title.py (35), tests/test_playlist_reconcile.py
(11 — incl. the reported case, parity for override/exact/fuzzy/extra, and
duplicate-server handling), tests/test_playlist_edit.py (12). 286 matching/sync
tests still pass.

Caveats: the sync_service change and the add/remove/editor endpoints are
read-verified, not executed against a live media server (none in CI). The pure
cores they call are exhaustively unit-tested; output-shape parity of the
reconcile lift is covered. Delete removes the first matching copy (duplicates
are identical, so harmless).

BoulderBadgeDad

174513d351

Fix #769 : playlist sync matched wrong same-artist track with high confidence

Tracks NOT in the library were matched to a DIFFERENT song by the SAME artist
and reported with high confidence instead of as missing — e.g. "Dani
California" -> "Californication" (Red Hot Chili Peppers), "Under The Bridge"
-> "Around the World".

Root cause: _calculate_track_confidence scores 0.5*title + 0.5*artist. A
same-artist comparison always yields artist = 1.0, so the title score is the
only thing that can tell two of an artist's songs apart — but that score is a
SequenceMatcher CHARACTER ratio, which over-credits unrelated titles that
share a long substring ("californi…" = 0.67) or just a stopword ("the" =
0.62). With the flat 0.5 artist term, anything clearing the weak 0.6 char
floor lands at ~0.81-0.83, well over the 0.7 sync threshold. Reproduced on
dev: both reported pairs score 0.81/0.83.

Fix: new core/text/title_match.py:titles_plausibly_same, called in
_calculate_track_confidence right before the floor. It accepts a pair only
when it's near-identical char-wise (>=0.85, so typos / punctuation / casing
like "Beleive"->"Believe", "HUMBLE."->"Humble" still match) OR the titles
share at least one significant (non-stopword) word. Two different songs by the
same artist share no content word, so they're rejected and the real track is
correctly reported missing. ("the" is a stopword — that's what leaked "Under
The Bridge"/"Around the World".)

Scoped deliberately: the word-overlap test fires ONLY when at least one side
has 2+ content words. For single-word titles there is no other word to share,
so it defers to the existing char floor — otherwise legitimate stylized
spellings ("Grey"/"Gray", "Tonite"/"Tonight", "4ever"/"Forever") would become
new false-negatives. Verified those still match. The few single-word variants
that do score low (Ok/Okay, Thru/Through) were already rejected by the
pre-existing length-ratio penalty, not by this gate.

Both reported false positives now score 0.33/0.31 -> missing. Does NOT address
the harder case of two different same-artist songs that DO share a content
word (e.g. "Believe"/"Believer") — pre-existing and unworsened. Any residual
error fails safe: a false-missing is re-downloaded/wishlisted, vs the old
behavior which silently substituted the wrong song.

Tests: tests/test_title_match_guard.py (14) — pure-guard unit tests + a
13-pair battery driving the REAL _calculate_track_confidence (genuine matches
stay >=0.7, same-artist different songs drop below), plus an explicit
no-regression test for stylized single-word spellings. 292 matching/sync tests
pass.

Broque Thomas

feb6778af4

Address Cin review: extract helpers, indexed pool fetch, tidy nits

Three changes folded into one perf+cleanup pass:

1. Indexed fast path for the per-artist pool fetch. The previous
   `search_tracks(artist=name)` call hit
   `unidecode_lower(artists.name) LIKE ?`, a function-in-WHERE that
   can't use `idx_artists_name`. New `MusicDatabase.get_artist_tracks_indexed`
   does a two-step lookup: exact-name match (indexed) plus a
   case-insensitive fallback, then `tracks WHERE artist_id IN (...)`
   via `idx_tracks_artist_id`. Drops per-artist fetch from seconds to
   milliseconds for the common case. The sync helper falls back to
   the old LIKE-based `search_tracks` only when the indexed lookup
   finds nothing, preserving diacritic recall and `tracks.track_artist`
   feature-artist matches with zero regression.

2. Public text-normalization helper. Lifted the body of
   `MusicDatabase._normalize_for_comparison` into
   `core/text/normalize.py:normalize_for_comparison` so callers outside
   the database layer (matching engine, sync pool, future import-side
   comparisons) don't reach across the module boundary into a
   leading-underscore "private" method. The DB method now delegates,
   so existing internal call sites stay untouched. Sync's lazy pool
   now imports the public helper.

3. Artist-name walker extracted. `_artist_name` at module level in
   `services/sync_service.py` replaces two near-identical inline
   str-or-dict-or-fallback walkers (one in `sync_playlist`, one in
   `_find_track_in_media_server`). Returns `''` for None instead of
   the literal string `'None'`.

Plus three small tidies from the same review:

- `_POOL_FETCH_LIMIT = 10000` constant in place of the literal at the
  pool-fetch call site.
- Trimmed the verbose docstring + comment block on the pool helper.
- Set-intersection predicate for the trigger-shape reset in
  `core/automation/api.py` instead of a two-line `or` chain.

Also removed the duplicate `_get_active_media_client()` call at
sync_service.py:212/214 — pre-existing wart that was sitting in the
same block I was editing.

Tests: 21 new tests across `tests/database/`, `tests/sync/`, and
`tests/text/`, plus updates to the existing pool tests to cover the
new fast/fallback split. Full suite stays green (3953 passing).

3 Commits (2fcdfd3145d415355b9202c406d8985fda8eb003)