mirror of https://github.com/Nezreka/SoulSync.git
experimental
dev
main
fix/usenet-album-poll-sab-handoff
fix/quarantine-source-dedup
release/2.5.3
fix/disable-beatport-features
johnbaumb-discover-redesign
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
2.4.0
2.4.1
2.4.2
2.5.0
2.5.1
2.5.2
2.5.3
2.5.4
2.5.5
2.5.6
2.5.7
2.5.9
2.6.0
2.6.1
2.6.2
2.6.3
2.6.4
2.6.5
2.6.6
2.6.7
2.6.8
2.6.9
2.7.0
2.7.1
2.7.2
v0.65
${ noResults }
285 Commits (cd9e4abc7c54c9fdc41b2529366a5b4d2ba269cb)
| Author | SHA1 | Message | Date |
|---|---|---|---|
|
|
174513d351 |
Fix #769: playlist sync matched wrong same-artist track with high confidence
Tracks NOT in the library were matched to a DIFFERENT song by the SAME artist
and reported with high confidence instead of as missing — e.g. "Dani
California" -> "Californication" (Red Hot Chili Peppers), "Under The Bridge"
-> "Around the World".
Root cause: _calculate_track_confidence scores 0.5*title + 0.5*artist. A
same-artist comparison always yields artist = 1.0, so the title score is the
only thing that can tell two of an artist's songs apart — but that score is a
SequenceMatcher CHARACTER ratio, which over-credits unrelated titles that
share a long substring ("californi…" = 0.67) or just a stopword ("the" =
0.62). With the flat 0.5 artist term, anything clearing the weak 0.6 char
floor lands at ~0.81-0.83, well over the 0.7 sync threshold. Reproduced on
dev: both reported pairs score 0.81/0.83.
Fix: new core/text/title_match.py:titles_plausibly_same, called in
_calculate_track_confidence right before the floor. It accepts a pair only
when it's near-identical char-wise (>=0.85, so typos / punctuation / casing
like "Beleive"->"Believe", "HUMBLE."->"Humble" still match) OR the titles
share at least one significant (non-stopword) word. Two different songs by the
same artist share no content word, so they're rejected and the real track is
correctly reported missing. ("the" is a stopword — that's what leaked "Under
The Bridge"/"Around the World".)
Scoped deliberately: the word-overlap test fires ONLY when at least one side
has 2+ content words. For single-word titles there is no other word to share,
so it defers to the existing char floor — otherwise legitimate stylized
spellings ("Grey"/"Gray", "Tonite"/"Tonight", "4ever"/"Forever") would become
new false-negatives. Verified those still match. The few single-word variants
that do score low (Ok/Okay, Thru/Through) were already rejected by the
pre-existing length-ratio penalty, not by this gate.
Both reported false positives now score 0.33/0.31 -> missing. Does NOT address
the harder case of two different same-artist songs that DO share a content
word (e.g. "Believe"/"Believer") — pre-existing and unworsened. Any residual
error fails safe: a false-missing is re-downloaded/wishlisted, vs the old
behavior which silently substituted the wrong song.
Tests: tests/test_title_match_guard.py (14) — pure-guard unit tests + a
13-pair battery driving the REAL _calculate_track_confidence (genuine matches
stay >=0.7, same-artist different songs drop below), plus an explicit
no-regression test for stylized single-word spellings. 292 matching/sync tests
pass.
|
2 weeks ago |
|
|
efe3895d5d |
Fix: metadata cache tables silently missing after DB recovery (stale migration marker)
Nothing was landing in the metadata cache browser because the metadata_cache_entities / metadata_cache_searches tables did not exist, so every cache write no-op-ed. Root cause: _add_metadata_cache_tables short-circuited on a marker-only guard (if the metadata_cache_v1 marker row exists, return). After a DB corruption-recovery the small metadata table (with the marker) survived but the large cache tables did not, so the stale marker permanently blocked the idempotent CREATE TABLE IF NOT EXISTS and the cache was dead forever. Guard now skips only when the marker is set AND the tables actually exist, so a stale marker self-heals: the tables are re-created on the next init. Tests: marker present but tables dropped -> re-created; marker + tables present -> no-op (idempotent). |
2 weeks ago |
|
|
ce9ec3f6f4 |
Manual library match: accept non-numeric library track ids (#754)
The save endpoint coerced library_track_id with int(), which rejected every non-numeric id with "Invalid library track id". Library ids are str(ratingKey) — numeric for Plex but GUIDs/hashes for Navidrome, Jellyfin, and other Subsonic servers — and are stored in the TEXT tracks.id column, so the coercion broke manual matching on every non-Plex server. Replace the int() coercion with a normalize_library_track_id() helper that trims and rejects only empty input, passing the opaque string id straight through. Plex numeric ids are unaffected (SQLite INTEGER affinity still stores a clean numeric string as an int, so existing matches are byte-identical) and no schema migration is needed (the INTEGER column already stores non-numeric ids as text). Tests: pure-helper cases (numeric/GUID/whitespace/empty) plus a real-DB round-trip proving a GUID id saves, reads back unchanged, and enriches. |
2 weeks ago |
|
|
bf2a2ca928 |
Player: log SoulSync web-player plays (recently-played + smart-radio recency)
listening_history was populated ONLY from the media server; the web player recorded nothing. Now a play heard ~10s logs to listening_history AND bumps tracks.play_count/last_played — so the existing 'recently played' query reflects actual SoulSync listening, and the Phase-2 smart-radio recency signal gets real data. - core/playback/play_log.build_play_event(): pure, DB-agnostic normalizer from player payload -> listening_history event shape. Caller supplies the timestamp (stays pure). Composite/streamed ids never become the int db_track_id; bool ids rejected; missing title -> skip. 9 unit tests. - MusicDatabase.record_web_player_play(): inserts the history row + increments play_count/last_played for the library track in one call. - /api/library/log-play: thin endpoint, server-side timestamp, best-effort (logging failure never 500s / never affects playback). - Frontend: npMaybeLogPlay on timeupdate fires once per track at the 10s threshold (flag reset in setTrackInfo, set-before-fetch so it can't double-fire), fully fire-and-forget. Pure builder is unit-tested; the DB write can't run in-sandbox (real DB throws) so it's a thin straightforward insert+update. JS + web_server parse clean. |
2 weeks ago |
|
|
c3aea58b03 |
Player revamp Phase 2: smart radio ranking (play-count + popularity)
Replaces radio's pure ORDER BY RANDOM() with weighted ranking. Each tier now
fetches a generous random POOL (4x the needed count, floored) and
core/radio/selection ranks it before the collector keeps the best:
score_candidate = play_count(log-damped, w=1.0)
+ lastfm_playcount(log-damped, w=0.5)
- recently_played penalty(w=2.0)
+ stable per-id jitter(w=1.0, hash-derived so runs vary but
tests stay reproducible)
Modest weights so popularity guides without burying lesser-played tracks, and
jitter keeps radio from being identical every run. All intelligence is in pure
functions (rank_candidates / score_candidate) so it's tunable + unit-testable
without SQL.
Defensive: the DB method probes PRAGMA table_info(tracks) and omits
play_count/lastfm_playcount from the SELECT when absent (older DBs predating
the listening-history migration) — the scorer treats missing signals as 0, so
radio degrades to jitter-only instead of crashing on 'no such column'.
Tests (tests/radio/, 43 total):
- score_candidate / rank_candidates: deterministic unit coverage (popularity
ordering, lastfm contribution, recency penalty, garbage→0, stable jitter).
These CANNOT pass against pre-Phase-2 code.
- DB end-to-end: ranking surfaces the heavily-played track first out of a
decoy pool (wiring proof — probabilistic vs old random, documented honestly);
plus a no-rank-columns DB proving the defensive degrade path.
- All Phase-0a behavioral/refactor-equivalence tests still green.
60 radio + adjacent-DB tests pass; ruff clean.
|
2 weeks ago |
|
|
cbc001e283 |
Player revamp Phase 0a: extract radio selection into testable core/radio/
First step of the stream/player/radio revamp (see revamp_plan.md). The radio
algorithm lived inline inside database.music_database.get_radio_tracks as raw
SQL tangled with selection logic — untestable without a live DB (which also
throws in the dev sandbox). Lifted the pure DECISIONS into core/radio/selection.py:
- parse_tags / merge_tags — JSON-or-CSV tag fields → ordered deduped list
- same_artist_cap — tier-1 30%-floored-at-5 cap
- build_like_conditions — OR-of-LIKEs SQL fragment + params per tier
- RadioCollector — dedup + cap + exclude-set + NOT-IN placeholder/value tracking
The DB method keeps the cursor work and now delegates every decision to these
helpers. Faithful extraction, not a rewrite — behavior unchanged.
This is the kettui foundation move: radio is now unit-testable, so Phase 2
(smart ranking — play-count / recency / feature seeding) becomes 'evolve a
tested function' instead of 'rewrite SQL and pray'.
Tests (tests/radio/):
- test_selection.py (22): unit coverage of every extracted helper
- test_get_radio_tracks_db.py (7): drive the REAL get_radio_tracks against
in-memory sqlite — tier fallback, dedup, exclude, file_path filter.
Behavior-pinned: these 7 pass against BOTH old inline and new extracted
code (refactor-equivalence proof). 52 adjacent DB+radio tests green.
|
2 weeks ago |
|
|
b55faff54b |
DB: add schema_migrations ledger + PRAGMA user_version backstop
Migration state was scattered across PRAGMA-table_info guards, sentinel marker tables (_genius_search_fix_applied, ...) and metadata-flag rows (id_columns_migrated, ...), with no single source of truth and no schema version — so a half-migrated DB was undetectable. Add a non-gating backstop: a schema_migrations(name, applied_at) ledger plus a _sync_migration_ledger pass (runs last in init) that back-fills the ledger from the existing signals and stamps PRAGMA user_version. ADDITIVE only — existing migrations keep their own idempotency gates; nothing decides whether a migration runs based on the ledger or the version. New one-time migrations call _record_migration (the genres migration already does). Tests: tests/test_db_migration_ledger.py — table exists, user_version stamped, record idempotent, genres recorded on fresh init, backfill from flag + marker, absent signals not recorded. |
2 weeks ago |
|
|
c5b02c0026 |
DB: normalize legacy comma-separated genres to canonical JSON
artists.genres / albums.genres stored EITHER a JSON array (new writes) OR a legacy comma-separated string (old writes), forcing every reader to try-JSON-then-split. Add a marker-gated one-time migration (_normalize_genres_to_json) that rewrites legacy rows to JSON in place, mirroring the readers' exact parse (JSON list, else comma-split/strip/ drop-empties) so genre VALUES are unchanged — only the storage format. Per-row diffed (already-canonical rows untouched, no churn) and non-fatal on error, consistent with the other migrations. Readers still tolerate both formats, so this breaks nothing; it just removes the dual-format debt. Tests: tests/test_db_genres_json_normalization.py — CSV->JSON, JSON-unchanged, whitespace/empties dropped, albums table, legacy-reader-equivalence, idempotent re-run, marker set on fresh init. |
2 weeks ago |
|
|
2bb935b9d7 |
DB: stop watchlist_artists rebuilds from dropping amazon_artist_id
amazon_artist_id is added to watchlist_artists via ALTER (music_database.py ~1732), but both table-rebuild migrations — the spotify_id-nullable fix (_fix_watchlist_spotify_id_nullable, two CREATE variants) and the profile-scoped UNIQUE rebuild — recreated the table from a hardcoded column list that omitted amazon_artist_id. Because shared_cols filters new_cols against the old table, the column and any stored Amazon artist IDs were silently dropped on every init (fresh OR upgraded), so Amazon watchlist IDs never persisted at all. Fix: add amazon_artist_id to all three rebuild CREATE schemas, both rebuild new_cols lists, and the base CREATE TABLE (so fresh installs are consistent and don't rely on the ALTER). Purely additive, column-named inserts + Row factory mean column position is irrelevant. Tests (tests/test_db_watchlist_amazon_id_migration.py): drive the real migrations via MusicDatabase() against a seeded pre-migration temp DB and assert the column + data survive; differential-proven to FAIL pre-fix. |
2 weeks ago |
|
|
f7ed41867d |
Fix: enhanced artist view 404s for library artists opened via source ID
Opening a library artist from a non-library search result (e.g. a
MusicBrainz hit) leaves the artist-detail page holding the source ID —
the MBID — not the integer library PK. The standard /api/artist-detail
route resolves that via find_library_artist_for_source, but the
enhanced-view (`/api/library/artist/<id>/enhanced`) and quality-analysis
endpoints call get_artist_full_detail directly with whatever ID the page
holds. Its lookup was `WHERE id = ?` only, so it 404'd ("Artist with ID
<mbid> not found") and the enhanced view failed to load.
When the direct PK lookup misses, fall back to matching any per-service
ID column, reusing SOURCE_ID_FIELD as the single source of truth so the
resolution covers every source (MusicBrainz, Spotify, Deezer, iTunes,
Discogs, Hydrabase, Amazon), not just MusicBrainz.
Adds 4 isolated DB-method tests: direct PK still works, resolves by
MBID, resolves by Spotify ID, and unknown IDs still 404.
|
2 weeks ago |
|
|
96e6ba0ed7 |
Preserve Navidrome album cover art
Expose Navidrome album coverArt as a Subsonic getCoverArt thumbnail so library refreshes keep a real album-art URL. Preserve existing album thumb_url when an incoming server album has no thumbnail, preventing manual or server-corrected covers from being cleared and later replaced by loose missing-cover searches. Add regression tests for Navidrome album thumbnails and DB thumb preservation. |
3 weeks ago |
|
|
dfdc6c6277 |
Restyle Auto-Sync manager and fix loading regressions
Three problems wrapped into one pass on the Playlist Auto-Sync surface: 1. Visual: the manager modal had its own vibe (radial gradient, pill tabs, sky-blue chrome) that didn't line up with the rest of the app. Reworked the modal shell, KPI summary, live pipeline monitor, tab bar, schedule board sidebar, and column cards to use the standard SoulSync patterns — gradient `#1a1a1a → #121212`, accent-tinted 1px border, 20px radius, underline tabs, dense dark card pattern that Automations + Library pages already use. Modal now uses near-full screen so there's room for the schedule board without horizontal scroll pain. Run history cards followed the same path: slim horizontal row mirroring `.automation-card` plus an expanded detail that mirrors the Automations run-history modal (stats-grid + facts row + result pills + log section). 2. Hang: the previous SQL fix for the run-history "in library" count added `COLLATE NOCASE` on the join columns of `tracks` and `artists`. SQLite can't use `idx_artists_name` or `idx_tracks_title` when the comparison collation doesn't match the column collation, so the join did a full table scan per mirrored playlist track. ~18s per playlist × 30 playlists = `/api/mirrored-playlists` hung indefinitely and the modal stayed at "Loading schedule…" forever. Switched the join back to case-sensitive equality (~6ms per playlist, 3000× faster). Spotify names canonicalize to the same form as library imports so the recall loss is in the rounding error of pure case-only mismatches. 3. Slowness: even after the hang fix, each modal open spent ~1.5s gathering per-playlist status counts. The endpoint looped `get_mirrored_playlist_status_counts(playlist_id)` per row, which opened a fresh SQLite connection + PRAGMA setup each time. Added `get_all_mirrored_playlist_status_counts(profile_id)` which returns counts for every mirrored playlist owned by the active profile in 4 batched `GROUP BY` queries over a single connection. Modal load dropped to ~280ms. Also fixed: `tracks.artist` reference in `get_mirrored_playlist_status_counts` that never worked since the schema went relational — the query threw "no such column", got swallowed by the try/except, and the in-library count silently defaulted to 0 on every playlist. Rewired to join through `artists`. `get_mirrored_playlist_status_counts` (single-playlist) kept for callers that still want it, but the modal endpoint uses the batched version. |
3 weeks ago |
|
|
efdcde1892 |
Add playlist auto-sync run history
Persist per-playlist pipeline run snapshots from the shared playlist pipeline, expose a history API, and upgrade the Auto-Sync modal with live pipeline monitoring, Run now controls, and a runs-style history tab. |
3 weeks ago |
|
|
9b086c5a65 |
Add owned_by column for Auto-Sync schedule ownership
The Auto-Sync schedule board was detecting its own automations by
checking `group_name === 'Playlist Auto-Sync' || name.startsWith('Auto-Sync:')`.
That's fragile — renaming the row from the Automations page silently
hands ownership back to the read-only Automation Pipelines tab and the
board stops managing it.
This commit replaces the string convention with an explicit
`automations.owned_by` TEXT column:
- Migration `_add_automation_owned_by_column` adds the column and
backfills `'auto_sync'` for existing rows that match the legacy
`group_name`/`name`-prefix pattern, so users running the migration
don't lose their schedules.
- `database.create_automation` and `database.update_automation` accept
`owned_by` (the latter via its `allowed` kwarg set).
- `core/automation/api.py` forwards `owned_by` on both POST and PUT.
Missing field is left as None, preserving today's behavior for every
caller that doesn't opt in.
- The Auto-Sync schedule board posts `owned_by: 'auto_sync'` and the
detection helper now prefers that signal, falling back to the legacy
name/group convention so any hand-rolled rows still show up.
Tests: three new cases in `tests/automation/test_automation_api.py`
covering create-with-owned-by, create-without (defaults to None), and
update set/clear. The fake DB grew the matching kwarg.
|
3 weeks ago |
|
|
feb6778af4 |
Address Cin review: extract helpers, indexed pool fetch, tidy nits
Three changes folded into one perf+cleanup pass: 1. Indexed fast path for the per-artist pool fetch. The previous `search_tracks(artist=name)` call hit `unidecode_lower(artists.name) LIKE ?`, a function-in-WHERE that can't use `idx_artists_name`. New `MusicDatabase.get_artist_tracks_indexed` does a two-step lookup: exact-name match (indexed) plus a case-insensitive fallback, then `tracks WHERE artist_id IN (...)` via `idx_tracks_artist_id`. Drops per-artist fetch from seconds to milliseconds for the common case. The sync helper falls back to the old LIKE-based `search_tracks` only when the indexed lookup finds nothing, preserving diacritic recall and `tracks.track_artist` feature-artist matches with zero regression. 2. Public text-normalization helper. Lifted the body of `MusicDatabase._normalize_for_comparison` into `core/text/normalize.py:normalize_for_comparison` so callers outside the database layer (matching engine, sync pool, future import-side comparisons) don't reach across the module boundary into a leading-underscore "private" method. The DB method now delegates, so existing internal call sites stay untouched. Sync's lazy pool now imports the public helper. 3. Artist-name walker extracted. `_artist_name` at module level in `services/sync_service.py` replaces two near-identical inline str-or-dict-or-fallback walkers (one in `sync_playlist`, one in `_find_track_in_media_server`). Returns `''` for None instead of the literal string `'None'`. Plus three small tidies from the same review: - `_POOL_FETCH_LIMIT = 10000` constant in place of the literal at the pool-fetch call site. - Trimmed the verbose docstring + comment block on the pool helper. - Set-intersection predicate for the trigger-shape reset in `core/automation/api.py` instead of a two-line `or` chain. Also removed the duplicate `_get_active_media_client()` call at sync_service.py:212/214 — pre-existing wart that was sitting in the same block I was editing. Tests: 21 new tests across `tests/database/`, `tests/sync/`, and `tests/text/`, plus updates to the existing pool tests to cover the new fast/fallback split. Full suite stays green (3953 passing). |
3 weeks ago |
|
|
73bd2db547 |
Harden playlist pipeline source refresh
Centralize mirrored playlist source reference normalization so edited links and IDs are stored consistently. Preserve URL-backed refresh refs, surface missing-source refresh failures, count background sync failures in pipeline summaries, and retry guarded automation skips after a short delay instead of losing a scheduled run. Add focused coverage for source refs, mirrored playlist source updates, refresh failures, and guarded retry behavior. |
3 weeks ago |
|
|
b9af4ef4ef |
Handle transient SQLite IO during maintenance
Keep full refresh moving when post-clear VACUUM hits a transient disk I/O error, and retry clear_server_data once when the clear step itself sees the same transient SQLite failure. Retry metadata cache maintenance writes once on transient disk I/O errors so first-attempt cache jobs do not fail when an immediate retry would succeed. Tests cover best-effort VACUUM, clear retry behavior, and cache maintenance retry behavior. |
3 weeks ago |
|
|
f1d4f78e0e |
Repair stale media schema during refresh
Ensure upgraded databases have the tracks.file_size and albums.api_track_count columns after all legacy migrations run. Add defensive repair paths for Jellyfin track imports and album track-count caching so stale schemas self-heal instead of dropping full-refresh track imports. Tests cover legacy schema repair and api_track_count self-repair. |
3 weeks ago |
|
|
f3ad65de34 |
Complete MusicBrainz watchlist source parity
Add MusicBrainz watchlist artist ID storage, badges, linked-provider editing, and per-artist preferred source support. Backfill watchlist MusicBrainz matches from already-enriched library artists so existing MusicBrainz worker matches appear in watchlist cards and settings. Extend bulk watchlist add, liked artist matching, artist map source picking, and service status labels to recognize MusicBrainz, with regression tests for watchlist ID persistence and backfill. |
4 weeks ago |
|
|
5bc5fbb662 |
Add MusicBrainz as a metadata source
Register MusicBrainz as a first-class metadata source alongside Deezer, iTunes, Spotify, Discogs, and Hydrabase. Expose the shared client through metadata services, add the settings option, and expand the MusicBrainz search adapter with source-compatible artist, album, track, and detail methods. Carry MusicBrainz IDs through similar-artist discovery, recommended artists, artist map serialization, and personalized playlist selection. Update DB migrations and lookup filters so similar_artist_musicbrainz_id is preserved on older schemas and used for source requirements and library exclusion. Normalize MusicBrainz album adapter output for import context and add regression coverage for registry mapping, typed album conversion, and similar-artist filtering. Verified by user with 120 focused tests passing. |
4 weeks ago |
|
|
aaf312cd34 |
Honor manual library matches across source labels
Manual matches can be created from sync history as mirrored while wishlist and download flows later see the same track as wishlist or a provider source. Add a shared track-level lookup that falls back from exact source/id to source_track_id and title/artist, then use it for wishlist adds, cleanup, and download analysis so mapped tracks are not re-added or redownloaded. Add coverage for mirrored-source matches being honored by wishlist cleanup and download batches, including the internal wishlist force-download path. |
4 weeks ago |
|
|
e061f12a05 |
Filter owned artists from discovery recommendations
|
4 weeks ago |
|
|
025007b97f |
Tighten artist discography soundtrack matching
|
4 weeks ago |
|
|
0345478361 |
Skip wishlist adds for manual library matches
|
4 weeks ago |
|
|
42f4aa5eac |
Add manual library track matching
|
4 weeks ago |
|
|
3b62bcab0c |
Add missing-track import from existing library files
Show actionable missing album tracks in the enhanced library from canonical metadata, with a practical Manage flow for Add to Library or I Have This. Implement I Have This as a non-destructive copy/import path: copy the chosen existing file, run normal post-processing with the missing track context, insert the real library row, and inherit album identity tags from target siblings so Navidrome does not split albums. Improve the modal with selectable search results, visible import progress, disabled controls during import, and missing-track row styling. |
4 weeks ago |
|
|
42a833fcb2 |
Amazon Music: UI badges, enrichment match chips, watchlist linking, metadata cache
- Artist cards, hero section, and enhanced view now show Amazon Music badges
when amazon_id is populated (AMAZON_LOGO_URL constant, orange #FF9900 brand)
- Enhanced view artist and album match status rows include amazon_match_status
chip with click-to-rematch via openManualMatchModal
- getServiceUrl: added amazon (album/track ASIN → music.amazon.com) and fixed
missing discogs entries; serviceLabels adds tidal/qobuz/amazon
- Enhanced view enhanced-artist-id-badges includes amazon_id entry
- DB SELECTs for library artists list and artist detail now return amazon_id;
both response dicts include the field
- watchlist_artists migration adds amazon_artist_id column
- Watchlist config GET: amazon_artist_id in SELECT/WHERE/response (index 18)
- Watchlist artists list response includes amazon_artist_id
- link-provider endpoint: amazon added to valid_providers and col_map
- _populateLinkedProviderSection: amazonId param + Amazon Music source row
- Watchlist card source badges render Amazon pill (watchlist-source-amazon CSS)
- _openSourceSearch labels map includes amazon
- service_search: amazon_worker injected via init(); _search_service amazon branch
uses search_artists/albums/tracks, same {id,name,image,extra} return shape
- _SERVICE_ID_COLUMNS: amazon → amazon_id for artist/album/track
- _init_service_search call passes amazon_worker_obj
- amazon_client._fetch_album_metas: 5-minute TTL cache per ASIN — cached hits
skip _rate_limit() and HTTP call entirely; fixes ~10s artist detail load
- registry.py: removed amazon from METADATA_SOURCE_PRIORITY and
METADATA_SOURCE_LABELS — T2Tunes has no discography API, cannot serve as a
primary metadata source; Amazon remains a download source + ASIN enricher
- Settings metadata source dropdown and help text updated accordingly
|
4 weeks ago |
|
|
4fce832ae1 |
Add Amazon Music enrichment worker
Background worker matching library artists/albums/tracks to Amazon ASINs via T2Tunes search. Follows same 6-tier priority queue as Deezer/iTunes/ Spotify/Qobuz/Tidal workers. Backfills artist thumbnails from album cover stand-ins (T2Tunes exposes no direct artist images). - core/amazon_worker.py: new AmazonWorker class with full parity - database/music_database.py: expand _add_amazon_columns to cover amazon_id/amazon_match_status/amazon_last_attempted on artists, albums, and tracks (was artists-only) - web_server.py: import, init, register in enrichment panel, add to scan pause/resume dicts and rate monitor key map - helper.js: WHATS_NEW 2.5.3 entry for enrichment worker |
4 weeks ago |
|
|
121651da2c |
Add amazon_id column to artists table for full source parity
Schema: ALTER TABLE artists ADD COLUMN amazon_id TEXT with index, added via _add_amazon_columns migration called after Discogs in _run_migrations. SOURCE_ID_FIELD: add "amazon" -> "amazon_id" entry. find_library_artist_for_ source now looks up Amazon artists by slug before falling back to name match, same as every other source. artist_source_detail already stamps artist_info [source_id_field] = artist_id so the amazon_id is set on source-only payloads. Tests: add "amazon": "amazon_id" to EXPECTED_SOURCE_ID_FIELD; revert test assertion back to strict equality (SOURCE_ONLY_ARTIST_SOURCES == SOURCE_ID_ FIELD.keys() holds again now that amazon has a column). |
4 weeks ago |
|
|
877d0e7d81 |
Personalized pipeline: auto-refresh stale snapshots after watchlist scan
Snapshots now track when their source data changes. Watchlist scan emits stale flags on the playlists whose underlying pool just got refreshed; the next pipeline run sees the flag and regenerates the snapshot before syncing, so the server playlist never lags the source. Schema: - new `is_stale INTEGER NOT NULL DEFAULT 0` column on `personalized_playlists`, plus an idempotent ADD COLUMN migration in `ensure_personalized_schema` for installs created before this PR. - `PlaylistRecord.is_stale: bool = False` exposed on the dataclass so callers can branch on freshness without re-querying. Manager: - new `mark_kinds_stale(kinds, profile_id=None)` flips the flag in bulk for a list of kinds (used by upstream data refreshers). - `_persist_snapshot` clears `is_stale = 0` on successful refresh. - SELECT statements + `_row_to_record` updated to read the column (with tuple-form length guard for safety). Pipeline: - `_build_payloads_for_kinds` now branches: refresh_first=True OR `existing.is_stale` -> refresh_playlist, else read existing snapshot. So the auto-refresh kicks in without needing the user to toggle the refresh-each-run option. Watchlist scanner emits stale flags at three sites: - after `update_discovery_pool_timestamp` -> marks pool-fed kinds stale: hidden_gems, discovery_shuffle, popular_picks, time_machine, genre_playlist, daily_mix. - after release_radar `save_curated_playlist` -> marks `fresh_tape`. - after discovery_weekly `save_curated_playlist` -> marks `archives`. All three calls go through a module-level `_mark_personalized_kinds_stale` helper that builds a PersonalizedPlaylistManager with `deps=None` (only DB access is needed for the flag update — no generator dispatch). Each call is wrapped in try/except so a flag failure can never abort the scan itself. Tests: - new `TestStaleFlag` class in `test_personalized_manager.py` (6 tests): default-false, single-kind flip, multi-kind, profile scoping, refresh-clears, empty-list noop. - two new pipeline tests pin the auto-refresh dispatch: `test_stale_snapshot_auto_refreshes_even_without_refresh_first` and `test_non_stale_snapshot_skips_refresh`. - existing stub-manager `SimpleNamespace` returns gained `is_stale=False` so the new attribute read doesn't AttributeError. Full suite: 3391 pass. User-facing WHATS_NEW entry added under 2.5.2 (above the prior pipeline auto-sync entry) describing the auto-refresh behavior. |
4 weeks ago |
|
|
79224ed294 |
Personalized playlists (1/N): unified storage + manager foundation
Begins the standardization of the personalized-playlist subsystem.
Pre-existing state was a patchwork: Group A (Fresh Tape / Archives /
Seasonal Mix) lived in `discovery_curated_playlists` and
`curated_seasonal_playlists` with inconsistent shapes; Group B
(Hidden Gems / Discovery Shuffle / Time Machine / Popular Picks /
Genre / Daily Mixes) was computed on-demand by
`PersonalizedPlaylistsService` with no persistence -- every call
reran the generator with `ORDER BY RANDOM()` so results rotated.
Post-overhaul (this PR) every personalized playlist lands in one
unified storage layer with stable identity, persistent track lists,
explicit refresh, and per-playlist user-tweakable config.
Foundation in this commit (no behavior change yet):
- `database/personalized_schema.py`: 3 tables created idempotently
at app startup (wired into `MusicDatabase._initialize_database`).
- `personalized_playlists`: one row per (profile, kind, variant)
with config_json, track_count, last_generated_at,
last_synced_at, last_generation_source, last_generation_error.
Variant '' (empty string) for singletons; non-empty for
time_machine / seasonal_mix / genre_playlist / daily_mix.
- `personalized_playlist_tracks`: current snapshot per playlist.
Atomically replaced on refresh.
- `personalized_track_history`: append-only log powering the
`exclude_recent_days` config knob.
- `core/personalized/types.py`: `Track`, `PlaylistConfig`,
`PlaylistRecord` dataclasses. `PlaylistConfig.merged()` for
partial-update PATCH semantics; `Track.from_dict()` accepts the
legacy generator output shape unchanged.
- `core/personalized/specs.py`: `PlaylistKindSpec` (kind,
name_template, default_config, generator, variant_resolver) and a
module-level registry. Generators register at import time;
manager dispatches by kind.
- `core/personalized/manager.py`: `PersonalizedPlaylistManager` --
the only thing that touches the new tables. Owns:
- ensure_playlist (auto-create row from kind defaults)
- get_playlist / list_playlists
- refresh_playlist (atomic snapshot replace; generator exception
preserves previous good snapshot + records error on row)
- get_playlist_tracks
- update_config (deep-merge with stored config, including extra dict)
- recent_track_ids (staleness lookup for generators)
35 boundary tests in `tests/test_personalized_manager.py` pin every
shape: config round-trip / merge semantics / extra deep-merge /
defaults; Track.from_dict tolerance + primary_id fallback chain;
registry dedup / display_name with+without variant; manager
ensure_playlist auto-create + idempotency, variant separation,
required-variant enforcement, unknown-kind error; refresh persists
+ replaces atomically + survives generator exception with previous
snapshot intact + records source from first track + round-trips
nested track_data_json; update_config patch semantics; list_playlists
profile scoping; staleness history scoped to (profile, kind, days).
3304 tests pass total. Generators ship in subsequent commits on this
branch -- each kind migrated one at a time with its own per-kind
boundary tests.
|
4 weeks ago |
|
|
43f168a048 |
Add artists.aliases column for cross-script artist matching
Foundation commit for issue #442 — Japanese kanji ↔ romanized name quarantines and equivalent cross-script mismatches. MusicBrainz exposes alternate-spelling aliases on every artist record but SoulSync's matching never consulted them; cross-script comparison scored 0% on raw similarity and the file got quarantined even when MusicBrainz knew both names belonged to the same artist. This commit only adds the column. Subsequent commits in this PR: - Build a pure alias-aware artist comparison helper - Wire the MusicBrainz worker to populate aliases on enrichment - Add a live MB lookup with cache for un-enriched artists - Wire the helper into the AcoustID verifier where the quarantine decision actually fires Schema change is additive (NULL default), gated by the same `PRAGMA table_info` check the existing `_add_musicbrainz_columns` helper uses, so re-running on databases that already have the column is a no-op. Verified: - New `artists.aliases` column present in fresh DB init - JSON round-trip works (mirrors the existing `genres` column pattern) - No existing tests broken |
1 month ago |
|
|
9602d1827c |
Final silent-exception sweep + ruff S110 lint guardrail — ~45 sites
Catches the silent excepts the awk-based earlier sweeps missed:
- Bare `except:` followed by `pass` (also swallows KeyboardInterrupt
and SystemExit — actively wrong). Upgraded to `except Exception as
e: logger.debug("...: %s", e)`. ~14 sites across connection_detect,
soulseek_client, listenbrainz_manager, watchlist_scanner,
youtube_client, navidrome_client, jellyfin_client, web_server.
- `except Exception:` + pass that the awk pattern missed (e.g.
multi-line or unusual whitespace). ~31 sites across automation_engine,
database_update_worker, music_database, spotify_client, web_server,
others.
- 14 legitimate cleanup sites left silent with explicit `# noqa: S110`
+ comment explaining why (atexit handlers, finally-block conn.close
calls). Logging during shutdown can itself crash because file handles
get torn down before the handler fires.
Also enables `S110` rule in pyproject.toml so this pattern fails CI
going forward — drift fails at PR review instead of at runtime against
a wedged worker thread. Tests path keeps S110 ignored (test fixtures
legitimately use try-except-pass for cleanup).
Adds a WHATS_NEW entry to helper.js summarizing the full #369 sweep.
Verified: `python -m ruff check .` → All checks passed.
Verified: `python -m pytest tests/` → 2188 passed.
Closes #369
|
1 month ago |
|
|
bfef2c7579 |
Surface silent exceptions in music_database.py — 18 sites
Mostly schema-migration ALTER TABLE fallbacks (column-already-exists
is the silent expected case) plus a few cache-purge/notify-migration
spots. Same pattern as the web_server sweep: `except Exception as e:
logger.debug("...: %s", e)`.
Refs #369
|
1 month ago |
|
|
fd5ccf4cb8 |
Fix "no such table: hifi_instances" via defensive lazy-create
GitHub issue #503 (@hadshaw21). Adding a HiFi instance via downloader settings popped up ``no such table: hifi_instances`` even though "Test Connection" and "Check All Instances" both worked. Root cause: ``MusicDatabase._initialize_database`` runs every ``CREATE TABLE`` + every migration step inside one sqlite transaction. Python's sqlite3 module doesn't autocommit DDL by default, so if any later migration step throws on a user's specific DB shape (e.g. an old volume from a prior SoulSync version with quirky schema state), the WHOLE batch rolls back — including the ``hifi_instances`` CREATE that ran earlier in the function. The user's next boot retries init, hits the same migration failure, rolls back again. The ``hifi_instances`` table never lands no matter how many restarts. Fix: defensive lazy-create. New ``_ensure_hifi_instances_table(cursor)`` helper runs ``CREATE TABLE IF NOT EXISTS`` on demand, called immediately before every CRUD operation that touches ``hifi_instances``: - ``get_hifi_instances`` / ``get_all_hifi_instances`` (read) - ``add_hifi_instance`` / ``remove_hifi_instance`` (CRUD) - ``toggle_hifi_instance`` / ``reorder_hifi_instances`` (CRUD) - ``seed_hifi_instances`` (defaults seed) Idempotent — costs one no-op CREATE check when the table is already present, fully recovers from a broken init state. Read methods now return empty instead of raising when init failed; write methods work end-to-end. Doesn't paper over the underlying init issue (still worth tracking which migration step breaks for which user DB shapes — separate concern) but makes HiFi instance management self-healing in the meantime. Tests: - 7 obsolete tests that pinned ``raises sqlite3.OperationalError`` removed — that contract is no longer correct - 7 new tests pin the lazy-create behavior: every CRUD method works against a DB that's missing the ``hifi_instances`` table, verifying the table gets created and the operation completes 2162/2162 full suite green. Pure additive — no behavior change for users with a healthy DB; affected users get back to working hifi instance management. Closes #503. |
1 month ago |
|
|
4b23bee4a9 |
Add Discogs collection as a Your Albums source
Discord request: pull user's Discogs collection into the Your Albums
section on Discover, similar to how Spotify Liked Albums works.
Implementation extends the existing 3-source pipeline (Spotify /
Tidal / Deezer) to a 4-source pipeline with click-context dispatch —
Discogs-only albums open with rich Discogs release detail (vinyl/CD
format, year, label, country, tracklist). Mirrors the per-source
dispatch pattern from enhanced/global search.
Discogs client (`core/discogs_client.py`):
- New `get_authenticated_username()` resolves the username for the
configured personal token via Discogs's `/oauth/identity` endpoint.
Cached on the instance so subsequent collection page-fetches don't
re-hit it.
- New `get_user_collection(username=None, folder_id=0, per_page=100,
max_pages=50)` walks all pages of `/users/{username}/collection/
folders/{folder_id}/releases`. Returns normalized dicts ready for
upsert_liked_album. folder_id=0 = Discogs's "All" folder.
Pagination cap of max_pages*per_page = 5000 releases — bounds
runtime on heavy collections.
- New `get_release(release_id)` thin wrapper for `/releases/{id}` —
returns the raw API response so the album-detail endpoint can
render rich context.
- Both methods defensive: missing token → empty list, malformed
responses → skipped, falsy ids → None. Disambiguation suffix
stripping (`Madonna (3)` → `Madonna`) so Discogs artist names
match what Spotify/Tidal/Deezer use.
Schema (`database/music_database.py`):
- New `discogs_release_id TEXT` column on `liked_albums_pool`.
Migration uses the established `try SELECT, except ALTER TABLE`
pattern. Idempotent; safe on existing installs.
- Added the column to the canonical CREATE TABLE for fresh installs.
- `upsert_liked_album` extended with `'discogs': 'discogs_release_id'`
in BOTH the INSERT and UPDATE id-column maps so Discogs source_id
routes to the new column. INSERT statement column count + value
count updated together.
Backend (`web_server.py`):
- `/api/discover/your-albums/sources` — adds Discogs to the
`connected` list when `discogs.token` config is set.
- `_fetch_liked_albums` — new branch for Discogs. Lazy-imports
DiscogsClient, respects the `enabled_sources` config, walks the
collection, upserts each release. Same try/except shape as the
existing source branches.
- `/api/discover/album/<source>/<album_id>` — new `discogs` branch
fetches the release via DiscogsClient.get_release, normalizes the
Discogs tracklist format, parses Discogs's `MM:SS`/`HH:MM:SS`
duration strings to milliseconds, returns the same response shape
as the Spotify/Deezer/iTunes branches.
Frontend (`webui/static/discover.js`):
- `openYourAlbumsSourcesModal` — adds Discogs to `sourceInfo` with
the vinyl emoji icon. Existing toggle/save plumbing handles it.
- `openYourAlbumDownload` — restructured the per-source dispatch:
builds an ordered list of (source, id) tuples, tries each in turn,
breaks on the first successful response. Pure-Discogs albums go
straight to the Discogs detail endpoint → modal opens with Discogs
context. Multi-source albums prefer Spotify/Deezer first since
their tracklists carry proper streaming IDs ready for download.
Tests: `tests/test_discogs_collection_source.py` — 12 cases:
- get_user_collection: empty without token, normalizes response
shape, strips disambiguation suffix, handles missing year, skips
malformed releases, paginates correctly, caps at max_pages,
uses explicit username when provided.
- get_release: passes id through to /releases/{id}, returns None
for invalid ids without API call.
- liked_albums_pool: discogs_release_id round-trips through upsert
+ get; multi-source dedup carries both Spotify and Discogs IDs
on the same row.
Verified: full suite 1825 pass (12 new), ruff clean, smoke test
populating + reading the discogs_release_id column round-trips
correctly via the real DB.
WHATS_NEW entry under '2.4.2' dev cycle.
|
1 month ago |
|
|
2ab460f5c4 |
Add Library Disk Usage card to System Statistics
Discord request (Samuel [KC]): show how much disk space the library
takes on the Stats page. Implementation piggybacks on the existing
deep scan — Plex/Jellyfin/Navidrome all return file size in their
track API responses, so we read it during the deep scan and store
it on the tracks row. Aggregation is then a single SQL query — no
filesystem walk, no extra I/O during the scan, no separate stat
job. SoulSync standalone gets size from os.path.getsize at insert
time (different code path; the file is local when we write the row).
Schema (`database/music_database.py`):
- New `file_size INTEGER` column on `tracks`. Migration uses the
established `try SELECT, except ALTER TABLE ADD COLUMN` pattern.
Idempotent; safe on existing installs. NULL on legacy rows so
they don't contribute to totals until next deep scan refreshes.
- Added the column to the canonical CREATE TABLE so fresh installs
get it without going through the migration path.
Track-object plumbing:
- `core/jellyfin_client.py` — JellyfinTrack reads MediaSources[0].Size
alongside existing Bitrate read. None when 0 / missing.
- `core/navidrome_client.py` — NavidromeTrack reads `size` from
the Subsonic song object (int coercion + None on parse fail).
- `core/soulsync_client.py` — SoulSyncTrack does os.path.getsize
(only "server" where size has to come from disk).
- Plex needs no client-side change: track.media[0].parts[0].size
is read directly inside insert_or_update_media_track.
Persistence — TWO separate insert paths:
(a) `database/music_database.py:insert_or_update_media_track` —
Plex/Jellyfin/Navidrome flows. Reads file_size from Plex's
MediaPart OR `track_obj.file_size` wrapper attribute (defensive
Plex-attr-not-present check + > 0 type guard).
INSERT writes the new column.
UPDATE uses COALESCE(?, file_size) so a None from the server
on a re-sync (rare Jellyfin Size omission) doesn't blank an
existing value. Pinned via test.
(b) `core/imports/side_effects.py:record_soulsync_library_entry` —
SoulSync standalone flow. Completely separate code path: the
standalone deep scan moves files to staging for auto-import
rather than calling insert_or_update_media_track. After the
auto-import processes them, side_effects writes the tracks row
directly. Reads file_size via os.path.getsize(final_path) at
insert time (file is local) and includes it in the INSERT
column list. SoulSync only does INSERT-if-not-exists (no
UPDATE path), so no COALESCE concern.
Aggregator (`database/music_database.py:get_library_disk_usage`):
- SELECT COALESCE(SUM(file_size), 0), COUNT(file_size),
COUNT(*) - COUNT(file_size) for the totals.
- Per-format breakdown done in Python via os.path.splitext over
(file_path, file_size) rows — sidesteps SQLite's first-vs-last-dot
ambiguity for paths like /music/Kendrick/M.A.A.D City/01.flac.
- Defensive: skips empty paths, paths without extension, and
implausibly long extensions (>6 chars). Returns the full
empty-shape dict (NOT a partial / undefined) when the column
doesn't exist or queries fail, so the UI's `if (!data.has_data)`
branch handles fresh installs cleanly.
API + UI:
- `core/stats/queries.py` — thin pass-through get_library_disk_usage
matching the existing query-helper convention.
- `web_server.py` — new /api/stats/library-disk-usage endpoint
mirroring the /api/stats/db-storage pattern.
- `webui/index.html` — new card in System Statistics above the
Database Storage card.
- `webui/static/stats-automations.js` — _loadLibraryDiskUsage +
_renderLibraryDiskUsage. Empty state: "Run a Deep Scan to
populate (X tracks pending)". Partial: "X measured (+Y pending)".
Full: total + format bars proportional to the largest format.
- `webui/static/style.css` — .stats-disk-* styled to match the
Database Storage card.
Backward compatibility:
- Migration is additive; existing rows get NULL file_size; the
empty-shape return from the aggregator means the UI renders
cleanly without errors before any deep scan runs.
- Old installs upgrading will see "Run a Deep Scan to populate
(N tracks pending)". Running their next deep scan fills sizes —
the existing scan flow doesn't need any changes, just consumes
the new track-wrapper attribute.
Tests:
- `tests/test_library_disk_usage.py` — 13 cases covering schema
migration, NULL defaults on legacy inserts, fresh-install empty
shape, summing with mixed NULL/known sizes, per-format breakdown,
mixed-case extensions, paths with album-name dots, missing
extensions, empty file_path, implausibly long extensions,
JellyfinTrack.file_size persistence via insert_or_update_media_track,
COALESCE preservation on null re-sync.
- `tests/imports/test_import_side_effects.py` — extended the
existing record_soulsync_library_entry test to assert
track_row['file_size'] == os.path.getsize(final_path), pinning
the SoulSync-standalone path. Test fixture's tracks schema also
updated to include the file_size column.
Verified: full suite 1813 pass (13 new, 1 existing-test extension),
ruff clean, smoke test populating + reading the column round-trips
correctly.
WHATS_NEW entry under '2.4.2' dev cycle.
|
1 month ago |
|
|
4b15fe0b75 |
Fix album MBID inconsistency: detector + persistent release-MBID cache
Discord report (Samuel [KC]): tracks of the same album sometimes carry different MUSICBRAINZ_ALBUMID tags, which causes Navidrome (and other media servers grouping by album MBID) to split the album into multiple entries. Two-part fix — one for existing libraries, one for the root cause that lets new imports drift. Part 1 — Detector + fix action (catches existing dissenters): `core/repair_jobs/mbid_mismatch_detector.py`: - New helpers: `_read_album_mbid_from_file` and `_write_album_mbid_to_file` use the Picard-standard tag conventions (`TXXX:MusicBrainz Album Id` for MP3, `MUSICBRAINZ_ALBUMID` for FLAC/OGG, `----:com.apple.iTunes:MusicBrainz Album Id` for MP4). - New scan phase `_scan_album_mbid_consistency` runs after the existing track-MBID scan: groups tracks by DB `album_id`, reads each track's embedded album MBID, finds the consensus (most-common) MBID via `Counter`, flags dissenters. Tracks without an album MBID at all are skipped (they don't break Navidrome — only an explicit MBID disagreement does). Albums where MBIDs are perfectly tied (no clear consensus) are skipped too — surface as a manual decision instead of fixing toward a 1/N tie. - New finding type `album_mbid_mismatch` carries `consensus_mbid`, `wrong_mbid`, `consensus_count`, `total_tracks_with_mbid`, and a human-readable reason string. `core/repair_worker.py`: - Added `'album_mbid_mismatch': self._fix_album_mbid_mismatch` to the fix dispatch dict and to the `fixable_types` tuple so auto-fix + bulk-fix paths pick it up. - New `_fix_album_mbid_mismatch` method reads `consensus_mbid` from finding details, resolves the dissenter's file path via the shared library resolver, calls `_write_album_mbid_to_file` to rewrite the tag in place. Doesn't touch the album's other tracks (they're already in agreement). Part 2 — Root cause fix (prevents new SoulSync imports from drifting): The original in-memory `mb_release_cache` in `core/metadata/source.py` maps `(normalized_album, artist) -> release_mbid` so per-track enrichment of the same album hits the cache and writes the same MUSICBRAINZ_ALBUMID to every track. That cache is bounded (4096 entries) and in-process — so cache eviction (when other albums are processed in between) and server restart can BOTH cause inconsistency. Per-track album-name variation (e.g. some tracks tagged `"Album"`, others tagged `"Album (Deluxe)"`) and per-track artist variation (features) make it worse. `core/metadata/album_mbid_cache.py` (new module): - DB-backed `lookup(normalized_album, artist) -> release_mbid` and `record(...)` functions. Same key shape as the in-memory cache. - Strict additive design: every public function is wrapped in try/except and degrades to None / no-op on ANY database error. The existing in-memory cache + MusicBrainz lookup remains the authoritative fallback. If this module breaks, downloads continue exactly as they would today. `database/music_database.py`: - New `mb_album_release_cache` table with composite primary key `(normalized_album_key, artist_key)`. Reverse-lookup index on `release_mbid` for future debug tooling. Created via the existing `CREATE TABLE IF NOT EXISTS` migration pattern — idempotent, no schema version bump needed. `core/metadata/source.py`: - Surgical change inside the existing `embed_source_ids` in-memory-cache-miss branch: BEFORE calling MusicBrainz, consult the persistent cache. If a previous SoulSync run already resolved this album's release MBID, reuse it. After a successful MB lookup, store in BOTH caches. Both calls wrapped in defensive try/except so any failure falls through to existing logic. Tests: - `tests/metadata/test_album_mbid_cache.py` — 16 cache tests: round-trip, idempotent re-record, overwrite semantics, clear_all, album+artist independence (no Greatest Hits collisions), defensive None-on-empty-input, graceful degradation when the DB is unavailable / connection raises / commit fails, schema sanity (table + index exist after init). - `tests/test_album_mbid_consistency.py` — 13 detector tests: tag read/write round-trip on real FLAC files, Picard-standard tag descriptors, defensive paths (unreadable file, empty input), detector behavior (agreement → no flags, lone dissenter → flag, ties → no flag, single-track albums → skipped, no-MBID tracks → skipped, unresolvable file paths → skipped). - `tests/metadata/test_metadata_enrichment.py` — added autouse fixture monkeypatching the persistent cache to no-op for tests in this file. The existing tests pin per-call MB counts and in-memory cache state; without the fixture, persistent rows from earlier tests would bypass the MB call. Persistent layer has its own dedicated tests. Verified: 1782 tests pass (29 new), ruff clean, smoke test confirms end-to-end cache round-trip works. WHATS_NEW entry under '2.4.2' dev cycle. |
1 month ago |
|
|
34ba26f5c8 |
Persist source IDs at download time + backfill onto tracks on sync
Followup to fix/watchlist-external-id-match. The companion PR closed the demand side — the watchlist scanner asks for tracks by external IDs before falling back to fuzzy. But for users on Plex / Jellyfin / Navidrome the supply side was still broken: tracks.spotify_track_id (and the other ID columns) only got populated by the asynchronous enrichment workers, sometimes hours after the file was actually written. During that window the ID match fell through to fuzzy and the bug returned. We were already collecting every ID during post-processing — they live in the `pp` dict in core/metadata/source.py:embed_source_ids and get embedded into file tags. We just dropped the in-memory copy afterwards. This PR persists them and uses them: - Schema migration adds spotify_track_id / itunes_track_id / deezer_track_id / tidal_track_id / qobuz_track_id / musicbrainz_recording_id / audiodb_id / soul_id / isrc columns + indexes to the existing track_downloads table (already keyed by file_path). - core/metadata/source.py:embed_source_ids exposes pp["id_tags"] and the resolved ISRC back to the import context as _embedded_id_tags / _isrc. - core/imports/side_effects.py:record_download_provenance reads those context fields and passes them to db.record_track_download, which now accepts the new ID kwargs and persists them. - New db.get_provenance_by_file_path with exact + basename-suffix fallback (handles container mount-root differences between download-time path and media-server-reported path). - New db.backfill_track_external_ids_from_provenance copies IDs from track_downloads onto a tracks row idempotently — COALESCE on every column preserves any value the enrichment worker already wrote (enrichment is more authoritative for late binding). - database/music_database.py:insert_or_update_media_track (the single insertion point used by every Plex / Jellyfin / Navidrome sync) calls the backfill immediately after each INSERT/UPDATE. - New core/library/track_identity.py:find_provenance_by_external_id used as a second-tier fallback in watchlist_scanner.is_track_missing _from_library — catches the window between download and media-server sync. Caller checks os.path.exists on the provenance file_path before treating it as "already in library" so a deleted file doesn't prevent re-download. Effect: freshly downloaded files become ID-recognizable to the watchlist on the very next scan, no enrichment-wait window. 19 regression tests in tests/test_provenance_id_persistence.py: - Schema migration adds expected columns + indexes - record_track_download persists every ID kwarg - record_track_download backward-compat (old kwargs still work) - get_provenance_by_file_path: exact match, basename fallback for mount-root differences, multi-record latest-wins, defensive None - backfill: copies all IDs, preserves existing via COALESCE, no-op when no provenance exists - find_provenance_by_external_id: per-ID lookup, ISRC cross-bridge, OR semantics, latest-wins on multiple matches Out of scope: backfilling provenance for files downloaded BEFORE this PR (their track_downloads rows don't carry the new IDs). Those continue to wait for enrichment. Acceptable — only affects historical files; new downloads benefit immediately. Full pytest 1625 passed; ruff clean. |
1 month ago |
|
|
ddef904414 |
Match featured-artist tracks across discography completion
Discord-reported scenario: a single "Super Single" by Artist1 feat.
Artist2 is also on Artist1's "Super Album". When the album is fully
owned, Artist1's discography correctly shows the single as complete,
but Artist2's discography (where the same track also appears as a
single) shows it as missing.
Two layers needed for the fix:
Scanner: the Jellyfin/Emby path was keeping only ArtistItems[0],
which is almost always equal to the album artist — so the
distinguishing per-track credit was silently suppressed. Now joins
every ArtistItems entry with "; " and stores the value when there
are multiple credits OR when the single credit differs from the
album artist. Plex's originalTitle already carries the full multi-
artist tag, so Plex users benefit without needing the scanner change.
Scorer: _calculate_track_confidence now splits track_artist on the
common multi-artist delimiters real-world tags use (",", ";", "&",
"feat.", "ft.", "featuring", "vs.", "x") and scores each piece
independently against the search artist, taking the max along with
the whole-string similarity as the floor. Never reduces a score —
purely additive matching for previously-missed featured-artist
credits.
Adds 12 regression tests covering the reported scenario, primary-
artist back-compat, every delimiter variant (parametrized), no-
regression on exact match, and the scanner storing every ArtistItem.
Existing Jellyfin-scanned rows persist their old single-artist value
until the next library scan rewrites them; Plex rows benefit
immediately on next match without needing a rescan.
|
1 month ago |
|
|
345273df22 |
Match soundtrack tracks against per-track artist, fix dead fallback
Two bugs surfacing the same user-reported symptom: a Vaiana OST
track ("Where You Are" by Christopher Jackson) wouldn't match against
a Plex/Emby library because the album sits under the album artist
(Lin-Manuel Miranda).
Bug 1: the data was already there but scoring ignored it. The DB
schema has a tracks.track_artist column, the scanner populates it
from Plex's originalTitle and Jellyfin's ArtistItems[0], and the SQL
WHERE clause already searches it — but _rows_to_tracks dropped the
column on its way to the Python object, and _calculate_track_confidence
only scored against the album-artist JOIN. Candidates whose track-
artist matched got returned by the search and then immediately
filtered out by the low confidence score.
Fix: _rows_to_tracks now propagates row['track_artist'] onto the
returned object, and _calculate_track_confidence takes the better of
(album-artist similarity, track-artist similarity) so soundtracks
match through whichever credit the search query carries.
Bug 2: the album-aware fallback path constructed DatabaseTrack with
kwargs the dataclass doesn't accept (artist_name, album_title,
server_source). Every row TypeError'd, the outer except swallowed it
silently, and the fallback never matched anything since the column
was added — invisible because nothing logged it.
Fix: build DatabaseTrack with valid fields and attach the joined
columns afterwards, the same pattern _rows_to_tracks uses.
Adds 6 regression tests covering: track-artist match (the OST case),
album-artist still matches, scorer takes the better of the two,
defensive handling for tracks without track_artist, search-path
attribute propagation, and the previously-dead album-aware fallback.
|
1 month ago |
|
|
0fa692f935
|
Make wishlist respect configured providers
- add neutral wishlist payload helpers while keeping legacy Spotify aliases - route wishlist removal and classification through generic track data - keep API and service compatibility for existing callers |
2 months ago |
|
|
7f94597706 |
validate hifi instance reorder against pre-existing instances
|
2 months ago |
|
|
eedd040318 |
update hifi db methods to return, rather than quash, sqlite errors
|
2 months ago |
|
|
788b7011d0 |
fix hifi instance reorder and enable/disable
|
2 months ago |
|
|
6ae1cb471e |
user-editable hifi instances
|
2 months ago |
|
|
37aefd2ff1 |
Reorganize queue: race + dedupe fixes from kettui review
Five issues kettui flagged on PR #377: - Worker race (reorganize_queue.py): _next_queued() picked an item and released the lock, then re-acquired to flip status='running'. A cancel() landing in that window marked the item cancelled but the worker still ran it. Replaced with _claim_next_or_wait() that picks AND flips under one lock acquisition. - Wakeup race (reorganize_queue.py): _wakeup.clear() after the empty check could lose an enqueue's _wakeup.set(), parking a freshly-queued album for up to 60 seconds. Replaced Lock + Event with a single threading.Condition; cond.wait() releases and re-acquires atomically on notify. - Bulk dedupe (reorganize_queue.py:enqueue_many): looped single-item enqueue, so a duplicate album_id later in the same batch could slip through if the worker finished the first copy before the loop reached the second. Now holds the lock for the whole batch and tracks a per-batch seen set, so intra-batch duplicates dedupe against each other and not just pre-existing items. - Preview button stuck disabled (library.js:loadReorganizePreview): early returns and thrown errors skipped the re-enable line. Moved state into a canApply flag committed in finally, so any exit path lands the button correctly. - DB helpers swallowing failures (music_database.py): get_album_display_meta and get_artist_albums_for_reorganize used to catch every Exception and return None / [], so a real DB outage masqueraded as "album not found" / "no albums". Now lets exceptions bubble; the route layer already wraps them as 500. Tests: - test_cancel_and_run_are_mutually_exclusive — hammers enqueue+cancel pairs and asserts the invariant that no successfully-cancelled item ever ran (catches regressions to the atomic pick). - test_enqueue_many_dedupes_batch_internal_duplicates — pins the intra-batch dedupe. - test_get_album_display_meta_propagates_db_errors and test_get_artist_albums_for_reorganize_propagates_db_errors — pin the bubble-up behavior. Changelog updated in helper.js and version modal. |
2 months ago |
|
|
d6094a3587 |
Library reorganize: FIFO queue with live status panel
Replaces the single-slot "one reorganize at a time, return 409 on collision" model with a per-user FIFO queue. Buttons stay clickable, "Reorganize All" is one backend call instead of an N-call JS loop, and a status panel mounted at the top of the artist actions bar shows live progress (active item, queued count, recent completions) with per-item cancel buttons. Backend - core/reorganize_queue.py: singleton queue + worker thread, dedupe-on- enqueue, cancel rules (queued cancellable, running not), enqueue_many for bulk operations, progress fan-out via update_active_progress - core/reorganize_runner.py: factory builds the worker's runner closure with injected dependencies. Reads config per-call so changing the download path in Settings takes effect on the next reorganize without a server restart - database/music_database.py: get_album_display_meta and get_artist_albums_for_reorganize — moves the SQL out of route handlers - web_server.py: thin enqueue/snapshot/cancel/clear endpoints, runner registration at module load. Old _reorganize_state globals + status endpoint deleted. Static-asset cache buster (?v=<server-start>) added so JS/CSS updates ship live without users clearing cache Frontend - webui/static/library.js: status panel mount, polling (1.5s when active, 8s when idle), expand/collapse, per-item cancel, debounced enhanced-view reload (one reload per artist batch instead of N). Per-album reorganize button paints with queued/running indicator and short-circuits to a toast when the album is already in queue - webui/static/style.css: panel + button styling matching the existing glass-UI accents - webui/static/helper.js + version modal: WHATS_NEW entry Tests (22 new) - tests/test_reorganize_queue.py (19 tests): FIFO order, dedupe, per-item source, cancel rules, continue-on-failure, snapshot shape, progress propagation, bulk enqueue - tests/test_reorganize_runner.py (4 tests): per-call config reads, setup-failure summary, dependency injection, progress fan-out - tests/test_reorganize_db_methods.py (7 tests): SQL JOIN behavior, ordering, fallback for blank strings, artist isolation Full suite 549 passed in 27s. |
2 months ago |
|
|
751b19c7b1 |
Preserve api_track_count across Plex ratingKey rekeys
Reported by kettui on PR #374 review: > api_track_count is not copied during the ratingKey migration, so > the cache disappears when an album row is rekeyed. Add it to > enrichment_cols or the next completeness scan will fall back to > live API lookups again. When Plex changes an album's ratingKey (after a library rescan), the sync code rekeys the album row by inserting a new row at the new ID and copying enrichment columns from the old row. The list of columns to copy did not include `api_track_count`, so the cached authoritative track count was lost on rekey — and the next completeness scan would hit the fallback path that calls back out to the metadata source's API. Defeats the cache. Added `api_track_count` to the album-level `enrichment_cols` at `music_database.py:4724`. The artist-level lists at lines 4238 and 4554 don't need updating — those are for artist rekeys and don't carry album-scoped fields. No new test — existing migration code has no test infrastructure and writing a Plex-mocked one is larger than this fix. Cin will say if he wants test coverage in his next review pass. Credit: kettui — PR #374 review comment that flagged the missing column in the rekey allowlist. |
2 months ago |
|
|
a60546929e |
Fix Album Completeness job reporting zero findings for every album
Reported by sassmastawillis: the Album Completeness maintenance job
scans 3127 albums in 0.1 seconds and reports 0 findings — for every
user, regardless of whether their library is actually complete.
Restoring an older DB surfaced 7 correct findings, so the code logic
works; the DB state is what's making everything look complete.
Root cause: `albums.track_count` is only ever written by server-sync
paths — Plex's `leafCount`/`childCount` and SoulSync standalone's
`len(tracks)`. It's the OBSERVED count of tracks SoulSync has indexed,
which is always exactly what `COUNT(tracks)` returns for that album.
The completeness job treated it as the EXPECTED total and compared it
against the observed count. They're equal by construction, so
`actual >= expected` is always true: skip, 0.1s scan, 0 findings.
Fix: new `api_track_count INTEGER` column on `albums`, written only by
metadata-source code paths. Populated in two places so the scan is
fast and the fallback is robust.
1. Enrichment workers — shared helper `set_album_api_track_count`
in `core/worker_utils.py`. Called by each worker's existing
`_update_album` method alongside its other album-column UPDATEs:
- spotify_worker: `album_obj.total_tracks` from the Spotify Album
dataclass (already in hand, zero new API calls)
- itunes_worker: same, from the iTunes Album dataclass
- deezer_worker: `nb_tracks` from full_data, falling back to
search_data when the full lookup didn't run
- discogs_worker: count of tracklist rows where `type_=='track'`
(Discogs tracklists interleave heading and index rows that
shouldn't count as songs)
Helper skips the write on zero/None/negative/non-numeric inputs
so a source lacking track info can't clobber a good value a
different source already wrote. Caller owns the transaction —
helper just queues an UPDATE on the caller's cursor without
committing, so it batches cleanly with each worker's existing
multi-UPDATE pattern.
Hydrabase worker deliberately not touched — it's a P2P mirror
that doesn't write album metadata to the local DB. Hydrabase-
primary users hit the fallback path below.
2. Album Completeness repair job — new `al.api_track_count` column
in the SELECT, read first in the scan loop. On miss (album never
enriched, or enrichment workers haven't run yet on a fresh
install), falls through to the existing `_get_expected_total()`
API lookup and persists the result via the same shared helper
(wrapped in connection/commit management since the repair job
runs outside a worker's batched transaction).
Also removed `al.track_count` from the scan's SELECT — now unused
since the observed count was the whole source of this bug, and
leaving a dead SELECT would invite a future engineer to re-introduce
the same comparison.
Help text on the job card was reworded so it honestly describes
current behavior ("counts cached during normal enrichment are used
when available; otherwise the job queries a metadata source
directly") rather than the old "active provider first, then others
as fallback" phrasing, which doesn't match how the cache actually
fills — any enrichment worker that runs can populate it, and the
last writer wins. Document-only follow-up if this edge case ever
bites in practice: add a `api_track_count_source` column so the
scan can prefer the configured primary source's count over others
(e.g. deluxe vs. standard edition mismatches). Not worth the
complexity today.
For existing users, the first completeness scan after upgrade is
fast to the extent their library is already enriched: the workers
already ran and populated `api_track_count` on their normal schedule.
For brand-new installs, the scan's fallback path handles the cold
start — slower, but correct, and subsequent scans are fast.
Does NOT affect:
- Download / post-processing / wishlist / sync code paths — none
of them read `track_count` for completeness semantics.
- Plex / Jellyfin / Navidrome / standalone sync — still write
`track_count` exactly as before; `api_track_count` is a separate
column they never touch.
- Other repair jobs.
- Any UI path — same finding schema, just correct counts now.
Files:
- database/music_database.py — idempotent migration adding
`api_track_count INTEGER DEFAULT NULL` to the existing album-column
check block.
- core/worker_utils.py — new `set_album_api_track_count` helper with
the documented skip-on-bad-input contract.
- core/spotify_worker.py, itunes_worker.py, deezer_worker.py,
discogs_worker.py — one-liner call from each `_update_album`.
- core/repair_jobs/album_completeness.py — scan uses the cache;
fallback path persists API-lookup results via the shared helper;
help text updated to match actual behavior.
- tests/test_worker_utils_album_track_count.py — 9 tests covering
the helper's write/skip contract + no-commit invariant.
- tests/test_album_completeness_job.py — 2 tests for the repair
job's fallback-path wrapper.
- webui/static/helper.js — WHATS_NEW entry.
Credit: sassmastawillis spotted the bug; the "restored older DB
finds 7 albums" signal pinpointed DB state over code logic and
made the diagnosis tractable.
|
2 months ago |