SoulSync

Commit Graph

Author	SHA1	Message	Date
Broque Thomas	54dbd150cb	Preserve full release dates in audio tags	1 week ago
Broque Thomas	0769fcd5cc	Fix Soulseek downloads losing collab artist tags Soulseek matched-download contexts populate `original_search_result` with `artist` (singular string) and no `artists` list — the full multi-artist array lives on `track_info` (the matched Spotify track object). `extract_source_metadata` only read `original_search.artists`, so the Soulseek path always fell through to the single-artist branch and TPE1 ended up with the primary artist only. Deezer-direct downloads were unaffected because their context populates `original_search.artists` as a proper list. Lifted artist resolution into a pure helper `core/metadata/artist_resolution.py:resolve_track_artists` that walks `original_search.artists` → `track_info.artists` → `artist_dict.name` fallback chain. Normalizes mixed list-item shapes (Spotify-style dicts, bare strings, anything else stringified) and drops empty entries. 13 new tests pin the resolution order, fallback chain, mixed-shape normalization, whitespace stripping, and empty/none handling. The existing `_artists_list` no-fall-through test in `test_multi_artist_tag_settings.py` was updated to reflect the new contract (always populated; multi-value write still gated on `len > 1`) plus a new regression test for the Soulseek shape. Composes with the existing Deezer per-track upgrade (still fires when single-artist + track_id available) and feat_in_title / artist_separator settings (still drive the joined ARTIST string downstream).	2 weeks ago
Broque Thomas	d5de724f9b	Multi-artist Deezer upgrade + double-append guard hardening Two follow-ups to the multi-artist tag settings PR: 1. Deezer contributors upgrade — closes the "known limitation" flagged in the prior commit. Deezer's `/search` endpoint only returns the primary artist for each track; the full contributors array (feat., remix collaborators, producers credited as artists) lives on `/track/<id>` and gets parsed by `_build_enhanced_track`. Without the upgrade Deezer-sourced tracks never got multi-artist tags even with the right settings on. Fix in `core/metadata/source.py`: when source==deezer AND the search response had a single artist AND a track_id is available, fetch full track details via `get_deezer_client().get_track_details` and replace `all_artists` with the upgraded list. - One extra API call per affected Deezer track - Skipped when search already returned multiple (no-op fast path) - Skipped for non-Deezer sources (Spotify/Tidal/iTunes search responses already include all artists) - Skipped when no track_id is available - Defensive try/except: on /track/<id> failure (network error, deezer client unavailable), fall through to the search-result list — never lose the data we already had 2. Double-append guard hardened with a word-boundary regex. Prior commit checked for `"feat." not in title.lower() and "(ft." not in title.lower()` — too narrow. Source platforms produce wildly different feat-marker conventions: "(feat. X)", "(Feat X)", "(FEAT X)", "(Featuring X)", "[feat. X]", "ft. X" (no parens), "FT. X", etc. Any of these as the SOURCE title would cause a double-append: `"Track (Feat X) (feat. Y)"`. Replaced with `re.search(r'\b(?:feat\|feat\.\|featuring\|ft\|ft\.)\b', title, IGNORECASE)`. Word-boundary regex catches every common variant. Substring matches like "Aftermath" containing `ft` correctly fall through to the append path (pinned by a regression test). 16 new tests (29 total in the file): - 9 parametrized variants of the double-append guard - 1 substring guard ("Aftermath") - 6 Deezer upgrade scenarios (fires when expected, doesn't fire for non-Deezer / multi-artist search / no track_id, defensive fall-through on failure, no false-positive when /track/<id> confirms single artist) Full pytest 2727 passed.	2 weeks ago
Broque Thomas	c11a5b7eab	Multi-artist tag settings: implement artist_separator + feat_in_title + populate _artists_list Three settings on Settings → Metadata → Tags were partially or completely unimplemented. Reporter (Netti93) traced each one. (1) `write_multi_artist` only "worked" because of a never-populated `_artists_list` field. `core/metadata/source.py` built `metadata["artist"]` as a hardcoded ", "-joined string but never assigned `metadata["_artists_list"]`. `core/metadata/enrichment.py` line 107 reads that field and gates the multi-value tag write on `len(_artists_list) > 1` — always saw an empty list, silently no-op'd the write. (2) `artist_separator` (default ", ") was referenced in the UI + settings.js save path but ZERO Python code read the value. Every multi-artist track ended up with hardcoded ", " regardless of what the user picked. (3) `feat_in_title` (when true: pull featured artists into the title as " (feat. X, Y)" and leave only primary in the ARTIST tag — Picard convention) had no implementation at all. Fix in source.py: * Populate `_artists_list` from the search response's artists array * Read `feat_in_title` and `artist_separator` configs * When `feat_in_title=True` and >1 artist: ARTIST = primary only, append "(feat. X, Y)" to title with double-append guard * Else: ARTIST = artists joined with `artist_separator` * Single-artist case unaffected by either setting Double-append guard uses a word-boundary regex catching all common "feat" variants source platforms produce — `feat`, `feat.`, `featuring`, `ft`, `ft.` — case-insensitive. Substring matches (e.g. "Aftermath" containing "ft") correctly fall through to the append path. Fix in enrichment.py ID3 branch: * TPE1 stays as the display string (with separator or primary-only per the user's settings) * Multi-value list goes to a separate `TXXX:Artists` frame (Picard convention) when `write_multi_artist` is on * Pre-fix the ID3 path wrote TPE1 twice — single-string then list — and the second `add` overwrote the first, clobbering both the configured separator AND the feat_in_title semantics. Vorbis path was already correct (separate "artist" + "artists" keys). Known limitation (flagged in WHATS_NEW): Deezer's `/search` endpoint only returns the primary artist. The full contributors array lives on `/track/<id>`. Enrichment uses search-result data so Deezer- sourced tracks may still get only the primary artist until a follow- up commit wires the per-track contributors fetch into the enrichment flow. Spotify, Tidal, and iTunes search responses include all artists so they work now. 23 new tests in `tests/metadata/test_multi_artist_tag_settings.py`: * `_artists_list` populated for multi/single/no-artist cases * `artist_separator` drives ARTIST string (default ", " + custom ";" + custom "; " + " & ") * Single-artist case unaffected by either setting * `feat_in_title=True` pulls featured to title, leaves primary in ARTIST * `feat_in_title` no-op for single artist * Double-append guard recognizes 9 source-title variants ("(feat. X)", "(Feat. X)", "(FEAT X)", "(feat X)", "(Featuring X)", "[feat. X]", "ft. X", "(ft X)", "FT. X") * Substring guard test pins "Aftermath" doesn't false-positive * Combined-settings precedence: feat_in_title wins ARTIST string but `_artists_list` carries everyone for multi-value tag Full pytest 2711 passed.	2 weeks ago
Broque Thomas	aa54bed818	Surface silent exceptions across remaining modules — ~70 sites Final sweep. Covers: - Downloads: candidates / lifecycle / master / monitor / wishlist_failed - Metadata: source / registry / cache / common / artwork (+ plex_client) - Imports: pipeline / resolution / file_ops / paths / guards - Library: path_resolver / retag / duplicate_cleaner - Stats / playlists / wishlist / discovery / automation / enrichment - Misc: hydrabase_client, soulsync_client, tag_writer, debug_info, api_call_tracker, album_consistency, beatport_unified_scraper, reorganize_runner, seasonal_discovery, lidarr_download_client, services/sync_service.py, automation_engine, automation/progress Two `_e` renames in imports/file_ops.py (outer scope binding `e`). A few finally-block sites in metadata/album_mbid_cache.py, library/track_identity.py, listening_stats_worker.py, watchlist/ auto_scan.py left silent — same reason as the rest of the sweep (logger calls during cleanup paths can themselves raise). Refs #369	3 weeks ago
Broque Thomas	4b15fe0b75	Fix album MBID inconsistency: detector + persistent release-MBID cache Discord report (Samuel [KC]): tracks of the same album sometimes carry different MUSICBRAINZ_ALBUMID tags, which causes Navidrome (and other media servers grouping by album MBID) to split the album into multiple entries. Two-part fix — one for existing libraries, one for the root cause that lets new imports drift. Part 1 — Detector + fix action (catches existing dissenters): `core/repair_jobs/mbid_mismatch_detector.py`: - New helpers: `_read_album_mbid_from_file` and `_write_album_mbid_to_file` use the Picard-standard tag conventions (`TXXX:MusicBrainz Album Id` for MP3, `MUSICBRAINZ_ALBUMID` for FLAC/OGG, `----:com.apple.iTunes:MusicBrainz Album Id` for MP4). - New scan phase `_scan_album_mbid_consistency` runs after the existing track-MBID scan: groups tracks by DB `album_id`, reads each track's embedded album MBID, finds the consensus (most-common) MBID via `Counter`, flags dissenters. Tracks without an album MBID at all are skipped (they don't break Navidrome — only an explicit MBID disagreement does). Albums where MBIDs are perfectly tied (no clear consensus) are skipped too — surface as a manual decision instead of fixing toward a 1/N tie. - New finding type `album_mbid_mismatch` carries `consensus_mbid`, `wrong_mbid`, `consensus_count`, `total_tracks_with_mbid`, and a human-readable reason string. `core/repair_worker.py`: - Added `'album_mbid_mismatch': self._fix_album_mbid_mismatch` to the fix dispatch dict and to the `fixable_types` tuple so auto-fix + bulk-fix paths pick it up. - New `_fix_album_mbid_mismatch` method reads `consensus_mbid` from finding details, resolves the dissenter's file path via the shared library resolver, calls `_write_album_mbid_to_file` to rewrite the tag in place. Doesn't touch the album's other tracks (they're already in agreement). Part 2 — Root cause fix (prevents new SoulSync imports from drifting): The original in-memory `mb_release_cache` in `core/metadata/source.py` maps `(normalized_album, artist) -> release_mbid` so per-track enrichment of the same album hits the cache and writes the same MUSICBRAINZ_ALBUMID to every track. That cache is bounded (4096 entries) and in-process — so cache eviction (when other albums are processed in between) and server restart can BOTH cause inconsistency. Per-track album-name variation (e.g. some tracks tagged `"Album"`, others tagged `"Album (Deluxe)"`) and per-track artist variation (features) make it worse. `core/metadata/album_mbid_cache.py` (new module): - DB-backed `lookup(normalized_album, artist) -> release_mbid` and `record(...)` functions. Same key shape as the in-memory cache. - Strict additive design: every public function is wrapped in try/except and degrades to None / no-op on ANY database error. The existing in-memory cache + MusicBrainz lookup remains the authoritative fallback. If this module breaks, downloads continue exactly as they would today. `database/music_database.py`: - New `mb_album_release_cache` table with composite primary key `(normalized_album_key, artist_key)`. Reverse-lookup index on `release_mbid` for future debug tooling. Created via the existing `CREATE TABLE IF NOT EXISTS` migration pattern — idempotent, no schema version bump needed. `core/metadata/source.py`: - Surgical change inside the existing `embed_source_ids` in-memory-cache-miss branch: BEFORE calling MusicBrainz, consult the persistent cache. If a previous SoulSync run already resolved this album's release MBID, reuse it. After a successful MB lookup, store in BOTH caches. Both calls wrapped in defensive try/except so any failure falls through to existing logic. Tests: - `tests/metadata/test_album_mbid_cache.py` — 16 cache tests: round-trip, idempotent re-record, overwrite semantics, clear_all, album+artist independence (no Greatest Hits collisions), defensive None-on-empty-input, graceful degradation when the DB is unavailable / connection raises / commit fails, schema sanity (table + index exist after init). - `tests/test_album_mbid_consistency.py` — 13 detector tests: tag read/write round-trip on real FLAC files, Picard-standard tag descriptors, defensive paths (unreadable file, empty input), detector behavior (agreement → no flags, lone dissenter → flag, ties → no flag, single-track albums → skipped, no-MBID tracks → skipped, unresolvable file paths → skipped). - `tests/metadata/test_metadata_enrichment.py` — added autouse fixture monkeypatching the persistent cache to no-op for tests in this file. The existing tests pin per-call MB counts and in-memory cache state; without the fixture, persistent rows from earlier tests would bypass the MB call. Persistent layer has its own dedicated tests. Verified: 1782 tests pass (29 new), ruff clean, smoke test confirms end-to-end cache round-trip works. WHATS_NEW entry under '2.4.2' dev cycle.	3 weeks ago
Broque Thomas	34ba26f5c8	Persist source IDs at download time + backfill onto tracks on sync Followup to fix/watchlist-external-id-match. The companion PR closed the demand side — the watchlist scanner asks for tracks by external IDs before falling back to fuzzy. But for users on Plex / Jellyfin / Navidrome the supply side was still broken: tracks.spotify_track_id (and the other ID columns) only got populated by the asynchronous enrichment workers, sometimes hours after the file was actually written. During that window the ID match fell through to fuzzy and the bug returned. We were already collecting every ID during post-processing — they live in the `pp` dict in core/metadata/source.py:embed_source_ids and get embedded into file tags. We just dropped the in-memory copy afterwards. This PR persists them and uses them: - Schema migration adds spotify_track_id / itunes_track_id / deezer_track_id / tidal_track_id / qobuz_track_id / musicbrainz_recording_id / audiodb_id / soul_id / isrc columns + indexes to the existing track_downloads table (already keyed by file_path). - core/metadata/source.py:embed_source_ids exposes pp["id_tags"] and the resolved ISRC back to the import context as _embedded_id_tags / _isrc. - core/imports/side_effects.py:record_download_provenance reads those context fields and passes them to db.record_track_download, which now accepts the new ID kwargs and persists them. - New db.get_provenance_by_file_path with exact + basename-suffix fallback (handles container mount-root differences between download-time path and media-server-reported path). - New db.backfill_track_external_ids_from_provenance copies IDs from track_downloads onto a tracks row idempotently — COALESCE on every column preserves any value the enrichment worker already wrote (enrichment is more authoritative for late binding). - database/music_database.py:insert_or_update_media_track (the single insertion point used by every Plex / Jellyfin / Navidrome sync) calls the backfill immediately after each INSERT/UPDATE. - New core/library/track_identity.py:find_provenance_by_external_id used as a second-tier fallback in watchlist_scanner.is_track_missing _from_library — catches the window between download and media-server sync. Caller checks os.path.exists on the provenance file_path before treating it as "already in library" so a deleted file doesn't prevent re-download. Effect: freshly downloaded files become ID-recognizable to the watchlist on the very next scan, no enrichment-wait window. 19 regression tests in tests/test_provenance_id_persistence.py: - Schema migration adds expected columns + indexes - record_track_download persists every ID kwarg - record_track_download backward-compat (old kwargs still work) - get_provenance_by_file_path: exact match, basename fallback for mount-root differences, multi-record latest-wins, defensive None - backfill: copies all IDs, preserves existing via COALESCE, no-op when no provenance exists - find_provenance_by_external_id: per-ID lookup, ISRC cross-bridge, OR semantics, latest-wins on multiple matches Out of scope: backfilling provenance for files downloaded BEFORE this PR (their track_downloads rows don't carry the new IDs). Those continue to wait for enrichment. Acceptable — only affects historical files; new downloads benefit immediately. Full pytest 1625 passed; ruff clean.	3 weeks ago
elmerohueso	cd19aa0301	revert tidal artist/track id name for hifi downloads Co-authored-by: Copilot <copilot@github.com>	3 weeks ago
elmerohueso	4ddb86522c	name tidal and hifi tags the same way	3 weeks ago
elmerohueso	e78dd7f593	get tidal tags during download, without needing to go through the enrichment pipeline	3 weeks ago
elmerohueso	1f4e8e5e3b	get hifi tags during download, without needing to go through the enrichment pipeline	3 weeks ago
elmerohueso	b363afe195	bpm for tidal, copyright and bpm for hifi	3 weeks ago
elmerohueso	f9f47f978e	fix post-download tagging, and enable tagging for hifi	3 weeks ago
Antti Kettunen	a759f778b6	Move metadata API into package - add package-owned metadata API, cache, registry, and lookup modules - keep legacy metadata_service and metadata_cache paths as explicit shims - update metadata call sites and tests to use package-owned helpers	4 weeks ago
Broque Thomas	c121582557	MusicBrainz genres: fall back to release then artist when recording is empty User report: SoulSync was only pulling MusicBrainz genres from the recording (track-level) endpoint. Most MB recordings don't carry genres at the track level — they live on the release (album) or artist. So the MB tier was contributing nothing to the genre merge for the overwhelming majority of tracks. Fix: - Added `'genres'` to the release-detail `includes` (was missing). - After release-detail processing, if pp['mb_genres'] is still empty, populate from release_detail['genres'] (sorted by count desc). - If still empty AND artist_mbid is set, fetch artist with `includes=['genres']` and use those. No extra API call when the recording (or release) already had genres — the artist fetch only fires when both upstream tiers came back empty. The downstream genre merge in _embed_metadata_genres is unchanged; this just makes the MB feed into it richer. Tests: 4 new (recording present, recording empty → release, recording + release empty → artist, all empty → []). Full suite 873 passing. Ruff clean. Reported by @kcaoyef421 in Discord.	4 weeks ago
Antti Kettunen	02305096a3	Tighten metadata and import safety - Normalize album import track display handling so queue labels and match rows stay consistent - Bound MusicBrainz caches and avoid caching transient lookup failures - Stop swallowing programmer errors in source enrichment helpers - Restore import config test seams without reintroducing lazy imports - Guard task completion calls and fix the Windows path test expectation - Keep file lock tracking from growing without bound	4 weeks ago
Antti Kettunen	9e496397da	Move shared metadata helpers into package - Relocate the shared metadata helper module from core/metadata_common.py into core/metadata/common.py. - Update the new metadata package, the import pipeline, and the web entrypoint to use the package-scoped helper. - Keep the shared config, mutagen, file-lock, and tag-writing helpers centralized without touching unrelated files.	4 weeks ago
Antti Kettunen	9656dbd46a	Thread runtime through metadata enrichment - Pass the live runtime bundle into the shared metadata facade so worker-backed source enrichment can actually run. - Forward runtime from the import pipeline and web-server wrapper into embed_source_ids. - Add a regression test that verifies the runtime object reaches the source-ID embedding path.	4 weeks ago
Antti Kettunen	8319c6679f	Move new metadata helpers into a package - Keep existing metadata_cache and metadata_service at the top level for now - Move the new branch-local metadata helpers under core/metadata - Share MusicBrainz release cache state from core.metadata.source and update import sites	4 weeks ago

19 Commits (dev)