SoulSync

Commit Graph

Author	SHA1	Message	Date
Broque Thomas	f9f74ac511	Lift auto-import matching to testable helper + pin contracts Cin-pass on the #524 + multi-disc fixes. Pre-merge polish. Lifts: `core/imports/album_matching.py` `AutoImportWorker._match_tracks` was a 100+-line method buried in a 1400-line class. Testing it required monkey-patching `_read_file_tags` + mocking the metadata client just to exercise the matching algorithm. Per Cin's "lift logic out of monolithic classes" pattern (same shape as the album-info builders / discography / quality scanner lifts), moved the dedup + scoring into `core/imports/album_matching.py` as pure functions over already-fetched data. Helper exposes: - Constants for every match weight (TITLE_WEIGHT, ARTIST_WEIGHT, POSITION_WEIGHT, NEAR_POSITION_WEIGHT, CROSS_DISC_POSITION_WEIGHT, ALBUM_WEIGHT, MATCH_THRESHOLD). Magic numbers killed. - `dedupe_files_by_position(audio_files, file_tags, , quality_rank)` — position-keyed quality dedup. - `score_file_against_track(file_path, file_tags, track, , target_album, similarity)` — pure per-(file, track) scorer. - `match_files_to_tracks(audio_files, file_tags, tracks, , target_album, similarity, quality_rank)` — full matching with greedy best-per-track + first-come-first-serve over deduped files. Worker shrinks from 100 lines of inline algorithm to 8 lines that fetch tags + delegate to the helper. Tests added (26 new across 3 files): `tests/imports/test_album_matching_helper.py` (19 tests): - Constants pin: weights sum to 1.0, threshold above position-only - `dedupe_files_by_position`: quality wins, cross-disc preserved, tag-less files passed through, first-wins on equal quality - `score_file_against_track`: perfect-agreement = 1.0, position needs both disc+track, near-position only same-disc, missing artist tags handled, disc field aliases (Spotify/Deezer/iTunes), filename fallback when title tag missing - `match_files_to_tracks`: happy path, file used at-most-once, below-threshold left unmatched - Edge case Cin would flag: tag-less file with strong filename title matches multi-disc album track via title alone (perfect-name scenario works); tag-less file with weak filename title against multi-disc API correctly stays unmatched (the behavior delta from the disc-aware fix — pinned so future readers see it's intentional) `tests/test_import_album_match_endpoint.py` (3 tests): - Backend warning fires when source missing from match POST - No warning fires on the legit path (catches noisy-warning regression) - Endpoint actually forwards source/name/artist to the payload builder (catches "logging the right warning but doing the wrong lookup" regression) `tests/test_import_page_album_lookup_pattern.py` (4 tests): - Source-text guard for the import-page #524 fix in stats-automations.js. Until the file is modularized enough for a behavioral JS test (under the existing tests/static/.mjs pattern), regex-based assertions pin: the `_albumLookup` field exists, the click handler reads from it, both card renderers populate it before emitting onclick, and the cache stores `source` per entry. Caveat documented in the test module docstring. Verification: - All 26 new tests pass. - Existing multi-disc tests (test_auto_import_multi_disc_matching.py) still pass after the lift — proves the helper is behavior-equivalent to the inline implementation it replaced. - Full suite: 2293 passed, 1 flaky-timing failure (test_library_reorganize_orchestrator.py::test_watchdog_warns_about_stuck_workers — passes in isolation, fails only in full-suite runs, pre-existing, unrelated to this PR). - Ruff clean. Notes for the reviewer: - The frontend stats-automations.js JS test is structural-only. Behavioral JS testing for that file requires modularizing the ~7k-line monolith first — out of scope for this fix. - The cross-disc 5% consolation bonus is a small behavior change for users with weak/missing tag info on multi-disc albums. Pinned explicitly in `test_tagless_file_with_weak_title_unmatched_in_multidisc` so the trade-off is visible: correct multi-disc matching wins over optimistic position-only matching that produced wrong-disc files.	1 month ago
Broque Thomas	c03edc3cb4	Auto-import: respect disc_number in dedup + match scoring Caught while live-testing the #524 fix with kendrick lamar mr morale & the big steppers (3 discs). User dropped discs 1+2 loose in staging root + disc 3 in its own folder, every file perfectly tagged with disc_number/track_number/title — only 9 tracks ended up in the library, the rest got integrity-rejected and quarantined. Two related bugs in `AutoImportWorker._match_tracks`: 1. Quality dedup keyed on track_number alone. The dedup loop kept `seen_track_nums[track_number] = file` and dropped any later file with the same number, treating it as a quality duplicate. On a multi-disc release where every disc has tracks 1..N, that collapses the album to one disc's worth of files BEFORE the matcher runs. User's 18 loose disc-1+disc-2 files reduced to 9 before any title/disc info was even consulted. 2. Match scoring ignored disc_number. The 30% track-number bonus fired whenever `ft[track_number] == track_num` regardless of disc. File with tag (disc=2, track=6, "Auntie Diaries", 281s) got the full bonus matching API track (disc=1, track=6, "Rich Interlude", 103s) — wrong file → wrong destination → integrity check correctly rejected and quarantined the file. Same for tracks 7, 8, 9. Fix: - Dedup keys on `(disc_number, track_number)` tuples — multi-disc files with parallel numbering all survive. - Match scoring's 30% bonus only when BOTH disc AND track agree. Cross-disc same-track-number collisions get a small 5% consolation bonus so title similarity has to carry the match (covers cases where tag disc info is missing or wrong). - API track disc_number read from `disc_number` (Spotify) / `disk_number` (Deezer) / `discNumber` (iTunes) defaulting to 1. 4 new pinning tests in `tests/imports/test_auto_import_multi_disc_matching.py`: - 18-file 2-disc regression case (dedup preserves all) - (disc=2, track=6) file matches API (disc=2, track=6) track, not the disc-1 same-numbered track - Single-disc albums still match normally (no regression) - Quality dedup within a single (disc, track) position still picks higher-quality format (.flac over .mp3) Verification: - 2268 full pytest suite passes (+4 new), 1 skipped, 0 failed - Ruff clean Same branch as the #524 fix because both surfaced from the same import session — easier reviewer context if they ship together.	1 month ago
Broque Thomas	de348981a5	Surface silent exceptions in import pipeline — 11 sites - imports/side_effects.py: 8 sites (post-import cleanup paths, thumbnail+lyrics pulls, Plex refresh) - auto_import_worker.py: 3 sites (queue/dedup helpers) All converted to `logger.debug("...: %s", e)`. Refs #369	1 month ago
Broque Thomas	03a7ccd74a	Rename unused loop var to silence ruff B007 `sub_name` is unused — the recursion only needs the path. Rename to `_sub_name` to satisfy ruff's B007 check.	1 month ago
Broque Thomas	cdd408b6f3	Auto-import: live card updates + multi-disc + featured-artist tag fixes The 'Live Per-Track Progress' work shipped a backend in-progress row + top-of-tab progress text but the history cards themselves stayed visually stale during processing — lowercase "processing" badge, neutral styling, no per-track hint. Smoke-testing also surfaced two latent identification bugs that prevented multi-disc rips with features (Kendrick GKMC Deluxe) from importing at all. Card-level live progress (`webui/static/stats-automations.js`): - Cache `/api/auto-import/status` response in `_autoImportLastStatus`; poller awaits status before re-rendering results so the card has the live data. - Add 'processing' entries to statusLabels / statusIcons / statusClass. - When card folder_name matches `current_folder`, swap the meta line to `track N/M: <track name>` and tag the matching row in the expanded list as `auto-import-track-row-active`; prior rows tag as `-row-done`. Card styling (`webui/static/style.css`): - `.auto-import-processing` blue left border, `.auto-import-badge-processing` pulse animation, active/done track-row classes. Multi-disc enumeration (`core/auto_import_worker.py:_scan_directory`): - Old code skipped disc folders during recursion AND only attached them to a parent that had its own loose audio. A folder containing only `Disc 1/`, `Disc 2/` was invisible. Now: when a directory has only disc subdirs and no loose audio, treat that directory itself as the album candidate. Disc folders still skipped when standing alone. - Add `FolderCandidate.is_staging_root` flag (set when the staging dir itself becomes the candidate via this path) so identification can refuse to use the meaningless folder name. Tag identification (`core/auto_import_worker.py:_identify_from_tags`): - Per-track `artist` tag fragmented consensus on albums with features ("Kendrick Lamar" / "Kendrick Lamar, Drake" / "Kendrick Lamar, Dr. Dre" produced 3 separate `(album, artist)` keys for one album). Now group by album first, then pick the most-common artist within that album group. - `_read_file_tags` now prefers `albumartist` over `artist` for album-level identity; falls back to `artist` for files without albumartist. - Add INFO-level log when tag identification rejects, showing top albums and their counts so the user can diagnose multi-disc / tagging issues. Folder-name false-match guard (`core/auto_import_worker.py:_identify_folder`): - When `is_staging_root` is set, skip the folder-name strategy entirely. Logs the skip and falls through to AcoustID. Without this, dropping disc folders directly into staging caused the scanner to search the metadata source for the literal name "Staging", which false-matched against random albums (e.g. "Stamina, Dinos" — a French rap album — at 13% confidence). What's New entries added under 2.4.2 dev cycle.	1 month ago
Broque Thomas	783c543c3e	Auto-import: live per-track progress + in-progress history row User reported (Mushy / generally) that dropping an album into the staging folder left the auto-import history blank for the entire processing window — sometimes 5+ minutes for a full album. Pre- existing UX gap, not caused by the recent context-builder refactor. Two root causes: 1. ``_record_result`` only fired AFTER ``_process_matches`` returned. For a 14-track album with ~30s/track post-processing, that meant ~7 minutes of zero rows in auto_import_history → nothing for ``/api/auto-import/results`` to return → empty UI. 2. ``_current_status`` only ever transitioned between 'idle' and 'scanning' — never 'processing'. ``get_status()`` had no per- track index/name fields, so the UI had no way to render "Processing track 3/14: Mine" even if it wanted to. Fix: - New ``_record_in_progress`` inserts a status='processing' row up-front (before the per-track loop starts) so the UI sees the import the moment it begins. Returns the row id. - New ``_finalize_result`` updates that same row with the final outcome (completed/failed) when processing finishes. One row per album, not per track — keeps the history list clean. - Both share ``_serialize_match_data`` (extracted from the original ``_record_result``) so the in-progress row carries the same match payload shape the existing review UI already understands. - ``_process_matches`` updates ``_current_track_index``, ``_current_track_total``, and ``_current_track_name`` BEFORE each per-track callback fires, so a polling UI sees consistent "processing N/M: <name>" snapshots. - ``_scan_cycle`` flips ``_current_status`` to 'processing' before the per-album loop, resets it + the per-track fields after. Defensive ``finally`` clears progress even if the inner code path raised. - ``get_status()`` exposes the new fields so the UI's existing /api/auto-import/status polling picks them up. - Frontend (stats-automations.js): renders the new ``current_status='processing'`` state with track index/total/name in the existing progress bar element. New 'processing' status class for styling parity with 'scanning'. 8 regression tests in tests/imports/test_auto_import_live_progress.py: - get_status surfaces the new fields with sane defaults - track_index advances 1, 2, 3 during a 3-track loop - track_total set BEFORE the first callback fires (no '1/0' flicker) - _record_in_progress writes status='processing' with no processed_at - _finalize_result updates the same row to completed + processed_at, no second insert - _finalize_result with failed status leaves processed_at NULL - _finalize_result with row_id=None is a safe no-op - Per-track fields cleared by _scan_cycle's finally block Full pytest 1643 passed; ruff clean.	1 month ago
Broque Thomas	fd014e2745	Use parent folder name as artist override in auto-import When staging files are organized as Artist/Albums/AlbumFolder or Artist/AlbumFolder, the auto-import now uses the parent folder name as the artist instead of trusting embedded file tags. Uses relative path from staging root to determine folder depth, so albums directly in staging root don't accidentally pick up container paths as artist names. Common category subfolder names (Albums, Singles, EPs, Mixtapes, etc.) are recognized and skipped. Fixes mixtapes and compilations where file tags have DJ names or incorrect artists (e.g. files tagged as "Slim" in a 2Pac folder).	2 months ago
Broque Thomas	6d5538de74	Fix single import: prefer tag data over weak metadata/AcoustID matches Files with embedded tags (artist+title from post-processing) were failing import because the metadata search scored low (66%) and the AcoustID result returned before the tag-preference code could run. - Tag-based identification now returns 85% confidence when embedded tags have an artist field, borrowing album art from weak metadata - AcoustID search result only accepted at 80%+ confidence, otherwise kept as fallback (doesn't short-circuit past tag preference) - AcoustID None artist/title falls back to tag data via 'or' operator - Stop retrying failed/unidentified items every scan cycle	2 months ago
Broque Thomas	7bad4a4fa9	Stop auto-import retrying failed/unidentified items every scan cycle Items with status needs_identification, failed, or rejected were not in the skip list, causing them to be re-scanned and re-logged every 60 seconds indefinitely. Now skips all terminal statuses.	2 months ago
Broque Thomas	bbf5af1ce1	Fix auto-import rescan race condition, coverage penalty, and UI Race condition: scanner re-scanned folders while post-processing was still moving files, causing partial matches and ghost failures. Now tracks in-progress paths and skips them on subsequent scans. Coverage penalty fix: individual tracks that match at 80%+ confidence now auto-import even when overall album coverage is low (e.g. 2 of 18 tracks present). Previously low coverage killed the entire import. Import page: stats bar, filter pills, Scan Now, Approve All, Clear History (clears imported + failed), live scan progress.	2 months ago
Broque Thomas	a2e3ce8000	Fix auto-import track numbers, dates, cover art, and track name display - Track numbers defaulted to 1 instead of using metadata source values - Release dates not captured, causing missing year in path templates - Cover art missing for Deezer (direct image_url not checked) - Track names in expanded view showed Unknown (wrong JSON field name) - Read year/date from embedded file tags as fallback - Add Deezer get_album_metadata/get_album_tracks fallbacks - Handle Deezer tracks.data response format	2 months ago
Broque Thomas	d2c6979ce4	Recursive staging scan, singles support, and improved import UI Auto-import now scans the staging folder recursively — any folder structure depth works (Artist/Album/tracks, Album/tracks, etc.). Loose audio files are treated as singles with tag/filename/AcoustID identification. Import results UI redesigned: - Click cards to expand per-track match details with confidence scores - Shows identification method badge (Tags, Folder Name, AcoustID) - Per-track grid: track name, matched filename, confidence percentage - Time ago labels, folder path, better status badges - Approve/Dismiss buttons use event.stopPropagation for clean UX	2 months ago
Broque Thomas	d66adb3c6e	Add single file support to auto-import worker Loose audio files in the staging root are now picked up alongside album folders. Singles are identified via embedded tags, filename parsing (Artist - Title.ext), or AcoustID fingerprinting, then matched against the configured metadata source. Confidence-gated processing applies the same way as album folders (90%+ auto, 70-90% review, <70% manual).	2 months ago
Broque Thomas	308773ea7c	Add Auto-Import — background staging folder watcher with smart matching Full auto-import pipeline: background worker watches the staging folder, identifies music using embedded tags → folder name parsing → AcoustID fingerprinting, matches files to metadata source tracklists, and processes high-confidence matches through the existing post-processing pipeline automatically. Worker: AutoImportWorker with start/stop/pause/resume, configurable scan interval (default 60s), confidence threshold (default 90%), and auto-process toggle. Processes one folder per cycle, alphabetical order. Disc folder detection, stability checking, content hash dedup. Confidence gate: 90%+ auto-processes silently, 70-90% queued as pending review with approve/dismiss actions, <70% flagged for manual identification. Track matching uses weighted algorithm (title 45%, artist 15%, track number 30%, album tag 10%). Database: auto_import_history table tracks every scan result with folder hash, match data JSON, confidence, status, timestamps. API: 7 endpoints — status, toggle, settings (GET/POST), results (filtered/paginated), approve, reject. UI: Auto tab on Import page with enable toggle, confidence slider, scan interval selector. Live result cards with album art, confidence bar (green/yellow/red), status badges, match stats. 5-second polling.	2 months ago

14 Commits (f9f74ac51157166e01a2513993eaea2add079dec)