Cin-pass on the #524 + multi-disc fixes. Pre-merge polish.
Lifts: `core/imports/album_matching.py`
`AutoImportWorker._match_tracks` was a 100+-line method buried in a
1400-line class. Testing it required monkey-patching `_read_file_tags`
+ mocking the metadata client just to exercise the matching algorithm.
Per Cin's "lift logic out of monolithic classes" pattern (same shape
as the album-info builders / discography / quality scanner lifts),
moved the dedup + scoring into `core/imports/album_matching.py` as
pure functions over already-fetched data.
Helper exposes:
- Constants for every match weight (TITLE_WEIGHT, ARTIST_WEIGHT,
POSITION_WEIGHT, NEAR_POSITION_WEIGHT, CROSS_DISC_POSITION_WEIGHT,
ALBUM_WEIGHT, MATCH_THRESHOLD). Magic numbers killed.
- `dedupe_files_by_position(audio_files, file_tags, *, quality_rank)` —
position-keyed quality dedup.
- `score_file_against_track(file_path, file_tags, track, *,
target_album, similarity)` — pure per-(file, track) scorer.
- `match_files_to_tracks(audio_files, file_tags, tracks, *,
target_album, similarity, quality_rank)` — full matching with
greedy best-per-track + first-come-first-serve over deduped files.
Worker shrinks from 100 lines of inline algorithm to 8 lines that
fetch tags + delegate to the helper.
Tests added (26 new across 3 files):
`tests/imports/test_album_matching_helper.py` (19 tests):
- Constants pin: weights sum to 1.0, threshold above position-only
- `dedupe_files_by_position`: quality wins, cross-disc preserved,
tag-less files passed through, first-wins on equal quality
- `score_file_against_track`: perfect-agreement = 1.0, position
needs both disc+track, near-position only same-disc, missing
artist tags handled, disc field aliases (Spotify/Deezer/iTunes),
filename fallback when title tag missing
- `match_files_to_tracks`: happy path, file used at-most-once,
below-threshold left unmatched
- Edge case Cin would flag: tag-less file with strong filename title
matches multi-disc album track via title alone (perfect-name
scenario works); tag-less file with weak filename title against
multi-disc API correctly stays unmatched (the behavior delta from
the disc-aware fix — pinned so future readers see it's intentional)
`tests/test_import_album_match_endpoint.py` (3 tests):
- Backend warning fires when source missing from match POST
- No warning fires on the legit path (catches noisy-warning regression)
- Endpoint actually forwards source/name/artist to the payload
builder (catches "logging the right warning but doing the wrong
lookup" regression)
`tests/test_import_page_album_lookup_pattern.py` (4 tests):
- Source-text guard for the import-page #524 fix in stats-automations.js.
Until the file is modularized enough for a behavioral JS test (under
the existing tests/static/*.mjs pattern), regex-based assertions pin:
the `_albumLookup` field exists, the click handler reads from it,
both card renderers populate it before emitting onclick, and the
cache stores `source` per entry. Caveat documented in the test
module docstring.
Verification:
- All 26 new tests pass.
- Existing multi-disc tests (test_auto_import_multi_disc_matching.py)
still pass after the lift — proves the helper is behavior-equivalent
to the inline implementation it replaced.
- Full suite: 2293 passed, 1 flaky-timing failure
(test_library_reorganize_orchestrator.py::test_watchdog_warns_about_stuck_workers
— passes in isolation, fails only in full-suite runs, pre-existing,
unrelated to this PR).
- Ruff clean.
Notes for the reviewer:
- The frontend stats-automations.js JS test is structural-only.
Behavioral JS testing for that file requires modularizing the
~7k-line monolith first — out of scope for this fix.
- The cross-disc 5% consolation bonus is a small behavior change for
users with weak/missing tag info on multi-disc albums. Pinned
explicitly in `test_tagless_file_with_weak_title_unmatched_in_multidisc`
so the trade-off is visible: correct multi-disc matching wins over
optimistic position-only matching that produced wrong-disc files.
Caught while live-testing the #524 fix with kendrick lamar
mr morale & the big steppers (3 discs). User dropped discs 1+2
loose in staging root + disc 3 in its own folder, every file
perfectly tagged with disc_number/track_number/title — only 9
tracks ended up in the library, the rest got integrity-rejected
and quarantined.
Two related bugs in `AutoImportWorker._match_tracks`:
1. **Quality dedup keyed on track_number alone.** The dedup loop
kept `seen_track_nums[track_number] = file` and dropped any later
file with the same number, treating it as a quality duplicate.
On a multi-disc release where every disc has tracks 1..N, that
collapses the album to one disc's worth of files BEFORE the
matcher runs. User's 18 loose disc-1+disc-2 files reduced to 9
before any title/disc info was even consulted.
2. **Match scoring ignored disc_number.** The 30% track-number bonus
fired whenever `ft[track_number] == track_num` regardless of disc.
File with tag (disc=2, track=6, "Auntie Diaries", 281s) got the
full bonus matching API track (disc=1, track=6, "Rich Interlude",
103s) — wrong file → wrong destination → integrity check correctly
rejected and quarantined the file. Same for tracks 7, 8, 9.
Fix:
- Dedup keys on `(disc_number, track_number)` tuples — multi-disc
files with parallel numbering all survive.
- Match scoring's 30% bonus only when BOTH disc AND track agree.
Cross-disc same-track-number collisions get a small 5% consolation
bonus so title similarity has to carry the match (covers cases
where tag disc info is missing or wrong).
- API track disc_number read from `disc_number` (Spotify) /
`disk_number` (Deezer) / `discNumber` (iTunes) defaulting to 1.
4 new pinning tests in `tests/imports/test_auto_import_multi_disc_matching.py`:
- 18-file 2-disc regression case (dedup preserves all)
- (disc=2, track=6) file matches API (disc=2, track=6) track, not
the disc-1 same-numbered track
- Single-disc albums still match normally (no regression)
- Quality dedup within a single (disc, track) position still picks
higher-quality format (.flac over .mp3)
Verification:
- 2268 full pytest suite passes (+4 new), 1 skipped, 0 failed
- Ruff clean
Same branch as the #524 fix because both surfaced from the same
import session — easier reviewer context if they ship together.
The 'Live Per-Track Progress' work shipped a backend in-progress row + top-of-tab
progress text but the history cards themselves stayed visually stale during
processing — lowercase "processing" badge, neutral styling, no per-track hint.
Smoke-testing also surfaced two latent identification bugs that prevented
multi-disc rips with features (Kendrick GKMC Deluxe) from importing at all.
Card-level live progress (`webui/static/stats-automations.js`):
- Cache `/api/auto-import/status` response in `_autoImportLastStatus`; poller
awaits status before re-rendering results so the card has the live data.
- Add 'processing' entries to statusLabels / statusIcons / statusClass.
- When card folder_name matches `current_folder`, swap the meta line to
`track N/M: <track name>` and tag the matching row in the expanded list
as `auto-import-track-row-active`; prior rows tag as `-row-done`.
Card styling (`webui/static/style.css`):
- `.auto-import-processing` blue left border, `.auto-import-badge-processing`
pulse animation, active/done track-row classes.
Multi-disc enumeration (`core/auto_import_worker.py:_scan_directory`):
- Old code skipped disc folders during recursion AND only attached them to a
parent that had its own loose audio. A folder containing only `Disc 1/`,
`Disc 2/` was invisible. Now: when a directory has only disc subdirs and no
loose audio, treat that directory itself as the album candidate. Disc folders
still skipped when standing alone.
- Add `FolderCandidate.is_staging_root` flag (set when the staging dir itself
becomes the candidate via this path) so identification can refuse to use the
meaningless folder name.
Tag identification (`core/auto_import_worker.py:_identify_from_tags`):
- Per-track `artist` tag fragmented consensus on albums with features
("Kendrick Lamar" / "Kendrick Lamar, Drake" / "Kendrick Lamar, Dr. Dre"
produced 3 separate `(album, artist)` keys for one album). Now group by
album first, then pick the most-common artist within that album group.
- `_read_file_tags` now prefers `albumartist` over `artist` for album-level
identity; falls back to `artist` for files without albumartist.
- Add INFO-level log when tag identification rejects, showing top albums and
their counts so the user can diagnose multi-disc / tagging issues.
Folder-name false-match guard (`core/auto_import_worker.py:_identify_folder`):
- When `is_staging_root` is set, skip the folder-name strategy entirely. Logs
the skip and falls through to AcoustID. Without this, dropping disc folders
directly into staging caused the scanner to search the metadata source for
the literal name "Staging", which false-matched against random albums (e.g.
"Stamina, Dinos" — a French rap album — at 13% confidence).
What's New entries added under 2.4.2 dev cycle.
User reported (Mushy / generally) that dropping an album into the
staging folder left the auto-import history blank for the entire
processing window — sometimes 5+ minutes for a full album. Pre-
existing UX gap, not caused by the recent context-builder refactor.
Two root causes:
1. ``_record_result`` only fired AFTER ``_process_matches`` returned.
For a 14-track album with ~30s/track post-processing, that meant
~7 minutes of zero rows in auto_import_history → nothing for
``/api/auto-import/results`` to return → empty UI.
2. ``_current_status`` only ever transitioned between 'idle' and
'scanning' — never 'processing'. ``get_status()`` had no per-
track index/name fields, so the UI had no way to render
"Processing track 3/14: Mine" even if it wanted to.
Fix:
- New ``_record_in_progress`` inserts a status='processing' row
up-front (before the per-track loop starts) so the UI sees the
import the moment it begins. Returns the row id.
- New ``_finalize_result`` updates that same row with the final
outcome (completed/failed) when processing finishes. One row per
album, not per track — keeps the history list clean.
- Both share ``_serialize_match_data`` (extracted from the original
``_record_result``) so the in-progress row carries the same match
payload shape the existing review UI already understands.
- ``_process_matches`` updates ``_current_track_index``,
``_current_track_total``, and ``_current_track_name`` BEFORE each
per-track callback fires, so a polling UI sees consistent
"processing N/M: <name>" snapshots.
- ``_scan_cycle`` flips ``_current_status`` to 'processing' before
the per-album loop, resets it + the per-track fields after.
Defensive ``finally`` clears progress even if the inner code path
raised.
- ``get_status()`` exposes the new fields so the UI's existing
/api/auto-import/status polling picks them up.
- Frontend (stats-automations.js): renders the new
``current_status='processing'`` state with track index/total/name
in the existing progress bar element. New 'processing' status
class for styling parity with 'scanning'.
8 regression tests in tests/imports/test_auto_import_live_progress.py:
- get_status surfaces the new fields with sane defaults
- track_index advances 1, 2, 3 during a 3-track loop
- track_total set BEFORE the first callback fires (no '1/0' flicker)
- _record_in_progress writes status='processing' with no
processed_at
- _finalize_result updates the same row to completed +
processed_at, no second insert
- _finalize_result with failed status leaves processed_at NULL
- _finalize_result with row_id=None is a safe no-op
- Per-track fields cleared by _scan_cycle's finally block
Full pytest 1643 passed; ruff clean.
When staging files are organized as Artist/Albums/AlbumFolder or
Artist/AlbumFolder, the auto-import now uses the parent folder name
as the artist instead of trusting embedded file tags.
Uses relative path from staging root to determine folder depth, so
albums directly in staging root don't accidentally pick up container
paths as artist names. Common category subfolder names (Albums,
Singles, EPs, Mixtapes, etc.) are recognized and skipped.
Fixes mixtapes and compilations where file tags have DJ names or
incorrect artists (e.g. files tagged as "Slim" in a 2Pac folder).
Files with embedded tags (artist+title from post-processing) were
failing import because the metadata search scored low (66%) and the
AcoustID result returned before the tag-preference code could run.
- Tag-based identification now returns 85% confidence when embedded
tags have an artist field, borrowing album art from weak metadata
- AcoustID search result only accepted at 80%+ confidence, otherwise
kept as fallback (doesn't short-circuit past tag preference)
- AcoustID None artist/title falls back to tag data via 'or' operator
- Stop retrying failed/unidentified items every scan cycle
Items with status needs_identification, failed, or rejected were not
in the skip list, causing them to be re-scanned and re-logged every
60 seconds indefinitely. Now skips all terminal statuses.
Race condition: scanner re-scanned folders while post-processing was
still moving files, causing partial matches and ghost failures. Now
tracks in-progress paths and skips them on subsequent scans.
Coverage penalty fix: individual tracks that match at 80%+ confidence
now auto-import even when overall album coverage is low (e.g. 2 of 18
tracks present). Previously low coverage killed the entire import.
Import page: stats bar, filter pills, Scan Now, Approve All, Clear
History (clears imported + failed), live scan progress.
- Track numbers defaulted to 1 instead of using metadata source values
- Release dates not captured, causing missing year in path templates
- Cover art missing for Deezer (direct image_url not checked)
- Track names in expanded view showed Unknown (wrong JSON field name)
- Read year/date from embedded file tags as fallback
- Add Deezer get_album_metadata/get_album_tracks fallbacks
- Handle Deezer tracks.data response format
Loose audio files in the staging root are now picked up alongside album
folders. Singles are identified via embedded tags, filename parsing
(Artist - Title.ext), or AcoustID fingerprinting, then matched against
the configured metadata source. Confidence-gated processing applies
the same way as album folders (90%+ auto, 70-90% review, <70% manual).
Full auto-import pipeline: background worker watches the staging folder,
identifies music using embedded tags → folder name parsing → AcoustID
fingerprinting, matches files to metadata source tracklists, and
processes high-confidence matches through the existing post-processing
pipeline automatically.
Worker: AutoImportWorker with start/stop/pause/resume, configurable
scan interval (default 60s), confidence threshold (default 90%), and
auto-process toggle. Processes one folder per cycle, alphabetical
order. Disc folder detection, stability checking, content hash dedup.
Confidence gate: 90%+ auto-processes silently, 70-90% queued as
pending review with approve/dismiss actions, <70% flagged for manual
identification. Track matching uses weighted algorithm (title 45%,
artist 15%, track number 30%, album tag 10%).
Database: auto_import_history table tracks every scan result with
folder hash, match data JSON, confidence, status, timestamps.
API: 7 endpoints — status, toggle, settings (GET/POST), results
(filtered/paginated), approve, reject.
UI: Auto tab on Import page with enable toggle, confidence slider,
scan interval selector. Live result cards with album art, confidence
bar (green/yellow/red), status badges, match stats. 5-second polling.