From ca5c93162c3ef7eb34787beda81e8cb29c0bf258 Mon Sep 17 00:00:00 2001 From: Broque Thomas <26755000+Nezreka@users.noreply.github.com> Date: Wed, 6 May 2026 21:18:20 -0700 Subject: [PATCH] Rewrite Library Reorganize job to delegate to per-album planner MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit GitHub issue #500 (@bafoed). Library Reorganize repair job moved album tracks to single-template paths because of a fragile classification heuristic. Concrete symptom: a track at ``Surf Curse/Surf Curse - Nothing Yet (2017)/01 - Christine F.flac`` got proposed for a move to ``Surf Curse/Surf Curse - Christine F/Surf Curse - Christine F.flac`` (single template) instead of staying under the album folder. Root cause: the job had its own tag-reading + transfer-folder-walk + template-application implementation. The classification was ``is_album = (group_size > 1)`` where ``group_size`` was the count of same-album tracks currently sitting in the transfer folder being scanned. Two failure modes: - only one track of an album was in the transfer folder (rest already moved to the library, or not yet downloaded), or - album tags varied slightly across tracks (e.g. ``"Buds"`` vs ``"Buds (Bonus)"``) Either case gave a 1-element group → routed through the SINGLE template → wrong destination. Rewrite — delegate to the per-album planner the artist-detail "Reorganize" modal already uses: - ``core.library_reorganize.preview_album_reorganize`` for path computation (DB-driven, knows the album has N tracks regardless of how many sit in transfer; album-vs-single is structurally correct) - ``core.reorganize_queue.enqueue_many`` for apply mode; the queue worker dispatches via ``reorganize_album`` which handles file move + post-processing + DB update + sidecar through the same code path the per-album modal uses Job's per-album loop: - iterate albums for the active media server only (matches the artist- detail modal's scope; multi-server users won't have the job touch the inactive server's files at paths they can't see) - preview each album, catch exceptions per-album so one bad row doesn't abort the scan - branch on planner status: - ``no_album`` / ``no_tracks`` (race: album deleted mid-scan) → skip silently - ``no_source_id`` (album never enriched) → emit ONE album-level "needs enrichment first" finding (vs N per-track findings cluttering the UI) - ``planned`` → filter mismatched tracks (matched + new_path + not unchanged + file_exists), emit per-track findings (dry-run) or collect album for bulk enqueue (apply) - bulk enqueue at end of loop using the queue's correct return-shape (``{'enqueued': N, 'already_queued': M, 'total': K}``) What's gone (~500 LOC): - ``_read_tag_metadata`` / ``_get_audio_quality`` / transfer-folder walk - ``_load_album_years`` / ``_lookup_years_from_api`` (planner does this) - ``_apply_path_template`` / ``_build_path_from_template`` - direct ``shutil.move`` + sidecar move logic (queue handles) - the fragile ``is_album = group_size > 1`` heuristic — structurally gone - ``move_sidecars`` setting (no longer applicable; queue's post-process re-downloads cover art at the destination) What stays: - dry-run vs apply toggle - ``file_organization.enabled`` gate - stop / pause respect - progress reporting - findings for the UI Cleaner separation of concerns: - this job: DB-known tracks at wrong paths (active server only) - ``orphan_file_detector``: files on disk with no DB entry - ``dead_file_cleaner``: DB entries pointing to nonexistent files Tests: 12 tests in ``tests/test_library_reorganize.py`` pin the delegation contract — every status branch, every track-filter case, exception handling, apply-mode enqueue payload, active-server scope, estimate-scope shape. Three obsolete ``_lookup_years_*`` tests removed (year handling moved to planner). Closes #500 (the misclassification half — orphan + dead-file are downstream sync-gap symptoms, separate concern). --- core/repair_jobs/library_reorganize.py | 1079 +++++++----------------- tests/test_library_reorganize.py | 529 ++++++++++-- webui/static/helper.js | 15 + 3 files changed, 772 insertions(+), 851 deletions(-) diff --git a/core/repair_jobs/library_reorganize.py b/core/repair_jobs/library_reorganize.py index 7ab4aa9b..e984c97e 100644 --- a/core/repair_jobs/library_reorganize.py +++ b/core/repair_jobs/library_reorganize.py @@ -1,247 +1,51 @@ """Library Reorganize Job — moves files to match the current file organization template. +Pre-rewrite this job had its own tag-reading + transfer-folder-walk + +template-application implementation that worked off file tags. The +classification heuristic ``is_album = (group_size > 1)`` (where +``group_size`` was the count of tracks for the same album currently +sitting in the transfer folder being scanned) misclassified album +tracks as singles whenever: + +- only one track of an album lived in the transfer folder (the rest + already moved to the library, or not yet downloaded), or +- album tags varied slightly across tracks (e.g. "Buds" vs + "Buds (Bonus)"), splitting them into separate single-element groups. + +Result: those tracks got routed through the SINGLE template and ended +up at e.g. ``Surf Curse/Surf Curse - Christine F/Surf Curse - Christine F.flac`` +instead of the album destination. + +GitHub issue #500 (@bafoed). Fix: delegate to the per-album planner +(``core.library_reorganize.preview_album_reorganize`` / +``reorganize_album``) the per-album reorganize modal already uses. The +planner is DB-driven — it knows the album has multiple tracks +regardless of how many currently sit in the transfer folder, so the +album-vs-single classification is structurally correct. + +Apply mode delegates to ``core.reorganize_queue`` so the actual file +move + post-processing + DB update + sidecar handling all flow +through the same code path the per-album modal uses. No second move +implementation to keep in sync. + Safety design: -- Dry run mode is ON by default. The job only creates findings (reports) showing - what WOULD move. The user must explicitly disable dry_run in job settings. -- The job is disabled by default (default_enabled=False) so it never runs - automatically unless the user explicitly enables it. -- Case-insensitive path comparison on Windows prevents false moves. -- Destination collision check prevents overwriting existing files. -- Files without usable tags are skipped, not guessed at. -- Moves are always within the transfer folder; cannot escape to parent dirs. +- Dry run mode is ON by default. Disabled by user explicitly. +- Job is disabled by default — never auto-runs unless user enables. +- Only DB-known tracks are considered. Files in transfer with no DB + entry are handled by the separate ``orphan_file_detector`` job. +- Albums with no matching metadata source ID are skipped with a + clear "needs enrichment first" finding rather than guessed at. """ -import json import os -import re -import shutil -import sys +from typing import Optional -from core.metadata_service import get_client_for_source, get_primary_source, get_source_priority from core.repair_jobs import register_job from core.repair_jobs.base import JobContext, JobResult, RepairJob from utils.logging_config import get_logger logger = get_logger("repair_job.library_reorganize") -AUDIO_EXTENSIONS = {'.mp3', '.flac', '.ogg', '.opus', '.m4a', '.aac', '.wav', '.wma', '.aiff', '.aif'} -SIDECAR_EXTENSIONS = {'.lrc', '.jpg', '.jpeg', '.png', '.nfo', '.txt', '.cue'} -ALBUM_SIDECAR_NAMES = { - 'cover.jpg', 'cover.jpeg', 'cover.png', - 'folder.jpg', 'folder.jpeg', 'folder.png', - 'album.jpg', 'album.jpeg', 'album.png', - 'front.jpg', 'front.jpeg', 'front.png', - 'thumb.jpg', 'thumb.png', -} - -# Windows and macOS use case-insensitive filesystems by default -_CASE_INSENSITIVE = sys.platform in ('win32', 'darwin') - - -def _paths_equivalent(path_a: str, path_b: str) -> bool: - """Compare two paths, case-insensitive on Windows/macOS.""" - a = os.path.normpath(path_a) - b = os.path.normpath(path_b) - if _CASE_INSENSITIVE: - return a.lower() == b.lower() - return a == b - - -def _sanitize_filename(filename: str) -> str: - """Sanitize filename for file system compatibility.""" - sanitized = re.sub(r'[<>:"/\\|?*]', '_', filename) - sanitized = re.sub(r'\s+', ' ', sanitized).strip() - sanitized = sanitized.rstrip('. ') or '_' - if re.match(r'^(CON|PRN|AUX|NUL|COM[1-9]|LPT[1-9])(\.|$)', sanitized, re.IGNORECASE): - sanitized = '_' + sanitized - return sanitized[:200] - - -def _sanitize_context_values(context: dict) -> dict: - """Sanitize all string values for path safety. - - Empty strings are preserved so that template cleanup regexes can - remove surrounding decorators (e.g. ``($year)`` → ``()`` → removed). - """ - sanitized = {} - for key, value in context.items(): - if isinstance(value, str): - sanitized[key] = _sanitize_filename(value) if value else '' - else: - sanitized[key] = value - return sanitized - - -def _apply_path_template(template: str, context: dict) -> str: - """Apply template variables to build a path string.""" - clean = _sanitize_context_values(context) - # $cdnum — smart CD label. "CD01"/"CD02" only when multi-disc, empty for - # single-disc (collapses cleanly via the double-dash regex at the end). - _total_discs = int(clean.get('total_discs', 1) or 1) - _disc_number = int(clean.get('disc_number', 1) or 1) - cdnum_value = f"CD{_disc_number:02d}" if _total_discs > 1 else '' - - result = template - result = result.replace('${cdnum}', cdnum_value) - result = result.replace('$albumartist', clean.get('albumartist', clean.get('artist', 'Unknown Artist'))) - result = result.replace('$albumtype', clean.get('albumtype', 'Album')) - result = result.replace('$playlist', clean.get('playlist_name', '')) - result = result.replace('$artistletter', (clean.get('artist', 'U') or 'U')[0].upper()) - result = result.replace('$artist', clean.get('artist', 'Unknown Artist')) - result = result.replace('$album', clean.get('album', 'Unknown Album')) - result = result.replace('$title', clean.get('title', 'Unknown Track')) - result = result.replace('$cdnum', cdnum_value) - result = result.replace('$track', f"{clean.get('track_number', 1):02d}") - result = result.replace('$year', str(clean.get('year', ''))) - result = re.sub(r'\s+', ' ', result) - result = re.sub(r'\s*-\s*-\s*', ' - ', result) - return result.strip() - - -def _build_path_from_template(template: str, context: dict) -> tuple: - """Build (folder_path, filename_base) from a template string and context.""" - full_path = _apply_path_template(template, context) - quality_value = context.get('quality', '') - disc_value = f"{context.get('disc_number', 1):02d}" - - path_parts = full_path.split('/') - if len(path_parts) > 1: - folder_parts = path_parts[:-1] - filename_base = path_parts[-1] - - cleaned_folders = [] - for part in folder_parts: - part = part.replace('$quality', '') - part = part.replace('$disc', '') - part = re.sub(r'\s*\[\s*\]', '', part) - part = re.sub(r'\s*\(\s*\)', '', part) - part = re.sub(r'\s*\{\s*\}', '', part) - part = re.sub(r'\s*-\s*$', '', part) - part = re.sub(r'^\s*-\s*', '', part) - part = re.sub(r'\s+', ' ', part).strip() - if part: - cleaned_folders.append(part) - - filename_base = filename_base.replace('$quality', quality_value) - filename_base = filename_base.replace('$disc', disc_value) - filename_base = re.sub(r'\s*\[\s*\]', '', filename_base) - filename_base = re.sub(r'\s*\(\s*\)', '', filename_base) - filename_base = re.sub(r'\s*\{\s*\}', '', filename_base) - filename_base = re.sub(r'\s*-\s*$', '', filename_base) - # Leading dash cleanup — lets $cdnum at the start of a filename - # cleanly disappear on single-disc albums (empty-value case). - filename_base = re.sub(r'^\s*-\s*', '', filename_base) - filename_base = re.sub(r'\s+', ' ', filename_base).strip() - - sanitized_folders = [_sanitize_filename(p) for p in cleaned_folders] - folder_path = os.path.join(*sanitized_folders) if sanitized_folders else '' - return folder_path, _sanitize_filename(filename_base) - else: - full_path = full_path.replace('$quality', quality_value) - full_path = full_path.replace('$disc', disc_value) - full_path = re.sub(r'\s*\[\s*\]', '', full_path) - full_path = re.sub(r'\s*\(\s*\)', '', full_path) - full_path = re.sub(r'\s*\{\s*\}', '', full_path) - full_path = re.sub(r'\s*-\s*$', '', full_path) - full_path = re.sub(r'\s+', ' ', full_path).strip() - return '', _sanitize_filename(full_path) - - -def _get_audio_quality(file_path: str) -> str: - """Read audio file and return a quality descriptor string.""" - try: - ext = os.path.splitext(file_path)[1].lower() - if ext == '.flac': - from mutagen.flac import FLAC - audio = FLAC(file_path) - bits = audio.info.bits_per_sample - return f"FLAC {bits}bit" - elif ext == '.mp3': - from mutagen.mp3 import MP3, BitrateMode - audio = MP3(file_path) - kbps = audio.info.bitrate // 1000 - if audio.info.bitrate_mode == BitrateMode.VBR: - return "MP3-VBR" - return f"MP3-{kbps}" - elif ext in ('.m4a', '.aac', '.mp4'): - from mutagen.mp4 import MP4 - audio = MP4(file_path) - kbps = audio.info.bitrate // 1000 - return f"M4A-{kbps}" - elif ext == '.ogg': - from mutagen.oggvorbis import OggVorbis - audio = OggVorbis(file_path) - kbps = audio.info.bitrate // 1000 - return f"OGG-{kbps}" - elif ext == '.opus': - from mutagen.oggopus import OggOpus - audio = OggOpus(file_path) - kbps = audio.info.bitrate // 1000 - return f"OPUS-{kbps}" - return '' - except Exception: - return '' - - -def _read_tag_metadata(file_path: str) -> dict: - """Read artist, album, title, track_number, disc_number, year from file tags.""" - try: - from mutagen import File as MutagenFile - audio = MutagenFile(file_path, easy=True) - if audio is None: - return {} - - def first(tag_list): - if isinstance(tag_list, list) and tag_list: - return str(tag_list[0]) - if isinstance(tag_list, str): - return tag_list - return '' - - meta = {} - meta['artist'] = first(audio.get('artist', [''])) - meta['albumartist'] = first(audio.get('albumartist', [''])) or meta['artist'] - meta['album'] = first(audio.get('album', [''])) - meta['title'] = first(audio.get('title', [''])) - - # Track number: may be "3/12" format - raw_track = first(audio.get('tracknumber', ['1'])) - try: - meta['track_number'] = int(raw_track.split('/')[0]) - except (ValueError, IndexError): - meta['track_number'] = 1 - - # Disc number: may be "1/2" format - raw_disc = first(audio.get('discnumber', ['1'])) - try: - meta['disc_number'] = int(raw_disc.split('/')[0]) - except (ValueError, IndexError): - meta['disc_number'] = 1 - - # Year - raw_date = first(audio.get('date', [''])) - meta['year'] = raw_date[:4] if raw_date and len(raw_date) >= 4 else '' - - return meta - except Exception as e: - logger.debug("Failed to read tags from %s: %s", file_path, e) - return {} - - -def _remove_empty_dirs(directory: str, root: str): - """Remove empty directories up to root. Never removes root itself.""" - directory = os.path.normpath(directory) - root = os.path.normpath(root) - while directory != root and len(directory) > len(root) and directory.startswith(root): - try: - if os.path.isdir(directory) and not os.listdir(directory): - os.rmdir(directory) - directory = os.path.dirname(directory) - else: - break - except OSError: - break - @register_job class LibraryReorganizeJob(RepairJob): @@ -249,36 +53,36 @@ class LibraryReorganizeJob(RepairJob): display_name = 'Library Reorganize' description = 'Moves files to match the current file organization template (dry run by default)' help_text = ( - 'Scans your transfer folder and reads each audio file\'s tags (artist, album, title, ' - 'track number, disc number) to compute the expected file path based on your current ' - 'file organization template from Settings.\n\n' - 'Any file whose actual path doesn\'t match the expected template gets flagged. In dry ' - 'run mode (default), a finding is created showing the current and expected paths. ' - 'Disable dry run to have the job move files automatically.\n\n' - 'Safety features: case-insensitive path comparison on Windows/macOS, collision ' - 'detection, path escape prevention, and sidecar file handling (.lrc, .nfo, etc.).\n\n' + 'Iterates every album in your library and computes the expected file path for each ' + 'track using the same per-album planner the artist-detail "Reorganize" modal uses. ' + 'Any track whose current path doesn\'t match the expected path gets flagged in dry-run ' + 'mode or queued for a move in live mode.\n\n' + 'In live mode, moves are dispatched to the same reorganize queue the per-album modal ' + 'uses — file move + post-processing + DB update + sidecar handling all flow through ' + 'one code path.\n\n' + 'Albums with no matching metadata source ID are skipped — run enrichment first to ' + 'populate at least one of spotify_album_id / itunes_album_id / deezer_id.\n\n' + 'Files in the transfer folder that aren\'t tracked in the database are handled by ' + 'the separate Orphan File Detector job.\n\n' + 'Sidecars (.lrc, .jpg, .nfo, cover.jpg, etc) are handled by the underlying ' + 'reorganize queue: per-track sidecars are deleted at the source and album-level ' + 'cover art is re-downloaded fresh at the destination via the same post-processing ' + 'pipeline downloads use.\n\n' 'Settings:\n' - '- Dry Run: When enabled, only reports what would change without moving files\n' - '- Move Sidecars: Also move associated files (.lrc, .jpg, .nfo) alongside audio files' + '- Dry Run: When enabled, only reports what would change without moving files' ) icon = 'repair-icon-reorganize' default_enabled = False default_interval_hours = 168 # Weekly — but disabled by default so won't auto-run default_settings = { 'dry_run': True, - 'move_sidecars': True, } auto_fix = True def scan(self, context: JobContext) -> JobResult: result = JobResult() - transfer = context.transfer_folder - if not os.path.isdir(transfer): - logger.warning("Transfer folder does not exist: %s", transfer) - return result - - # Get template config cm = context.config_manager + if not cm: logger.error("No config manager available") return result @@ -286,603 +90,322 @@ class LibraryReorganizeJob(RepairJob): if not cm.get('file_organization.enabled', True): logger.info("File organization is disabled — skipping reorganize") if context.report_progress: - context.report_progress(phase='Skipped — file organization disabled', - log_line='File organization is disabled in settings', - log_type='skip') + context.report_progress( + phase='Skipped — file organization disabled', + log_line='File organization is disabled in settings', + log_type='skip', + ) return result - templates = cm.get('file_organization.templates', {}) - album_template = templates.get('album_path', '$albumartist/$albumartist - $album/$track - $title') - single_template = templates.get('single_path', '$artist/$artist - $title/$title') - disc_label = cm.get('file_organization.disc_label', 'Disc') - logger.info(f"Library Reorganize templates — album: '{album_template}', single: '{single_template}' (raw config: {templates})") - dry_run = self._get_setting(context, 'dry_run', True) - move_sidecars = self._get_setting(context, 'move_sidecars', True) - if context.report_progress: - mode_label = 'DRY RUN' if dry_run else 'LIVE' - context.report_progress(phase=f'Scanning files ({mode_label})...', - log_line=f'Mode: {mode_label} — Scanning {transfer}', - log_type='info') + # Imports kept inside scan() so the module can be imported in + # contexts that don't have web_server's Flask app booted. + from core.imports.paths import build_final_path_for_track + from core.library.path_resolver import resolve_library_file_path + from core.library_reorganize import preview_album_reorganize + + transfer_dir = context.transfer_folder + download_folder = cm.get('soulseek.download_path', '') if cm else '' + + def _resolve(file_path): + return resolve_library_file_path( + file_path, + transfer_folder=transfer_dir, + download_folder=download_folder, + config_manager=cm, + ) - # Collect all audio files - audio_files = [] - for root_dir, _dirs, files in os.walk(transfer): - if context.check_stop(): - return result - for fname in files: - ext = os.path.splitext(fname)[1].lower() - if ext in AUDIO_EXTENSIONS: - audio_files.append(os.path.join(root_dir, fname)) + # Scope to the active media server only — the artist-detail + # reorganize modal does the same. Multi-server users (Plex + + # Jellyfin etc) shouldn't have this job touch the inactive + # server's files (different paths, likely shouldn't move). + active_server = None + try: + active_server = cm.get_active_media_server() + except Exception as exc: + logger.warning("Couldn't read active media server: %s", exc) + + album_rows = self._load_albums(context.db, active_server=active_server) + total = len(album_rows) - total = len(audio_files) if total == 0: - logger.info("No audio files found in transfer folder") + logger.info( + "No albums in DB to reorganize (active server: %s)", active_server, + ) if context.report_progress: - context.report_progress(phase='No files found', log_line='No audio files in transfer folder', - log_type='info') + context.report_progress( + phase='No albums to scan', + log_line=f'No albums for server "{active_server}" in database', + log_type='info', + ) return result if context.report_progress: - context.report_progress(phase=f'Reading tags from {total} files...', - log_line=f'Found {total} audio files', - log_type='info', scanned=0, total=total) - - # Pre-read all tags and group by album for multi-disc detection - file_tags = {} # fpath -> tags dict - album_groups = {} # (albumartist, album) -> [fpath, ...] - for fpath in audio_files: - tags = _read_tag_metadata(fpath) - file_tags[fpath] = tags - key = (tags.get('albumartist', '') or tags.get('artist', ''), - tags.get('album', '')) - if key not in album_groups: - album_groups[key] = [] - album_groups[key].append(fpath) - - # Compute total_discs per album group - album_total_discs = {} - for key, fpaths in album_groups.items(): - max_disc = max((file_tags[fp].get('disc_number', 1) for fp in fpaths), default=1) - album_total_discs[key] = max_disc - - # Pre-load album years from DB for files missing year tags - db_album_years = {} # (artist, album) -> year string - needs_year = '$year' in (album_template + single_template) - if needs_year: - db_album_years = self._load_album_years(context.db) - - # API fallback: find (artist, album) pairs still missing year, batch-lookup - if needs_year and db_album_years is not None: - missing_pairs = set() - for _fpath, tags in file_tags.items(): - year = tags.get('year', '') - if year: - continue - artist = tags.get('artist', '') or tags.get('albumartist', '') - album = tags.get('album', '') or tags.get('title', '') - if not artist or not album: - continue - key = (artist.lower(), album.lower()) - if key not in db_album_years: - missing_pairs.add((artist, album)) - - if missing_pairs: - api_years = self._lookup_years_from_api(context, missing_pairs) - db_album_years.update(api_years) - - # Track claimed destinations to detect in-batch collisions - claimed_destinations = set() - # Track src_dir -> dest_dir for post-pass sidecar cleanup - moved_dirs = {} # src_dir -> dest_dir (last used destination) - - for i, fpath in enumerate(audio_files): + mode_label = 'DRY RUN' if dry_run else 'LIVE' + context.report_progress( + phase=f'Scanning {total} albums ({mode_label})...', + log_line=f'Mode: {mode_label} — Iterating {total} albums', + log_type='info', + scanned=0, total=total, + ) + + items_to_enqueue = [] + + for i, album_row in enumerate(album_rows): if context.check_stop(): return result - if i % 50 == 0 and context.wait_if_paused(): + if i % 20 == 0 and context.wait_if_paused(): return result - result.scanned += 1 - fname = os.path.basename(fpath) - file_ext = os.path.splitext(fname)[1] + album_id = album_row['id'] + album_title = album_row['title'] or 'Unknown Album' - tags = file_tags.get(fpath, {}) - - # Skip files without minimum usable tags - title = tags.get('title', '') or '' - artist = tags.get('artist', '') or '' - if not title and not artist: - result.skipped += 1 - if context.report_progress and i % 20 == 0: - context.report_progress(scanned=i + 1, total=total, - phase=f'Processing ({i+1}/{total})...') - continue - - # Use defaults only when tags exist but are empty - artist = artist or 'Unknown Artist' - albumartist = tags.get('albumartist', '') or artist - album = tags.get('album', '') or '' - title = title or 'Unknown Track' - track_number = tags.get('track_number', 1) or 1 - disc_number = tags.get('disc_number', 1) or 1 - year = tags.get('year', '') - - # Fallback: if file tags have no year, try the DB album year - if not year and db_album_years: - year = db_album_years.get((artist.lower(), (album or title).lower()), '') - if not year: - year = db_album_years.get((albumartist.lower(), (album or title).lower()), '') - - # Read quality for $quality template variable - quality = _get_audio_quality(fpath) - - # Determine template type: album or single - album_key = (albumartist, album) - group_size = len(album_groups.get(album_key, [])) - is_album = bool(album) and group_size > 1 - total_discs = album_total_discs.get(album_key, 1) - - template_context = { - 'artist': artist, - 'albumartist': albumartist, - 'album': album or title, - 'title': title, - 'track_number': track_number, - 'disc_number': disc_number, - # total_discs lets $cdnum decide whether to emit "CDxx" or - # stay empty (single-disc albums). - 'total_discs': total_discs, - 'year': year, - 'quality': quality, - 'albumtype': 'Album', - } - - if is_album: - template = album_template - user_controls_disc = '$disc' in template - folder_path, filename_base = _build_path_from_template(template, template_context) - if folder_path and filename_base: - if total_discs > 1 and not user_controls_disc: - disc_folder = f"{disc_label} {disc_number}" - expected = os.path.join(transfer, folder_path, disc_folder, filename_base + file_ext) - else: - expected = os.path.join(transfer, folder_path, filename_base + file_ext) - else: - result.skipped += 1 - continue - else: - template = single_template - folder_path, filename_base = _build_path_from_template(template, template_context) - if folder_path and filename_base: - expected = os.path.join(transfer, folder_path, filename_base + file_ext) - else: - result.skipped += 1 - continue - - # Safety: verify destination is still inside transfer folder - expected_norm = os.path.normpath(expected) - transfer_norm = os.path.normpath(transfer) - if not expected_norm.startswith(transfer_norm + os.sep) and expected_norm != transfer_norm: - logger.warning("Computed path escapes transfer folder, skipping: %s", expected_norm) - result.skipped += 1 + try: + preview = preview_album_reorganize( + album_id=str(album_id), + db=context.db, + transfer_dir=transfer_dir, + resolve_file_path_fn=_resolve, + build_final_path_fn=build_final_path_for_track, + ) + except Exception as exc: + logger.warning( + "Reorganize preview failed for album %s ('%s'): %s", + album_id, album_title, exc, + ) + result.errors += 1 continue - actual_norm = os.path.normpath(fpath) + tracks = preview.get('tracks', []) + result.scanned += len(tracks) - # Case-insensitive comparison on Windows/macOS - if _paths_equivalent(actual_norm, expected_norm): - if context.report_progress and i % 20 == 0: - context.report_progress(scanned=i + 1, total=total, - phase=f'Processing ({i+1}/{total})...') - continue + status = preview.get('status', '') - # Check for in-batch destination collision - dest_key = expected_norm.lower() if _CASE_INSENSITIVE else expected_norm - if dest_key in claimed_destinations: + if status in ('no_album', 'no_tracks'): + # Album was deleted between the SELECT and the preview, + # or has no tracks. Nothing to do. result.skipped += 1 - if context.report_progress: - context.report_progress( - scanned=i + 1, total=total, - log_line=f'SKIP (duplicate dest): {os.path.basename(fpath)}', - log_type='skip' - ) continue - claimed_destinations.add(dest_key) - # File needs to move - if dry_run: - rel_actual = os.path.relpath(actual_norm, transfer) - rel_expected = os.path.relpath(expected_norm, transfer) - if context.create_finding: + if status == 'no_source_id': + # Can't compute destinations without a metadata source — + # skip cleanly with a single finding rather than 12 per-track + # "no source" findings that would clutter the UI. + result.skipped += len(tracks) or 1 + if dry_run and context.create_finding and tracks: inserted = context.create_finding( job_id=self.job_id, - finding_type='path_mismatch', + finding_type='album_needs_enrichment', severity='info', - entity_type='file', - entity_id=None, - file_path=fpath, - title=f'Would move: {os.path.basename(fpath)}', - description=f'From: {rel_actual}\nTo: {rel_expected}', - details={'from': rel_actual, 'to': rel_expected} + entity_type='album', + entity_id=str(album_id), + file_path=None, + title=f'Needs enrichment: {album_title}', + description=( + f"Album '{album_title}' by {preview.get('artist', '?')} " + "has no metadata source ID — run enrichment first to " + "populate at least one of spotify_album_id / " + "itunes_album_id / deezer_id / discogs_id / soul_id." + ), + details={'album_id': str(album_id), 'reason': 'no_source_id'}, ) if inserted: result.findings_created += 1 else: result.findings_skipped_dedup += 1 - if context.report_progress: + continue + + # Successful plan — count mismatched tracks + mismatched = [ + t for t in tracks + if t.get('matched') + and t.get('new_path') + and not t.get('unchanged') + and t.get('file_exists') + ] + + if not mismatched: + if context.report_progress and (i + 1) % 25 == 0: context.report_progress( scanned=i + 1, total=total, - phase=f'Dry run ({i+1}/{total})...', - log_line=f'[DRY] {os.path.basename(fpath)} -> {os.path.relpath(expected_norm, transfer)}', - log_type='info' + phase=f'Scanning ({i+1}/{total})...', ) - else: - # Actually move the file - try: - dest_dir = os.path.dirname(expected_norm) - os.makedirs(dest_dir, exist_ok=True) - - # Collision: skip if destination already exists and is a different file - if os.path.exists(expected_norm): - # On case-insensitive FS, check if it's the same file (case rename) - try: - same_file = os.path.samefile(actual_norm, expected_norm) - except (OSError, ValueError): - same_file = False - - if not same_file: - result.skipped += 1 - if context.report_progress: - context.report_progress( - scanned=i + 1, total=total, - log_line=f'SKIP (exists): {os.path.basename(fpath)}', - log_type='skip' - ) - continue - - # Same file, different case — use two-step rename to avoid - # OS refusing rename to "same" path on case-insensitive FS - if _CASE_INSENSITIVE: - tmp_path = expected_norm + '.tmp_rename' - shutil.move(actual_norm, tmp_path) - shutil.move(tmp_path, expected_norm) - else: - shutil.move(actual_norm, expected_norm) - else: - shutil.move(actual_norm, expected_norm) - - result.auto_fixed += 1 - # Record src->dest for post-pass sidecar cleanup - # For multi-disc, use album root (parent of Disc N/) as destination - sidecar_dest = dest_dir - if is_album and total_discs > 1 and '$disc' not in album_template: - sidecar_dest = os.path.dirname(dest_dir) - moved_dirs[os.path.dirname(actual_norm)] = sidecar_dest - - # Move sidecar files (LRC, cover art, etc.) - if move_sidecars: - stem = os.path.splitext(os.path.basename(actual_norm))[0] - src_dir = os.path.dirname(actual_norm) - # Track-level sidecars (same stem as audio file) - for sidecar_ext in SIDECAR_EXTENSIONS: - sidecar_src = os.path.join(src_dir, stem + sidecar_ext) - if os.path.isfile(sidecar_src): - new_stem = os.path.splitext(os.path.basename(expected_norm))[0] - sidecar_dst = os.path.join(dest_dir, new_stem + sidecar_ext) - try: - shutil.move(sidecar_src, sidecar_dst) - except Exception as se: - logger.debug("Failed to move sidecar %s: %s", sidecar_src, se) - # Album-level sidecars (cover.jpg, folder.jpg, etc.) - # For multi-disc, place in the album root (parent of Disc N/) - album_dest = dest_dir - if is_album and total_discs > 1 and '$disc' not in album_template: - album_dest = os.path.dirname(dest_dir) - os.makedirs(album_dest, exist_ok=True) - for album_sidecar in ALBUM_SIDECAR_NAMES: - sidecar_src = os.path.join(src_dir, album_sidecar) - if os.path.isfile(sidecar_src): - sidecar_dst = os.path.join(album_dest, album_sidecar) - if not os.path.exists(sidecar_dst): - try: - shutil.move(sidecar_src, sidecar_dst) - except Exception as se: - logger.debug("Failed to move album sidecar %s: %s", sidecar_src, se) - - # Update DB file_path if there's a matching track - self._update_db_path(context.db, actual_norm, expected_norm, transfer) - - # Clean up empty source directories - _remove_empty_dirs(os.path.dirname(actual_norm), transfer) - - if context.report_progress: - context.report_progress( - scanned=i + 1, total=total, - phase=f'Moving ({i+1}/{total})...', - log_line=f'Moved: {os.path.basename(fpath)}', - log_type='success' - ) - except Exception as e: - logger.error("Failed to move %s -> %s: %s", fpath, expected_norm, e) - result.errors += 1 - if context.report_progress: - context.report_progress( - scanned=i + 1, total=total, - log_line=f'ERROR: {os.path.basename(fpath)} -- {e}', - log_type='error' - ) + continue - if context.update_progress and (i + 1) % 10 == 0: + if dry_run: + # One finding per track that would move. + for t in mismatched: + try: + if context.create_finding: + inserted = context.create_finding( + job_id=self.job_id, + finding_type='path_mismatch', + severity='info', + entity_type='track', + entity_id=str(t.get('track_id') or ''), + file_path=t.get('current_path') or '', + title=f"Would move: {os.path.basename(t.get('current_path') or '') or t.get('title', '')}", + description=( + f"From: {t.get('current_path') or '?'}\n" + f"To: {t.get('new_path') or '?'}" + ), + details={ + 'from': t.get('current_path') or '', + 'to': t.get('new_path') or '', + 'album_id': str(album_id), + 'album_title': album_title, + 'source': preview.get('source'), + 'track_id': t.get('track_id'), + }, + ) + if inserted: + result.findings_created += 1 + else: + result.findings_skipped_dedup += 1 + except Exception as e: + logger.debug( + "Error creating path_mismatch finding for track %s: %s", + t.get('track_id'), e, + ) + result.errors += 1 + else: + # Apply mode: enqueue the album for the live reorganize + # queue worker. The queue handles file move + post-process + # + DB update + sidecar via the same code path the per- + # album modal uses — no second move implementation. + items_to_enqueue.append({ + 'album_id': str(album_id), + 'album_title': album_title, + 'artist_id': str(album_row.get('artist_id') or ''), + 'artist_name': preview.get('artist') or album_row.get('artist_name') or 'Unknown Artist', + 'source': preview.get('source'), + }) + + if context.update_progress and (i + 1) % 25 == 0: context.update_progress(i + 1, total) + if context.report_progress and (i + 1) % 25 == 0: + context.report_progress( + scanned=i + 1, total=total, + phase=f'{"Dry run" if dry_run else "Queueing"} ({i+1}/{total})...', + ) + + # Bulk enqueue collected items in one batch (apply mode only). + # Queue's tally shape: {'enqueued': N, 'already_queued': M, 'total': K}. + if items_to_enqueue: + try: + from core.reorganize_queue import get_queue + queue_summary = get_queue().enqueue_many(items_to_enqueue) + enqueued_count = queue_summary.get('enqueued', 0) + already_queued = queue_summary.get('already_queued', 0) + result.auto_fixed += enqueued_count + # Dedupe-skipped albums are tracked separately so the + # repair-job summary doesn't double-count. + result.skipped += already_queued + logger.info( + "Reorganize: enqueued %d albums (%d already in queue)", + enqueued_count, already_queued, + ) + except Exception as exc: + logger.error("Failed to enqueue reorganize items: %s", exc) + result.errors += 1 if context.update_progress: context.update_progress(total, total) - # Post-pass: move leftover sidecar files from directories that lost all audio - if not dry_run and move_sidecars and moved_dirs: - for src_dir, dest_dir in moved_dirs.items(): - if not os.path.isdir(src_dir): - continue - # Check if any audio files remain in this directory - has_audio = any( - os.path.splitext(f)[1].lower() in AUDIO_EXTENSIONS - for f in os.listdir(src_dir) - if os.path.isfile(os.path.join(src_dir, f)) - ) - if has_audio: - continue - # Move all remaining sidecar-type files to the destination - for f in os.listdir(src_dir): - fpath_full = os.path.join(src_dir, f) - if not os.path.isfile(fpath_full): - continue - ext = os.path.splitext(f)[1].lower() - if ext in SIDECAR_EXTENSIONS: - dst = os.path.join(dest_dir, f) - if not os.path.exists(dst): - try: - shutil.move(fpath_full, dst) - except Exception: - pass - # Try cleaning up the now-potentially-empty directory - _remove_empty_dirs(src_dir, transfer) - - mode_text = 'Dry run' if dry_run else 'Reorganize' - summary = f"{mode_text} complete: {result.scanned} scanned, {result.auto_fixed} moved, {result.findings_created} findings, {result.skipped} skipped, {result.errors} errors" + mode_text = 'Dry run' if dry_run else 'Enqueue' + summary = ( + f"{mode_text} complete: {result.scanned} tracks scanned across {total} albums, " + f"{result.auto_fixed} albums queued, {result.findings_created} findings, " + f"{result.skipped} skipped, {result.errors} errors" + ) logger.info(summary) if context.report_progress: context.report_progress( phase='Complete', log_line=summary, log_type='success', - scanned=total, total=total + scanned=total, total=total, ) return result def estimate_scope(self, context: JobContext) -> int: - transfer = context.transfer_folder - if not os.path.isdir(transfer): + """Estimate is the active-server album count — matches what + scan() iterates over.""" + active_server = None + if context.config_manager: + try: + active_server = context.config_manager.get_active_media_server() + except Exception: + pass + try: + conn = context.db._get_connection() + try: + cursor = conn.cursor() + if active_server: + cursor.execute( + "SELECT COUNT(*) FROM albums WHERE server_source = ?", + (active_server,), + ) + else: + cursor.execute("SELECT COUNT(*) FROM albums") + row = cursor.fetchone() + return int(row[0]) if row else 0 + finally: + conn.close() + except Exception: return 0 - count = 0 - for _root, _dirs, files in os.walk(transfer): - for fname in files: - if os.path.splitext(fname)[1].lower() in AUDIO_EXTENSIONS: - count += 1 - return count - def _get_setting(self, context: JobContext, key: str, default): - """Read a job-specific setting from config.""" - if context.config_manager: - return context.config_manager.get(f'repair.jobs.{self.job_id}.settings.{key}', default) - return default + def _load_albums(self, db, active_server: Optional[str] = None) -> list: + """Load minimal album metadata (id, title, artist_id, artist_name) + for albums on the active media server. - def _load_album_years(self, db) -> dict: - """Load all album years from DB. Returns {(artist_lower, album_lower): year_str}. + SoulSync's DB stores rows for every configured server (Plex + + Jellyfin + Navidrome + SoulSync standalone) distinguished by + the ``server_source`` column. The artist-detail reorganize + modal only sees the active server's library; this job matches + that scope so users don't accidentally try to reorganize the + inactive server's files (which live at different paths and + likely shouldn't move). - Checks the main albums table first, then falls back to discovery_pool - release_date for tracks that were playlist-synced without year metadata. + ``active_server=None`` falls back to "no filter" — used by + legacy callers / tests that don't have a config_manager. """ - years = {} conn = None try: conn = db._get_connection() cursor = conn.cursor() - - # Source 1: albums table (most authoritative) - cursor.execute(""" - SELECT ar.name, al.title, al.year - FROM albums al - JOIN artists ar ON ar.id = al.artist_id - WHERE al.year IS NOT NULL AND al.year != 0 - """) - for row in cursor.fetchall(): - artist_name, album_title, year = row - if artist_name and album_title and year: - key = (artist_name.lower(), album_title.lower()) - years[key] = str(year)[:4] - - # Source 2: discovery_pool release_date (covers playlist-synced tracks) - try: + if active_server: cursor.execute(""" - SELECT artist_name, album_name, release_date - FROM discovery_pool - WHERE release_date IS NOT NULL AND release_date != '' - """) - for row in cursor.fetchall(): - artist_name, album_name, release_date = row - if artist_name and album_name and release_date: - key = (artist_name.lower(), album_name.lower()) - if key not in years: # Don't override albums table - year_str = str(release_date)[:4] - if len(year_str) == 4 and year_str.isdigit(): - years[key] = year_str - except Exception: - pass # discovery_pool may not exist on all installs - - # Source 3: recent_releases (watchlist artist releases) - try: - cursor.execute(""" - SELECT wa.artist_name, rr.album_name, rr.release_date - FROM recent_releases rr - JOIN watchlist_artists wa ON wa.id = rr.watchlist_artist_id - WHERE rr.release_date IS NOT NULL AND rr.release_date != '' - """) - for row in cursor.fetchall(): - artist_name, album_name, release_date = row - if artist_name and album_name and release_date: - key = (artist_name.lower(), album_name.lower()) - if key not in years: - year_str = str(release_date)[:4] - if len(year_str) == 4 and year_str.isdigit(): - years[key] = year_str - except Exception: - pass # recent_releases may not exist on all installs - - # Source 4: wishlist tracks (spotify_data JSON contains release date) - try: + SELECT al.id, al.title, al.artist_id, ar.name AS artist_name + FROM albums al + LEFT JOIN artists ar ON ar.id = al.artist_id + WHERE al.server_source = ? + ORDER BY al.id + """, (active_server,)) + else: cursor.execute(""" - SELECT spotify_data FROM wishlist_tracks - WHERE spotify_data IS NOT NULL AND spotify_data != '' + SELECT al.id, al.title, al.artist_id, ar.name AS artist_name + FROM albums al + LEFT JOIN artists ar ON ar.id = al.artist_id + ORDER BY al.id """) - for row in cursor.fetchall(): - try: - data = json.loads(row[0]) - artist_name = '' - if data.get('artists'): - a = data['artists'][0] - artist_name = a.get('name', '') if isinstance(a, dict) else str(a) - album_data = data.get('album', {}) - album_name = album_data.get('name', '') if isinstance(album_data, dict) else '' - release_date = album_data.get('release_date', '') if isinstance(album_data, dict) else '' - if artist_name and album_name and release_date: - key = (artist_name.lower(), album_name.lower()) - if key not in years: - year_str = str(release_date)[:4] - if len(year_str) == 4 and year_str.isdigit(): - years[key] = year_str - except (ValueError, KeyError, TypeError): - continue - except Exception: - pass # wishlist_tracks may not exist or have different schema - + return [dict(row) for row in cursor.fetchall()] except Exception as e: - logger.debug("Failed to load album years from DB: %s", e) + logger.error("Failed to load album list for reorganize: %s", e) + return [] finally: if conn: - conn.close() - return years - - def _lookup_years_from_api(self, context, missing_pairs) -> dict: - """Batch-lookup release years from the configured metadata providers for albums not found in DB. - - Args: - context: JobContext with config_manager - missing_pairs: set of (artist, album) tuples needing year lookup - - Returns: - dict of {(artist_lower, album_lower): year_str} - """ - years = {} - if not missing_pairs: - return years - - primary_source = get_primary_source() - source_priority = get_source_priority(primary_source) - - # Cap lookups to avoid excessive API calls - max_lookups = 200 - pairs_list = list(missing_pairs)[:max_lookups] - logger.info("Looking up %d album years from configured metadata providers", len(pairs_list)) - - if context.report_progress: - context.report_progress( - phase=f'Looking up {len(pairs_list)} album years from metadata providers...', - log_line=f'Fetching release years for {len(pairs_list)} albums', - log_type='info' - ) - - for artist, album in pairs_list: - if context.check_stop(): - break - key = (artist.lower(), album.lower()) - for source_name in source_priority: try: - search_client = get_client_for_source(source_name) - if not search_client or not hasattr(search_client, 'search_albums'): - continue - results = search_client.search_albums(f"{artist} {album}", limit=3) - if results: - for r in results: - release_date = getattr(r, 'release_date', '') or '' - if release_date and len(release_date) >= 4: - year_str = release_date[:4] - if year_str.isdigit(): - years[key] = year_str - break - if key in years: - break - if context.sleep_or_stop(0.1): # Rate limit courtesy - return years - except Exception as e: - logger.debug("API year lookup failed for %s - %s via %s: %s", artist, album, source_name, e) - - logger.info("API year lookup: found %d/%d years", len(years), len(pairs_list)) - return years - - def _update_db_path(self, db, old_path: str, new_path: str, transfer_folder: str = ''): - """Update file_path in the tracks table when a file is moved. - - DB may store server-side paths (e.g. /mnt/musicBackup/Artist/Album/track.flac) - while local paths use the transfer folder (e.g. H:\\Music\\Artist\\Album\\track.flac). - Falls back to suffix matching when exact match fails. - """ - conn = None - try: - conn = db._get_connection() - cursor = conn.cursor() - - # Try exact match first - cursor.execute("UPDATE tracks SET file_path = ? WHERE file_path = ?", - (new_path, old_path)) - if cursor.rowcount > 0: - conn.commit() - return - - # Try normalized path match - cursor.execute("UPDATE tracks SET file_path = ? WHERE file_path = ?", - (new_path, os.path.normpath(old_path))) - if cursor.rowcount > 0: - conn.commit() - return - - # Suffix match: compute path relative to transfer folder and match - # against DB paths that may use a different base prefix - if transfer_folder: - try: - rel_suffix = os.path.relpath(old_path, transfer_folder).replace('\\', '/') - # Escape LIKE wildcards (% _ ^) so artist/album names are literal - escaped = rel_suffix.replace('^', '^^').replace('%', '^%').replace('_', '^_') - cursor.execute( - "UPDATE tracks SET file_path = ? WHERE file_path LIKE ? ESCAPE '^'", - (new_path, '%/' + escaped) - ) - if cursor.rowcount > 0: - conn.commit() - return - # Also try with backslash separators (Windows DB paths) - escaped_bs = escaped.replace('/', '\\') - cursor.execute( - "UPDATE tracks SET file_path = ? WHERE file_path LIKE ? ESCAPE '^'", - (new_path, '%\\' + escaped_bs) - ) + conn.close() except Exception: pass - conn.commit() - except Exception as e: - logger.debug("DB path update failed for %s: %s", old_path, e) - finally: - if conn: - conn.close() + def _get_setting(self, context: JobContext, key: str, default): + """Read a job-specific setting from config.""" + if context.config_manager: + return context.config_manager.get( + f'repair.jobs.{self.job_id}.settings.{key}', default, + ) + return default diff --git a/tests/test_library_reorganize.py b/tests/test_library_reorganize.py index 50f97fa9..6c34c285 100644 --- a/tests/test_library_reorganize.py +++ b/tests/test_library_reorganize.py @@ -1,8 +1,29 @@ +"""Tests for the rewritten Library Reorganize repair job. + +Issue #500: pre-rewrite the job had its own tag-reading + transfer- +folder-grouping + template-application implementation. The +``is_album = group_size > 1`` heuristic misclassified album tracks +as singles when only one track of an album sat in the transfer folder +or when album tags varied across tracks. + +Post-rewrite the job delegates to the per-album planner +(``core.library_reorganize.preview_album_reorganize`` / +``reorganize_queue``) — no second move/template implementation. These +tests pin the delegation contract so future drift fails here instead +of at runtime against a real library. +""" + +from __future__ import annotations + import sys import types from types import SimpleNamespace +from unittest.mock import MagicMock + +import pytest -# Stub optional Spotify dependency so metadata_service can import in tests. + +# ── stubs (same shape used elsewhere in the suite) ────────────────────── if 'spotipy' not in sys.modules: spotipy = types.ModuleType('spotipy') oauth2 = types.ModuleType('spotipy.oauth2') @@ -36,97 +57,459 @@ if 'config.settings' not in sys.modules: sys.modules['config'] = config_mod sys.modules['config.settings'] = settings_mod -from core.repair_jobs import library_reorganize as lr +from core.repair_jobs.library_reorganize import LibraryReorganizeJob +from core.repair_jobs.base import JobContext -class _FakeSearchClient: - def __init__(self, source_name, year=None): - self.source_name = source_name - self.year = year - self.calls = [] - def search_albums(self, query, limit=3): - self.calls.append((query, limit)) - if self.year is None: - return [] - return [SimpleNamespace(release_date=f"{self.year}-01-01")] +# ── fixtures ────────────────────────────────────────────────────────── -def test_lookup_years_prefers_primary_source(monkeypatch): - deezer_client = _FakeSearchClient('deezer', '2022') - spotify_client = _FakeSearchClient('spotify', '1999') +class _FakeConfigManager: + """Minimal config manager. Reads via ``get(key, default)``.""" - monkeypatch.setattr(lr, 'get_primary_source', lambda: 'deezer') - monkeypatch.setattr(lr, 'get_source_priority', lambda primary: [primary, 'itunes', 'spotify']) - monkeypatch.setattr( - lr, - 'get_client_for_source', - lambda source: {'deezer': deezer_client, 'spotify': spotify_client}.get(source), - ) + def __init__(self, settings: dict | None = None, file_org_enabled: bool = True, + active_server: str = 'plex'): + self._settings = settings or {} + self._file_org_enabled = file_org_enabled + self._active_server = active_server - job = lr.LibraryReorganizeJob() - result = job._lookup_years_from_api(SimpleNamespace(report_progress=None, check_stop=lambda: False, sleep_or_stop=lambda *_: False), {('Artist', 'Album')}) + def get(self, key, default=None): + if key == 'file_organization.enabled': + return self._file_org_enabled + return self._settings.get(key, default) - assert result == {('artist', 'album'): '2022'} - assert deezer_client.calls == [('Artist Album', 3)] - assert spotify_client.calls == [] + def get_active_media_server(self): + return self._active_server -def test_lookup_years_falls_through_to_later_source(monkeypatch): - deezer_client = _FakeSearchClient('deezer', None) - spotify_client = _FakeSearchClient('spotify', '1999') +class _FakeDB: + """Stand-in for MusicDatabase. Returns the canned album list from + ``_load_albums``'s SELECT. - monkeypatch.setattr(lr, 'get_primary_source', lambda: 'deezer') - monkeypatch.setattr(lr, 'get_source_priority', lambda primary: [primary, 'itunes', 'spotify']) - monkeypatch.setattr( - lr, - 'get_client_for_source', - lambda source: {'deezer': deezer_client, 'spotify': spotify_client}.get(source), - ) + Supports per-server filtering: pass ``rows_by_server`` as a dict to + return different album sets depending on the SQL parameters. The + helper inspects the SQL string for ``server_source = ?`` and returns + the matching slice. Falls back to ``album_rows`` when ``rows_by_server`` + isn't set (back-compat with single-server tests).""" - job = lr.LibraryReorganizeJob() - result = job._lookup_years_from_api( - SimpleNamespace(report_progress=None, check_stop=lambda: False, sleep_or_stop=lambda *_: False), - {('Artist', 'Album')}, - ) + def __init__(self, album_rows: list = None, rows_by_server: dict = None): + self._album_rows = album_rows or [] + self._rows_by_server = rows_by_server or {} + + def _get_connection(self): + cursor = MagicMock() + rows_by_server = self._rows_by_server + default_rows = self._album_rows + + def _execute(sql, params=()): + if rows_by_server and 'server_source = ?' in sql and params: + cursor._captured_rows = rows_by_server.get(params[0], []) + else: + cursor._captured_rows = default_rows + cursor.fetchall.return_value = cursor._captured_rows + cursor.fetchone.return_value = (len(cursor._captured_rows),) + + cursor.execute.side_effect = _execute + cursor.fetchall.return_value = default_rows + cursor.fetchone.return_value = (len(default_rows),) + conn = MagicMock() + conn.cursor.return_value = cursor + return conn + + +@pytest.fixture +def make_context(): + """Build a JobContext with optional finding-collector.""" + + def _make(*, db, cm=None, dry_run=True, transfer='/tmp/transfer'): + findings = [] + + def _create_finding(**kwargs): + findings.append(kwargs) + return True # 'inserted' return value + + ctx = JobContext( + db=db, + transfer_folder=transfer, + config_manager=cm or _FakeConfigManager( + settings={ + f'repair.jobs.library_reorganize.settings.dry_run': dry_run, + }, + ), + create_finding=_create_finding, + ) + # Attach so tests can inspect. + ctx._captured_findings = findings # type: ignore[attr-defined] + return ctx + + return _make + + +def _make_album_row(*, id_, title='Test Album', artist_id=10, artist_name='Test Artist'): + """Match the row shape ``_load_albums`` returns.""" + return { + 'id': id_, + 'title': title, + 'artist_id': artist_id, + 'artist_name': artist_name, + } + + +def _stub_preview(monkeypatch, response_by_album_id: dict): + """Patch ``preview_album_reorganize`` import inside scan() so it + returns a canned response per album_id.""" + from core import library_reorganize as core_lr + + def _fake_preview(*, album_id, **kwargs): + return response_by_album_id.get(album_id) or { + 'success': False, 'status': 'no_album', 'tracks': [], + } + monkeypatch.setattr(core_lr, 'preview_album_reorganize', _fake_preview) + + +# ── core delegation contract ───────────────────────────────────────── + + +def test_scan_skips_when_file_organization_disabled(make_context): + """Pin: file_organization.enabled=False → scan returns immediately + with empty result, no DB iteration, no preview calls.""" + db = _FakeDB([]) + cm = _FakeConfigManager(file_org_enabled=False) + ctx = make_context(db=db, cm=cm) + + job = LibraryReorganizeJob() + result = job.scan(ctx) + + assert result.scanned == 0 + assert result.findings_created == 0 + assert ctx._captured_findings == [] # type: ignore[attr-defined] + + +def test_scan_returns_empty_when_no_albums(make_context): + """Pin: empty DB → empty result, no errors.""" + db = _FakeDB([]) + ctx = make_context(db=db) + + job = LibraryReorganizeJob() + result = job.scan(ctx) + + assert result.scanned == 0 + assert result.findings_created == 0 + assert result.errors == 0 + + +def test_scan_emits_path_mismatch_finding_for_each_changed_track(make_context, monkeypatch): + """Pin: dry-run mode emits one finding per matched-but-not-unchanged + track returned by the planner. Album with two tracks, both + mismatched → two findings.""" + db = _FakeDB([_make_album_row(id_='A1')]) + _stub_preview(monkeypatch, { + 'A1': { + 'success': True, 'status': 'planned', + 'source': 'spotify', + 'album': 'Album One', + 'artist': 'Artist One', + 'tracks': [ + { + 'track_id': 't1', 'title': 'Track One', 'track_number': 1, + 'current_path': 'old/path/01 - Track One.flac', + 'new_path': 'new/path/01 - Track One.flac', + 'matched': True, 'unchanged': False, 'file_exists': True, + }, + { + 'track_id': 't2', 'title': 'Track Two', 'track_number': 2, + 'current_path': 'old/path/02 - Track Two.flac', + 'new_path': 'new/path/02 - Track Two.flac', + 'matched': True, 'unchanged': False, 'file_exists': True, + }, + ], + }, + }) + ctx = make_context(db=db, dry_run=True) + + job = LibraryReorganizeJob() + result = job.scan(ctx) + + assert result.findings_created == 2 + assert result.scanned == 2 + findings = ctx._captured_findings # type: ignore[attr-defined] + titles = {f['title'] for f in findings} + assert any('Track One' in t for t in titles) + assert any('Track Two' in t for t in titles) + + +def test_scan_skips_unchanged_tracks(make_context, monkeypatch): + """Pin: tracks with unchanged=True don't produce findings — they're + already at the right path.""" + db = _FakeDB([_make_album_row(id_='A1')]) + _stub_preview(monkeypatch, { + 'A1': { + 'success': True, 'status': 'planned', + 'source': 'spotify', + 'album': 'Album One', 'artist': 'Artist One', + 'tracks': [ + { + 'track_id': 't1', 'title': 'Track One', + 'current_path': 'right/path/01 - Track One.flac', + 'new_path': 'right/path/01 - Track One.flac', + 'matched': True, 'unchanged': True, 'file_exists': True, + }, + ], + }, + }) + ctx = make_context(db=db, dry_run=True) + + job = LibraryReorganizeJob() + result = job.scan(ctx) + + assert result.findings_created == 0 + assert result.scanned == 1 + + +def test_scan_skips_unmatched_tracks_within_planned_album(make_context, monkeypatch): + """Pin: tracks the planner couldn't match (matched=False) are + skipped — no path was computed for them, can't reorganize.""" + db = _FakeDB([_make_album_row(id_='A1')]) + _stub_preview(monkeypatch, { + 'A1': { + 'success': True, 'status': 'planned', + 'source': 'spotify', + 'album': 'Album One', 'artist': 'Artist One', + 'tracks': [ + { + 'track_id': 't1', 'title': 'Bonus Track', + 'current_path': 'old/path/12 - Bonus.flac', + 'new_path': '', + 'matched': False, 'unchanged': False, 'file_exists': True, + 'reason': 'No matching track in spotify tracklist', + }, + ], + }, + }) + ctx = make_context(db=db, dry_run=True) + + job = LibraryReorganizeJob() + result = job.scan(ctx) + + assert result.findings_created == 0 + assert result.scanned == 1 + + +def test_scan_skips_tracks_with_missing_files(make_context, monkeypatch): + """Pin: file_exists=False → not eligible for move (handled by the + Dead File Cleaner job instead). No path_mismatch finding emitted.""" + db = _FakeDB([_make_album_row(id_='A1')]) + _stub_preview(monkeypatch, { + 'A1': { + 'success': True, 'status': 'planned', + 'source': 'spotify', + 'album': 'Album One', 'artist': 'Artist One', + 'tracks': [ + { + 'track_id': 't1', 'title': 'Missing Track', + 'current_path': 'gone/01 - Missing.flac', + 'new_path': 'new/01 - Missing.flac', + 'matched': True, 'unchanged': False, 'file_exists': False, + }, + ], + }, + }) + ctx = make_context(db=db, dry_run=True) + + job = LibraryReorganizeJob() + result = job.scan(ctx) + + assert result.findings_created == 0 + + +def test_scan_emits_album_needs_enrichment_when_planner_returns_no_source_id(make_context, monkeypatch): + """Pin: planner returns status='no_source_id' → emit ONE + album-level finding ('needs enrichment') instead of N per-track + 'no source' findings (which would clutter the UI).""" + db = _FakeDB([_make_album_row(id_='A1', title='Unenriched Album')]) + _stub_preview(monkeypatch, { + 'A1': { + 'success': False, 'status': 'no_source_id', + 'source': None, + 'album': 'Unenriched Album', 'artist': 'Some Artist', + 'tracks': [ + {'track_id': 't1', 'title': 'Track 1', 'matched': False, 'reason': '...'}, + {'track_id': 't2', 'title': 'Track 2', 'matched': False, 'reason': '...'}, + ], + }, + }) + ctx = make_context(db=db, dry_run=True) + + job = LibraryReorganizeJob() + result = job.scan(ctx) + + findings = ctx._captured_findings # type: ignore[attr-defined] + assert result.findings_created == 1 + assert findings[0]['finding_type'] == 'album_needs_enrichment' + assert 'Unenriched Album' in findings[0]['title'] + + +def test_scan_skips_albums_planner_reports_as_no_album(make_context, monkeypatch): + """Pin: planner returns 'no_album' (race: album deleted between + SELECT and preview) → silently skipped, no finding, no error.""" + db = _FakeDB([_make_album_row(id_='A1')]) + _stub_preview(monkeypatch, { + 'A1': {'success': False, 'status': 'no_album', 'tracks': []}, + }) + ctx = make_context(db=db, dry_run=True) + + job = LibraryReorganizeJob() + result = job.scan(ctx) + + assert result.findings_created == 0 + assert result.errors == 0 + assert result.skipped == 1 + + +def test_scan_handles_preview_exceptions_gracefully(make_context, monkeypatch): + """Pin: preview raising for one album doesn't abort the whole + scan — counts as one error, continues to next album.""" + db = _FakeDB([ + _make_album_row(id_='A1'), + _make_album_row(id_='A2', title='Good Album'), + ]) + from core import library_reorganize as core_lr + + def _flaky(*, album_id, **kwargs): + if album_id == 'A1': + raise RuntimeError("preview boom") + return { + 'success': True, 'status': 'planned', + 'source': 'spotify', + 'album': 'Good Album', 'artist': 'Artist', + 'tracks': [ + { + 'track_id': 't1', 'title': 'Track', + 'current_path': 'old/01 - Track.flac', + 'new_path': 'new/01 - Track.flac', + 'matched': True, 'unchanged': False, 'file_exists': True, + }, + ], + } + monkeypatch.setattr(core_lr, 'preview_album_reorganize', _flaky) + + ctx = make_context(db=db, dry_run=True) + job = LibraryReorganizeJob() + result = job.scan(ctx) + + assert result.errors == 1 # A1 failed + assert result.findings_created == 1 # A2 succeeded + + +def test_scan_apply_mode_enqueues_albums_via_reorganize_queue(make_context, monkeypatch): + """Pin: dry_run=False → mismatched albums are bulk-enqueued via + ``core.reorganize_queue.get_queue().enqueue_many(...)``. Repair + job does NOT do file moves itself — delegates to the queue worker + which uses the same code path the per-album modal does.""" + db = _FakeDB([ + _make_album_row(id_='A1', title='First Album', artist_id=10, artist_name='A'), + _make_album_row(id_='A2', title='Second Album', artist_id=20, artist_name='B'), + ]) + _stub_preview(monkeypatch, { + 'A1': { + 'success': True, 'status': 'planned', + 'source': 'spotify', + 'album': 'First Album', 'artist': 'A', + 'tracks': [ + { + 'track_id': 't1', 'title': 'X', + 'current_path': 'old/01 - X.flac', 'new_path': 'new/01 - X.flac', + 'matched': True, 'unchanged': False, 'file_exists': True, + }, + ], + }, + 'A2': { + 'success': True, 'status': 'planned', + 'source': 'deezer', + 'album': 'Second Album', 'artist': 'B', + 'tracks': [ + { + 'track_id': 't2', 'title': 'Y', + 'current_path': 'old/01 - Y.flac', 'new_path': 'new/01 - Y.flac', + 'matched': True, 'unchanged': False, 'file_exists': True, + }, + ], + }, + }) + + enqueue_calls = [] + + class _StubQueue: + def enqueue_many(self, items): + enqueue_calls.append(items) + # Match the real queue's return shape: + # {'enqueued': N, 'already_queued': M, 'total': K} + return {'enqueued': len(items), 'already_queued': 0, 'total': len(items)} + + import core.reorganize_queue as queue_mod + monkeypatch.setattr(queue_mod, 'get_queue', lambda: _StubQueue()) + + ctx = make_context(db=db, dry_run=False) + job = LibraryReorganizeJob() + result = job.scan(ctx) + + assert len(enqueue_calls) == 1 + queued = enqueue_calls[0] + assert {q['album_id'] for q in queued} == {'A1', 'A2'} + assert {q['source'] for q in queued} == {'spotify', 'deezer'} + assert result.auto_fixed == 2 + # Apply mode does NOT emit findings — it enqueues for actual move. + assert result.findings_created == 0 + + +def test_scan_only_iterates_albums_for_active_server(make_context, monkeypatch): + """Pin: multi-server users (Plex + Jellyfin etc) — the job only + iterates albums on the ACTIVE server. Inactive server's rows are + skipped so we don't move files at paths the user can't see in + the artist-detail UI.""" + db = _FakeDB(rows_by_server={ + 'plex': [_make_album_row(id_='plex_album_1', title='Plex-only Album')], + 'jellyfin': [ + _make_album_row(id_='jelly_album_1', title='Jelly Album 1'), + _make_album_row(id_='jelly_album_2', title='Jelly Album 2'), + ], + }) + cm = _FakeConfigManager(active_server='jellyfin') + ctx = make_context(db=db, cm=cm) + + seen_album_ids = [] - assert result == {('artist', 'album'): '1999'} - assert deezer_client.calls == [('Artist Album', 3)] - assert spotify_client.calls == [('Artist Album', 3)] + from core import library_reorganize as core_lr + def _track_calls(*, album_id, **kwargs): + seen_album_ids.append(album_id) + return {'success': False, 'status': 'no_album', 'tracks': []} + monkeypatch.setattr(core_lr, 'preview_album_reorganize', _track_calls) -def test_lookup_years_rechecks_client_availability_per_album(monkeypatch): - availability = {'spotify': True} + job = LibraryReorganizeJob() + job.scan(ctx) - class _SpotifyClient(_FakeSearchClient): - def search_albums(self, query, limit=3): - self.calls.append((query, limit)) - availability['spotify'] = False - return [] + # Only Jellyfin's two albums were processed; the Plex album was + # filtered out by the SQL active-server clause. + assert sorted(seen_album_ids) == ['jelly_album_1', 'jelly_album_2'] - spotify_client = _SpotifyClient('spotify', None) - itunes_client = _FakeSearchClient('itunes', '2002') - helper_calls = [] - def fake_get_client_for_source(source): - helper_calls.append(source) - if source == 'spotify' and not availability['spotify']: - return None - return {'spotify': spotify_client, 'itunes': itunes_client}.get(source) +def test_estimate_scope_returns_album_count(monkeypatch): + """Pin: ``estimate_scope`` returns the DB album count (matches + what scan iterates over).""" + cursor = MagicMock() + cursor.fetchone.return_value = (42,) + conn = MagicMock() + conn.cursor.return_value = cursor - monkeypatch.setattr(lr, 'get_primary_source', lambda: 'spotify') - monkeypatch.setattr(lr, 'get_source_priority', lambda primary: [primary, 'itunes']) - monkeypatch.setattr(lr, 'get_client_for_source', fake_get_client_for_source) + db = MagicMock() + db._get_connection.return_value = conn - job = lr.LibraryReorganizeJob() - result = job._lookup_years_from_api( - SimpleNamespace(report_progress=None, check_stop=lambda: False, sleep_or_stop=lambda *_: False), - {('Artist A', 'Album A'), ('Artist B', 'Album B')}, + cm = _FakeConfigManager() + ctx = JobContext( + db=db, transfer_folder='/tmp', config_manager=cm, ) - assert result == {('artist a', 'album a'): '2002', ('artist b', 'album b'): '2002'} - assert helper_calls.count('spotify') == 2 - assert helper_calls.count('itunes') == 2 - assert len(spotify_client.calls) == 1 - assert spotify_client.calls[0] in [('Artist A Album A', 3), ('Artist B Album B', 3)] - assert set(itunes_client.calls) == {('Artist A Album A', 3), ('Artist B Album B', 3)} + job = LibraryReorganizeJob() + assert job.estimate_scope(ctx) == 42 diff --git a/webui/static/helper.js b/webui/static/helper.js index 1a99afed..9cd40428 100644 --- a/webui/static/helper.js +++ b/webui/static/helper.js @@ -3432,6 +3432,7 @@ const WHATS_NEW = { '2.4.2': [ // --- post-2.4.1 dev work — entries hidden by _getLatestWhatsNewVersion until the build version bumps --- { date: 'Unreleased — 2.4.2 dev cycle' }, + { title: 'Fix: Library Reorganize Job Misclassified Album Tracks As Singles', desc: 'github issue #500 (bafoed): library reorganize repair job moved tracks like `Surf Curse/Surf Curse - Nothing Yet (2017)/01 - Christine F.flac` to single-template paths like `Surf Curse/Surf Curse - Christine F/Surf Curse - Christine F.flac`. root cause: the job used `is_album = (group_size > 1)` where `group_size` was the count of tracks for the same album currently sitting in the transfer folder being scanned — when only one track of an album was in transfer (rest already moved to library, or album tags varied across tracks like "Buds" vs "Buds (Bonus)"), each track became a 1-element group → all routed through single template. fix: rewrote the job to delegate to the per-album planner (`core.library_reorganize.preview_album_reorganize` / `reorganize_queue`) — the same planner the artist-detail "reorganize" modal uses. db-driven: the planner knows the album has multiple tracks regardless of how many sit in the transfer folder, so the album-vs-single classification is structurally correct. apply mode delegates to the existing reorganize queue → file move + post-processing + db update + sidecar handling all flow through one code path. only iterates albums for the ACTIVE media server (matches the artist-detail modal\'s scope) — multi-server users (plex + jellyfin etc) won\'t accidentally have the job touch the inactive server\'s files. albums missing a metadata source id get a single "needs enrichment first" finding instead of n per-track "no source" findings. dropped ~500 loc of tag-reading + transfer-walk + template logic that was duplicated against the per-album path. files in transfer with no db entry are now exclusively the orphan_file_detector\'s domain (clean separation). 12 tests pin the delegation contract.', page: 'library' }, { title: 'Fix: Enrich Honors Manual Album Matches', desc: 'github issue #501 (tacobell444): if you manually matched an album to a specific source ID via the match-chip UI, then clicked "enrich" on that album, the worker would search by name and overwrite your manual match with whatever the search returned (or revert status to "not_found" if it found nothing). reorganize then read the now-wrong id and moved files to the wrong destination. fix: extracted a shared `core/enrichment/manual_match_honoring.py` helper. every per-source enrichment worker (spotify / itunes / deezer / tidal / qobuz) now reads its stored id column at the top of `_process_*_individual` — if present, it fetches via `client.get_album(stored_id)` directly and refreshes metadata without touching the id. fuzzy name search only runs as fallback for never-matched entities. discogs / audiodb / musicbrainz already had inline stored-id fast paths and are left alone. lastfm / genius are name-based and don\'t store ids. cin-shape lift: same fix in 5 workers gets exactly one helper, per-worker variability (column name, client method, response shape) plugs in via callbacks. 11 new helper tests pin: stored-id fast-path, no-id fallthrough, fetch-failure fallthrough, table/column whitelist, callback contract.', page: 'library' }, { title: 'Fix: "no such table: hifi_instances" When Adding HiFi Instance', desc: 'github issue #503 (hadshaw21): adding a hifi instance via downloader settings popped up `no such table: hifi_instances` even though the connection test and "check all instances" both worked. root cause: `_initialize_database` runs every CREATE TABLE + every migration step inside one sqlite transaction. python\'s sqlite3 module doesn\'t autocommit DDL by default, so if any later migration step throws on a user\'s specific DB shape (e.g. an old volume from a prior soulsync version with quirky schema state), the WHOLE batch rolls back — including the hifi_instances CREATE that ran successfully. user\'s next boot retries init, hits the same migration failure, rolls back again. table never lands. fix: defensive lazy-create. every hifi_instances CRUD method now runs `CREATE TABLE IF NOT EXISTS hifi_instances (...)` immediately before its operation. idempotent — costs one PRAGMA-level no-op when the table is already present, fully recovers from a broken init. read methods (`get_hifi_instances`, `get_all_hifi_instances`) now return empty instead of raising when init failed. write methods (`add`, `remove`, `toggle`, `reorder`, `seed`) work end-to-end. doesn\'t paper over the underlying init issue (still worth tracking down which migration breaks for which users) but makes hifi instance management self-healing. 7 new tests pin the lazy-create behavior — every method works against a DB that\'s missing the table.', page: 'settings' }, { title: 'Plex: "All Libraries (Combined)" Mode', desc: 'github issue #505 (popebruhlxix): users with multiple plex music libraries (e.g. one per plex home user) only saw one library inside soulsync because the connection settings forced you to pick a single library section. now there\'s a new "all libraries (combined)" option in settings → connections → plex → music library dropdown. picking it flips the plex client into a server-wide read mode where every read method (`get_all_artists` / `get_all_album_ids` / `search_tracks` / `get_library_stats` / etc) dispatches through `server.library.search(libtype=...)` instead of querying a single section. one api call, plex handles the aggregation. cross-section dedup applied at the listing layer — same-name artists across sections collapse to a canonical entry (the one with more tracks), so plex home families with overlapping music tastes don\'t see "drake" twice. removal-detection id enumeration stays raw on purpose — deduping there would falsely prune tracks linked to non-canonical ratingKeys. write methods (genre / poster / metadata updates) are unaffected and operate on plex objects via ratingKey directly — write-back targets one section\'s copy of an artist if it exists in multiple, document and revisit if it matters. trigger_library_scan + is_library_scanning fan out across every music section in the new mode. backward compatible — existing users with a real library name saved see no behavior change. the "all libraries" option only appears in the dropdown when more than one music library exists on the server. 29 new tests pin both modes (single-section preserved, all-libraries dispatches through server-wide search, dedup keeps canonical, id enumeration stays raw).', page: 'settings' }, @@ -3779,6 +3780,20 @@ const WHATS_NEW = { // Section shape: { title, description, features: [bullet strings], // usage_note?: 'optional hint shown at the bottom' } const VERSION_MODAL_SECTIONS = [ + { + title: "Library Reorganize No Longer Mistakes Album Tracks for Singles", + description: "github issue #500 (bafoed): library reorganize repair job was moving album tracks like `01 - Christine F.flac` to single-template paths because of a fragile classification heuristic.", + features: [ + "• pre-rewrite the job had its own tag-reading + transfer-folder walk + template logic — used `is_album = (group_size > 1)` where group_size was the count of same-album tracks in the transfer folder being scanned", + "• when only one track of an album sat in transfer (rest already moved, or album tags varied slightly like \"Buds\" vs \"Buds (Bonus)\") → group size 1 → routed to single template → wrong destination", + "• fix: delegate to the per-album planner the artist-detail \"reorganize\" modal already uses — db-driven, knows the album has n tracks regardless of how many currently sit in transfer", + "• only iterates albums on the ACTIVE media server (matches what the artist-detail modal sees) — multi-server users (plex + jellyfin etc) won\'t accidentally have the job touch the inactive server\'s files", + "• apply mode dispatches to the existing reorganize queue → one code path for file move + post-processing + db update + sidecar", + "• albums missing a metadata source id get a single \"needs enrichment first\" finding instead of n per-track \"no source\" findings cluttering the ui", + "• dropped ~500 loc that was duplicated against the per-album logic — files in transfer with no db entry are now exclusively the orphan file detector\'s domain", + ], + usage_note: "no settings to change — applies on next library reorganize repair job run", + }, { title: "Enrich Now Honors Manual Album Matches", description: "github issue #501 (tacobell444): manually matching an album then clicking enrich would overwrite your manual match with whatever the worker\'s name-search returned, or revert status to \"not found\". reorganize then read the wrong id and moved files to the wrong destination.",