Retag now re-embeds LYRICS tag instead of leaving it empty

Discord report (netti93). The download flow runs `enhance_file_metadata` (clears all tags) then `generate_lrc_file` (writes .lrc sidecar AND embeds USLT). The retag flow only ran the first half — `enhance_file_metadata` cleared USLT and there was no follow-up to restore it. Two coordinated fixes (no new setting per kettui scope discipline — user described it as "might even be an idea," consistency was the load-bearing ask). Fix 1 — retag calls generate_lrc_file after enhance `core/library/retag.py:execute_retag` now invokes `deps.generate_lrc_file` right after the `enhance_file_metadata` call, mirroring the download pipeline. New `generate_lrc_file` field on `RetagDeps`, defaults to None for backward compat with any test caller that builds RetagDeps without it. Web_server's `_build_retag_deps()` factory wires in the real `core.metadata.lyrics.generate_lrc_file`. Placement matters — runs BEFORE `safe_move_file` so the helper sees the audio file at its current path with its existing sidecar (which retag hasn't moved yet). After the embed, the audio file gets moved with USLT now present; the sidecar move step that follows is unaffected. Fix 2 — create_lrc_file re-embeds from existing sidecar `core/lyrics_client.py:create_lrc_file` used to early-return True when an .lrc / .txt sidecar already existed (skipping the LRClib fetch). For the retag case the sidecar is already there, so the shortcut hit and USLT was never re-written. Now the helper reads the existing sidecar and calls `_embed_lyrics` with its content before returning. Empty / unreadable sidecars short-circuit silently — defensive, no crash. Download flow unaffected because no sidecar exists at fetch time. 7 boundary tests pin: existing .lrc triggers re-embed, existing .txt triggers re-embed, empty sidecar skips embed, unreadable sidecar swallows error, no sidecar falls through to LRClib (download path regression guard), RetagDeps.generate_lrc_file field accepted, field optional for backward compat. Full suite: 3120 passed.
22 hours ago · 2f284efa57
parent 853c32bdb2
commit 2f284efa57
5 changed files with 268 additions and 3 deletions
--- a/core/library/retag.py
+++ b/core/library/retag.py
@ -37,7 +37,7 @@ import os
 import traceback
 from dataclasses import dataclass
 from difflib import SequenceMatcher
-from typing import Any, Callable
+from typing import Any, Callable, Optional

 logger = logging.getLogger(__name__)

@ -62,6 +62,13 @@ class RetagDeps:
    _get_retag_state: Callable[[], dict]
    _set_retag_state: Callable[[dict], None]
    get_database: Callable[[], Any]
+    # Discord report (Netti93) — retag was clearing the LYRICS / USLT
+    # tag without rewriting it, while the download pipeline calls
+    # `generate_lrc_file` after enrichment to refetch + embed lyrics.
+    # Injected here so retag mirrors the same post-enrichment step.
+    # Optional for backward compat with any test caller that builds
+    # RetagDeps without the new field — empty default no-ops the call.
+    generate_lrc_file: Optional[Callable] = None

    @property
    def retag_state(self) -> dict:
@ -230,6 +237,20 @@ def execute_retag(group_id, album_id, deps: RetagDeps):
            except Exception as meta_err:
                logger.error(f"[Retag] Metadata write failed for '{track_title}': {meta_err}")

+            # Discord report (Netti93) — `enhance_file_metadata` clears
+            # ALL tags (incl. USLT lyrics) and rewrites only the source
+            # metadata. The download pipeline calls `generate_lrc_file`
+            # after enrichment to refetch + embed lyrics — retag was
+            # missing that step and dropped the LYRICS tag with no
+            # rewrite. Mirroring the download path's post-enrichment
+            # step. Same args, same `lrclib_enabled` config gate, same
+            # idempotency (skip when sidecar already present).
+            if deps.generate_lrc_file:
+                try:
+                    deps.generate_lrc_file(current_file_path, context, new_artist, album_info)
+                except Exception as lrc_err:
+                    logger.debug("[Retag] generate_lrc_file failed for '%s': %s", track_title, lrc_err)
+
            # Compute new path and move if different
            file_ext = os.path.splitext(current_file_path)[1]
            try:
--- a/core/lyrics_client.py
+++ b/core/lyrics_client.py
@ -52,9 +52,29 @@ class LyricsClient:
            lrc_path = os.path.splitext(audio_file_path)[0] + '.lrc'
            txt_path = os.path.splitext(audio_file_path)[0] + '.txt'

-            # Skip if lyrics file already exists (either .lrc or .txt)
+            # Sidecar already exists — skip the LRClib fetch but still
+            # re-embed the lyrics in the audio file's tag. The retag
+            # flow clears all tags including USLT and then runs this
+            # helper to restore them; without the embed step the
+            # LYRICS tag stays empty even though the .lrc is right
+            # there next to the file (Discord report — Netti93).
            if os.path.exists(lrc_path) or os.path.exists(txt_path):
-                logger.debug(f"Lyrics file already exists for: {os.path.basename(audio_file_path)}")
+                existing_path = lrc_path if os.path.exists(lrc_path) else txt_path
+                try:
+                    with open(existing_path, 'r', encoding='utf-8') as f:
+                        existing_lyrics = f.read().strip()
+                    if existing_lyrics:
+                        self._embed_lyrics(audio_file_path, existing_lyrics)
+                        logger.debug(
+                            "Re-embedded lyrics from existing %s for: %s",
+                            os.path.basename(existing_path),
+                            os.path.basename(audio_file_path),
+                        )
+                except Exception as e:
+                    logger.debug(
+                        "Could not re-embed lyrics from existing sidecar %s: %s",
+                        os.path.basename(existing_path), e,
+                    )
                return True

            # Fetch lyrics from LRClib
--- a/tests/test_lyrics_reembed_from_sidecar.py
+++ b/tests/test_lyrics_reembed_from_sidecar.py
@ -0,0 +1,220 @@
+"""Tests for re-embedding lyrics from an existing sidecar file.
+
+Discord report (Netti93): retag was clearing the LYRICS / USLT tag
+without rewriting it. Cause was two-fold:
+
+1. `core/library/retag.py:execute_retag` never called
+   `generate_lrc_file` after `enhance_file_metadata`. The download
+   pipeline does — retag was inconsistent.
+2. Even with the call added, `lyrics_client.create_lrc_file` used to
+   short-circuit when an .lrc / .txt sidecar already existed (the
+   typical retag case — sidecar moved alongside the audio file).
+   Pre-fix: returned True without re-embedding USLT. Post-fix: reads
+   the existing sidecar and re-embeds the USLT tag.
+"""
+
+from __future__ import annotations
+
+import os
+import tempfile
+from unittest.mock import patch, MagicMock
+
+import pytest
+
+
+@pytest.fixture
+def fake_audio_file(tmp_path):
+    """Build a minimal FLAC file with no LYRICS tag."""
+    fd, path = tempfile.mkstemp(suffix='.flac', dir=str(tmp_path))
+    os.close(fd)
+    minimal = (
+        b'fLaC'
+        + b'\x80\x00\x00\x22'
+        + b'\x00\x10\x00\x10'
+        + b'\x00\x00\x00\x00\x00\x00'
+        + b'\x0a\xc4\x42\xf0\x00\x00\x00\x00'
+        + b'\x00' * 16
+    )
+    with open(path, 'wb') as f:
+        f.write(minimal)
+    yield path
+
+
+# ──────────────────────────────────────────────────────────────────────
+# create_lrc_file — re-embed when sidecar present
+# ──────────────────────────────────────────────────────────────────────
+
+def test_existing_lrc_sidecar_triggers_reembed(fake_audio_file):
+    """The exact retag scenario — sidecar already exists alongside the
+    audio file (moved during retag), USLT got cleared by enrichment.
+    Helper should read the sidecar and re-embed without hitting LRClib."""
+    from core.lyrics_client import LyricsClient
+
+    sidecar_path = os.path.splitext(fake_audio_file)[0] + '.lrc'
+    with open(sidecar_path, 'w', encoding='utf-8') as f:
+        f.write('[00:01.00]Test lyric line\n[00:05.00]Second line')
+
+    client = LyricsClient()
+    client.api = MagicMock()  # API stub — should NOT be called
+    client._embed_lyrics = MagicMock()
+
+    result = client.create_lrc_file(
+        audio_file_path=fake_audio_file,
+        track_name='Test',
+        artist_name='Artist',
+    )
+
+    assert result is True
+    # API never hit — sidecar shortcut
+    client.api.get_lyrics.assert_not_called()
+    client.api.search_lyrics.assert_not_called()
+    # USLT was re-embedded
+    client._embed_lyrics.assert_called_once()
+    call_args = client._embed_lyrics.call_args
+    assert call_args.args[0] == fake_audio_file
+    assert 'Test lyric line' in call_args.args[1]
+
+
+def test_existing_txt_sidecar_also_triggers_reembed(fake_audio_file):
+    """Same shape with .txt sidecar (plain lyrics, no timestamps)."""
+    from core.lyrics_client import LyricsClient
+
+    sidecar_path = os.path.splitext(fake_audio_file)[0] + '.txt'
+    with open(sidecar_path, 'w', encoding='utf-8') as f:
+        f.write('Just plain lyrics no timestamps')
+
+    client = LyricsClient()
+    client.api = MagicMock()
+    client._embed_lyrics = MagicMock()
+
+    result = client.create_lrc_file(
+        audio_file_path=fake_audio_file,
+        track_name='T', artist_name='A',
+    )
+
+    assert result is True
+    client._embed_lyrics.assert_called_once_with(
+        fake_audio_file, 'Just plain lyrics no timestamps'
+    )
+
+
+def test_empty_sidecar_does_not_embed(fake_audio_file):
+    """Defensive — if the sidecar exists but is empty, don't write an
+    empty USLT tag."""
+    from core.lyrics_client import LyricsClient
+
+    sidecar_path = os.path.splitext(fake_audio_file)[0] + '.lrc'
+    with open(sidecar_path, 'w', encoding='utf-8') as f:
+        f.write('   \n  ')  # whitespace only
+
+    client = LyricsClient()
+    client.api = MagicMock()
+    client._embed_lyrics = MagicMock()
+
+    result = client.create_lrc_file(
+        audio_file_path=fake_audio_file,
+        track_name='T', artist_name='A',
+    )
+
+    assert result is True
+    client._embed_lyrics.assert_not_called()
+
+
+def test_unreadable_sidecar_swallows_error_returns_true(fake_audio_file):
+    """If the sidecar is somehow unreadable, return True (don't try
+    LRClib again — the early-return contract holds), just skip the
+    embed silently."""
+    from core.lyrics_client import LyricsClient
+
+    sidecar_path = os.path.splitext(fake_audio_file)[0] + '.lrc'
+    with open(sidecar_path, 'wb') as f:
+        f.write(b'\xff\xfe\x00\x00')  # invalid UTF-8
+
+    client = LyricsClient()
+    client.api = MagicMock()
+    client._embed_lyrics = MagicMock()
+
+    result = client.create_lrc_file(
+        audio_file_path=fake_audio_file,
+        track_name='T', artist_name='A',
+    )
+
+    assert result is True
+    client.api.get_lyrics.assert_not_called()
+
+
+def test_no_sidecar_falls_through_to_lrclib(fake_audio_file):
+    """No sidecar → original LRClib fetch path runs (download flow)."""
+    from core.lyrics_client import LyricsClient
+
+    client = LyricsClient()
+    fake_lyrics = MagicMock()
+    fake_lyrics.synced_lyrics = '[00:01.00]synced from api'
+    fake_lyrics.plain_lyrics = None
+    client.api = MagicMock()
+    client.api.get_lyrics.return_value = None
+    client.api.search_lyrics.return_value = [fake_lyrics]
+    client._embed_lyrics = MagicMock()
+
+    result = client.create_lrc_file(
+        audio_file_path=fake_audio_file,
+        track_name='T', artist_name='A',
+    )
+
+    assert result is True
+    client.api.search_lyrics.assert_called_once()
+    # Sidecar was created
+    lrc = os.path.splitext(fake_audio_file)[0] + '.lrc'
+    assert os.path.exists(lrc)
+    # And USLT was embedded
+    client._embed_lyrics.assert_called_once()
+
+
+# ──────────────────────────────────────────────────────────────────────
+# RetagDeps integration — generate_lrc_file is now wired
+# ──────────────────────────────────────────────────────────────────────
+
+def test_retagdeps_accepts_generate_lrc_file_field():
+    from core.library.retag import RetagDeps
+
+    # Mock the required + optional deps with do-nothing callables
+    deps = RetagDeps(
+        config_manager=MagicMock(),
+        retag_lock=MagicMock(),
+        spotify_client=MagicMock(),
+        get_audio_quality_string=lambda *a: '',
+        enhance_file_metadata=lambda *a: True,
+        build_final_path_for_track=lambda *a: ('', ''),
+        safe_move_file=lambda *a: None,
+        cleanup_empty_directories=lambda *a: None,
+        download_cover_art=lambda *a: None,
+        docker_resolve_path=lambda x: x,
+        _get_retag_state=lambda: {},
+        _set_retag_state=lambda v: None,
+        get_database=lambda: MagicMock(),
+        generate_lrc_file=lambda *a: True,
+    )
+    assert callable(deps.generate_lrc_file)
+
+
+def test_retagdeps_generate_lrc_file_optional_for_backward_compat():
+    """Tests that built RetagDeps without the new field don't break."""
+    from core.library.retag import RetagDeps
+
+    deps = RetagDeps(
+        config_manager=MagicMock(),
+        retag_lock=MagicMock(),
+        spotify_client=MagicMock(),
+        get_audio_quality_string=lambda *a: '',
+        enhance_file_metadata=lambda *a: True,
+        build_final_path_for_track=lambda *a: ('', ''),
+        safe_move_file=lambda *a: None,
+        cleanup_empty_directories=lambda *a: None,
+        download_cover_art=lambda *a: None,
+        docker_resolve_path=lambda x: x,
+        _get_retag_state=lambda: {},
+        _set_retag_state=lambda v: None,
+        get_database=lambda: MagicMock(),
+    )
+    # Field defaults to None — no crash on construction.
+    assert deps.generate_lrc_file is None
--- a/web_server.py
+++ b/web_server.py
@ -15164,6 +15164,8 @@ def _build_retag_deps():
        global retag_state
        retag_state = value

+    from core.metadata.lyrics import generate_lrc_file as _generate_lrc_file
+
    return _library_retag.RetagDeps(
        config_manager=config_manager,
        retag_lock=retag_lock,
@ -15178,6 +15180,7 @@ def _build_retag_deps():
        _get_retag_state=_get_state,
        _set_retag_state=_set_state,
        get_database=_get_db,
+        generate_lrc_file=_generate_lrc_file,
    )


--- a/webui/static/helper.js
+++ b/webui/static/helper.js
@ -3416,6 +3416,7 @@ const WHATS_NEW = {
    '2.5.2': [
        // --- May 13, 2026 — 2.5.2 release ---
        { date: 'May 13, 2026 — 2.5.2 release' },
+        { title: 'Retag No Longer Strips LYRICS Tag Without Rewriting', desc: 'discord report (netti93): retag tool was clearing the LYRICS / USLT tag and never rewriting it, while the download flow correctly embeds lyrics. asymmetry trace: download pipeline (`core/imports/pipeline.py`) calls `enhance_file_metadata` (clears all tags) then `generate_lrc_file` (writes .lrc sidecar + embeds USLT). retag (`core/library/retag.py`) only called the first half — `enhance_file_metadata` cleared USLT and there was no follow-up to restore it. fix 1: retag now calls `generate_lrc_file` after `enhance_file_metadata`, mirroring the download flow. injectable via `RetagDeps.generate_lrc_file` (optional default for backward compat). fix 2: `lyrics_client.create_lrc_file` used to short-circuit when an .lrc/.txt sidecar already existed (the typical retag case — sidecar moved alongside the audio). pre-fix: returned True without re-embedding USLT. post-fix: reads the existing sidecar and re-embeds the USLT tag. download flow unaffected (no sidecar at fetch time → original LRClib path runs). 7 boundary tests pin: existing .lrc triggers re-embed, existing .txt triggers re-embed, empty sidecar skips embed, unreadable sidecar swallows error, no sidecar falls through to LRClib (download path), `RetagDeps.generate_lrc_file` field accepted + optional for backward compat.', page: 'tools' },
        { title: 'Track Number Tag No Longer Writes "6/0" When Album Total Is Unknown', desc: 'discord report (netti93): downloaded album tracks were tagged with `TRCK = "6/0"` instead of `"6/13"` when source data lacked total_tracks. retag tool wrote correct `"6/13"` because `core/tag_writer.py` already handled the case. trace: `core/metadata/enrichment.py:105` formatted unconditionally as `f"{track_number}/{total_tracks}"` and many album-dict construction sites pass `total_tracks: 0` (per `types.py`, 0 means "unknown" — not a real count). that 0 propagated straight to disk. fix at the consumer boundary so every album-dict constructor stays unchanged: lifted to pure helper `core/metadata/track_number_format.py:format_track_number_tag` that drops the `/N` suffix when total is 0 / None / negative — emits just `"6"` instead. matches retag\'s behavior + ID3 spec convention (TRCK can be `"N"` or `"N/M"`). MP4 trkn tuple gets the same treatment via `format_track_number_tuple` returning `(6, 0)` per spec\'s "unknown total" marker. 16 boundary tests pin every shape: known total / zero total / none total / none track / zero track / negative inputs / string coercion / unparseable strings / floats truncate.', page: 'tools' },
        { title: 'AcoustID Scanner: Multi-Candidate Match + Duration Guard + Multi-Value Retag', desc: 'discord report (foxxify) issue #587: scanner produced false-positive "Wrong Song" findings for tracks where AcoustID returned multiple recordings per fingerprint and the top match was a wrong-credited recording (different MB entry sharing the same fingerprint). also: applying an AcoustID match retag stripped multi-value ARTISTS tags. three coordinated fixes per codex diagnosis. fix 1: scanner now iterates ALL AcoustID candidates (not just `recordings[0]`) — if any candidate matches expected title + artist, no finding. lifted to a shared pure helper `core/matching/acoustid_candidates.py:find_matching_recording` so both verifier and scanner use the same logic. fix 2: duration guard catches fingerprint hash collisions (foxxify\'s 17-minute mashup edit fingerprinted to a 5-minute late-70s japanese hiphop track — different songs, same fingerprint hash collision on a sampled section). when file duration vs AcoustID candidate duration differs by more than max(60s, 35%), the scanner skips the finding. fix 3: scanner retag (`_fix_wrong_song`) was bypassing the user\'s `metadata_enhancement.tags.write_multi_artist` setting because `write_tags_to_file` only wrote single-string TPE1. now extended with optional `artists_list` parameter that, when supplied + setting on, writes the multi-value tag (TXXX:Artists for ID3, `artists` key for vorbis/opus/flac, list-form `\xa9ART` for mp4) — exact same behavior as the post-download enrichment pipeline. AcoustID retag derives the per-artist list by splitting AcoustID\'s credit on the same separators (comma / ampersand / feat. / ft. / etc) the matching layer already uses. AcoustID version-mismatch gate left intact (still correctly catches genuinely-wrong files). 15 tests on the candidate helper + duration guard, 13 tests on the multi-value tag write path, 4 new scanner regression tests pinning every shape: lower-ranked candidate match suppression, no-suppression when no candidate matches, duration mismatch skip, no-skip when duration matches.', page: 'tools' },
        { title: 'Cross-Script Artist Aliases: Cyrillic / Kanji Canonical Names Now Bridge', desc: 'github issue #586 (follow-up to #442): "Dmitry Yablonsky" tracks were quarantining as audio mismatch — file identified as "Русская филармония, Дмитрий Яблонский" (4% artist similarity) — even though the Cyrillic spelling is just the russian transliteration. three layered bugs in the alias resolution chain. fix 1: `fetch_artist_aliases` only read `data.aliases` and IGNORED the artist record\'s canonical `name` and `sort-name`. for artists where the canonical form is the cross-script spelling (e.g. MB stores "Дмитрий Яблонский" as canonical, "Dmitry Yablonsky" as alias — or vice versa), the missing direction never made it into the alias list. now both canonical name + sort-name are included alongside the explicit aliases (deduped). fix 2: `lookup_artist_aliases` ran search in strict mode only (`artist:"..."` lucene query), which skips MB\'s alias and sortname indexes. cross-script searches found nothing under strict. now falls back to non-strict (bare query, hits all indexes) when strict returns empty OR all results fail the trust gate. fix 3: trust gate weighted local similarity 70% — cross-script pairs have similarity ~0 → combined score ~0.30 → below the 0.85 threshold → cached as empty even when MB\'s own confidence was 100. new escape: when MB score is ≥ 95 AND the result is unambiguous (top result clearly leads), accept regardless of local similarity. covers cases where MB definitely knows the right artist but our local sim collapses to zero. existing #442 same-script path (Hiroyuki Sawano ↔ 澤野弘之) still passes via combined-score path. 12 new tests pin every layer + the exact reporter scenario end-to-end via `artist_names_match`. existing alias tests updated to reflect canonical-name inclusion + 2-call strict+non-strict pattern.', page: 'downloads' },