Fix: ReplayGain wrote same +52 dB gain to every track

User report: every downloaded track in an album came out with ``replaygain_track_gain: +52.00 dB`` regardless of actual loudness. Root cause: the parser at ``core/replaygain.py:79`` used ``re.search('I:\s+...')`` which returns the FIRST match. ffmpeg's ebur128 filter emits ``I:`` per measurement window (running partial integrated loudness) AND in a final Summary block. The first per-window reading is at t=0.5s — almost always ~-70 LUFS because nearly every track starts with silence / encoder padding. So: gain = RG2_reference - lufs = -18 - (-70) = +52.00 dB …on EVERY track. Same regex pattern, same first per-window match, same +52 dB written to every file's REPLAYGAIN_TRACK_GAIN tag. Verified by running ffmpeg ebur128 against a real generated FLAC and inspecting the stderr output — first per-window line at t=0.5s shows ``I: -70.0 LUFS`` (silent intro), and the Summary block at the end shows the real integrated value (e.g. ``I: -27.8 LUFS`` for the test sine wave). Old code captured the -70.0 reading. Fix: anchor LUFS parsing to the ``Summary:`` block via ``stderr.rfind('Summary:')``. The Summary block is always emitted last and contains the authoritative final integrated loudness. Peak parsing already worked correctly (per-window output uses ``TPK:``/``FTPK:`` labels; only the Summary uses ``Peak:``), but applied the same Summary anchor for consistency. Defensive fallback: if no Summary block is present (truncated output / unusual ffmpeg version), use the LAST per-window reading instead of the first. Still better than the buggy first-window behavior. Smoke verified end-to-end: a freshly-generated FLAC of a -24 dBFS sine wave now reports LUFS=-27.80, gain=+9.80 dB (correct, was +52.00 before fix). Tests: ``tests/test_replaygain_summary_parse.py`` — 7 cases pinning the parser behavior with realistic ffmpeg ebur128 stderr samples: - Summary value parsed correctly even when first per-window is -70 - Resulting gain is realistic (NOT +52) - Two tracks with same first per-window but different summaries get different LUFS (regression assertion for "all tracks same gain") - Per-window reading higher than Summary doesn't leak through - Fallback to last per-window when Summary absent - Clean RuntimeError raised when no LUFS values anywhere - Peak still correctly anchored to Summary Verified: full suite 1800 pass (7 new), ruff clean. WHATS_NEW entry under '2.4.2' dev cycle.
3 weeks ago · 776d195f71
parent fbf4bad47a
commit 776d195f71
3 changed files with 236 additions and 7 deletions
--- a/core/replaygain.py
+++ b/core/replaygain.py
@ -75,18 +75,48 @@ def analyze_track(file_path: str) -> Tuple[float, float]:

    stderr = result.stderr

-    # Parse integrated loudness: "    I:         -18.3 LUFS"
-    lufs_match = re.search(r'I:\s+([-\d.]+)\s+LUFS', stderr)
-    # Parse true peak: "    Peak:\s+([-\d.]+) dBFS" (may appear per-channel; take max)
-    peak_matches = re.findall(r'Peak:\s+([-\d.]+)\s+dBFS', stderr)
-
-    if not lufs_match:
+    # ebur128 emits two kinds of output:
+    #   (a) Per-window progress lines like
+    #       "[Parsed_ebur128_0 @ ...] t: 0.5 ... I: -70.0 LUFS ..."
+    #       — the "I:" here is the PARTIAL integrated loudness up to that
+    #       window. The very first window of nearly any track is silent
+    #       (intro fade-in / encoder padding) and reads ~-70 LUFS.
+    #   (b) A final "Summary:" block like:
+    #         Summary:
+    #           Integrated loudness:
+    #             I:         -14.3 LUFS
+    #           ...
+    #           True peak:
+    #             Peak:        -0.4 dBFS
+    #
+    # The OLD code used ``re.search('I:\s+...')`` which returned the FIRST
+    # match — i.e. the first per-window partial reading. For nearly every
+    # track that came back as ~-70 LUFS, producing gain = -18 - (-70) =
+    # +52.00 dB on EVERY track. (User report: "all tracks seem to get
+    # the same track_gain ... +52.00 dB".)
+    #
+    # Fix: anchor parsing to the Summary block so we always read the
+    # final integrated value, never a per-window partial.
+    summary_idx = stderr.rfind('Summary:')
+    if summary_idx >= 0:
+        summary_block = stderr[summary_idx:]
+        lufs_values = re.findall(r'I:\s+([-\d.]+)\s+LUFS', summary_block)
+        peak_matches = re.findall(r'Peak:\s+([-\d.]+)\s+dBFS', summary_block)
+    else:
+        # No Summary block in the output (truncated / unexpected ffmpeg
+        # version). Defensive fallback to the LAST per-window reading,
+        # which is at least closer to the final integrated value than
+        # the first window (which is always silence).
+        lufs_values = re.findall(r'I:\s+([-\d.]+)\s+LUFS', stderr)
+        peak_matches = re.findall(r'Peak:\s+([-\d.]+)\s+dBFS', stderr)
+
+    if not lufs_values:
        raise RuntimeError(
            f"Could not parse ebur128 output for '{file_path}'. "
            f"FFmpeg exit code: {result.returncode}"
        )

-    integrated_lufs = float(lufs_match.group(1))
+    integrated_lufs = float(lufs_values[-1])

    if peak_matches:
        true_peak_dbfs = max(float(v) for v in peak_matches)
--- a/tests/test_replaygain_summary_parse.py
+++ b/tests/test_replaygain_summary_parse.py
@ -0,0 +1,198 @@
+"""Pin the ReplayGain analysis fix.
+
+User report: every track in a downloaded album got the same
+``replaygain_track_gain`` of ``+52.00 dB`` after post-processing.
+Smoking gun: ``-18 (RG2 reference) - (-70.0) = +52.00``. Every track's
+first ebur128 measurement window reads ~-70 LUFS because the first
+window covers the silent intro / encoder padding.
+
+The old code used ``re.search('I:\\s+...')`` which returns the FIRST
+match — capturing that initial -70 LUFS reading instead of the final
+integrated value from the Summary block.
+
+These tests use representative ffmpeg ebur128 output (per-window
+progress + final Summary block) to pin: parser anchors to the
+Summary, ignores per-window partials, and falls back gracefully when
+Summary is absent.
+"""
+
+from __future__ import annotations
+
+import re
+import subprocess
+from unittest.mock import patch
+
+import pytest
+
+from core.replaygain import analyze_track
+
+
+# ---------------------------------------------------------------------------
+# Fabricated ebur128 stderr samples
+# ---------------------------------------------------------------------------
+
+_REAL_EBUR128_STDERR = """
+[Parsed_ebur128_0 @ 0x000001] Summary:
+[Parsed_ebur128_0 @ 0x000001] t: 0.500000   TARGET:-23 LUFS    M: -70.0 S:-70.0     I: -70.0 LUFS       LRA:   0.0 LU  FTPK: -70.0 dBFS  TPK: -70.0 dBFS
+[Parsed_ebur128_0 @ 0x000001] t: 1.000000   TARGET:-23 LUFS    M: -50.0 S:-60.0     I: -60.0 LUFS       LRA:   0.0 LU  FTPK: -50.0 dBFS  TPK: -50.0 dBFS
+[Parsed_ebur128_0 @ 0x000001] t: 1.500000   TARGET:-23 LUFS    M: -20.0 S:-30.0     I: -25.0 LUFS       LRA:   0.0 LU  FTPK: -2.5 dBFS   TPK: -2.5 dBFS
+[Parsed_ebur128_0 @ 0x000001] t: 2.000000   TARGET:-23 LUFS    M: -18.0 S:-20.0     I: -14.5 LUFS       LRA:   1.5 LU  FTPK: -0.4 dBFS   TPK: -0.4 dBFS
+[Parsed_ebur128_0 @ 0x000001] Summary:
+
+  Integrated loudness:
+    I:         -14.3 LUFS
+    Threshold: -24.3 LUFS
+
+  Loudness range:
+    LRA:         3.2 LU
+    Threshold: -34.3 LUFS
+    LRA low:   -16.5 LUFS
+    LRA high:  -13.3 LUFS
+
+  True peak:
+    Peak:       -0.4 dBFS
+[out#0/null @ 0x000002] video:0KiB audio:172KiB
+"""
+
+
+def _stub_ffmpeg(stderr_output: str):
+    """Patch subprocess.run to return a fake ffmpeg result with the
+    given stderr."""
+    class _FakeResult:
+        def __init__(self):
+            self.stderr = stderr_output
+            self.returncode = 0
+    return patch.object(subprocess, 'run', return_value=_FakeResult())
+
+
+# ---------------------------------------------------------------------------
+# Headline regression: don't grab the first per-window reading
+# ---------------------------------------------------------------------------
+
+
+def test_parses_summary_lufs_not_first_per_window_reading():
+    """The per-window stream contains 'I: -70.0 LUFS' (silent intro)
+    BEFORE the Summary block's 'I: -14.3 LUFS'. Parser must return
+    -14.3 (summary), NOT -70.0 (first per-window).
+
+    This is the exact bug from the user's +52.00 dB report:
+    -18 RG2 reference - (-70.0) = +52.00 was the symptom."""
+    with _stub_ffmpeg(_REAL_EBUR128_STDERR):
+        lufs, peak = analyze_track('/fake/path.flac')
+
+    assert lufs == pytest.approx(-14.3, abs=0.01)
+    assert peak == pytest.approx(-0.4, abs=0.01)
+
+
+def test_resulting_gain_is_realistic_not_plus_52():
+    """Computed gain must be a normal real-world value (a few dB
+    range), NOT the symptomatic +52.00 dB the bug produced."""
+    with _stub_ffmpeg(_REAL_EBUR128_STDERR):
+        lufs, _peak = analyze_track('/fake/path.flac')
+    gain = -18.0 - lufs  # RG2 reference
+    assert -10.0 < gain < 10.0, f"Unrealistic gain {gain:+.2f} dB — bug regression"
+
+
+# ---------------------------------------------------------------------------
+# Different per-track values stay different
+# ---------------------------------------------------------------------------
+
+
+def _make_stderr(per_window_lufs: list[float], summary_lufs: float, summary_peak: float) -> str:
+    """Build an ebur128 stderr blob with controllable per-window and
+    summary values. Lets each test verify the summary is what gets
+    used regardless of what's in the per-window stream."""
+    per_window = '\n'.join(
+        f"[Parsed_ebur128_0 @ 0x1] t: {(i + 1) * 0.5:.6f}   TARGET:-23 LUFS    "
+        f"M: {lufs:.1f} S:{lufs:.1f}     I: {lufs:.1f} LUFS       LRA:   0.0 LU  "
+        f"FTPK: {lufs / 2:.1f} dBFS  TPK: {lufs / 2:.1f} dBFS"
+        for i, lufs in enumerate(per_window_lufs)
+    )
+    return f"""{per_window}
+[Parsed_ebur128_0 @ 0x1] Summary:
+
+  Integrated loudness:
+    I:         {summary_lufs:.1f} LUFS
+    Threshold: -24.0 LUFS
+
+  Loudness range:
+    LRA:         3.2 LU
+
+  True peak:
+    Peak:       {summary_peak:+.1f} dBFS
+"""
+
+
+def test_two_tracks_with_different_summaries_get_different_lufs():
+    """Two simulated tracks with the SAME first per-window value (-70)
+    but DIFFERENT summary integrated loudness values. Old buggy parser
+    would report -70 for both. Fixed parser correctly reports the
+    distinct summary values."""
+    track_a_stderr = _make_stderr([-70.0, -50.0, -20.0], summary_lufs=-14.3, summary_peak=-0.4)
+    track_b_stderr = _make_stderr([-70.0, -45.0, -10.0], summary_lufs=-7.8, summary_peak=-1.2)
+
+    with _stub_ffmpeg(track_a_stderr):
+        lufs_a, _ = analyze_track('/fake/a.flac')
+    with _stub_ffmpeg(track_b_stderr):
+        lufs_b, _ = analyze_track('/fake/b.flac')
+
+    assert lufs_a != lufs_b
+    assert lufs_a == pytest.approx(-14.3, abs=0.01)
+    assert lufs_b == pytest.approx(-7.8, abs=0.01)
+
+
+def test_per_window_lufs_with_higher_value_doesnt_leak_into_summary():
+    """If per-window readings include a value HIGHER than the summary
+    (a transient loud window), the parser must still return the
+    summary value, not the loudest per-window."""
+    stderr = _make_stderr([-70.0, -5.0, -3.0], summary_lufs=-12.0, summary_peak=-0.5)
+    with _stub_ffmpeg(stderr):
+        lufs, _ = analyze_track('/fake/loud.flac')
+    assert lufs == pytest.approx(-12.0, abs=0.01)
+
+
+# ---------------------------------------------------------------------------
+# Defensive fallback when Summary block is absent
+# ---------------------------------------------------------------------------
+
+
+def test_falls_back_to_last_per_window_when_no_summary():
+    """Some ffmpeg versions or truncated outputs may lack a Summary
+    block. Defensive fallback uses the LAST per-window reading (still
+    closer to the final integrated value than the first)."""
+    stderr = """
+[Parsed_ebur128_0 @ 0x1] t: 0.5   I: -70.0 LUFS  LRA: 0.0 LU
+[Parsed_ebur128_0 @ 0x1] t: 1.0   I: -25.0 LUFS  LRA: 0.0 LU
+[Parsed_ebur128_0 @ 0x1] t: 1.5   I: -14.5 LUFS  LRA: 1.5 LU
+""".strip()
+
+    with _stub_ffmpeg(stderr):
+        lufs, _peak = analyze_track('/fake/no_summary.flac')
+
+    # Last per-window reading, NOT the first
+    assert lufs == pytest.approx(-14.5, abs=0.01)
+
+
+def test_raises_when_no_lufs_anywhere():
+    """If ffmpeg output contains no LUFS values at all (parse failure
+    / wrong format), surface a clear RuntimeError so the caller can
+    decide whether to skip RG analysis."""
+    stderr = "ffmpeg: garbled output, no LUFS data anywhere\n"
+    with _stub_ffmpeg(stderr):
+        with pytest.raises(RuntimeError, match='Could not parse'):
+            analyze_track('/fake/garbage.flac')
+
+
+# ---------------------------------------------------------------------------
+# Peak parsing — ensure it stays anchored to summary too
+# ---------------------------------------------------------------------------
+
+
+def test_peak_uses_summary_value_not_per_window_max():
+    """Per-window output uses 'TPK:'/'FTPK:' labels; only the summary
+    uses 'Peak:'. Pin that the parser only catches the summary peak
+    even when per-window TPK values would be larger."""
+    stderr = _make_stderr([-70.0, -45.0, -10.0], summary_lufs=-12.0, summary_peak=-0.4)
+    with _stub_ffmpeg(stderr):
+        _lufs, peak = analyze_track('/fake/peak.flac')
+    assert peak == pytest.approx(-0.4, abs=0.01)
--- a/webui/static/helper.js
+++ b/webui/static/helper.js
@ -3444,6 +3444,7 @@ const WHATS_NEW = {
    '2.4.2': [
        // --- post-2.4.1 dev work — entries hidden by _getLatestWhatsNewVersion until the build version bumps ---
        { date: 'Unreleased — 2.4.2 dev cycle' },
+        { title: 'Fix: ReplayGain Wrote Same +52 dB Gain to Every Track', desc: 'noticed every downloaded track came out with `replaygain_track_gain: +52.00 dB` regardless of actual loudness. cause: parser used `re.search` which returned the FIRST `I:` (integrated loudness) reading from ffmpeg\'s ebur128 output. that\'s the per-window measurement at t=0.5s — almost always ~-70 LUFS because tracks start with silence/encoder padding. -18 (RG2 reference) - (-70) = +52 dB on every track. fix: parser now anchors to the `Summary:` block at the end of ffmpeg\'s output and reads the actual integrated loudness from there, not the silent-intro partial. defensive fallback uses the LAST per-window reading if Summary is missing (still better than the first). gains now reflect real per-track loudness.', page: 'downloads' },
        { title: 'Fix: Tracks Showed Completed When File Was Quarantined', desc: 'caught downloading kendrick mr morale: three tracks (rich interlude, savior interlude, savior) showed ✅ completed in the modal but were missing on disk. two layered bugs. (1) the post-process verification wrapper had a fallback that assumed success when no `_final_processed_path` was in context — but integrity-rejected files (which get quarantined instead of moved) leave that path unset, so the wrapper marked them complete. now wrapper explicitly checks `_integrity_failure_msg` and `_race_guard_failed` markers before the assume-success fallback. failed integrity = task marked failed, batch tracker notified with success=false. (2) acoustid skip-logic was too lenient — when fingerprint confidence was very high and either title OR artist matched a bit, it skipped verification with reason "likely same song in different language/script." that fired for english-vs-english by the same artist with the word "interlude" in both — same artist + 0.55 title sim = skip = wrong file accepted. tightened: skip now requires non-ASCII chars present (real language/script case) AND artist match, OR very high title similarity (≥0.80) AND artist match. english-vs-english with very different titles by same artist no longer skipped — verification correctly returns FAIL and the wrong file gets quarantined.', page: 'downloads' },
        { title: 'Stop Navidrome From Splitting Albums Over Inconsistent MBIDs', desc: 'discord report (samuel [KC]): tracks of the same album sometimes carry different MUSICBRAINZ_ALBUMID tags, which causes navidrome to split the album into multiple entries. two-part fix: (1) the MBID Mismatch Detector now does a second scan that groups tracks by db album, finds the consensus (most-common) album mbid, and flags dissenters — fix action rewrites the dissenter\'s tag to match. catches existing inconsistencies in your library. (2) root cause: per-track musicbrainz release lookups went through an in-memory cache that\'s capped at 4096 entries and dies on server restart, so big libraries / restarts could resolve different release ids for tracks of the same album. added a persistent sqlite-backed cache so a release mbid resolved ONCE for an album applies to every future track of that album for the install\'s lifetime. strictly additive: any failure in the persistent layer falls through to the live musicbrainz lookup exactly as before.', page: 'library' },
        { title: 'Lidarr: Right Track Lands on Disk + Profile Lookup Stops Failing', desc: 'lidarr is an album-grabber — when you ask for one track it grabs the whole album, then we pick the wanted track out. old code blindly took the first imported file as the result, so any track you asked for got mistagged as track 1 of the album. now matches the wanted title against lidarr\'s track list (with punctuation-tolerant fuzzy compare) and copies only that file. also fixed a hardcoded `metadataProfileId=1` that broke artist-add on installs where someone had renamed/recreated profiles, and a polling-loop bug where the inner break never escaped the outer poll loop so completion detection was delayed. settings tooltip updated to be honest: lidarr is best for full-album grabs and effectively a no-op for playlist sync (track searches return nothing useful, hybrid mode falls through to your other sources).', page: 'settings' },