mirror of https://github.com/Nezreka/SoulSync.git
User report: all 6 staging candidates failing with "Could not match
tracks to album tracklist" despite identification correctly resolving
each album. 18 properly-tagged Chris Brown F.A.M.E. tracks, 21
properly-tagged Mr. Morale tracks, etc. — every match attempt
rejected by the duration sanity gate.
Root cause: I had Deezer in `_SECONDS_DURATION_SOURCES`, assuming
Deezer's `duration` field was raw seconds (which the API returns).
But `DeezerClient.get_album_tracks` already converts seconds → ms
INTERNALLY (`'duration_ms': item.get('duration', 0) * 1000`) before
the value reaches the matcher. My helper saw `source='deezer'` →
multiplied by 1000 again → 255000 ms became 255,000,000 ms (70 hours).
Every track-file pair failed the gate by a factor of 1000×.
Diagnostic chain that got me there:
1. Added `[Album Matching] No matches: X files, Y tracks, Z
duration-rejected, W below threshold` summary log so future "0
matches" reports surface the rejection reason.
2. Fixed the helper's logger from `logging.getLogger(__name__)` (which
resolves outside the soulsync handler tree → invisible in app.log)
to `get_logger("imports.album_matching")` (under the namespace the
file handler watches).
3. Added per-rejection-type diagnostic showing actual file vs track
duration values + raw track keys + source.
That third diagnostic surfaced `track 'United In Grief' resolved=255000000
(raw duration_ms=255000, raw duration=None, source='deezer')` —
making the bug obvious.
Fixes:
- Moved Deezer from `_SECONDS_DURATION_SOURCES` to
`_MS_DURATION_SOURCES`. Comment documents WHY (the client converts
before returning) so a future reader doesn't "fix" the
classification back the wrong way.
- Bumped `DURATION_TOLERANCE_MS` from 3000 → 10000 (3s → 10s) to
match Picard ~7s / Beets ~10-15s / Plex ~10s industry baselines.
3s was a defensive copy of the post-download integrity check
threshold but that's a different problem (catching truncated
downloads, not identifying recordings across remasters/encodings).
- `_track_duration_ms` magnitude heuristic kept as fallback for
unknown / missing source (mocked test data without `source` field).
- Added `Match aborted` warnings at the three earlier silent return
points in `_match_tracks` (no client, no album_data, no tracks)
so future "Could not match" reports show WHICH step bailed.
- Added per-run diagnostic in `match_files_to_tracks` that logs the
first duration rejection's actual values — surfaces unit mismatches
+ drift problems without spamming N×M lines per run.
Test changes:
- `test_deezer_seconds_duration_converted_to_ms` renamed +
rewritten as `test_deezer_already_normalised_to_ms_by_client`
to pin the actual contract (matcher receives ms from the Deezer
client, takes as-is).
- `test_track_duration_source_aware_dispatch` updated — Deezer test
case now uses ms input + expects ms output.
- New `test_raw_deezer_seconds_falls_back_to_magnitude_heuristic`
pins the rare edge case where raw Deezer items WITHOUT `source`
reach the matcher (no client conversion path) — heuristic catches
it.
Verification:
- 179 import tests pass after changes
- Live test: all 6 user staging candidates now matching at 95-100%
confidence
- Multi-disc Mr. Morale lands with proper Disc 1 / Disc 2 / Disc 3
folder structure
- Picard-tagged libraries hit MBID fast paths (verified earlier)
- Tracks process in parallel via the existing scan-now thread spawn
(next commit refactors this to a proper bounded executor)
pull/536/head
parent
a478747a89
commit
e11786ee40
Loading…
Reference in new issue