mirror of https://github.com/Nezreka/SoulSync.git
Cin-pass on the MBID/ISRC fast-paths + duration-gate work.
Three small but real gaps closed.
Gap 1 — Real-file tag reader integration test
(tests/imports/test_auto_import_tag_reader_real_files.py, 6 tests):
The matcher unit tests use dict fixtures, which prove the algorithm
handles the right shapes once tags are read. They DON'T prove the tag
reader itself extracts the right values from real files. Mutagen's
easy-mode key normalisation (across FLAC / MP3 / M4A) is the exact
spot a future mutagen version could silently drift and break the
fast paths in production while every unit test stays green.
These tests write real FLAC files via mutagen (using the same
`_make_minimal_flac` pattern from `test_album_mbid_consistency.py`)
and assert `_read_file_tags` extracts:
- Picard's `MUSICBRAINZ_TRACKID` (lowercase normalisation in reader)
- `ISRC` (uppercase normalisation in reader; matcher strips
formatting at compare time)
- "track/total" parsing (TRACKNUMBER='5/12' → 5)
- Duration via `audio.info.length` from synthesised STREAMINFO
- Graceful empty-default return for tagless files
- Graceful empty-default return for invalid audio (not a crash)
Acknowledged gap (carried forward): MP3 + M4A integration coverage
not added — mutagen docs say easy-mode normalisation is identical
across all three formats, but only FLAC is pinned here. Followup
candidate.
Gap 2 — Source-aware duration dispatch
(core/imports/album_matching.py, 4 tests in test_album_matching_exact_id.py):
The previous `_track_duration_ms` helper used a magnitude heuristic
("anything below 30000 is seconds, convert × 1000") to decide
whether a track's duration was in seconds or ms. That worked for
typical tracks but had a real edge case: an actual sub-30-second
Spotify track (intros, interludes, skits) would be detected as
seconds and converted to 8.5 hours, breaking the duration sanity
gate.
Replaced with deterministic source-aware dispatch:
- Spotify / iTunes / Qobuz / HiFi / Hydrabase → ms (canonical)
- Deezer / Discogs / MusicBrainz → seconds, × 1000
- Tidal classified as ms (album-tracks endpoint convention; flagged
in code comment as needing real-world verification — defensive
if wrong)
- Magnitude heuristic kept as fallback for unknown / missing source
(mocked test data without source field)
Tests pin all four paths: confirmed-ms source, confirmed-seconds
source, unknown source falls back to heuristic, and the regression
case (sub-30s real track on a known-ms source — must not be
× 1000-converted).
Gap 3 — Cross-disc consolation rationale
(tests/imports/test_album_matching_helper.py, 1 test):
The `CROSS_DISC_POSITION_WEIGHT = 0.05` magic number had no test
proving it was load-bearing. Anyone could have set it to 0 thinking
"strict matching is better" without realising it would silently
break a real scenario.
New test (`test_cross_disc_consolation_is_load_bearing_for_imperfect_titles`)
constructs the exact case the consolation exists for: file has the
right title spelling but the metadata source returns a slightly-
different version (e.g. "Auntie Diaries" file vs "Auntie Diaries
(Remix)" track), AND the file's disc tag is wrong while the track
number agrees. Title sim ~0.78 × 0.45 = ~0.35 (below
MATCH_THRESHOLD 0.4). Without the 5% consolation → file goes
unmatched. With it → ~0.40, just clears.
The test doesn't justify "why 0.05 specifically" — that's still a
tuned knob, not a measured value. But it forces a deliberate
decision if someone wants to drop it: failing this test gives them
the "you broke imperfect-title cross-disc matching" message
explicitly.
Verification:
- 10 new tests across 3 files, all pass
- 35 album-matching tests total now (including pre-existing 17 +
18 fast-path)
- Full suite: 2321 passed, 1 pre-existing flaky timing test
(`test_watchdog_warns_about_stuck_workers` — passes in isolation,
fails only in full-suite runs, unrelated to this PR)
- Ruff clean
- All changes still scoped to import flow — download flow byte-
identical (verified by grep on every changed file)
pull/536/head
parent
3246490800
commit
f2cd95e0f1
Loading…
Reference in new issue