mirror of https://github.com/Nezreka/SoulSync.git
Cin-pass on the #524 + multi-disc fixes. Pre-merge polish. Lifts: `core/imports/album_matching.py` `AutoImportWorker._match_tracks` was a 100+-line method buried in a 1400-line class. Testing it required monkey-patching `_read_file_tags` + mocking the metadata client just to exercise the matching algorithm. Per Cin's "lift logic out of monolithic classes" pattern (same shape as the album-info builders / discography / quality scanner lifts), moved the dedup + scoring into `core/imports/album_matching.py` as pure functions over already-fetched data. Helper exposes: - Constants for every match weight (TITLE_WEIGHT, ARTIST_WEIGHT, POSITION_WEIGHT, NEAR_POSITION_WEIGHT, CROSS_DISC_POSITION_WEIGHT, ALBUM_WEIGHT, MATCH_THRESHOLD). Magic numbers killed. - `dedupe_files_by_position(audio_files, file_tags, *, quality_rank)` — position-keyed quality dedup. - `score_file_against_track(file_path, file_tags, track, *, target_album, similarity)` — pure per-(file, track) scorer. - `match_files_to_tracks(audio_files, file_tags, tracks, *, target_album, similarity, quality_rank)` — full matching with greedy best-per-track + first-come-first-serve over deduped files. Worker shrinks from 100 lines of inline algorithm to 8 lines that fetch tags + delegate to the helper. Tests added (26 new across 3 files): `tests/imports/test_album_matching_helper.py` (19 tests): - Constants pin: weights sum to 1.0, threshold above position-only - `dedupe_files_by_position`: quality wins, cross-disc preserved, tag-less files passed through, first-wins on equal quality - `score_file_against_track`: perfect-agreement = 1.0, position needs both disc+track, near-position only same-disc, missing artist tags handled, disc field aliases (Spotify/Deezer/iTunes), filename fallback when title tag missing - `match_files_to_tracks`: happy path, file used at-most-once, below-threshold left unmatched - Edge case Cin would flag: tag-less file with strong filename title matches multi-disc album track via title alone (perfect-name scenario works); tag-less file with weak filename title against multi-disc API correctly stays unmatched (the behavior delta from the disc-aware fix — pinned so future readers see it's intentional) `tests/test_import_album_match_endpoint.py` (3 tests): - Backend warning fires when source missing from match POST - No warning fires on the legit path (catches noisy-warning regression) - Endpoint actually forwards source/name/artist to the payload builder (catches "logging the right warning but doing the wrong lookup" regression) `tests/test_import_page_album_lookup_pattern.py` (4 tests): - Source-text guard for the import-page #524 fix in stats-automations.js. Until the file is modularized enough for a behavioral JS test (under the existing tests/static/*.mjs pattern), regex-based assertions pin: the `_albumLookup` field exists, the click handler reads from it, both card renderers populate it before emitting onclick, and the cache stores `source` per entry. Caveat documented in the test module docstring. Verification: - All 26 new tests pass. - Existing multi-disc tests (test_auto_import_multi_disc_matching.py) still pass after the lift — proves the helper is behavior-equivalent to the inline implementation it replaced. - Full suite: 2293 passed, 1 flaky-timing failure (test_library_reorganize_orchestrator.py::test_watchdog_warns_about_stuck_workers — passes in isolation, fails only in full-suite runs, pre-existing, unrelated to this PR). - Ruff clean. Notes for the reviewer: - The frontend stats-automations.js JS test is structural-only. Behavioral JS testing for that file requires modularizing the ~7k-line monolith first — out of scope for this fix. - The cross-disc 5% consolation bonus is a small behavior change for users with weak/missing tag info on multi-disc albums. Pinned explicitly in `test_tagless_file_with_weak_title_unmatched_in_multidisc` so the trade-off is visible: correct multi-disc matching wins over optimistic position-only matching that produced wrong-disc files.pull/536/head
parent
c03edc3cb4
commit
f9f74ac511
@ -0,0 +1,118 @@
|
||||
"""Pin the ``/api/import/album/match`` endpoint's source-routing
|
||||
behavior — github issue #524 regression guard.
|
||||
|
||||
The bug: clicking an album in the import page POSTed only ``album_id``,
|
||||
dropping the ``source`` field that the backend needs to route the
|
||||
lookup to the correct metadata client. The backend silently fell back
|
||||
to its primary-source-priority chain, which fails for cross-source
|
||||
album_ids (Deezer numeric id vs Spotify primary, etc.) → broken
|
||||
fallback dict written to the library DB.
|
||||
|
||||
The frontend fix populates source on every match POST. These tests
|
||||
pin the BACKEND defense: when source is dropped (curl, third-party,
|
||||
regression in another caller), a clear warning lands in the logs so
|
||||
the regression is grep-able instead of silent.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
from unittest.mock import patch
|
||||
|
||||
import pytest
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def import_match_client(monkeypatch):
|
||||
"""Flask test client, with the album-match payload builder mocked
|
||||
so we don't have to spin up real metadata clients."""
|
||||
with patch("web_server.add_activity_item"):
|
||||
with patch("web_server.SpotifyClient"):
|
||||
with patch("core.tidal_client.TidalClient"):
|
||||
from web_server import app as flask_app
|
||||
flask_app.config['TESTING'] = True
|
||||
yield flask_app.test_client()
|
||||
|
||||
|
||||
def test_missing_source_logs_warning(import_match_client, caplog):
|
||||
"""When the match POST omits source, backend logs a clear warning
|
||||
so the regression is visible in app.log even though the request
|
||||
still proceeds (best-effort lookup via primary-source priority).
|
||||
"""
|
||||
fake_payload = {'success': True, 'album': {}, 'matches': [], 'unmatched_files': []}
|
||||
with caplog.at_level(logging.WARNING, logger='soulsync'):
|
||||
with patch(
|
||||
'web_server.build_album_import_match_payload',
|
||||
return_value=fake_payload,
|
||||
):
|
||||
resp = import_match_client.post(
|
||||
'/api/import/album/match',
|
||||
json={'album_id': '1234567890'}, # no source
|
||||
)
|
||||
|
||||
assert resp.status_code == 200
|
||||
# The defensive log must mention the missing source AND the album_id
|
||||
# so ops can grep app.log for the offending caller.
|
||||
assert any(
|
||||
"Missing 'source'" in r.message and '1234567890' in r.message
|
||||
for r in caplog.records
|
||||
), (
|
||||
"Expected a warning naming the missing source + album_id. "
|
||||
"Got records: " + repr([r.message for r in caplog.records])
|
||||
)
|
||||
|
||||
|
||||
def test_source_provided_does_not_warn(import_match_client, caplog):
|
||||
"""When source IS provided (the common path), no warning fires.
|
||||
Catches regression where the warning becomes noisy from firing on
|
||||
every legit request."""
|
||||
fake_payload = {'success': True, 'album': {}, 'matches': [], 'unmatched_files': []}
|
||||
with caplog.at_level(logging.WARNING, logger='soulsync'):
|
||||
with patch(
|
||||
'web_server.build_album_import_match_payload',
|
||||
return_value=fake_payload,
|
||||
):
|
||||
resp = import_match_client.post(
|
||||
'/api/import/album/match',
|
||||
json={
|
||||
'album_id': '1234567890',
|
||||
'source': 'deezer',
|
||||
'album_name': 'Test Album',
|
||||
'album_artist': 'Test Artist',
|
||||
},
|
||||
)
|
||||
|
||||
assert resp.status_code == 200
|
||||
missing_source_warnings = [
|
||||
r for r in caplog.records if "Missing 'source'" in r.message
|
||||
]
|
||||
assert not missing_source_warnings, (
|
||||
"When source is supplied, no missing-source warning should fire. "
|
||||
f"Got: {[r.message for r in missing_source_warnings]}"
|
||||
)
|
||||
|
||||
|
||||
def test_source_passed_through_to_payload_builder(import_match_client):
|
||||
"""Verify the endpoint actually forwards source to the underlying
|
||||
payload builder. Without this, we'd be logging the warning correctly
|
||||
but still doing the wrong lookup."""
|
||||
fake_payload = {'success': True, 'album': {}, 'matches': [], 'unmatched_files': []}
|
||||
with patch(
|
||||
'web_server.build_album_import_match_payload',
|
||||
return_value=fake_payload,
|
||||
) as mock_builder:
|
||||
import_match_client.post(
|
||||
'/api/import/album/match',
|
||||
json={
|
||||
'album_id': 'abc123',
|
||||
'source': 'spotify',
|
||||
'album_name': 'X',
|
||||
'album_artist': 'Y',
|
||||
},
|
||||
)
|
||||
|
||||
mock_builder.assert_called_once()
|
||||
call_kwargs = mock_builder.call_args.kwargs
|
||||
assert call_kwargs['source'] == 'spotify'
|
||||
assert call_kwargs['album_name'] == 'X'
|
||||
assert call_kwargs['album_artist'] == 'Y'
|
||||
@ -0,0 +1,112 @@
|
||||
"""Pin the import-page album-lookup cache pattern in
|
||||
``webui/static/stats-automations.js`` — github issue #524 regression
|
||||
guard at the source-text level.
|
||||
|
||||
Why a structural test instead of a behavioral JS test:
|
||||
|
||||
``stats-automations.js`` is a ~7k-line file with a lot of global state
|
||||
+ inline DOM rendering. Loading it into a sandboxed Node `vm` context
|
||||
(the pattern used in `tests/static/test_discover_section_controller.mjs`)
|
||||
would require stubbing dozens of unrelated dependencies. The file
|
||||
needs to be modularized before behavioral tests are practical for
|
||||
arbitrary functions in it.
|
||||
|
||||
Until then, this test fails the suite if the critical pattern from
|
||||
the #524 fix gets removed:
|
||||
|
||||
1. The album cache (``_albumLookup`` field on ``importPageState``)
|
||||
2. Card renderers populating the cache before emitting the onclick
|
||||
3. The match-POST builder reading source/name/artist from the cache
|
||||
|
||||
If anyone deletes the cache, the click handler, or the cache writes,
|
||||
this test catches it before the regression ships.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import re
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
|
||||
_REPO_ROOT = Path(__file__).resolve().parents[1]
|
||||
_SOURCE = _REPO_ROOT / "webui" / "static" / "stats-automations.js"
|
||||
|
||||
|
||||
@pytest.fixture(scope="module")
|
||||
def js_source() -> str:
|
||||
return _SOURCE.read_text(encoding="utf-8")
|
||||
|
||||
|
||||
def test_album_lookup_cache_field_exists_on_state(js_source: str):
|
||||
"""importPageState must have an `_albumLookup` field. Without it,
|
||||
card renderers have nowhere to stash source/name/artist for the
|
||||
click handler to read."""
|
||||
assert "_albumLookup:" in js_source, (
|
||||
"importPageState._albumLookup field missing — the album cache "
|
||||
"that backs the source-routing fix for issue #524 has been "
|
||||
"removed. The click handler will fall back to passing only "
|
||||
"album_id and the backend will silently misroute lookups again."
|
||||
)
|
||||
|
||||
|
||||
def test_select_album_handler_reads_cache(js_source: str):
|
||||
"""importPageSelectAlbum must read source / name / artist from
|
||||
the cache and include them in the match POST body. The whole
|
||||
point of the fix."""
|
||||
# Find the function body
|
||||
match = re.search(
|
||||
r"async function importPageSelectAlbum\([^)]*\) \{(.*?)^\}",
|
||||
js_source, re.DOTALL | re.MULTILINE,
|
||||
)
|
||||
assert match, "importPageSelectAlbum function not found"
|
||||
body = match.group(1)
|
||||
|
||||
# Must read from the lookup cache
|
||||
assert "_albumLookup[" in body, (
|
||||
"importPageSelectAlbum no longer reads from "
|
||||
"importPageState._albumLookup — match POST will drop source "
|
||||
"again, see issue #524."
|
||||
)
|
||||
|
||||
# Must build a matchBody that includes source + album_name + album_artist
|
||||
for required_field in ("source:", "album_name:", "album_artist:"):
|
||||
assert required_field in body, (
|
||||
f"matchBody missing required field {required_field!r}. "
|
||||
"Backend's get_artist_album_tracks needs source to route "
|
||||
"the lookup to the correct metadata client. Without it, "
|
||||
"cross-source album_ids fall through to the failure-fallback "
|
||||
"dict (Unknown Artist / album_id-as-title / 0 tracks). "
|
||||
"See issue #524 for the original symptom."
|
||||
)
|
||||
|
||||
|
||||
def test_card_renderers_populate_cache_before_onclick(js_source: str):
|
||||
"""Both renderers (suggestion card + search-result card) must write
|
||||
to ``_albumLookup`` before emitting the onclick — otherwise the
|
||||
click handler reads an empty cache for newly-displayed albums."""
|
||||
cache_writes = re.findall(
|
||||
r"_albumLookup\[a\.id\]\s*=\s*\{",
|
||||
js_source,
|
||||
)
|
||||
assert len(cache_writes) >= 2, (
|
||||
f"Expected >=2 _albumLookup writes (one per card renderer - "
|
||||
f"suggestions + search results), found {len(cache_writes)}. "
|
||||
"Adding a new card-rendering site without populating the cache "
|
||||
"regresses issue #524 for that path."
|
||||
)
|
||||
|
||||
|
||||
def test_cache_entry_carries_source_field(js_source: str):
|
||||
"""The cache must store `source:` per entry — not just id/name/artist."""
|
||||
write_blocks = re.findall(
|
||||
r"_albumLookup\[a\.id\]\s*=\s*\{[^}]*\}",
|
||||
js_source,
|
||||
)
|
||||
assert write_blocks, "no _albumLookup writes found"
|
||||
assert any("source:" in block for block in write_blocks), (
|
||||
"_albumLookup cache entries must include `source` — that's the "
|
||||
"field the click handler forwards to /api/import/album/match "
|
||||
"to route the lookup to the correct provider."
|
||||
)
|
||||
Loading…
Reference in new issue