fix(downloads): route torrent/usenet through streaming-result validation path

Live-test bug: Spotify-flow downloads with Torrent Only as the
active source produced 'download_failed' for every track. Searches
hit Prowlarr fine but no candidate ever got picked. Root cause was
in core/downloads/validation.py's get_valid_candidates:

- The streaming-source allowlist for the structured-metadata path
  didn't include 'torrent' / 'usenet', so torrent results fell into
  the Soulseek matching branch.
- Soulseek matching parses ``candidate.filename`` as a slskd-style
  ``Artist/Album/Track.flac`` path. Torrent / usenet filenames are
  encoded as ``<download_url>||<display_name>`` so the orchestrator
  can recover the URL — splitting that string on slashes produced
  garbage path segments that never matched the expected artist,
  every candidate failed the artist-folder gate, returned [], track
  status flipped to 'not_found'.

Fixes:
- _streaming_sources now includes 'torrent' and 'usenet'. They take
  the structured-metadata scoring path that reads r.title / r.artist
  directly (the projection layer pre-fills both correctly).
- Artist gate skipped for torrent/usenet, same as YouTube. Album-
  level releases legitimately don't expose per-track artist — the
  projection falls back to the indexer name as the 'artist' field,
  which would otherwise fail the gate against every Spotify artist.
- New album-name fallback scoring: for torrent/usenet only, the
  candidate title is ALSO scored against the wanted track's
  spotify_track.album field, and the max of (track-title score,
  album-title score) wins. This makes a candidate titled
  "GNX (2024) [FLAC]" match every track on the GNX album rather
  than scoring near zero against a specific track title like
  "Luther (with SZA)". match_type 'album_release' for visibility.

All 9 existing validation tests still pass.
pull/671/head
Broque Thomas 6 days ago
parent e83b661471
commit a2db5382bb

@ -97,8 +97,12 @@ def get_valid_candidates(results, spotify_track, query):
return []
# Streaming sources (YouTube, Tidal, Qobuz, HiFi, Deezer, SoundCloud) return structured API results
# with proper artist/title metadata — score using the same matching engine as Soulseek
_streaming_sources = ("youtube", "tidal", "qobuz", "hifi", "deezer_dl", "soundcloud", "amazon")
# with proper artist/title metadata — score using the same matching engine as Soulseek.
# Torrent / usenet results also belong here: their filename field is a download URL, not
# a slskd-style ``Artist/Album/Track.flac`` path, so the Soulseek matcher would extract
# garbage segments from it. Routing them through the streaming path means score_track_match
# reads ``r.title`` and ``r.artist`` directly (which the torrent/usenet projections pre-fill).
_streaming_sources = ("youtube", "tidal", "qobuz", "hifi", "deezer_dl", "soundcloud", "amazon", "torrent", "usenet")
if results[0].username in _streaming_sources:
source_label = results[0].username.replace('_dl', '').title()
expected_artists = spotify_track.artists if spotify_track else []
@ -142,6 +146,27 @@ def get_valid_candidates(results, spotify_track, query):
candidate_duration_ms=r.duration or 0,
)
# Torrent / usenet results are typically release-level (album torrents).
# Looking for "Luther (with SZA)" against a candidate titled
# "GNX (2024) [FLAC]" scores ~0 on track-title alone, even though
# the album torrent does in fact contain the wanted track. Score
# the candidate title against the wanted track's ALBUM name too
# and take the max, so album-level releases match every track on
# them. The album_track_count bonus only kicks in when we have
# a non-empty album string to compare against.
if r.username in ('torrent', 'usenet') and spotify_track and spotify_track.album:
album_conf, _ = matching_engine.score_track_match(
source_title=spotify_track.album,
source_artists=expected_artists,
source_duration_ms=0, # albums don't have one duration
candidate_title=r.title or '',
candidate_artists=[r.artist] if r.artist else [],
candidate_duration_ms=0,
)
if album_conf > confidence:
confidence = album_conf
match_type = 'album_release'
# Version detection penalty — reject live/remix/acoustic when expecting original
r_title_lower = (r.title or '').lower()
is_wrong_version = False
@ -162,8 +187,11 @@ def get_valid_candidates(results, spotify_track, query):
# Artist gate — streaming APIs (Tidal/Qobuz/HiFi/Deezer) have reliable metadata,
# so "My Will" by "B. Starr" should never match expected "B小町".
# Skip for YouTube — artist is parsed from video titles and often unreliable.
if r.username != 'youtube':
# Skip for YouTube (video-title parsing is unreliable) and torrent/usenet
# (album-level releases legitimately don't expose per-track artist —
# the projection layer fills artist with the indexer name as a fallback,
# which would otherwise fail the gate against every Spotify artist).
if r.username not in ('youtube', 'torrent', 'usenet'):
from difflib import SequenceMatcher
import re as _re
_cand_artist_raw = r.artist or ''

Loading…
Cancel
Save