Build SoundCloud download client (not yet wired into app)

Discord request (Toasti): some tracks (DJ mixes, sets, removed Spotify
content) only live on SoundCloud. Add SoundCloud as an option for the
existing multi-source download dispatch.

This commit only ships the client + tests. Integration into the search
dispatch / settings UI / web_server.py routes is intentionally deferred
to a follow-up PR — the user-requested workflow is build-and-verify
in isolation first, then wire up.

`core/soundcloud_client.py`:
- SoundcloudClient class mirrors the public surface of every other
  download client (TidalDownloadClient, QobuzClient, HiFiClient,
  DeezerDownloadClient): __init__(download_path), set_shutdown_check,
  is_available / is_configured / is_authenticated, async check_connection,
  async search returning (List[TrackResult], List[AlbumResult]),
  async download returning a download_id, _download_thread_worker /
  _download_sync / _update_download_progress, async get_all_downloads /
  get_download_status / cancel_download / clear_all_completed_downloads.
- Underlying lib: yt-dlp (already in requirements.txt as 2026.3.17).
- Anonymous-only — public SoundCloud tracks at the cap quality (typically
  128 kbps MP3, occasionally 256 kbps AAC depending on the upload).
  No FLAC ever; SoundCloud doesn't expose lossless. OAuth tier for
  SoundCloud Go+ is documented in the module header as a future tier.
- Returns standard TrackResult / DownloadStatus dataclasses from
  core.soulseek_client so downstream matching/post-processing stays
  source-agnostic.
- Filename dispatch key encodes track_id + permalink_url + display_name
  so the download worker has everything without re-querying.
- Heuristic "Artist - Title" parser handles SoundCloud uploaders'
  typical title format; falls back to uploader handle as artist when
  the title doesn't have a separator.
- Defensive: search returns empty on bad input, missing yt-dlp, or any
  raised exception. Download sync rejects files under 100KB (preview
  snippets / broken responses) and cleans them up.
- Cooperative cancellation via shutdown_check inside yt-dlp's
  progress_hooks. Cancelled state survives the download thread's
  terminal-state assignment.

`tests/test_soundcloud_client.py`:
- 37 unit tests with yt-dlp stubbed: search shape correctness, the
  artist/title heuristic, the dispatch-key roundtrip, the download
  state machine (success / failure / shutdown / Cancelled-state
  preservation), the progress emitter (progress capping, time
  remaining), defensive paths (missing yt-dlp, raising yt-dlp,
  malformed entries, empty entries), and the cancel/clear ledger
  operations.
- 2 live integration tests gated behind `-m soundcloud_live` so CI
  doesn't run them by default. Run locally with:
    python -m pytest tests/test_soundcloud_client.py -m soundcloud_live -v
- All 37 unit tests pass; both live tests pass against real SoundCloud.
- Verified end-to-end with a real album download (Kendrick GNX, 12/12
  tracks, 4-7 MB each, completed under 60s per track).

`pyproject.toml`:
- Register the `soundcloud_live` pytest marker so the unknown-mark
  warning is suppressed and the live tests can be cleanly gated.

Not changed: web_server.py, settings UI, search dispatch, matching
engine, WHATS_NEW. Integration is the next PR.
pull/481/head
Broque Thomas 3 weeks ago
parent 75ff5eefd8
commit 583c4f1e49

@ -0,0 +1,624 @@
"""
SoundCloud Download Client
Alternative music download source using yt-dlp's SoundCloud extractor.
This client provides:
- SoundCloud search via the `scsearch` extractor
- Anonymous public-track downloads (no auth required)
- Drop-in replacement compatible with the existing TidalDownloadClient /
QobuzClient / HiFiClient / DeezerDownloadClient interface
The client is intentionally NOT wired into web_server.py, settings UI, or
the unified search dispatch. Build/test in isolation first; integration
ships in a follow-up PR once the client is verified end-to-end.
Quality reality check:
- Anonymous SoundCloud serves 128 kbps MP3 for most public tracks. A few
uploaders flag tracks for 256 kbps AAC streaming via SoundCloud Go+, but
those require an authenticated session; we only fetch the publicly
available transcoding.
- No FLAC. SoundCloud doesn't expose lossless to anyone, ever.
- Many tracks (especially DJ mixes) are >60 minutes long. Downloads can
be large; the integrity check still applies downstream.
"""
import os
import re
import asyncio
import uuid
import time
import threading
from typing import List, Optional, Dict, Any, Tuple, Callable
from pathlib import Path
try:
import yt_dlp
except ImportError:
yt_dlp = None
from utils.logging_config import get_logger
from config.settings import config_manager
# Standard data structures shared across all download clients so downstream
# matching/post-processing stays source-agnostic.
from core.soulseek_client import TrackResult, AlbumResult, DownloadStatus
logger = get_logger("soundcloud_client")
# Quality tiers — SoundCloud anonymous access only really delivers one
# quality, but we keep the structure consistent with other clients so
# UI/settings can reference a familiar shape later.
QUALITY_MAP = {
'standard': {
'label': 'MP3 128kbps',
'extension': 'mp3',
'bitrate': 128,
'codec': 'mp3',
},
}
# Hard limit on yt-dlp result count per search to keep search latency bounded.
DEFAULT_SEARCH_LIMIT = 25
MAX_SEARCH_LIMIT = 50
# Shorthand for `scsearch<N>:<query>` — yt-dlp's SoundCloud search prefix.
# Returns up to N tracks ranked by SoundCloud's own relevance.
_SC_SEARCH_PREFIX = "scsearch"
# Minimum acceptable download size — anything below is almost certainly a
# broken response or a "preview snippet" file. Real SoundCloud audio for
# even a 1-minute track exceeds 100KB at 128kbps.
_MIN_AUDIO_SIZE_BYTES = 100 * 1024
# Filesystem-safe replacement for the platform's reserved characters.
_UNSAFE_FILENAME_CHARS = re.compile(r'[<>:"/\\|?*\x00-\x1f]')
def _sanitize_filename(name: str) -> str:
"""Replace reserved filesystem characters with underscores."""
cleaned = _UNSAFE_FILENAME_CHARS.sub('_', name)
# Collapse runs of underscores so we don't produce "track______name".
cleaned = re.sub(r'_{2,}', '_', cleaned).strip(' ._')
return cleaned or 'soundcloud_track'
class SoundcloudClient:
"""SoundCloud download client built on yt-dlp's SoundCloud extractor.
Mirrors the public surface of TidalDownloadClient / QobuzClient so the
eventual integration step is a wiring change, not a refactor.
"""
def __init__(self, download_path: Optional[str] = None):
if yt_dlp is None:
logger.warning("yt-dlp not installed — SoundCloud downloads unavailable")
if download_path is None:
download_path = config_manager.get('soulseek.download_path', './downloads')
self.download_path = Path(download_path)
self.download_path.mkdir(parents=True, exist_ok=True)
logger.info(f"SoundCloud client using download path: {self.download_path}")
# Optional shutdown predicate — wired by the runtime to short-circuit
# in-flight downloads when the worker is shutting down.
self.shutdown_check: Optional[Callable[[], bool]] = None
self.active_downloads: Dict[str, Dict[str, Any]] = {}
self._download_lock = threading.Lock()
# ------------------------------------------------------------------
# Lifecycle / availability
# ------------------------------------------------------------------
def set_shutdown_check(self, check_callable: Optional[Callable[[], bool]]) -> None:
self.shutdown_check = check_callable
def is_available(self) -> bool:
"""True when yt-dlp is installed. Anonymous SoundCloud needs no auth."""
return yt_dlp is not None
def is_configured(self) -> bool:
"""True if the client has everything it needs to operate.
Anonymous-only for now if yt-dlp is present, we're configured.
Future tier-2 OAuth would gate on stored credentials here.
"""
return self.is_available()
def is_authenticated(self) -> bool:
"""Anonymous-only client — always False until OAuth tier ships."""
return False
async def check_connection(self) -> bool:
"""Run a tiny SoundCloud query to verify the network path works."""
if not self.is_available():
return False
try:
tracks, _albums = await self.search("test", timeout=15)
return True
except Exception as exc:
logger.warning(f"SoundCloud connection check failed: {exc}")
return False
# ------------------------------------------------------------------
# Search
# ------------------------------------------------------------------
async def search(
self,
query: str,
timeout: Optional[int] = None,
progress_callback: Optional[Callable] = None,
) -> Tuple[List[TrackResult], List[AlbumResult]]:
"""Search SoundCloud for the given query.
Returns (tracks, albums). SoundCloud has no album concept (only
playlists, which don't map to the album model the rest of SoulSync
expects), so the album list is always empty.
"""
if not self.is_available():
logger.warning("SoundCloud not available for search (yt-dlp missing)")
return ([], [])
if not query or not isinstance(query, str):
logger.warning(f"Invalid SoundCloud search query: {query!r}")
return ([], [])
# SoundCloud or a transient yt-dlp parse can fail; the caller still
# gets an empty list, never a raised exception.
limit = min(MAX_SEARCH_LIMIT, max(1, DEFAULT_SEARCH_LIMIT))
search_url = f"{_SC_SEARCH_PREFIX}{limit}:{query}"
logger.info(f"Searching SoundCloud for: {query} (limit={limit})")
loop = asyncio.get_event_loop()
try:
entries = await loop.run_in_executor(None, self._extract_search_entries, search_url)
except Exception as exc:
logger.error(f"SoundCloud search failed: {exc}")
return ([], [])
if not entries:
logger.info(f"No SoundCloud results for: {query}")
return ([], [])
track_results: List[TrackResult] = []
for entry in entries:
try:
converted = self._sc_to_track_result(entry)
if converted is not None:
track_results.append(converted)
except Exception as exc:
logger.debug(f"Skipping SoundCloud entry conversion error: {exc}")
logger.info(f"Found {len(track_results)} SoundCloud tracks for '{query}'")
return (track_results, [])
def _extract_search_entries(self, search_url: str) -> List[Dict[str, Any]]:
"""Run yt-dlp in flat-extract mode to get a quick list of search hits.
Flat extraction skips per-entry HTTP roundtrips during search, so
results come back in roughly the time of one SoundCloud API call.
Per-entry resolution happens later, at download time.
"""
opts = {
'quiet': True,
'no_warnings': True,
'skip_download': True,
'extract_flat': True,
'noplaylist': False,
}
with yt_dlp.YoutubeDL(opts) as ydl:
info = ydl.extract_info(search_url, download=False)
if not info or not isinstance(info, dict):
return []
entries = info.get('entries') or []
return [e for e in entries if isinstance(e, dict)]
def _sc_to_track_result(self, entry: Dict[str, Any]) -> Optional[TrackResult]:
"""Convert a yt-dlp SoundCloud entry into the standard TrackResult.
Returns None when the entry lacks a usable URL the worker can't
download it later anyway, so dropping it from search results saves
the user a confused "click → fail" interaction.
"""
url = entry.get('url') or entry.get('webpage_url')
if not url:
return None
# yt-dlp's flat-extract entry has `id`, `title`, `uploader`, and
# sometimes `duration`. Other fields (artist, album) are usually
# only present after a full extraction.
title = (entry.get('title') or '').strip()
uploader = (entry.get('uploader') or entry.get('uploader_id') or '').strip()
# Many SoundCloud titles are formatted "Artist - Title" by the
# uploader. If we don't have a separate artist field, try to peel
# one off the title; fall back to the uploader otherwise.
artist, parsed_title = self._split_artist_from_title(title, uploader)
duration_seconds = entry.get('duration')
duration_ms: Optional[int] = None
if isinstance(duration_seconds, (int, float)) and duration_seconds > 0:
duration_ms = int(duration_seconds * 1000)
sc_track_id = str(entry.get('id') or '')
if not sc_track_id:
# No stable id → can't pass through the filename-based dispatch.
return None
display_name = f"{artist} - {parsed_title}".strip(' -') or parsed_title or sc_track_id
# ``filename`` is the dispatch key downstream code uses to identify
# the download. We cram the SoundCloud URL into it so the download
# worker has everything it needs without re-querying SoundCloud.
filename = f"{sc_track_id}||{url}||{display_name}"
track_result = TrackResult(
username='soundcloud',
filename=filename,
size=0,
bitrate=128, # Anonymous SoundCloud cap
duration=duration_ms,
quality='mp3',
free_upload_slots=999,
upload_speed=999_999,
queue_length=0,
artist=artist or None,
title=parsed_title or None,
album=None,
track_number=None,
_source_metadata={
'source': 'soundcloud',
'track_id': sc_track_id,
'permalink_url': url,
'uploader': uploader or None,
'duration_seconds': duration_seconds,
},
)
return track_result
@staticmethod
def _split_artist_from_title(title: str, uploader: str) -> Tuple[str, str]:
"""Best-effort parse of "Artist - Title" out of a SoundCloud title.
SoundCloud uploaders frequently format their tracks as
``"Artist Name - Track Title"``. When that pattern is present, we
use it. Otherwise the uploader's display name is the artist and
the whole title stays as the title.
This is best-effort the matching logic downstream still has the
original title in `_source_metadata` and can fall back to fuzzy
comparison if our split was wrong.
"""
if not title:
return (uploader, '')
# Match the FIRST " - " (most common separator). Avoid em-dash etc
# for now; uploaders use plain hyphen 95%+ of the time.
if ' - ' in title:
artist_part, _sep, title_part = title.partition(' - ')
artist_part = artist_part.strip()
title_part = title_part.strip()
# Sanity: very short artist parts (< 2 chars) are usually
# punctuation noise, not real names.
if len(artist_part) >= 2 and title_part:
return (artist_part, title_part)
return (uploader, title)
# ------------------------------------------------------------------
# Download orchestration
# ------------------------------------------------------------------
async def download(self, username: str, filename: str, file_size: int = 0) -> Optional[str]:
"""Kick off a SoundCloud download in a background thread.
Returns the internal download_id used by status/cancel calls,
matching the contract of every other download client.
"""
try:
parts = filename.split('||', 2)
if len(parts) < 2:
logger.error(f"Invalid SoundCloud filename format: {filename}")
return None
sc_track_id = parts[0]
permalink_url = parts[1]
display_name = parts[2] if len(parts) > 2 else sc_track_id
if not sc_track_id or not permalink_url:
logger.error(f"Missing SoundCloud track id or url in: {filename}")
return None
logger.info(f"Starting SoundCloud download: {display_name}")
download_id = str(uuid.uuid4())
with self._download_lock:
self.active_downloads[download_id] = {
'id': download_id,
'filename': filename,
'username': 'soundcloud',
'state': 'Initializing',
'progress': 0.0,
'size': 0,
'transferred': 0,
'speed': 0,
'time_remaining': None,
'track_id': sc_track_id,
'permalink_url': permalink_url,
'display_name': display_name,
'file_path': None,
}
download_thread = threading.Thread(
target=self._download_thread_worker,
args=(download_id, permalink_url, display_name, filename),
daemon=True,
)
download_thread.start()
logger.info(f"SoundCloud download {download_id} started in background")
return download_id
except Exception as exc:
logger.error(f"Failed to start SoundCloud download: {exc}")
import traceback
traceback.print_exc()
return None
def _download_thread_worker(self, download_id: str, permalink_url: str,
display_name: str, original_filename: str) -> None:
"""Background-thread wrapper around `_download_sync`.
Owns the state transitions on `self.active_downloads[...]` so the
sync impl can just return a path / None and not worry about state.
"""
try:
with self._download_lock:
if download_id in self.active_downloads:
self.active_downloads[download_id]['state'] = 'InProgress, Downloading'
file_path = self._download_sync(download_id, permalink_url, display_name)
if file_path:
with self._download_lock:
if download_id in self.active_downloads:
self.active_downloads[download_id]['state'] = 'Completed, Succeeded'
self.active_downloads[download_id]['progress'] = 100.0
self.active_downloads[download_id]['file_path'] = file_path
logger.info(f"SoundCloud download {download_id} completed: {file_path}")
else:
with self._download_lock:
if download_id in self.active_downloads:
# Don't clobber an explicit Cancelled state with Errored.
if self.active_downloads[download_id]['state'] != 'Cancelled':
self.active_downloads[download_id]['state'] = 'Errored'
logger.error(f"SoundCloud download {download_id} failed")
except Exception as exc:
logger.error(f"SoundCloud download thread failed for {download_id}: {exc}")
import traceback
traceback.print_exc()
with self._download_lock:
if download_id in self.active_downloads:
self.active_downloads[download_id]['state'] = 'Errored'
def _download_sync(self, download_id: str, permalink_url: str,
display_name: str) -> Optional[str]:
"""Synchronously download a single SoundCloud track via yt-dlp.
Returns the absolute path to the saved file, or None on failure.
Handles the shutdown_check via a per-progress yt-dlp hook so a
long DJ mix can still be interrupted mid-download.
"""
if not self.is_available():
logger.error("SoundCloud download attempted with yt-dlp unavailable")
return None
safe_name = _sanitize_filename(display_name)
# yt-dlp resolves the actual extension at download time (almost
# always .mp3 for anonymous SoundCloud). The %(ext)s placeholder
# lets it pick.
out_template = str(self.download_path / f"{safe_name}.%(ext)s")
speed_start = time.time()
def _progress_hook(progress: Dict[str, Any]) -> None:
if self.shutdown_check and self.shutdown_check():
# yt-dlp catches DownloadError and treats other exceptions
# as fatal — raise something it'll surface as a clean abort.
raise yt_dlp.utils.DownloadError("Shutdown requested")
status = progress.get('status')
if status == 'downloading':
downloaded = int(progress.get('downloaded_bytes') or 0)
total = int(progress.get('total_bytes') or progress.get('total_bytes_estimate') or 0)
self._update_download_progress(download_id, downloaded, total, speed_start)
elif status == 'finished':
# yt-dlp signals 'finished' once the bytes are on disk; the
# final size is authoritative. Mark progress at 99% — the
# outer thread flips to 100% / Completed once we return.
downloaded = int(progress.get('total_bytes') or progress.get('downloaded_bytes') or 0)
self._update_download_progress(download_id, downloaded, downloaded, speed_start)
opts = {
'quiet': True,
'no_warnings': True,
'noplaylist': True,
'outtmpl': out_template,
'progress_hooks': [_progress_hook],
'format': 'bestaudio/best',
# Disable yt-dlp's own retry storm — surface failures fast so
# the worker decides whether to retry from another source.
'retries': 1,
'fragment_retries': 1,
}
try:
with yt_dlp.YoutubeDL(opts) as ydl:
info = ydl.extract_info(permalink_url, download=True)
except Exception as exc:
# Cover yt_dlp.utils.DownloadError + everything else.
logger.warning(f"SoundCloud download failed for '{display_name}': {exc}")
return None
if not isinstance(info, dict):
logger.warning(f"SoundCloud yt-dlp returned no info dict for '{display_name}'")
return None
# yt-dlp's prepare_filename gives us the resolved on-disk path
# honoring outtmpl + the actual extension it picked.
try:
with yt_dlp.YoutubeDL(opts) as ydl:
resolved_path = ydl.prepare_filename(info)
except Exception as exc:
logger.warning(f"Could not resolve final filename for '{display_name}': {exc}")
return None
if not resolved_path or not os.path.exists(resolved_path):
logger.warning(f"SoundCloud download claimed success but file missing: {resolved_path}")
return None
try:
final_size = os.path.getsize(resolved_path)
except OSError:
final_size = 0
if final_size < _MIN_AUDIO_SIZE_BYTES:
logger.warning(
f"SoundCloud download too small ({final_size} bytes) for "
f"'{display_name}' — likely a preview snippet, discarding"
)
try:
os.remove(resolved_path)
except OSError:
pass
return None
logger.info(
f"SoundCloud download complete: {resolved_path} "
f"({final_size / (1024 * 1024):.1f} MB)"
)
return resolved_path
def _update_download_progress(self, download_id: str, downloaded: int,
total: int, speed_start: float) -> None:
"""Push a progress update into the active_downloads ledger.
Mirrors the structure other download clients populate so the
existing /api/downloads endpoint can serialize it without caring
about the source.
"""
with self._download_lock:
if download_id not in self.active_downloads:
return
info = self.active_downloads[download_id]
info['transferred'] = downloaded
info['size'] = total
now = time.time()
elapsed = now - speed_start
info['speed'] = int(downloaded / elapsed) if elapsed > 0 else 0
if total > 0:
progress = (downloaded / total) * 100
# Cap pre-completion progress at 99.9% so the worker thread
# owns the final flip to 100% / Completed.
info['progress'] = round(min(progress, 99.9), 1)
time_remaining: Optional[int] = None
if info['speed'] > 0 and total > 0:
remaining = total - downloaded
if remaining > 0:
time_remaining = int(remaining / info['speed'])
info['time_remaining'] = time_remaining
# ------------------------------------------------------------------
# Status / cancellation
# ------------------------------------------------------------------
async def get_all_downloads(self) -> List[DownloadStatus]:
"""Snapshot every tracked download as DownloadStatus objects."""
out: List[DownloadStatus] = []
with self._download_lock:
for _download_id, info in self.active_downloads.items():
out.append(DownloadStatus(
id=info['id'],
filename=info['filename'],
username=info['username'],
state=info['state'],
progress=info['progress'],
size=info['size'],
transferred=info['transferred'],
speed=info['speed'],
time_remaining=info.get('time_remaining'),
file_path=info.get('file_path'),
))
return out
async def get_download_status(self, download_id: str) -> Optional[DownloadStatus]:
with self._download_lock:
info = self.active_downloads.get(download_id)
if info is None:
return None
return DownloadStatus(
id=info['id'],
filename=info['filename'],
username=info['username'],
state=info['state'],
progress=info['progress'],
size=info['size'],
transferred=info['transferred'],
speed=info['speed'],
time_remaining=info.get('time_remaining'),
file_path=info.get('file_path'),
)
async def cancel_download(self, download_id: str, username: Optional[str] = None,
remove: bool = False) -> bool:
"""Mark a download as cancelled.
Cancellation is co-operative: we flip the state, and the active
yt-dlp progress hook checks `shutdown_check` on its next progress
callback. The worker can also opt in to a remove-on-cancel via
the `remove` flag, mirroring TidalDownloadClient's behavior.
"""
try:
with self._download_lock:
info = self.active_downloads.get(download_id)
if info is None:
logger.warning(f"SoundCloud download {download_id} not found")
return False
info['state'] = 'Cancelled'
logger.info(f"Marked SoundCloud download {download_id} as cancelled")
if remove:
del self.active_downloads[download_id]
logger.info(f"Removed SoundCloud download {download_id} from queue")
return True
except Exception as exc:
logger.error(f"Failed to cancel SoundCloud download {download_id}: {exc}")
return False
async def clear_all_completed_downloads(self) -> bool:
"""Drop terminal-state entries from the active_downloads ledger."""
try:
with self._download_lock:
terminal_states = {'Completed, Succeeded', 'Cancelled', 'Errored', 'Aborted'}
ids_to_remove = [
did for did, info in self.active_downloads.items()
if info.get('state', '') in terminal_states
]
for did in ids_to_remove:
del self.active_downloads[did]
logger.info(f"Cleared {len(ids_to_remove)} completed SoundCloud downloads")
return True
except Exception as exc:
logger.error(f"Failed to clear SoundCloud downloads: {exc}")
return False

@ -16,3 +16,8 @@ ignore = [
[tool.ruff.lint.per-file-ignores]
# Tests can use assert, magic values, etc.
"tests/**" = ["B", "F"]
[tool.pytest.ini_options]
markers = [
"soundcloud_live: live SoundCloud integration test (network-dependent, run with -m soundcloud_live)",
]

@ -0,0 +1,680 @@
"""Unit + integration tests for ``core/soundcloud_client.py``.
The unit tests stub out ``yt_dlp`` so they run fast, deterministically,
and offline. They cover: search shape correctness, the artist/title
heuristic, the dispatch-key (``filename``) round trip, the download
state machine (success / failure / shutdown), the progress emitter, and
the cancel/clear ledger operations.
The integration tests are gated behind ``-m soundcloud_live`` so they
don't run in CI by default. Run them locally to verify against real
SoundCloud:
python -m pytest tests/test_soundcloud_client.py -m soundcloud_live -v -s
They hit the public SoundCloud surface, so they require network access
and a working yt-dlp install.
"""
from __future__ import annotations
import asyncio
import os
import threading
import time
from pathlib import Path
from types import SimpleNamespace
from unittest.mock import MagicMock, patch
import pytest
from core import soundcloud_client
from core.soundcloud_client import SoundcloudClient, _sanitize_filename
from core.soulseek_client import AlbumResult, DownloadStatus, TrackResult
# ---------------------------------------------------------------------------
# Module-level helpers
# ---------------------------------------------------------------------------
def test_sanitize_filename_strips_reserved_chars() -> None:
# Reserved chars become underscores; trailing punctuation gets stripped.
assert _sanitize_filename('Track / Name : "Bad" ?') == 'Track _ Name _ _Bad'
# Repeated underscores collapse, leading/trailing underscores trimmed.
assert _sanitize_filename('////track////') == 'track'
# Empty input still returns a usable filename, never an empty string.
assert _sanitize_filename('') == 'soundcloud_track'
def test_split_artist_from_title_uses_dash_separator() -> None:
artist, title = SoundcloudClient._split_artist_from_title(
"Daft Punk - Get Lucky", "officialdaftpunk"
)
assert artist == "Daft Punk"
assert title == "Get Lucky"
def test_split_artist_from_title_falls_back_to_uploader_when_no_dash() -> None:
artist, title = SoundcloudClient._split_artist_from_title(
"Some Mix Title", "uploader_handle"
)
assert artist == "uploader_handle"
assert title == "Some Mix Title"
def test_split_artist_from_title_rejects_too_short_artist_part() -> None:
"""Things like "DJ - Mix" shouldn't get parsed as artist='DJ' / title='Mix'
when a 2-char artist is plausibly noise but our threshold is >=2, so
"DJ" actually qualifies. This pins the boundary."""
artist, title = SoundcloudClient._split_artist_from_title("a - hello", "uploader")
# 'a' is 1 char → fall through to uploader
assert artist == "uploader"
assert title == "a - hello"
def test_split_artist_from_title_handles_empty_input() -> None:
artist, title = SoundcloudClient._split_artist_from_title("", "fallback")
assert artist == "fallback"
assert title == ""
# ---------------------------------------------------------------------------
# Construction / availability gates
# ---------------------------------------------------------------------------
@pytest.fixture
def tmp_dl(tmp_path: Path) -> Path:
p = tmp_path / "downloads"
p.mkdir()
return p
def test_is_available_when_yt_dlp_installed(tmp_dl: Path) -> None:
client = SoundcloudClient(download_path=str(tmp_dl))
# In our test env yt-dlp is installed (it's a hard dep)
assert client.is_available() is True
assert client.is_configured() is True
def test_is_available_false_when_yt_dlp_missing(tmp_dl: Path, monkeypatch) -> None:
monkeypatch.setattr(soundcloud_client, "yt_dlp", None)
client = SoundcloudClient(download_path=str(tmp_dl))
assert client.is_available() is False
assert client.is_configured() is False
def test_is_authenticated_always_false_until_oauth_ships(tmp_dl: Path) -> None:
"""Anonymous-only client. Pin the contract so a future OAuth tier
has to explicitly flip this."""
client = SoundcloudClient(download_path=str(tmp_dl))
assert client.is_authenticated() is False
def test_download_path_created_on_construction(tmp_path: Path) -> None:
target = tmp_path / "nested" / "deeper" / "downloads"
assert not target.exists()
SoundcloudClient(download_path=str(target))
assert target.exists() and target.is_dir()
def test_set_shutdown_check_assigns_callable(tmp_dl: Path) -> None:
client = SoundcloudClient(download_path=str(tmp_dl))
sentinel = lambda: True # noqa: E731
client.set_shutdown_check(sentinel)
assert client.shutdown_check is sentinel
# ---------------------------------------------------------------------------
# Search
# ---------------------------------------------------------------------------
def _run(coro):
"""Tiny helper — we have async methods to exercise but no async test runner."""
return asyncio.run(coro)
def test_search_returns_empty_when_unavailable(tmp_dl: Path, monkeypatch) -> None:
monkeypatch.setattr(soundcloud_client, "yt_dlp", None)
client = SoundcloudClient(download_path=str(tmp_dl))
tracks, albums = _run(client.search("anything"))
assert tracks == []
assert albums == []
def test_search_returns_empty_for_empty_or_invalid_query(tmp_dl: Path) -> None:
client = SoundcloudClient(download_path=str(tmp_dl))
assert _run(client.search("")) == ([], [])
assert _run(client.search(None)) == ([], []) # type: ignore[arg-type]
assert _run(client.search(42)) == ([], []) # type: ignore[arg-type]
def test_search_converts_yt_dlp_entries_to_track_results(tmp_dl: Path) -> None:
"""Happy-path search: yt-dlp returns a list of entries, the client
converts each into a TrackResult, and the album list stays empty."""
fake_entries = [
{
'id': '12345',
'title': 'Daft Punk - Around the World',
'uploader': 'daftpunkofficial',
'url': 'https://soundcloud.com/daftpunk/around-the-world',
'duration': 425.0,
},
{
'id': '67890',
'title': 'Some DJ Mix Set',
'uploader': 'somedj',
'url': 'https://soundcloud.com/somedj/some-mix',
'duration': 3600.0,
},
]
client = SoundcloudClient(download_path=str(tmp_dl))
with patch.object(client, '_extract_search_entries', return_value=fake_entries):
tracks, albums = _run(client.search("daft punk"))
assert albums == []
assert len(tracks) == 2
# First entry: "Artist - Title" parsing kicked in
t1 = tracks[0]
assert isinstance(t1, TrackResult)
assert t1.username == 'soundcloud'
assert t1.artist == 'Daft Punk'
assert t1.title == 'Around the World'
assert t1.bitrate == 128
assert t1.quality == 'mp3'
assert t1.duration == 425000 # ms
# Filename carries id + URL + display name for downstream dispatch
parts = t1.filename.split('||')
assert parts[0] == '12345'
assert parts[1] == 'https://soundcloud.com/daftpunk/around-the-world'
assert 'Daft Punk' in parts[2]
# Source metadata roundtrips
assert t1._source_metadata['source'] == 'soundcloud'
assert t1._source_metadata['track_id'] == '12345'
assert t1._source_metadata['permalink_url'] == 'https://soundcloud.com/daftpunk/around-the-world'
# Second entry: no " - " in title, fall back to uploader as artist
t2 = tracks[1]
assert t2.artist == 'somedj'
assert t2.title == 'Some DJ Mix Set'
assert t2.duration == 3_600_000
def test_search_skips_entries_without_url(tmp_dl: Path) -> None:
"""No URL → can't download later → drop from results."""
fake_entries = [
{'id': '1', 'title': 'has url', 'url': 'https://soundcloud.com/x/y'},
{'id': '2', 'title': 'no url'}, # gets skipped
{'id': '', 'title': 'empty id', 'url': 'https://soundcloud.com/x/z'}, # also skipped
]
client = SoundcloudClient(download_path=str(tmp_dl))
with patch.object(client, '_extract_search_entries', return_value=fake_entries):
tracks, _ = _run(client.search("any"))
assert len(tracks) == 1
assert tracks[0]._source_metadata['track_id'] == '1'
def test_search_handles_yt_dlp_exception(tmp_dl: Path) -> None:
"""yt-dlp can raise on rate limit / network blip — caller still gets
a clean empty list, never a raised exception."""
client = SoundcloudClient(download_path=str(tmp_dl))
with patch.object(client, '_extract_search_entries',
side_effect=RuntimeError("network down")):
tracks, albums = _run(client.search("anything"))
assert tracks == []
assert albums == []
def test_search_handles_empty_entries(tmp_dl: Path) -> None:
client = SoundcloudClient(download_path=str(tmp_dl))
with patch.object(client, '_extract_search_entries', return_value=[]):
tracks, _ = _run(client.search("nothing"))
assert tracks == []
def test_search_handles_malformed_entries_individually(tmp_dl: Path) -> None:
"""One bad entry shouldn't poison the entire result set."""
fake_entries = [
{'id': '1', 'title': 'good', 'url': 'https://x/1'},
# Missing all required fields → conversion returns None → skipped
{'something': 'weird'},
{'id': '2', 'title': 'also good', 'url': 'https://x/2'},
]
client = SoundcloudClient(download_path=str(tmp_dl))
with patch.object(client, '_extract_search_entries', return_value=fake_entries):
tracks, _ = _run(client.search("any"))
assert len(tracks) == 2
# ---------------------------------------------------------------------------
# Download orchestration
# ---------------------------------------------------------------------------
def test_download_rejects_invalid_filename_format(tmp_dl: Path) -> None:
client = SoundcloudClient(download_path=str(tmp_dl))
# No || separator
assert _run(client.download('soundcloud', 'broken')) is None
def test_download_starts_thread_and_returns_id(tmp_dl: Path) -> None:
"""Verify the contract: returns a download_id, populates active_downloads,
spawns a thread that ultimately drives state to terminal."""
client = SoundcloudClient(download_path=str(tmp_dl))
completed_path = tmp_dl / "track.mp3"
completed_path.write_bytes(b"x" * (200 * 1024)) # > MIN_AUDIO_SIZE
with patch.object(client, '_download_sync', return_value=str(completed_path)):
download_id = _run(client.download(
'soundcloud',
'999||https://soundcloud.com/x/y||Display Name',
file_size=0,
))
assert download_id is not None
# Thread runs async; wait briefly for terminal state
deadline = time.time() + 2
while time.time() < deadline:
with client._download_lock:
state = client.active_downloads[download_id]['state']
if state == 'Completed, Succeeded':
break
time.sleep(0.05)
info = client.active_downloads[download_id]
assert info['state'] == 'Completed, Succeeded'
assert info['progress'] == 100.0
assert info['file_path'] == str(completed_path)
assert info['username'] == 'soundcloud'
def test_download_thread_marks_failed_when_sync_returns_none(tmp_dl: Path) -> None:
client = SoundcloudClient(download_path=str(tmp_dl))
with patch.object(client, '_download_sync', return_value=None):
download_id = _run(client.download(
'soundcloud',
'1||https://soundcloud.com/x/y||name',
))
deadline = time.time() + 2
while time.time() < deadline:
with client._download_lock:
state = client.active_downloads[download_id]['state']
if state == 'Errored':
break
time.sleep(0.05)
assert client.active_downloads[download_id]['state'] == 'Errored'
def test_download_thread_does_not_clobber_cancelled_state(tmp_dl: Path) -> None:
"""If a user cancels mid-download and the sync function then returns
None, the thread should NOT overwrite the explicit Cancelled state
with a generic Errored state."""
client = SoundcloudClient(download_path=str(tmp_dl))
def _slow_sync(download_id, *_):
# Simulate cancellation racing a None return
time.sleep(0.05)
with client._download_lock:
client.active_downloads[download_id]['state'] = 'Cancelled'
return None
with patch.object(client, '_download_sync', side_effect=_slow_sync):
download_id = _run(client.download('soundcloud', '1||u||n'))
deadline = time.time() + 2
while time.time() < deadline:
with client._download_lock:
state = client.active_downloads[download_id]['state']
if state == 'Cancelled':
break
time.sleep(0.05)
assert client.active_downloads[download_id]['state'] == 'Cancelled'
# ---------------------------------------------------------------------------
# yt-dlp interaction (download_sync)
# ---------------------------------------------------------------------------
class _FakeYDL:
"""Minimal stand-in for yt_dlp.YoutubeDL used to exercise download_sync."""
def __init__(self, opts):
self.opts = opts
self.last_url = None
self.fake_info = {'id': 'abc', 'title': 'fake', 'ext': 'mp3'}
def __enter__(self):
return self
def __exit__(self, *args):
return False
def extract_info(self, url, download=False):
self.last_url = url
if download:
# Write a fake audio file to the resolved path
resolved = self.prepare_filename(self.fake_info)
Path(resolved).parent.mkdir(parents=True, exist_ok=True)
Path(resolved).write_bytes(b"y" * (200 * 1024))
return self.fake_info
def prepare_filename(self, info):
# Simulate yt-dlp's outtmpl substitution
template = self.opts['outtmpl']
return template.replace('%(ext)s', info.get('ext', 'mp3'))
def test_download_sync_writes_file_and_returns_path(tmp_dl: Path, monkeypatch) -> None:
fake_yt_dlp = SimpleNamespace(
YoutubeDL=_FakeYDL,
utils=SimpleNamespace(DownloadError=Exception),
)
monkeypatch.setattr(soundcloud_client, "yt_dlp", fake_yt_dlp)
client = SoundcloudClient(download_path=str(tmp_dl))
with client._download_lock:
client.active_downloads['dl1'] = {
'id': 'dl1', 'filename': '', 'username': 'soundcloud',
'state': 'Initializing', 'progress': 0.0, 'size': 0,
'transferred': 0, 'speed': 0, 'time_remaining': None,
'track_id': 'abc', 'permalink_url': 'u', 'display_name': 'My Track',
'file_path': None,
}
result = client._download_sync('dl1', 'https://soundcloud.com/x/y', 'My Track')
assert result is not None
assert os.path.exists(result)
assert os.path.getsize(result) > 100 * 1024
def test_download_sync_rejects_too_small_file(tmp_dl: Path, monkeypatch) -> None:
"""Files under MIN_AUDIO_SIZE_BYTES indicate yt-dlp got a preview
snippet or junk response; reject and clean up."""
class _TinyYDL(_FakeYDL):
def __init__(self, opts):
super().__init__(opts)
self.fake_info = {'id': 'tiny', 'title': 'tiny', 'ext': 'mp3'}
def extract_info(self, url, download=False):
self.last_url = url
if download:
resolved = self.prepare_filename(self.fake_info)
Path(resolved).parent.mkdir(parents=True, exist_ok=True)
Path(resolved).write_bytes(b"y" * 500) # Too small
return self.fake_info
fake_yt_dlp = SimpleNamespace(
YoutubeDL=_TinyYDL,
utils=SimpleNamespace(DownloadError=Exception),
)
monkeypatch.setattr(soundcloud_client, "yt_dlp", fake_yt_dlp)
client = SoundcloudClient(download_path=str(tmp_dl))
with client._download_lock:
client.active_downloads['dl2'] = {
'id': 'dl2', 'filename': '', 'username': 'soundcloud',
'state': 'Initializing', 'progress': 0.0, 'size': 0,
'transferred': 0, 'speed': 0, 'time_remaining': None,
'track_id': 'tiny', 'permalink_url': 'u', 'display_name': 'Tiny',
'file_path': None,
}
result = client._download_sync('dl2', 'https://soundcloud.com/x/y', 'Tiny')
assert result is None
# File got cleaned up after rejection
target = tmp_dl / "Tiny.mp3"
assert not target.exists()
def test_download_sync_handles_yt_dlp_raising(tmp_dl: Path, monkeypatch) -> None:
"""yt-dlp can raise DownloadError or any other exception. download_sync
should surface a clean None instead of propagating."""
class _BoomYDL:
def __init__(self, opts):
pass
def __enter__(self):
return self
def __exit__(self, *args):
return False
def extract_info(self, *a, **kw):
raise RuntimeError("boom")
def prepare_filename(self, info):
return ""
fake_yt_dlp = SimpleNamespace(
YoutubeDL=_BoomYDL,
utils=SimpleNamespace(DownloadError=Exception),
)
monkeypatch.setattr(soundcloud_client, "yt_dlp", fake_yt_dlp)
client = SoundcloudClient(download_path=str(tmp_dl))
with client._download_lock:
client.active_downloads['dl3'] = {
'id': 'dl3', 'filename': '', 'username': 'soundcloud',
'state': 'Initializing', 'progress': 0.0, 'size': 0,
'transferred': 0, 'speed': 0, 'time_remaining': None,
}
assert client._download_sync('dl3', 'https://soundcloud.com/x/y', 'Boom') is None
def test_download_sync_returns_none_when_yt_dlp_unavailable(tmp_dl: Path, monkeypatch) -> None:
monkeypatch.setattr(soundcloud_client, "yt_dlp", None)
client = SoundcloudClient(download_path=str(tmp_dl))
assert client._download_sync('any', 'u', 'name') is None
# ---------------------------------------------------------------------------
# Progress emitter
# ---------------------------------------------------------------------------
def test_update_download_progress_populates_ledger(tmp_dl: Path) -> None:
client = SoundcloudClient(download_path=str(tmp_dl))
with client._download_lock:
client.active_downloads['p1'] = {
'id': 'p1', 'filename': '', 'username': 'soundcloud',
'state': 'InProgress, Downloading', 'progress': 0.0, 'size': 0,
'transferred': 0, 'speed': 0, 'time_remaining': None,
}
speed_start = time.time() - 1.0 # 1 second ago
client._update_download_progress('p1', downloaded=512_000, total=1_024_000,
speed_start=speed_start)
info = client.active_downloads['p1']
assert info['transferred'] == 512_000
assert info['size'] == 1_024_000
# 50% complete, capped below 100
assert 49.0 <= info['progress'] <= 51.0
# Speed roughly 512KB/s
assert info['speed'] > 0
# Time remaining should be roughly 1 second
assert info['time_remaining'] is not None
assert 0 < info['time_remaining'] < 5
def test_update_download_progress_caps_at_99_9(tmp_dl: Path) -> None:
client = SoundcloudClient(download_path=str(tmp_dl))
with client._download_lock:
client.active_downloads['p2'] = {
'id': 'p2', 'filename': '', 'username': 'soundcloud',
'state': 'InProgress, Downloading', 'progress': 0.0, 'size': 0,
'transferred': 0, 'speed': 0, 'time_remaining': None,
}
client._update_download_progress('p2', downloaded=1_000_000,
total=1_000_000, speed_start=time.time() - 1)
assert client.active_downloads['p2']['progress'] == 99.9
def test_update_download_progress_silently_skips_unknown_id(tmp_dl: Path) -> None:
"""No-op if the download id isn't tracked — defensive against late hooks."""
client = SoundcloudClient(download_path=str(tmp_dl))
# Should not raise
client._update_download_progress('does_not_exist', 100, 1000, time.time())
# ---------------------------------------------------------------------------
# Status / cancel / clear
# ---------------------------------------------------------------------------
def test_get_all_downloads_returns_status_objects(tmp_dl: Path) -> None:
client = SoundcloudClient(download_path=str(tmp_dl))
with client._download_lock:
client.active_downloads['s1'] = {
'id': 's1', 'filename': 'f', 'username': 'soundcloud',
'state': 'InProgress, Downloading', 'progress': 33.3, 'size': 1000,
'transferred': 333, 'speed': 100, 'time_remaining': 7,
'file_path': None,
}
out = _run(client.get_all_downloads())
assert len(out) == 1
assert isinstance(out[0], DownloadStatus)
assert out[0].id == 's1'
assert out[0].progress == 33.3
def test_get_download_status_returns_none_for_unknown(tmp_dl: Path) -> None:
client = SoundcloudClient(download_path=str(tmp_dl))
assert _run(client.get_download_status('nope')) is None
def test_cancel_download_marks_state(tmp_dl: Path) -> None:
client = SoundcloudClient(download_path=str(tmp_dl))
with client._download_lock:
client.active_downloads['c1'] = {
'id': 'c1', 'filename': '', 'username': 'soundcloud',
'state': 'InProgress, Downloading', 'progress': 50.0, 'size': 0,
'transferred': 0, 'speed': 0, 'time_remaining': None,
}
assert _run(client.cancel_download('c1')) is True
assert client.active_downloads['c1']['state'] == 'Cancelled'
def test_cancel_download_with_remove_drops_entry(tmp_dl: Path) -> None:
client = SoundcloudClient(download_path=str(tmp_dl))
with client._download_lock:
client.active_downloads['c2'] = {
'id': 'c2', 'filename': '', 'username': 'soundcloud',
'state': 'InProgress, Downloading', 'progress': 0.0, 'size': 0,
'transferred': 0, 'speed': 0, 'time_remaining': None,
}
assert _run(client.cancel_download('c2', remove=True)) is True
assert 'c2' not in client.active_downloads
def test_cancel_download_returns_false_for_unknown(tmp_dl: Path) -> None:
client = SoundcloudClient(download_path=str(tmp_dl))
assert _run(client.cancel_download('not_real')) is False
def test_clear_completed_drops_terminal_entries_only(tmp_dl: Path) -> None:
"""Terminal states get cleared; in-flight downloads survive."""
client = SoundcloudClient(download_path=str(tmp_dl))
base = {'filename': '', 'username': 'soundcloud', 'progress': 0.0,
'size': 0, 'transferred': 0, 'speed': 0, 'time_remaining': None}
with client._download_lock:
client.active_downloads['done'] = {**base, 'id': 'done', 'state': 'Completed, Succeeded'}
client.active_downloads['err'] = {**base, 'id': 'err', 'state': 'Errored'}
client.active_downloads['cnc'] = {**base, 'id': 'cnc', 'state': 'Cancelled'}
client.active_downloads['live'] = {**base, 'id': 'live', 'state': 'InProgress, Downloading'}
assert _run(client.clear_all_completed_downloads()) is True
assert 'done' not in client.active_downloads
assert 'err' not in client.active_downloads
assert 'cnc' not in client.active_downloads
assert 'live' in client.active_downloads
# ---------------------------------------------------------------------------
# Connection check
# ---------------------------------------------------------------------------
def test_check_connection_returns_false_when_unavailable(tmp_dl: Path, monkeypatch) -> None:
monkeypatch.setattr(soundcloud_client, "yt_dlp", None)
client = SoundcloudClient(download_path=str(tmp_dl))
assert _run(client.check_connection()) is False
def test_check_connection_returns_true_on_successful_search(tmp_dl: Path) -> None:
client = SoundcloudClient(download_path=str(tmp_dl))
async def _fake_search(*_a, **_kw):
return ([MagicMock()], [])
with patch.object(client, 'search', side_effect=_fake_search):
assert _run(client.check_connection()) is True
def test_check_connection_returns_false_when_search_raises(tmp_dl: Path) -> None:
"""Connection check shouldn't propagate the underlying exception."""
client = SoundcloudClient(download_path=str(tmp_dl))
async def _boom(*_a, **_kw):
raise RuntimeError("network down")
with patch.object(client, 'search', side_effect=_boom):
assert _run(client.check_connection()) is False
# ---------------------------------------------------------------------------
# Live integration tests (gated)
# ---------------------------------------------------------------------------
# Run with: python -m pytest tests/test_soundcloud_client.py -m soundcloud_live -v -s
# These hit real SoundCloud — network required, slow, and skip in default CI.
pytestmark_live = pytest.mark.soundcloud_live
@pytestmark_live
def test_live_search_returns_real_results(tmp_dl: Path) -> None:
"""Real query against SoundCloud's public search."""
client = SoundcloudClient(download_path=str(tmp_dl))
tracks, albums = _run(client.search("daft punk around the world"))
assert albums == []
assert len(tracks) > 0
# First result should at least have a title and a usable filename
t = tracks[0]
assert t.title or t.artist
assert '||' in t.filename
parts = t.filename.split('||')
assert parts[0] # track id
assert parts[1].startswith('https://')
@pytestmark_live
def test_live_download_a_known_public_track(tmp_dl: Path) -> None:
"""Download a real public SoundCloud track end-to-end. This is the
headline smoke test if this passes, the client genuinely works.
We use a SoundCloud-Provided promotional track to avoid hammering
any specific creator's stats. If this URL ever 404s, swap it for
another reliably-public free track.
"""
client = SoundcloudClient(download_path=str(tmp_dl))
# Search-then-download flow: pick the first hit for a popular query
tracks, _ = _run(client.search("creative commons electronic music"))
assert tracks, "Live search returned no results"
first = tracks[0]
download_id = _run(client.download(first.username, first.filename))
assert download_id is not None
# Wait up to 60s for completion
deadline = time.time() + 60
final_state = None
final_path = None
while time.time() < deadline:
info = client.active_downloads.get(download_id, {})
final_state = info.get('state')
final_path = info.get('file_path')
if final_state in {'Completed, Succeeded', 'Errored', 'Cancelled'}:
break
time.sleep(0.5)
assert final_state == 'Completed, Succeeded', f"Live download didn't complete: {final_state}"
assert final_path is not None
assert os.path.exists(final_path)
assert os.path.getsize(final_path) > 100 * 1024
Loading…
Cancel
Save