- Verified end-to-end: fetch_public_playlist_full pulled all 236 tracks of the
test playlist via SpotipyFree (the library handles the client-auth that 429'd
the raw approach). Name + tracks correct.
- requirements.txt: declare spotipyFree>=1.1.2 as a normal pip dependency (like
spotDL, also MIT — aggregation, not vendored) + websockets (a transitive dep
SpotipyFree/spotapi needs that pip doesn't pull automatically). Code still
soft-imports + falls back to embed, so it's never a hard runtime requirement.
- meta fetch uses limit=1 (name/owner only) so we don't pull the whole list
twice. 9 tests green.
The in-house anonymous-token path is blocked by Spotify (429 without the web
player's rotating client-auth). Switch the full-fetch to SpotipyFree — the
maintained no-creds spotipy drop-in spotDL uses, which tracks that machinery.
- core/spotify_public_api.fetch_public_playlist_full now uses a SpotipyFree
client (playlist + playlist_items + next), normalising the spotipy-shaped
items to the embed scraper's shape. Injectable client_factory keeps it
unit-testable without the library or network. Dropped the dead in-house
token/pagination code.
- Licensing: SpotipyFree is GPL-3.0, so it is NOT bundled/required (SoulSync is
MIT). Optional, user-installed: the import is soft, and on ImportError (or any
failure) fetch_spotify_public falls back to the embed scraper (~100). So the
shipped project stays cleanly MIT and the link path never regresses.
- requirements.txt: documents it as a commented optional extra
(pip install SpotipyFree) with the GPL/MIT rationale.
- 9 tests: normalisation, pagination past 100, library-missing -> raises (->
fallback), and the embed-fallback orchestration.
Needs a live click-through with SpotipyFree installed to confirm the exact
class/method names match (SpotipyFree.Spotify / playlist / playlist_items).
The full-fetch's logs used a bare module logger that app.log doesn't capture,
so we couldn't see whether the API path succeeded or why it fell back. Route
them to 'soulsync.spotify_public' and log: token found?, embed parsed?, the
API HTTP status on a non-200, and pagination result. Lets us see the exact
failure (e.g. 401 vs 429) on the next link-tab test.
Live debugging the 'shows 100' report:
- The full playlist page no longer embeds an accessToken, and get_access_token
/ server-time now 403/404. The EMBED page (open.spotify.com/embed/playlist/{id})
still ships a usable anonymous token. Was fetching the wrong page -> no token
-> raised -> embed fallback (100). Now reads the embed page for the token.
- Confirmed live: token extraction + embed parse work; the token is accepted by
the Web API (429 rate-limit, not 401). Could not show >100 from here because
the test IP got rate-limited from probing; needs a clean-IP click-through.
While in there, made it more robust against the rate-limiting that's clearly in
play:
- Refactored scrape_spotify_embed -> reusable parse_embed_html.
- fetch_public_playlist_full now does ONE embed fetch for token + name + first
page (no separate metadata call = fewer requests = less 429 surface), then
paginates the API. If the API is unavailable/rate-limited, it keeps the embed
page's tracks (<=100) instead of raising — so the result is always >= today's
behaviour, never worse.
- 12 tests incl. the new API-fails-but-embed-tracks-survive path.
Caveat unchanged: rides Spotify's undocumented embed-page token; degrades to the
embed fallback, never crashes.
The no-auth 'add by link' path scrapes Spotify's embed widget, which only ever
contains ~100 tracks and can't paginate — so big public playlists got
truncated. This adds an in-house anonymous fetch that pulls the FULL list:
- core/spotify_public_api.py: reads the anonymous web-player accessToken Spotify
already embeds in its own open.spotify.com page HTML (no app credentials, and
no rotating TOTP secret for us to maintain), then paginates
/v1/playlists/{id}/tracks 100 at a time until the whole playlist is pulled.
Returns the embed scraper's exact shape. Pure helpers + injected http_get so
it's unit-testable without the network.
- core/spotify_public_scraper.fetch_spotify_public(): tries the full fetch for
playlists; on ANY failure (or for albums) falls back to scrape_spotify_embed.
Worst case == today's behaviour, so the link path can't regress.
- web_server: the link-tab endpoint and the authed flow's last-resort scrape
now both go through fetch_spotify_public.
Scoped entirely to the spotify_public_* (no-auth) path — the authenticated
playlist sync is untouched. 11 tests (token extraction, normalisation,
pagination past 100, and the embed-fallback orchestration).
Caveat: rides Spotify's undocumented page-embedded token — expected to break
when they change their page; it degrades to the embed fallback, never crashes.
Needs a live click-through to confirm the token path works end to end (can't
hit Spotify from the test env).