Four selection-quality fixes on the SoulSync-made discover playlists.
None change public method signatures; all are tightenings on what's
already there.
(1) Diversity for Hidden Gems + Discovery Shuffle
Both used to be `RANDOM() LIMIT N` with no diversity. Could return
50 tracks from one artist or 20 from one album if the discovery
pool happened to be skewed. Both now over-fetch 3x and run the
existing `_apply_diversity_filter`:
- Hidden Gems: max 2 per album, 3 per artist
- Discovery Shuffle: max 2 per album, 2 per artist (tighter — shuffle
should feel maximally varied)
(2) Source-aware popularity thresholds
`popularity >= 60` for "Popular Picks" and `popularity < 40` for
"Hidden Gems" was Spotify-shaped (0-100 scale). Deezer writes its
`rank` value into that column (often six-digit integers); iTunes
writes nothing meaningful. For Deezer-primary users:
- Popular Picks pulled essentially everything (rank >= 60 = all)
- Hidden Gems pulled essentially nothing (rank < 40 = none)
New `_get_popularity_thresholds(source)` helper returns per-source
values:
- Spotify: (60, 40) — the existing 0-100 scale
- Deezer: (500_000, 100_000) — ballpark from real rank values
- iTunes / unknown: (None, None) — skip the popularity filter
entirely, fall back to random + diversity
`get_popular_picks` and `get_hidden_gems` now consult the helper.
When threshold is None they skip the popularity SQL filter. Diversity
+ ID gate still apply.
(3) Push genre keyword filter into SQL
`get_genre_playlist` used to fetch `limit=1_000_000` rows into Python
then run a substring keyword filter on `artist_genres`. Bad on big
discovery pools.
Now the keyword OR chain is generated as SQL placeholders:
AND (artist_genres LIKE ? OR artist_genres LIKE ? OR ...)
Each placeholder gets `f'%{keyword.lower()}%'` via `extra_params`.
`fetch_limit` drops back to `limit * 10`. `_genre_matches` Python
helper deleted (only intra-file caller; verified via grep).
Parent-genre expansion via `GENRE_MAPPING` preserved — keywords list
feeds the LIKE chain unchanged.
(4) Filter out tracks already in library
Discovery pool can include tracks the user already owns. Hidden Gems
/ Shuffle / Popular Picks shouldn't surface those.
`_select_discovery_tracks` gained `exclude_owned: bool = True`
parameter. When True, adds a correlated NOT EXISTS subquery against
the `tracks` table covering all 3 source IDs:
AND NOT EXISTS (
SELECT 1 FROM tracks t WHERE
(t.spotify_track_id IS NOT NULL AND t.spotify_track_id = discovery_pool.spotify_track_id)
OR (t.itunes_track_id IS NOT NULL AND t.itunes_track_id = discovery_pool.itunes_track_id)
OR (t.deezer_id IS NOT NULL AND t.deezer_id = discovery_pool.deezer_track_id)
)
Note column-name asymmetry: tracks.deezer_id vs
discovery_pool.deezer_track_id. Inline comment marks the trap. All
5 public discovery methods automatically benefit (default True).
Seasonal Playlist doesn't go through the helper so it's unaffected
(curated content, dedup is wrong intent there).
Tests
12 new tests in `tests/test_personalized_playlists_id_gate.py` (27
total in the file):
- Hidden Gems + Discovery Shuffle apply diversity (cap proven by
inserting 10 same-artist + same-album rows and asserting return
count ≤ per-album cap)
- Popularity thresholds: Spotify (60, 40), Deezer larger scale,
iTunes None / None
- Popular Picks skips threshold filter when None
- Genre playlist pushes filter to SQL (parent + child genre expansion)
- Owned-track exclusion: filtered when match, kept when no match,
opt-out flag works
- Deezer column-name asymmetry pinned (regression footgun)
Test fixture re-added the minimal `tracks` table (4 columns: id,
spotify_track_id, itunes_track_id, deezer_id) — only what the new
NOT EXISTS subquery needs to join. Plus `insert_library_track`
helper.
Verification
- 27/27 in this test file pass (15 prior + 12 new)
- 2232/2232 full suite green
- ruff clean
LOC delta:
- core/personalized_playlists.py: 1030 → 1101 (+71)
- tests/test_personalized_playlists_id_gate.py: 352 → 616 (+264)