Discogs uses two disambiguation conventions for duplicate artist names:
- legacy `(N)` numeric suffix: "Bullet (2)", "Madonna (3)"
- newer `*` asterisk suffix: "John Smith*", "Foo*"
Both were leaking through to the UI on artist search and album search,
and worse — through the import path into folder names on disk
(reported: importing yielded folders literally named `Foo*`).
The pre-existing cleanup only handled `(N)` and only at ONE site —
`get_user_collection` (line 469) and one path inside
`extract_track_from_release` (line 448 — `re.sub(r'\s*\(\d+\)$', '',
artist_name)`). Every other surface (artist search, album search,
album-track lookups, get_artist_albums feature matching) returned the
raw Discogs string.
Centralized into `_clean_discogs_artist_name(name)` at module top,
with regex covering both suffixes including repeated forms (`Baz**`,
`Foo (3)*`). Applied at six sites:
- `Artist.from_discogs_artist` (artist search)
- `Album.from_discogs_release` (album search — three fallbacks: array,
string, title-split)
- `Track.from_discogs_track` (track lookup — track-level + release-level
fallback)
- `extract_track_from_release` (replaces the inline `(N)`-only re.sub)
- `get_user_collection` (existing site, now also strips `*`)
- `get_artist_albums` (artist_name used for primary-vs-feature matching;
cleaning prevents `Beyoncé*` from failing equality vs `Beyoncé`)
- `get_album` (artists_list + per-track artists in the tracklist projection)
Tests:
- New `test_clean_discogs_artist_name` parametrized over 14 cases
covering `(N)`, `*`, repeated `**`, combined `(N) *`, whitespace
handling, empty/None defensive returns.
- New `test_get_user_collection_strips_discogs_asterisk_disambiguation`
pinning the asterisk path end-to-end through the collection import
flow (sibling to the existing `(N)` test).
- Existing 37 discogs tests still pass.
Out of scope (separate issue): the same #634 report flagged track-count
and year fields rendering as 0 / empty in Discogs album search. Both
are inherent to Discogs `/database/search` response shape — search
results don't carry `tracklist` (only release detail does) and `year`
is often `0` in search payloads. Fixing requires lazy-fetching release
detail per row, which hits the 25 req/min unauth limit hard. Not
bundled here.
Discord request: pull user's Discogs collection into the Your Albums
section on Discover, similar to how Spotify Liked Albums works.
Implementation extends the existing 3-source pipeline (Spotify /
Tidal / Deezer) to a 4-source pipeline with click-context dispatch —
Discogs-only albums open with rich Discogs release detail (vinyl/CD
format, year, label, country, tracklist). Mirrors the per-source
dispatch pattern from enhanced/global search.
Discogs client (`core/discogs_client.py`):
- New `get_authenticated_username()` resolves the username for the
configured personal token via Discogs's `/oauth/identity` endpoint.
Cached on the instance so subsequent collection page-fetches don't
re-hit it.
- New `get_user_collection(username=None, folder_id=0, per_page=100,
max_pages=50)` walks all pages of `/users/{username}/collection/
folders/{folder_id}/releases`. Returns normalized dicts ready for
upsert_liked_album. folder_id=0 = Discogs's "All" folder.
Pagination cap of max_pages*per_page = 5000 releases — bounds
runtime on heavy collections.
- New `get_release(release_id)` thin wrapper for `/releases/{id}` —
returns the raw API response so the album-detail endpoint can
render rich context.
- Both methods defensive: missing token → empty list, malformed
responses → skipped, falsy ids → None. Disambiguation suffix
stripping (`Madonna (3)` → `Madonna`) so Discogs artist names
match what Spotify/Tidal/Deezer use.
Schema (`database/music_database.py`):
- New `discogs_release_id TEXT` column on `liked_albums_pool`.
Migration uses the established `try SELECT, except ALTER TABLE`
pattern. Idempotent; safe on existing installs.
- Added the column to the canonical CREATE TABLE for fresh installs.
- `upsert_liked_album` extended with `'discogs': 'discogs_release_id'`
in BOTH the INSERT and UPDATE id-column maps so Discogs source_id
routes to the new column. INSERT statement column count + value
count updated together.
Backend (`web_server.py`):
- `/api/discover/your-albums/sources` — adds Discogs to the
`connected` list when `discogs.token` config is set.
- `_fetch_liked_albums` — new branch for Discogs. Lazy-imports
DiscogsClient, respects the `enabled_sources` config, walks the
collection, upserts each release. Same try/except shape as the
existing source branches.
- `/api/discover/album/<source>/<album_id>` — new `discogs` branch
fetches the release via DiscogsClient.get_release, normalizes the
Discogs tracklist format, parses Discogs's `MM:SS`/`HH:MM:SS`
duration strings to milliseconds, returns the same response shape
as the Spotify/Deezer/iTunes branches.
Frontend (`webui/static/discover.js`):
- `openYourAlbumsSourcesModal` — adds Discogs to `sourceInfo` with
the vinyl emoji icon. Existing toggle/save plumbing handles it.
- `openYourAlbumDownload` — restructured the per-source dispatch:
builds an ordered list of (source, id) tuples, tries each in turn,
breaks on the first successful response. Pure-Discogs albums go
straight to the Discogs detail endpoint → modal opens with Discogs
context. Multi-source albums prefer Spotify/Deezer first since
their tracklists carry proper streaming IDs ready for download.
Tests: `tests/test_discogs_collection_source.py` — 12 cases:
- get_user_collection: empty without token, normalizes response
shape, strips disambiguation suffix, handles missing year, skips
malformed releases, paginates correctly, caps at max_pages,
uses explicit username when provided.
- get_release: passes id through to /releases/{id}, returns None
for invalid ids without API call.
- liked_albums_pool: discogs_release_id round-trips through upsert
+ get; multi-source dedup carries both Spotify and Discogs IDs
on the same row.
Verified: full suite 1825 pass (12 new), ruff clean, smoke test
populating + reading the discogs_release_id column round-trips
correctly via the real DB.
WHATS_NEW entry under '2.4.2' dev cycle.
- Add _extract_discogs_fields to metadata cache — handles Discogs field
names (title vs name, images array, Artist - Title format)
- Worker uses _fetch_and_cache_artist/_fetch_and_cache_album helpers
that cache raw data while returning it for enrichment
- All search/lookup methods cache results for repeat queries
- Cache browser: Discogs stat pill, source filter, clear button, badge
- Fixes albums showing as 'Unknown' and artists missing images in cache
- get_album and get_album_tracks now try /masters/{id} first, fall back
to /releases/{id} — artist discography returns master IDs which are
in a different namespace than release IDs
- Fixes wrong album showing in download modal (master ID 3664443 for
GNX was hitting /releases/3664443 which is a different album)
- Add Discogs source override to all 6 artist/album/track endpoints
- Add discogs_id to _resolve_db_album_id lookup
- Remove upfront master detail fetching (was 15+ API calls, 40+ seconds)
- Discography loads from releases list only (~1 second, 2 API calls)
- Track counts populate on-demand via get_album_tracks when clicking album
- New get_album_tracks method: tries /masters first, falls back to /releases,
returns Spotify-compatible format with proper disc/track numbering
- Album type defaults to 'album' for masters without format metadata —
Discogs limitation, singles only detectable from individual release format
- Search results return format as list ['Vinyl', 'LP'] while artist
releases return comma-separated string — handle both
- Fixes "'list' object has no attribute 'lower'" error
- Master releases now fetch /masters/{id} to get actual tracklist,
genres, styles, and images — fixes 0/0 track count display
- Album type re-evaluated with real track count: 1-3 = single,
4-6 = EP, 7+ = album
- Cover art from master detail used when search results have none
- Tested: Kendrick Lamar shows correct track counts, proper types,
and images for all albums
- Fetch artist name first, then compare against each release's primary
artist — skip releases where the artist is listed after Feat./Ft./&
- "Beyoncé Feat. Kendrick Lamar" → skipped (Kendrick is featured)
- "Kendrick Lamar Feat. Rihanna" → kept (Kendrick is primary)
- Fixes artist pages showing unrelated albums from other artists
- Add _normalize_name helper and re import to DiscogsClient
- Prefer master releases over individual pressings to avoid duplicates
(multiple pressings of same album showing separately)
- Individual releases only included if no master exists for that title
- Skip non-main roles (appearances, features, remixes by others)
- Better album type detection from format string: catches LP, Album,
EP, Single, Compilation from comma-separated format field
- Fetch more results (3x limit) to compensate for filtering
- Fix source name mapping so sidebar/dashboard shows 'Discogs' instead
of falling through to 'iTunes'
- Fix album type detection: parse format string from artist releases
endpoint (e.g. "File, FLAC, Single, 320") to correctly identify
singles, EPs, albums, compilations — was defaulting everything to
'single' because track count was 0
- Remove fake track search that returned albums as tracks — Discogs
has no track-level search API, so tracks section is empty (honest)
- Track data available via album tracklists instead
- Full parity with iTunes/Deezer clients — same Track/Artist/Album
dataclasses, same method signatures (search_artists, search_albums,
search_tracks, get_artist, get_album, get_artist_albums)
- 25 req/min unauthenticated, 60 req/min with free personal token
- Rate limited via same decorator pattern with API call tracking
- Unique data: 400+ genre/style taxonomy, label info, catalog numbers,
community ratings, artist bios
- Smart "Artist - Title" parsing for search results
- Release deduplication (Discogs has many pressings of same album)
- Track search via release tracklist extraction
- Tested: artist/album/track search, artist detail with bio, album
detail with full tracklist + genres + styles + label