Discogs uses two disambiguation conventions for duplicate artist names:
- legacy `(N)` numeric suffix: "Bullet (2)", "Madonna (3)"
- newer `*` asterisk suffix: "John Smith*", "Foo*"
Both were leaking through to the UI on artist search and album search,
and worse — through the import path into folder names on disk
(reported: importing yielded folders literally named `Foo*`).
The pre-existing cleanup only handled `(N)` and only at ONE site —
`get_user_collection` (line 469) and one path inside
`extract_track_from_release` (line 448 — `re.sub(r'\s*\(\d+\)$', '',
artist_name)`). Every other surface (artist search, album search,
album-track lookups, get_artist_albums feature matching) returned the
raw Discogs string.
Centralized into `_clean_discogs_artist_name(name)` at module top,
with regex covering both suffixes including repeated forms (`Baz**`,
`Foo (3)*`). Applied at six sites:
- `Artist.from_discogs_artist` (artist search)
- `Album.from_discogs_release` (album search — three fallbacks: array,
string, title-split)
- `Track.from_discogs_track` (track lookup — track-level + release-level
fallback)
- `extract_track_from_release` (replaces the inline `(N)`-only re.sub)
- `get_user_collection` (existing site, now also strips `*`)
- `get_artist_albums` (artist_name used for primary-vs-feature matching;
cleaning prevents `Beyoncé*` from failing equality vs `Beyoncé`)
- `get_album` (artists_list + per-track artists in the tracklist projection)
Tests:
- New `test_clean_discogs_artist_name` parametrized over 14 cases
covering `(N)`, `*`, repeated `**`, combined `(N) *`, whitespace
handling, empty/None defensive returns.
- New `test_get_user_collection_strips_discogs_asterisk_disambiguation`
pinning the asterisk path end-to-end through the collection import
flow (sibling to the existing `(N)` test).
- Existing 37 discogs tests still pass.
Out of scope (separate issue): the same #634 report flagged track-count
and year fields rendering as 0 / empty in Discogs album search. Both
are inherent to Discogs `/database/search` response shape — search
results don't carry `tracklist` (only release detail does) and `year`
is often `0` in search payloads. Fixing requires lazy-fetching release
detail per row, which hits the 25 req/min unauth limit hard. Not
bundled here.