Two pre-existing parity gaps in `record_soulsync_library_entry` that
the prior parity commits left untouched. Both close real holes
between auto-import writes and what the soulsync_client deep scan
would have produced.
# Gap 1: Album duration was the first-imported track's duration
`record_soulsync_library_entry` is called once per track. The album
INSERT only fires for the FIRST track of a new album (subsequent
tracks find the album row already exists). The INSERT was passing
`duration_ms` — `track_info["duration_ms"]` — as the album's
`duration` column. That's the duration of one track, not the album
total. Compare to `SoulSyncAlbum.duration` in soulsync_client which
is `sum(t.duration for t in self._tracks)`.
Fix:
- Worker computes `album_total_duration_ms = sum(...)` across every
matched track and threads it onto context as
`album.duration_ms`.
- side_effects reads that value (or falls back to the per-track
duration for legacy non-auto-import callers) and writes it as the
album row's `duration`.
# Gap 2: Re-imports of the same artist/album were insert-only
When the SELECT-by-id or SELECT-by-name found an existing soulsync
artist or album row, the function skipped completely — no UPDATE
path. Meant: artist genres / thumb / source-id reflected ONLY
whatever the FIRST imported album supplied, never refreshing as
more albums by that artist landed. Ten more imports later, the
artist row still held whatever the first random import wrote.
Conservative fix: when an existing row matches, run an UPDATE that
fills only the columns whose current value is NULL or empty. Never
overwrites populated values — protects manual edits +
enrichment-worker writes the same way the scanner UPDATE path
preserves enrichment columns.
Implementation note: the empty-check happens in Python, NOT SQL.
Initial pass tried `COALESCE(NULLIF(col, ''), NULLIF(col, 0), ?)`
but SQLite's `NULLIF(text_col, 0)` returns the original text value
instead of NULL — different types, no coercion. So the SQL-only
conditional was unreliable on text columns. New helper does
`SELECT cols FROM table WHERE id`, compares each column in Python,
and emits UPDATE clauses only for the ones that need filling.
Allowlist defense: f-string column names go through
`_SOULSYNC_FILLABLE_COLUMNS` validation before interpolation.
Misuse adding new columns without an allowlist update fails closed
(logger.debug + skip).
# Tests added (4)
- `test_album_duration_uses_album_total_not_single_track` —
album with single-track context carrying explicit
`album.duration_ms = 2_500_000` writes 2_500_000 to the album row,
not the per-track 200_000 fallback.
- `test_re_import_fills_empty_artist_fields` — first import lands
artist with empty thumb + empty genres; second import for same
artist with thumb + genres present updates the existing row.
- `test_re_import_does_not_clobber_populated_artist_fields` —
first import writes rich genres + thumb; second import with
worse / different metadata leaves the existing row untouched.
- `test_re_import_fills_empty_source_id_when_missing` — first
import had no source artist ID; second import does — fills the
empty `spotify_artist_id` column on the existing row.
# Verification
- 10/10 side-effects tests pass (including 4 new + 4 from prior
parity commit + 2 history/provenance)
- 217 imports tests pass (no regression)
- 2369 full suite passes (+4 from prior, +22 PR-total from baseline 2347)
- 1 pre-existing flake (`test_watchdog_warns_about_stuck_workers`,
passes in isolation, unrelated)
- Ruff clean
// --- post-release patch work on the 2.4.3 line — entries hidden by _getLatestWhatsNewVersion until the build version bumps ---
{date:'Unreleased — 2.4.3 patch work'},
{title:'Auto-Import: Album Duration Is Album Total + Re-Imports Fill Metadata Gaps',desc:'two more parity gaps closed in the soulsync standalone library write path. (1) album row\'s `duration` column was being written with the FIRST imported track\'s duration instead of the album total — pre-existing bug that survived the prior parity commit. soulsync_client deep scan computes `sum(t.duration for t in self._tracks)` for each album; auto-import now mirrors that by computing the sum across every matched track in the worker and threading it through context to the album INSERT. (2) `record_soulsync_library_entry` was insert-only on artists + albums — once a row existed (matched by id OR name fallback), subsequent imports of the same artist or album skipped completely. meant: artist genres / thumb / source-id reflected ONLY whatever the FIRST imported album supplied, never refreshing as more albums by that artist landed (ten more deezer/spotify imports later, artist row still had whatever the first random import wrote). new conservative UPDATE path: when an existing row matches, fill ONLY the columns whose current value is NULL or empty — never overwrites populated values. protects manual edits + enrichment-worker writes the same way scanner UPDATEs preserve enrichment columns. f-string column names are validated against an allowlist (`_SOULSYNC_FILLABLE_COLUMNS`) before interpolation — defensive against accidental misuse adding columns without an allowlist update. 4 new tests pin: album duration uses sum not single-track, re-import fills empty thumb + genres on existing artist row, re-import does NOT clobber populated values, re-import fills empty source-id columns when later import has them.',page:'import'},
{title:'Auto-Import: Genre Tags Land On The Artists Row + ISRC/MBID Type Hardening',desc:'small followup to the standalone-library parity commit. (1) auto-import now reads the GENRE tag from each matched audio file (mutagen easy mode, supports flac / mp3 / m4a) and aggregates the deduped set across the album onto the new artists row\'s genres column. matches what soulsync_client._scan_transfer would have written if you\'d done a fresh deep scan after the import — your imported artists no longer feel hollow compared to plex / jellyfin / navidrome scans. dedup is case-insensitive but preserves original casing + insertion order so the json column reads naturally ("Hip-Hop, Rap, Trap" not "hip-hop, rap, trap"). (2) defensive `str()` cast on the worker\'s isrc + mbid extraction. metadata source clients all coerce to string today via `_build_album_track_entry`, but if a future source ever returned int / None for either id the side-effects layer would crash on `.strip()`. cheap insurance. 3 new tests pin: genre aggregation produces deduped insertion-order list, empty when no GENRE tags, isrc/mbid hostile-type input (int, None) coerced to safe string before propagation.',page:'import'},
{title:'Auto-Import: SoulSync Standalone Library Now Gets Full Server-Quality Rows',desc:'soulsync standalone is meant to be a full replacement for plex / jellyfin / navidrome — the imported tracks should land in the db with the same field richness a media server scan would write. they weren\'t. the auto-import context dict (the payload it handed to the post-process pipeline) had no `source` field anywhere, so `record_soulsync_library_entry` couldn\'t pick the right source-id column on the new tracks/albums/artists rows. result: every auto-imported track landed with NULL on `spotify_track_id` / `deezer_id` / `itunes_track_id` / etc. — watchlist scans (which match by stable source IDs) couldn\'t recognise these tracks as already in library and would re-download them on the next pass. fixed by threading `identification[\'source\']` onto the top-level context, plus per-recording IDs (`isrc`, `musicbrainz_recording_id`) onto track_info so picard-tagged libraries land their per-recording metadata directly. also extracted the artist source ID from the metadata source\'s search response (`_search_metadata_source` and `_search_single_track` now pull `best_result.artists[0][\'id\']`) and threaded it through identification → context → standalone library write, so the artists row finally gets its source-ID column populated instead of staying NULL forever. also added `_download_username=\'auto_import\'` so library history shows "Auto-Import" instead of mislabeling every staging import as "Soulseek" (the fallback default), and an "auto_import" → "Auto-Import" mapping in the source-map dicts at side_effects.py to honour it. record_soulsync_library_entry tracks INSERT now also writes `musicbrainz_recording_id` + `isrc` columns directly (matches the navidrome scanner write path). 17 new tests pin: auto-import context carries source for every metadata source (spotify/deezer/itunes/discogs), `_download_username=auto_import`, isrc + mbid pass-through to track_info, album-id back-reference on track_info, artist source-id flows from identification → context (and not from album_id, the prior copy-paste bug), `_search_metadata_source` extracts artist_id from search response, soulsync library writes mbid + isrc to dedicated columns, deezer source maps to deezer_id column, library history + provenance use Auto-Import / auto_import labels.',page:'import'},
{title:'Auto-Import: Process Multiple Albums At Once',desc:'auto-import used to process one album at a time. drop 5 albums into staging → wait for the first to fully finish (identify + match + every track post-processed) before the second one even starts. on a slow network or with a big batch this means 30+ minutes of staring at "Processing AlbumOne" while the others sit untouched. now there\'s a small bounded thread pool (3 workers by default, configurable) — up to 3 albums process in parallel, the queue moves through the rest as workers free up. clicking "Scan Now" multiple times no longer spawns extra unbounded scan threads — every trigger (timer + manual button) routes through one shared scan lock so duplicate triggers no-op instead of stacking up. live progress widget on the auto-import card now lists EACH in-flight album with its own track index/total/name instead of one shared scalar that the parallel workers used to stomp on each other. graceful shutdown: stopping the worker waits for in-flight pool work to finish before reporting stopped — no half-moved files or partial DB writes mid-album. stats counters (`scanned` / `auto_processed` / `pending_review` / `failed`) now use a lock so parallel workers don\'t lose increments under load. 17 new tests pin: pool size config, scan lock dedup, executor dispatch + bounded parallelism, cross-trigger candidate dedup, graceful shutdown, per-candidate UI state isolation across parallel workers, stats counter thread-safety, and snapshot consistency.',page:'import'},