Self-review caught a test-fidelity hole: the temp cache overrode
_run_maintenance_write with a simplified version, so evict_over_capacity was
tested against the stub's plumbing, not production's (retry + connection
handling). Removed the override — _get_db is now the only injected seam, so the
test runs the genuine code path. Differential-verified the LRU assertions are
real: flipping ORDER BY ASC->DESC makes them fail. 8/8 pass; ruff clean.
Investigation (not assumption): the cache's TTL eviction + junk cleanup ARE
correct and DO run automatically every 6h (CacheEvictorJob, auto_fix=True).
The real gap is there's NO SIZE CEILING — TTL-only eviction means 'how big can
it get' = 'however much you fetch within the 30-day window', so heavy
discovery/enrichment legitimately grew metadata_cache_entities to ~1.8M rows /
7.6 GB, bloating the main DB (a factor in the corruption incident).
Fix — add a bounded LRU cap:
- entities_to_evict_for_capacity(total, max_rows): pure decision fn (cap<=0
disables), unit-testable like core.db_integrity.prune_backups.
- MetadataCache.evict_over_capacity(): deletes the least-recently-ACCESSED rows
(uses the already-stored last_accessed_at; NULL = never-touched = evicted
first) down to the ceiling. Default 250k rows, tunable.
- Wired as Phase 5 of CacheEvictorJob — runs LAST, after TTL/junk/orphan/null
cleanup, so it only trims a still-oversized HEALTHY cache.
Verified safe to bound/wipe: audited every cache reader (get_entity/
get_entities_batch/get_search_results/get_entity_detail/browse) — all degrade
to None/[]/empty on miss, treated as 'go fetch'. Nothing depends on a row
existing, so eviction can't break callers.
Tests: tests/metadata/test_cache_capacity_eviction.py (8) — pure-fn coverage +
real temp-DB proof that it drops the LRU rows specifically (not arbitrary) and
NULL-access rows go first. 18 adjacent cache tests still green; ruff clean.
Follow-ups (separate phases, scoped): (2) move the cache to its own bounded
metadata_cache.sqlite3 (no JOINs to library tables — confirmed clean to split;
invalidate-and-rebuild rather than migrate the 7.6GB), (3) kill the
raw_json + 22-extracted-column double storage.