mempalace-session: make --dry-run dedup-aware

A --dry-run report showed all qualifying sessions without indicating
which would actually hit the palace on a real run. On a second run
against an already-mined corpus this was misleading — output said
'Exported 62 session(s)' but the real mine step would skip all 62.

The wrapper now queries the palace's chroma.sqlite3 (read-only, via
file:...?mode=ro URI) for source_file values under the staging dir,
then tags each exported session as [NEW] or [SKIP] during listing and
reports the split in the summary:

  Exported 62 session(s) to ~/.cache/mempalace-session/wing_conversations
    0 new   → will be filed on mine
    62 already filed → will be skipped (dedup by source_file)

  --dry-run: no new sessions to mine. A real run would skip all 62.

Implementation notes:
- Classification is best-effort. If the palace is unreachable (fresh
  install, moved, permission-denied, file missing) the wrapper falls
  back to treating all exports as NEW — the real mine step still
  delegates dedup to 'mempalace mine --mode convos' which is the
  authoritative source of truth. Getting the classification wrong
  in --dry-run is cosmetic; behaviour of a real run is unchanged.
- Palace path respects $MEMPALACE_PATH env var for non-default setups.
- Same classification also shown on a real (non-dry-run) mine so users
  see upfront how much of the export set is actually new before the
  miner runs.

Verified both directions:
- All-already-filed case (current box, 62 sessions in palace): reports
  0 new, 62 skipped. --dry-run message correctly says 'would skip all'.
- Partial case (simulated by deleting one session's metadata from
  palace): reports 1 new, 61 skipped. --dry-run message correctly
  says 'would file 1 new'. Palace was restored from backup
  immediately after the test.

README and SKILL.md both updated with the new dedup-aware output and
a direct answer to the FAQ 'will it mine the same sessions again?'
This commit is contained in:
Joakim Persson
2026-04-30 08:33:36 +00:00
parent 72e7019101
commit 349a3a3d3d
3 changed files with 81 additions and 6 deletions
+10
View File
@@ -90,6 +90,16 @@ A docs-heavy repo should produce ~510 drawers per file. >15 drawers/file on a
Second run immediately after first → 0 new drawers, only the post-mine `repair` step runs (~5 min on 5k drawers).
**`mempalace-session --dry-run` is dedup-aware.** Each session listed is tagged `[NEW]` (would be filed) or `[SKIP]` (already in the palace), and the summary reports the split:
```
Exported 62 session(s) to ~/.cache/...
0 new → will be filed on mine
62 already filed → will be skipped (dedup by source_file)
```
So when a user asks "will it mine the same sessions again?" — point them at `mempalace-session --dry-run` and read the summary line. If `N new = 0`, nothing will be re-filed. The classification check is best-effort (falls back to "everything is new" if palace unreachable); the real mine step delegates to `mempalace mine --mode convos`, which is always the authoritative dedup source.
### Incremental catch-up
```bash