mempalace-session: make --dry-run dedup-aware
A --dry-run report showed all qualifying sessions without indicating
which would actually hit the palace on a real run. On a second run
against an already-mined corpus this was misleading — output said
'Exported 62 session(s)' but the real mine step would skip all 62.
The wrapper now queries the palace's chroma.sqlite3 (read-only, via
file:...?mode=ro URI) for source_file values under the staging dir,
then tags each exported session as [NEW] or [SKIP] during listing and
reports the split in the summary:
Exported 62 session(s) to ~/.cache/mempalace-session/wing_conversations
0 new → will be filed on mine
62 already filed → will be skipped (dedup by source_file)
--dry-run: no new sessions to mine. A real run would skip all 62.
Implementation notes:
- Classification is best-effort. If the palace is unreachable (fresh
install, moved, permission-denied, file missing) the wrapper falls
back to treating all exports as NEW — the real mine step still
delegates dedup to 'mempalace mine --mode convos' which is the
authoritative source of truth. Getting the classification wrong
in --dry-run is cosmetic; behaviour of a real run is unchanged.
- Palace path respects $MEMPALACE_PATH env var for non-default setups.
- Same classification also shown on a real (non-dry-run) mine so users
see upfront how much of the export set is actually new before the
miner runs.
Verified both directions:
- All-already-filed case (current box, 62 sessions in palace): reports
0 new, 62 skipped. --dry-run message correctly says 'would skip all'.
- Partial case (simulated by deleting one session's metadata from
palace): reports 1 new, 61 skipped. --dry-run message correctly
says 'would file 1 new'. Palace was restored from backup
immediately after the test.
README and SKILL.md both updated with the new dedup-aware output and
a direct answer to the FAQ 'will it mine the same sessions again?'
This commit is contained in:
@@ -90,6 +90,16 @@ A docs-heavy repo should produce ~5–10 drawers per file. >15 drawers/file on a
|
||||
|
||||
Second run immediately after first → 0 new drawers, only the post-mine `repair` step runs (~5 min on 5k drawers).
|
||||
|
||||
**`mempalace-session --dry-run` is dedup-aware.** Each session listed is tagged `[NEW]` (would be filed) or `[SKIP]` (already in the palace), and the summary reports the split:
|
||||
|
||||
```
|
||||
Exported 62 session(s) to ~/.cache/...
|
||||
0 new → will be filed on mine
|
||||
62 already filed → will be skipped (dedup by source_file)
|
||||
```
|
||||
|
||||
So when a user asks "will it mine the same sessions again?" — point them at `mempalace-session --dry-run` and read the summary line. If `N new = 0`, nothing will be re-filed. The classification check is best-effort (falls back to "everything is new" if palace unreachable); the real mine step delegates to `mempalace mine --mode convos`, which is always the authoritative dedup source.
|
||||
|
||||
### Incremental catch-up
|
||||
|
||||
```bash
|
||||
|
||||
Reference in New Issue
Block a user