Files
mempalace-toolkit/SKILL.md
T
Joakim Persson 349a3a3d3d mempalace-session: make --dry-run dedup-aware
A --dry-run report showed all qualifying sessions without indicating
which would actually hit the palace on a real run. On a second run
against an already-mined corpus this was misleading — output said
'Exported 62 session(s)' but the real mine step would skip all 62.

The wrapper now queries the palace's chroma.sqlite3 (read-only, via
file:...?mode=ro URI) for source_file values under the staging dir,
then tags each exported session as [NEW] or [SKIP] during listing and
reports the split in the summary:

  Exported 62 session(s) to ~/.cache/mempalace-session/wing_conversations
    0 new   → will be filed on mine
    62 already filed → will be skipped (dedup by source_file)

  --dry-run: no new sessions to mine. A real run would skip all 62.

Implementation notes:
- Classification is best-effort. If the palace is unreachable (fresh
  install, moved, permission-denied, file missing) the wrapper falls
  back to treating all exports as NEW — the real mine step still
  delegates dedup to 'mempalace mine --mode convos' which is the
  authoritative source of truth. Getting the classification wrong
  in --dry-run is cosmetic; behaviour of a real run is unchanged.
- Palace path respects $MEMPALACE_PATH env var for non-default setups.
- Same classification also shown on a real (non-dry-run) mine so users
  see upfront how much of the export set is actually new before the
  miner runs.

Verified both directions:
- All-already-filed case (current box, 62 sessions in palace): reports
  0 new, 62 skipped. --dry-run message correctly says 'would skip all'.
- Partial case (simulated by deleting one session's metadata from
  palace): reports 1 new, 61 skipped. --dry-run message correctly
  says 'would file 1 new'. Palace was restored from backup
  immediately after the test.

README and SKILL.md both updated with the new dedup-aware output and
a direct answer to the FAQ 'will it mine the same sessions again?'
2026-04-30 08:33:36 +00:00

235 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
name: opencode-mempalace-bridge
description: Set up the producer side of MemPalace — feed opencode session history and project docs into the palace via the wrappers in the mempalace-toolkit repo. Use when provisioning a new machine, when the user asks how palace feeding works, when opencode sessions aren't showing up in searches, or when a project needs docs-only mining. Pairs with the `mempalace` skill (consumer side).
---
# Opencode ↔ MemPalace Bridge (producer side)
## Overview
The `mempalace` skill covers *using* the palace (search, diary, KG). This skill covers *feeding* it — specifically, how to wire opencode session history and project docs into the palace on a new machine or after a container recreate.
**Authoritative source:** `/workspace/mempalace-toolkit/ARCHITECTURE.md` (also at the root of the `mempalace-toolkit` repo on gitea). When in doubt, read that file — it's the canonical spec. This skill is the short-form checklist.
**Core idea:** two thin wrappers in `mempalace-toolkit/bin/` close gaps in the stock mempalace CLI:
| Gap | Wrapper |
| ---------------------------------------------------------------------------------------- | -------------------- |
| `mempalace mine` floods the palace with source code we don't want | `mempalace-docs` |
| `mempalace mine --mode convos` can't read opencode's SQLite DB | `mempalace-session` |
Both follow the same **stage-to-cache-then-mine** idiom — they curate input into `~/.cache/…/<wing>/`, then delegate to `mempalace mine`.
## When to Load This Skill
- User asks "how does the palace get fed?" or mentions setting up mempalace on a new machine.
- Opencode conversations are missing from palace searches (`wing_conversations` is empty or stale).
- A project needs to be mined but you want *docs only, no source code*.
- User asks about `mempalace-docs` or `mempalace-session`.
- After a container recreate on a devbox — the wrappers need reinstall.
- Planning to retire either wrapper once upstream PRs merge (see §6 of ARCHITECTURE.md).
## Setup Recipe (new machine)
Prerequisites: `opencode` installed with an active DB at `~/.local/share/opencode/opencode.db`, `mempalace` CLI v3.3.3+, Python 3 (stdlib `sqlite3` only — no extra deps).
**If mempalace itself isn't installed yet**, suggest `uv tool install mempalace` (not `pip install mempalace` — it fights PEP 668 on modern distros and leaks deps into system site-packages). For a system-wide install on a container or shared box, set `UV_TOOL_DIR=/opt/uv-tools` + `UV_TOOL_BIN_DIR=/usr/local/bin` before `uv tool install`, and ship an MCP wrapper on `PATH` that exec's the venv's Python — otherwise MCP clients fail silently with `ModuleNotFoundError`. Full recipe in `mempalace-toolkit/README.md#installing-mempalace-itself-prerequisite`.
```bash
# 1. Clone mempalace-toolkit (holds the two wrappers in bin/)
git clone ssh://git@gitea.jordbo.se:2222/joakimp/mempalace-toolkit.git ~/mempalace-toolkit
cd ~/mempalace-toolkit
# 2. Install — symlinks bin/* into ~/.local/bin, adds loader to rc file
./install.sh
# 3. Verify ~/.local/bin is on PATH
which mempalace-session mempalace-docs
# 4. Mine opencode session history into wing_conversations
# (No global init needed — the palace is created lazily on first write.
# `mempalace init <dir>` is per-project and optional.)
mempalace-session --dry-run # preview: which sessions qualify?
mempalace-session # do it (~20 min per 60 sessions)
# 5. Mine project docs per project (docs only — no source code)
# Optional: `mempalace init --yes <dir>` first to customize wing/entities
mempalace-docs /workspace/my_project --dry-run
mempalace-docs /workspace/my_project
# 7. If a long-lived MCP session is open, reconnect it
# (from inside the MCP client): mempalace_reconnect
```
### Containerized (devbox) specifics
Named Docker volumes preserve state across container recreate:
- `devbox-palace``~/.mempalace/palace`
- `devbox-data``~/.local/share/opencode`
Bind mount `/workspace/mempalace-toolkit` from the host — code survives recreate, syncs via gitea.
**After container recreate:** `~/.local/bin` is ephemeral. Just re-run `./install.sh` (idempotent) — everything else already persists.
## Key Operational Rules
### Always dry-run first on a cold system
```bash
mempalace-session --dry-run # shows qualifying sessions
mempalace-docs <dir> --dry-run # shows files that would be mined
```
A docs-heavy repo should produce ~510 drawers per file. >15 drawers/file on average = code leaked in; investigate.
### Dedup is free — re-running is safe
- `mempalace-docs`: dedup keyed on `source_file` path + `mtime`. Unchanged files skipped.
- `mempalace-session`: dedup keyed on `source_file` path alone (no mtime check for convos). Staging filenames are deterministic per session (`<slug>_<id>.jsonl`), so re-runs skip already-filed sessions.
Second run immediately after first → 0 new drawers, only the post-mine `repair` step runs (~5 min on 5k drawers).
**`mempalace-session --dry-run` is dedup-aware.** Each session listed is tagged `[NEW]` (would be filed) or `[SKIP]` (already in the palace), and the summary reports the split:
```
Exported 62 session(s) to ~/.cache/...
0 new → will be filed on mine
62 already filed → will be skipped (dedup by source_file)
```
So when a user asks "will it mine the same sessions again?" — point them at `mempalace-session --dry-run` and read the summary line. If `N new = 0`, nothing will be re-filed. The classification check is best-effort (falls back to "everything is new" if palace unreachable); the real mine step delegates to `mempalace mine --mode convos`, which is always the authoritative dedup source.
### Incremental catch-up
```bash
mempalace-session --since 2026-04-20 # only recent sessions
mempalace-session --session ses_abc123 # one specific session
```
### Force re-mine
```bash
rm -rf ~/.cache/mempalace-session/<wing>/ # nukes staging dir
mempalace-session # stages + mines fresh
```
Staging is ephemeral by design; the palace is the source of truth.
## Operational Routine (when to invoke)
Until upstream opencode session hooks land, **`mempalace-session` is the entire mechanism** that gets opencode conversations into the palace. If the user's opencode history isn't showing up in `mempalace_search`, the most likely cause is "`mempalace-session` hasn't been run recently".
### Agent-level triggers
Suggest invoking the tool when any of these apply:
- User asks *"why can't you find our conversation from earlier?"* / *"you should remember this from last week"* → the palace may not have the opencode session yet. Check `mempalace_list_wings` for `wing_conversations`; if the last drawer there is older than the referenced conversation, run `mempalace-session`.
- User mentions an upcoming **container recreate / system migration / OS reinstall** → suggest a full mine first as a backup checkpoint. The opencode DB normally survives via named volumes, but an explicit mine is cheap insurance.
- User has just **set up a new machine** and asks about mempalace → part of the recipe is the initial backfill. Run `mempalace-session --dry-run` first to show scope, then the real mine.
- User completed a **substantive session they want preserved past `/exit`** → offer a targeted `mempalace-session --session <id>` right then.
- User asks *"how do I keep this up to date?"* → point at [`contrib/`](../../contrib/) (systemd timer or cron recipes).
### Cadence guidance
| Situation | Suggested cadence |
|---|---|
| Active devbox, daily opencode use | Weekly automated (systemd timer or cron) |
| Occasional opencode user | Monthly manual or weekly automated |
| Fresh machine / first setup | One-shot full backfill, then schedule |
| "I'm about to rebuild the container" | Run now, as a checkpoint |
| Automated daily mines | Pass `--no-repair` + schedule weekly repair separately |
Don't suggest running more often than daily — the post-mine HNSW repair (~5 min on 5k drawers) dominates cost, and session growth is slow enough that daily is already overkill.
### Relationship to the `mempalace` skill
The consumer-side `mempalace` skill covers in-session habits (wake-up search, wind-down diary). `mempalace-session` is **between-session maintenance** — it runs when no agent session is active, typically automated. Both are needed:
- Wake-up/wind-down → curated summaries (what mattered, what was decided)
- `mempalace-session` → raw turns (the actual words exchanged)
A machine running only one of these has half a memory. Agents loading the `mempalace` skill on an opencode machine should be aware this tool exists and point users at it if they ask about persistence.
### Quick automation pitch
If the user has a devbox and hasn't set up automation, offer it. Pick the scheduler that matches the OS:
**Linux (systemd user timer):**
```bash
cd ~/mempalace-toolkit
mkdir -p ~/.config/systemd/user
cp contrib/systemd/*.{service,timer} ~/.config/systemd/user/
systemctl --user daemon-reload
systemctl --user enable --now mempalace-session.timer
sudo loginctl enable-linger "$USER" # optional, for headless boxes
```
**macOS (launchd user agent):**
```bash
cd ~/mempalace-toolkit
sed "s|USER|$USER|g" contrib/launchd/se.jordbo.mempalace-session.plist \
> ~/Library/LaunchAgents/se.jordbo.mempalace-session.plist
mkdir -p ~/Library/Logs
launchctl bootstrap "gui/$(id -u)" ~/Library/LaunchAgents/se.jordbo.mempalace-session.plist
launchctl enable "gui/$(id -u)/se.jordbo.mempalace-session"
```
**BSD or systemd-less Linux (cron):** see [`contrib/cron/`](../../contrib/cron/).
Full install/verify/uninstall recipes for all three are in [`contrib/README.md`](../../contrib/README.md). All three default to weekly runs on Monday 03:00 local time.
## Failure Modes & Fixes
| Symptom | Cause | Fix |
| ---------------------------------------------------------- | ----------------------------------------------- | ------------------------------------------------------- |
| `mempalace-session: command not found` | `~/.local/bin` wiped (container recreate) | `cd ~/mempalace-toolkit && ./install.sh --yes` |
| Sessions missing from palace | Fewer messages than `--min-messages` (default 3)| Lower threshold or `--session <id>` explicitly |
| "Error finding id" on search after mining | Stale HNSW index | `mempalace repair --yes` + `mempalace_reconnect` |
| Drawers doubled for a project | Someone ran raw `mempalace mine` alongside wrapper, or renamed wing mid-flight | Inspect `embedding_metadata` in `chroma.sqlite3`, purge duplicates by source prefix, then `mempalace repair` |
| Post-mine ChromaDB search returns stale results in MCP | MCP server caches old index | Call `mempalace_reconnect` from MCP |
| Opencode DB not at default path | Non-standard `XDG_DATA_HOME` or opencode config | `export OPENCODE_DB=/custom/path/opencode.db` or `--db` |
## What to File Under Which Wing
| Content type | Wing (convention) | Room | Tool |
| ----------------------------------- | ------------------------------ | ---------------- | ----------------------- |
| Opencode session transcripts | `wing_conversations` | auto (keyword) | `mempalace-session` |
| Project docs (md, yaml, Dockerfile) | `wing_<project-name>` | auto | `mempalace-docs` |
| Per-agent session diaries | `wing_<agent-name>` | `diary` | `mempalace_diary_write` (from the consumer-side `mempalace` skill) |
| Ad-hoc verbatim facts | any | any | `mempalace_add_drawer` |
## Cost Profile (reference)
From a 10-day opencode corpus (140 sessions / 1491 msgs / 4656 parts):
- Dry run: seconds.
- Full mine: ~21 min wall / ~38 min user CPU → 2378 drawers from 62 qualifying sessions.
- Dedup re-run: mine instant, repair ~5 min.
Budget **~20 minutes per 60-session batch**. Scales roughly linearly with message count.
## Anti-Patterns
- **Don't run `mempalace mine` directly on a project.** Use `mempalace-docs` — otherwise source code floods the palace.
- **Don't try to point `mempalace mine --mode convos` at `opencode.db` directly.** The convos miner reads files (txt/md/json/jsonl) only — no SQLite support. Use `mempalace-session` to export first.
- **Don't delete staging dirs unnecessarily.** They're dedup anchors; deleting means a forced re-mine of everything in that wing.
- **Don't forget `mempalace_reconnect`** after a mine from inside a live MCP session — otherwise search hits the stale index.
- **Don't mine with `--min-messages 0` or `1`** — 78 out of 140 sessions in reference corpus were throwaway `/exit`'d sessions that would flood the palace with noise. Default 3 is sensible.
## Upstream Roadmap (when to retire these wrappers)
- **[MemPalace PR #1213](https://github.com/MemPalace/mempalace/pull/1213)** merges → `mempalace-docs` becomes redundant (exclude patterns in `mempalace.yaml`). Retire to thin shim or delete.
- **Opencode session-stopping hooks merge** ([PR #16598](https://github.com/anomalyco/opencode/pull/16598) et al.) **AND** `hooks_cli.py` gains `opencode` harness → live auto-save works; `mempalace-session` becomes a manual-only backfill tool (cron / historic import).
- **SQLite mode lands in `mempalace mine --mode convos`** → `mempalace-session` loses its reason to exist entirely.
Check `ARCHITECTURE.md` §6 in `mempalace-toolkit/` for current upstream status before doing any retirement work.
## See Also
- `<mempalace-toolkit>/ARCHITECTURE.md`**canonical spec** (diagrams, implementation notes, full troubleshooting).
- `<mempalace-toolkit>/README.md` — per-tool usage reference.
- `~/.agents/skills/mempalace/SKILL.md` — consumer-side skill (search, diary, KG) — *pair this skill with that one*.