Files
mempalace-toolkit/README.md
T
Joakim Persson 349a3a3d3d mempalace-session: make --dry-run dedup-aware
A --dry-run report showed all qualifying sessions without indicating
which would actually hit the palace on a real run. On a second run
against an already-mined corpus this was misleading — output said
'Exported 62 session(s)' but the real mine step would skip all 62.

The wrapper now queries the palace's chroma.sqlite3 (read-only, via
file:...?mode=ro URI) for source_file values under the staging dir,
then tags each exported session as [NEW] or [SKIP] during listing and
reports the split in the summary:

  Exported 62 session(s) to ~/.cache/mempalace-session/wing_conversations
    0 new   → will be filed on mine
    62 already filed → will be skipped (dedup by source_file)

  --dry-run: no new sessions to mine. A real run would skip all 62.

Implementation notes:
- Classification is best-effort. If the palace is unreachable (fresh
  install, moved, permission-denied, file missing) the wrapper falls
  back to treating all exports as NEW — the real mine step still
  delegates dedup to 'mempalace mine --mode convos' which is the
  authoritative source of truth. Getting the classification wrong
  in --dry-run is cosmetic; behaviour of a real run is unchanged.
- Palace path respects $MEMPALACE_PATH env var for non-default setups.
- Same classification also shown on a real (non-dry-run) mine so users
  see upfront how much of the export set is actually new before the
  miner runs.

Verified both directions:
- All-already-filed case (current box, 62 sessions in palace): reports
  0 new, 62 skipped. --dry-run message correctly says 'would skip all'.
- Partial case (simulated by deleting one session's metadata from
  palace): reports 1 new, 61 skipped. --dry-run message correctly
  says 'would file 1 new'. Palace was restored from backup
  immediately after the test.

README and SKILL.md both updated with the new dedup-aware output and
a direct answer to the FAQ 'will it mine the same sessions again?'
2026-04-30 08:33:36 +00:00

316 lines
18 KiB
Markdown

# mempalace-toolkit
Producer-side tooling for [MemPalace](https://github.com/MemPalace/mempalace) — bridges that feed opencode session history and project documentation into the palace. Pairs with the consumer-side [`mempalace` agent skill](https://github.com/MemPalace/mempalace).
**What this repo contains:**
- `bin/mempalace-session` — exports [opencode](https://github.com/anomalyco/opencode) session history from its local SQLite DB to Claude Code JSONL, then mines it via `mempalace mine --mode convos`.
- `bin/mempalace-docs` — mines project directories into MemPalace while excluding source code, keeping the palace signal-dense.
- [`ARCHITECTURE.md`](ARCHITECTURE.md) — **canonical spec**: architecture diagram, component details, setup recipe, operational notes, upstream-retirement roadmap.
- [`SKILL.md`](SKILL.md) — the companion agent skill, symlinked into `~/.agents/skills/opencode-mempalace-bridge/` on install.
**If you're just trying to get this working on a new machine → jump to [Setup](#setup).**
**If you want the full architecture story → read [`ARCHITECTURE.md`](ARCHITECTURE.md).**
---
## Why this exists
MemPalace is the agent memory layer. Its stock CLI has two gaps that bite on a machine running opencode with a docs-first palace policy:
1. **`mempalace mine` floods the palace with source code** — every `__init__` fragment, every generated file, hundreds of low-signal drawers per project. `mempalace-docs` fixes this by staging only documentation-class files (`*.md`, `*.yml`, `Dockerfile`, etc.) before mining.
2. **`mempalace mine --mode convos` can't read opencode's SQLite DB** — only file-based chat formats (Claude Code JSONL, Claude.ai JSON, ChatGPT, Slack, Codex). Opencode persists every turn in `~/.local/share/opencode/opencode.db` and has no upstream hook into mempalace's auto-save. `mempalace-session` fixes this by exporting each session to Claude Code JSONL before mining.
Both wrappers follow the same **stage-to-cache-then-mine** idiom. Neither reimplements the miner; they curate input and delegate.
Long-term, both should retire:
- `mempalace-docs` → retires when [MemPalace PR #1213](https://github.com/MemPalace/mempalace/pull/1213) (`exclude_patterns` in `mempalace.yaml`) merges.
- `mempalace-session` → retires when opencode session-stopping hooks ([PR #16598](https://github.com/anomalyco/opencode/pull/16598) et al.) merge **and** `hooks_cli.py` gains an `opencode` harness. Until both land, this repo fills the gap.
See [`ARCHITECTURE.md`](ARCHITECTURE.md) §6 for the full upstream roadmap.
---
## Setup
### Prerequisites
- [MemPalace](https://github.com/MemPalace/mempalace) CLI v3.3.3+ — **see [Installing mempalace itself](#installing-mempalace-itself-prerequisite) below if you haven't already**.
- Python 3 (stdlib `sqlite3` only — no extra deps)
- [opencode](https://github.com/anomalyco/opencode) with an active session DB at `~/.local/share/opencode/opencode.db` *(only needed for `mempalace-session`)*
### Installing mempalace itself (prerequisite)
mempalace-toolkit wraps the mempalace CLI but does not bundle it. The upstream [MemPalace repo](https://github.com/MemPalace/mempalace) documents `pip install mempalace` as the install method; `uv tool install` is cleaner and is the flow used in production containers like [opencode-devbox](https://gitea.jordbo.se/joakimp/opencode-devbox).
**Why uv over pip:**
- Isolated venv per tool — mempalace's dependencies (chromadb, embedding model runtime, …) don't leak into system Python or your project venvs.
- No PEP 668 fight — modern Debian / Ubuntu / Homebrew Python all refuse `pip install` into the system site-packages. `uv tool install` sidesteps this entirely.
- The shim (`~/.local/bin/mempalace` by default) is a thin wrapper that automatically activates the isolated venv on invocation, so `mempalace` is available from any bash or zsh terminal without manual `source venv/bin/activate`.
**Install uv** if it's not already on the machine:
```bash
# macOS / Linux, official installer — puts uv in ~/.local/bin
curl -LsSf https://astral.sh/uv/install.sh | sh
# Or: Homebrew on macOS
brew install uv
# Verify
uv --version
```
#### Personal machine (recommended default)
```bash
# Installs mempalace into an isolated venv under ~/.local/share/uv/tools/mempalace/,
# puts the `mempalace` shim into ~/.local/bin/.
uv tool install mempalace
# Make sure ~/.local/bin is on $PATH (uv prints this if it isn't)
export PATH="$HOME/.local/bin:$PATH" # add to ~/.bashrc or ~/.zshrc
# Verify
mempalace --version # should print the installed version
which mempalace # should point into ~/.local/bin/
```
After this, `mempalace` works the same from any bash or zsh terminal — interactive shell, script, cron, systemd user service, launchd agent, all fine.
To upgrade later: `uv tool upgrade mempalace` (or `--all`).
To uninstall: `uv tool uninstall mempalace`.
#### System-wide / container install (opencode-devbox pattern)
For a Docker image or a multi-user box where the shim should live on the system `PATH` rather than in each user's `~/.local/bin`, use `UV_TOOL_DIR` + `UV_TOOL_BIN_DIR` to relocate both the venv and the shim:
```bash
# In the Dockerfile — this is the pattern used by opencode-devbox
ENV UV_TOOL_DIR=/opt/uv-tools
ENV UV_TOOL_BIN_DIR=/usr/local/bin
RUN mkdir -p /opt/uv-tools && \
uv tool install --no-cache mempalace && \
/opt/uv-tools/mempalace/bin/python -c "import mempalace; print('mempalace installed')"
```
After this:
- `/opt/uv-tools/mempalace/` — the isolated venv.
- `/usr/local/bin/mempalace` — the CLI shim (globally on `PATH`, works for every user).
The last `python -c` line in the RUN step is a build-time sanity check: if the install silently failed, the build fails here rather than at runtime.
See [opencode-devbox/Dockerfile](https://gitea.jordbo.se/joakimp/opencode-devbox/src/branch/main/Dockerfile) §"MemPalace install" for the full production version (adds `INSTALL_MEMPALACE=true` build arg so the install can be skipped to shave ~300 MB off the image).
#### MCP server wrapper (required for MCP clients on a system install)
MCP clients (opencode, Claude Code, Kiro) spawn the mempalace MCP server as a subprocess. On a *personal-machine* install the command is just `mempalace-mcp` — the uv tool shim finds the venv's Python automatically.
**Pitfall that bit us during the first opencode-devbox attempt:** on a *system install* with `UV_TOOL_DIR=/opt/uv-tools`, the system `python3` cannot import `mempalace` because the modules live in the isolated venv under `/opt/uv-tools/mempalace/lib/...`, not in system site-packages. Any MCP config that reads
```json
{ "command": ["python3", "-m", "mempalace.mcp_server"] }
```
will fail at spawn with `ModuleNotFoundError: No module named 'mempalace'` — and because MCP failures are reported as "server unavailable" rather than surfacing the stderr, the root cause is easy to miss.
**Fix:** ship a thin wrapper on `PATH` that exec's the venv's own Python. opencode-devbox ships this as `/usr/local/bin/mempalace-mcp-server`:
```sh
#!/bin/sh
# Launcher for the MemPalace MCP server on a uv-tool install.
# System python3 cannot import mempalace from the isolated venv,
# so exec the venv's python directly with the mcp_server module.
exec /opt/uv-tools/mempalace/bin/python -m mempalace.mcp_server "$@"
```
…and MCP configs reference the wrapper instead:
```json
{ "command": ["mempalace-mcp-server"] }
```
If you're on a personal-machine install (default `uv tool install` paths), you don't need the wrapper — `mempalace-mcp` is already a shim that does the right thing. The wrapper is specifically the workaround for the non-default `UV_TOOL_DIR` setup.
See [opencode-devbox/AGENTS.md](https://gitea.jordbo.se/joakimp/opencode-devbox/src/branch/main/AGENTS.md) ("Critical conventions" → "MemPalace install path") for the authoritative reference.
#### Verification checklist
After any install (personal or system-wide), confirm:
```bash
# CLI reachable from PATH
which mempalace # → a shim path
mempalace --version # → v3.3.3+ without import errors
# CLI can import its own modules (catches venv vs site-packages mismatch)
mempalace status 2>&1 | head -3 # → either palace stats or "No palace found" — not a Python traceback
# MCP server reachable (system install — only relevant if you set up the wrapper)
which mempalace-mcp-server # personal install: skip, uses `mempalace-mcp` directly
mempalace-mcp-server --help 2>&1 | head -5 # should show MCP server help, not import error
```
If any of these produce `ModuleNotFoundError`, you've hit the venv-mismatch pitfall. Re-read the MCP wrapper section above.
### Install mempalace-toolkit
```bash
git clone ssh://git@gitea.jordbo.se:2222/joakimp/mempalace-toolkit.git ~/mempalace-toolkit
cd ~/mempalace-toolkit
./install.sh
```
The installer symlinks `bin/*` into `~/.local/bin/` and optionally installs the agent skill into `~/.agents/skills/opencode-mempalace-bridge/`.
Ensure `~/.local/bin` is on `$PATH`:
```bash
export PATH="$HOME/.local/bin:$PATH"
```
**If `install.sh` reports `Skipping <name> — already exists`:** there's a leftover symlink or file at `~/.local/bin/<name>` from a previous install (e.g. the pre-split `cli_utils` days). The installer prints the exact `rm && ./install.sh` command to fix it — remove the stale entry and re-run. It will never clobber an existing file without the user explicitly removing it first.
### First mine
```bash
# Mine opencode session history into wing_conversations (no init needed)
mempalace-session --dry-run # preview qualifying sessions
mempalace-session # do it (~20 min per 60 sessions)
# Mine a project (docs only). If you want to pre-init the project with a
# custom wing name or entity config, run `mempalace init --yes <dir>` first;
# otherwise `mempalace-docs` derives the wing from the directory name.
mempalace-docs /workspace/my_project --dry-run
mempalace-docs /workspace/my_project
```
> **Note:** mempalace has no one-time global init. The palace itself is created lazily on first write (at `~/.mempalace/palace/`). `mempalace init <dir>` is a *per-project* command that sets up a `mempalace.yaml` + entity list for a specific source directory — optional, not a prerequisite for either wrapper.
### Keeping it fresh (automation)
Manual invocation is fine while you're actively driving the machine, but long-running devboxes benefit from a weekly automated mine. [`contrib/`](contrib/) ships ready-to-install templates:
- **systemd user timer** (recommended on Linux): survives reboots, catches missed runs, logs to `journalctl`.
- **launchd user agent** (recommended on macOS): native-equivalent — logs to `~/Library/Logs/`, single-instance guarantees, `ProcessType=Background` throttling.
- **cron**: simplest, works on BSD and systemd-less distros. No user-unit awareness needed.
Quick-start (Linux / systemd, weekly Mon 03:00 local):
```bash
mkdir -p ~/.config/systemd/user
cp contrib/systemd/*.{service,timer} ~/.config/systemd/user/
systemctl --user daemon-reload
systemctl --user enable --now mempalace-session.timer
sudo loginctl enable-linger "$USER" # optional, for headless boxes
```
Quick-start (macOS / launchd, same schedule):
```bash
sed "s|USER|$USER|g" contrib/launchd/se.jordbo.mempalace-session.plist \
> ~/Library/LaunchAgents/se.jordbo.mempalace-session.plist
mkdir -p ~/Library/Logs
launchctl bootstrap "gui/$(id -u)" ~/Library/LaunchAgents/se.jordbo.mempalace-session.plist
launchctl enable "gui/$(id -u)/se.jordbo.mempalace-session"
```
See [`contrib/README.md`](contrib/README.md) for full install/verify/uninstall recipes, tuning, and devbox/container caveats. The full operational routine (triggers, cadence, verification) is in [`ARCHITECTURE.md`](ARCHITECTURE.md) §5.
### Containerized (devbox) notes
On a Docker-based devbox, the palace and opencode DB should live on named volumes so they survive container recreate:
- `devbox-palace``~/.mempalace/palace`
- `devbox-data``~/.local/share/opencode`
This repo is typically bind-mounted from the host, so code survives recreate and syncs via git. After a container recreate, `~/.local/bin` is wiped — just re-run `./install.sh` (idempotent) to relink.
---
## `mempalace-docs`
Docs-only MemPalace miner. Stages documentation files into a cache dir and runs `mempalace mine` against the cache — never against the raw project dir.
```bash
mempalace-docs <directory> # mine with wing = dirname
mempalace-docs <directory> --wing my_project # override wing name
mempalace-docs <directory> --agent alice # record agent on drawers
mempalace-docs <directory> --dry-run # list files, don't file
mempalace-docs <directory> --no-repair # skip post-mine repair
mempalace-docs --help
```
**What gets mined:** `*.md`, `*.mdx`, `*.rst`, `*.txt`, `*.yml`, `*.yaml`, `*.toml`, `*.json`, `*.sh`, `*.bash`, `*.zsh`, `*.fish`, `Dockerfile*`, `Makefile*`, `*.conf`, `*.cfg`, `*.ini`, `LICENSE*`, `COPYING*`, `NOTICE*`.
**What gets skipped:** `.py`, `.ts`, `.tsx`, `.js`, `.jsx`, `.go`, `.rs`, `.java`, `.cpp`, `.c`, `.rb`, `.kt`, `.swift`, build output directories (`.git`, `.venv`, `node_modules`, `__pycache__`, `.mypy_cache`, `.pytest_cache`, `.ruff_cache`, `dist`, `build`, `.next`, `target`, `coverage`), lockfiles.
**Rationale:** the palace is for *context and intent*. Agents already have `grep`/`glob`/`Read` for code — always authoritative, never stale. Embedding source code creates a parallel, lossier, drift-prone copy that pollutes semantic search for years.
---
## `mempalace-session`
Opencode → MemPalace session bridge. Reads `~/.local/share/opencode/opencode.db`, transforms each session into Claude Code JSONL, and files via `mempalace mine --mode convos`.
```bash
mempalace-session # mine all sessions (≥3 msgs)
mempalace-session --wing my_convos # custom wing (default: wing_conversations)
mempalace-session --session ses_abc123 # one session only
mempalace-session --since 2026-04-01 # only sessions updated on/after date
mempalace-session --min-messages 6 # stricter short-session filter
mempalace-session --db /custom/path/opencode.db # non-default DB location
mempalace-session --dry-run # export + list, skip mine
mempalace-session --no-repair # skip post-mine index repair
mempalace-session --help
```
**What gets exported per session:**
- Synthetic header injected as the first user turn (`[session: <title> | <dir> | <date>]`) so the palace can find sessions by topic, not just by ID.
- Each message → Claude Code JSONL line (`{"type": "user"|"assistant", "message": {"content": ...}}`).
- Tool calls → `tool_use` blocks. Known tools (`Bash`, `Read`, `Grep`, `Edit`, `Write`) get formatted summaries; unknown tools are JSON-serialized.
- Tool outputs → `tool_result` blocks in a follow-up human message, folded back into the assistant turn by the mempalace normalizer.
- `step-start` / `step-finish` parts are dropped as noise. `reasoning` parts are kept with a `[reasoning]` prefix.
**Dedup:** staging at `~/.cache/mempalace-session/<wing>/` with deterministic per-session filenames (`<slug>_<id>.jsonl`). The convos miner keys on `source_file`, so re-runs skip unchanged sessions. To force re-mining a session, delete its JSONL from the staging dir.
**`--dry-run` is dedup-aware.** Each session is tagged `[NEW]` (would be filed) or `[SKIP]` (already in the palace), and the summary breaks down the count:
```
Exported 62 session(s) to ~/.cache/mempalace-session/wing_conversations
0 new → will be filed on mine
62 already filed → will be skipped (dedup by source_file)
--dry-run: no new sessions to mine. A real run would skip all 62.
```
If the palace is unreachable (fresh install, moved, permission-denied) the wrapper falls back to "everything is new" — the real mine step delegates dedup to `mempalace mine --mode convos`, which is always the source of truth. So running `mempalace-session` twice in a row is never destructive or wasteful: the second run's only cost is the post-mine HNSW `repair` step (~5 min on a ~5k-drawer palace).
**Filter:** sessions with fewer than `--min-messages` messages (default 3) are skipped — drops throwaway `/exit`'d sessions that would otherwise flood the palace. On a reference 140-session corpus, 78 were filtered this way.
**Cost profile:** ~20 minutes per 60-session batch. Scales roughly linearly with message count. Dedup re-run: mine step instant, only the post-mine `repair` runs (~5 min on 5k drawers).
---
## Companion agent skill
Installing this repo symlinks `SKILL.md` into `~/.agents/skills/opencode-mempalace-bridge/SKILL.md`, where it's auto-discovered by opencode (and by Claude Code / Kiro if you run `agents-sync` from [`cli_utils`](https://gitea.jordbo.se/joakimp/cli_utils)).
The skill is the *short-form checklist* for agents — when to use which wrapper, failure modes, setup recipes, anti-patterns. The canonical reference is always [`ARCHITECTURE.md`](ARCHITECTURE.md); the skill points there for deep context.
The skill pairs with the consumer-side [`mempalace` skill](https://github.com/MemPalace/mempalace) — that one covers using the palace (search, diary, KG); this one covers feeding it.
**Colocated skill pattern.** The skill lives here (not in [`skillset`](https://gitea.jordbo.se/joakimp/skillset)) because it moves in lockstep with the wrappers it documents. `install.sh` drops a `.skill-source` marker file in the deployed skill directory so sibling tooling (skillset's `deploy-skills.sh`, cli_utils's `agents-sync.zsh`) can tell the directory is externally owned. See [`AGENTS.md`](AGENTS.md) for the full convention and how to adopt it for future colocated skills.
---
## See also
- [`ARCHITECTURE.md`](ARCHITECTURE.md) — canonical spec: diagrams, setup recipe, failure modes, upstream roadmap.
- [`AGENTS.md`](AGENTS.md) — repo conventions for AI agents modifying this codebase.
- [MemPalace](https://github.com/MemPalace/mempalace) — the memory layer itself.
- [opencode](https://github.com/anomalyco/opencode) — the agent harness this bridges.
- [cli_utils](https://gitea.jordbo.se/joakimp/cli_utils) — sibling repo with shell quality-of-life tools (origin of these wrappers before the 2026-04-30 split).