Files
joakimp 6352373a1f fix(feeders): make post-mine repair opt-in, not default
The three feeder wrappers (mempalace-docs, mempalace-pi-session,
mempalace-session) unconditionally ran 'mempalace repair --yes' after
mining, controllable only via --no-repair opt-out. The contrib launchd
and systemd templates did not pass --no-repair, so every scheduled tick
invoked the destructive in-place HNSW rebuild.

This has bitten us twice:
  - 2026-05-04 09:08: a kickstart triggered repair while an MCP
    subprocess held the DB open; the live collection was wiped (0
    drawers) and had to be restored from the palace.backup snapshot.
  - 2026-05-05 10:00: post-mine repair crashed mid-rebuild with
    'NotFoundError: Collection [<uuid>] does not exist' - chromadb's
    rebuild recreated the collection under a new UUID while the code
    still held the old handle. Live DB survived only by luck (crash
    hit before the swap).

Fix: flip the default.
  - New flag: --repair (opt-in). Prints a warning and sleeps 3s before
    invoking 'mempalace repair --yes'.
  - --no-repair is retained as a deprecated no-op alias for backward
    compatibility with any scripts/units still passing it.
  - Default behavior: no repair. Routine ChromaDB add() keeps HNSW
    consistent; repair is a recovery op, not a maintenance tick.

Docs updated to match: README, SKILL, ARCHITECTURE, AGENTS,
contrib/README. Scheduling guidance now explicitly warns against
enabling --repair on cron/launchd/systemd-timer runs.
2026-05-05 12:35:04 +02:00

243 lines
15 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
name: opencode-mempalace-bridge
description: Set up the producer side of MemPalace — feed opencode session history and project docs into the palace via the wrappers in the mempalace-toolkit repo. Use when provisioning a new machine, when the user asks how palace feeding works, when opencode sessions aren't showing up in searches, or when a project needs docs-only mining. Pairs with the `mempalace` skill (consumer side).
---
# Opencode ↔ MemPalace Bridge (producer side)
## Overview
The `mempalace` skill covers *using* the palace (search, diary, KG). This skill covers *feeding* it — specifically, how to wire opencode session history and project docs into the palace on a new machine or after a container recreate.
**Authoritative source:** `/workspace/mempalace-toolkit/ARCHITECTURE.md` (also at the root of the `mempalace-toolkit` repo on gitea). When in doubt, read that file — it's the canonical spec. This skill is the short-form checklist.
**Core idea:** two thin wrappers in `mempalace-toolkit/bin/` close gaps in the stock mempalace CLI:
| Gap | Wrapper |
| ---------------------------------------------------------------------------------------- | -------------------- |
| `mempalace mine` floods the palace with source code we don't want | `mempalace-docs` |
| `mempalace mine --mode convos` can't read opencode's SQLite DB | `mempalace-session` |
Both follow the same **stage-to-cache-then-mine** idiom — they curate input into `~/.cache/…/<wing>/`, then delegate to `mempalace mine`.
## When to Load This Skill
- User asks "how does the palace get fed?" or mentions setting up mempalace on a new machine.
- Opencode conversations are missing from palace searches (`wing_conversations` is empty or stale).
- A project needs to be mined but you want *docs only, no source code*.
- User asks about `mempalace-docs` or `mempalace-session`.
- After a container recreate on a devbox — the wrappers need reinstall.
- Planning to retire either wrapper once upstream PRs merge (see §6 of ARCHITECTURE.md).
## Setup Recipe (new machine)
Prerequisites: `opencode` installed with an active DB at `~/.local/share/opencode/opencode.db`, `mempalace` CLI v3.3.3+, Python 3 (stdlib `sqlite3` only — no extra deps).
**If mempalace itself isn't installed yet**, suggest `uv tool install mempalace` (not `pip install mempalace` — it fights PEP 668 on modern distros and leaks deps into system site-packages). For a system-wide install on a container or shared box, set `UV_TOOL_DIR=/opt/uv-tools` + `UV_TOOL_BIN_DIR=/usr/local/bin` before `uv tool install`, and ship an MCP wrapper on `PATH` that exec's the venv's Python — otherwise MCP clients fail silently with `ModuleNotFoundError`. Full recipe in `mempalace-toolkit/README.md#installing-mempalace-itself-prerequisite`.
```bash
# 1. Clone mempalace-toolkit (holds the two wrappers in bin/)
git clone ssh://git@gitea.jordbo.se:2222/joakimp/mempalace-toolkit.git ~/mempalace-toolkit
cd ~/mempalace-toolkit
# 2. Install — symlinks bin/* into ~/.local/bin, adds loader to rc file
./install.sh
# 3. Verify ~/.local/bin is on PATH
which mempalace-session mempalace-docs
# 4. Mine opencode session history into wing_conversations
# (No global init needed — the palace is created lazily on first write.
# `mempalace init <dir>` is per-project and optional.)
mempalace-session --dry-run # preview: which sessions qualify?
mempalace-session # do it (~20 min per 60 sessions)
# 5. Mine project docs per project (docs only — no source code)
# Optional: `mempalace init --yes <dir>` first to customize wing/entities
mempalace-docs /workspace/my_project --dry-run
mempalace-docs /workspace/my_project
# 7. If a long-lived MCP session is open, reconnect it
# (from inside the MCP client): mempalace_reconnect
```
### Containerized (devbox) specifics
Named Docker volumes preserve state across container recreate:
- `devbox-palace``~/.mempalace/palace`
- `devbox-data``~/.local/share/opencode`
Bind mount `/workspace/mempalace-toolkit` from the host — code survives recreate, syncs via gitea.
**After container recreate:** `~/.local/bin` is ephemeral. Just re-run `./install.sh` (idempotent) — everything else already persists.
## Key Operational Rules
### Always dry-run first on a cold system
```bash
mempalace-session --dry-run # shows qualifying sessions
mempalace-docs <dir> --dry-run # shows files that would be mined
```
A docs-heavy repo should produce ~510 drawers per file. >15 drawers/file on average = code leaked in; investigate.
### Dedup is free — re-running is safe
- `mempalace-docs`: dedup keyed on `source_file` path + `mtime`. Unchanged files skipped.
- `mempalace-session`: dedup keyed on `source_file` path alone (no mtime check for convos). Staging filenames are deterministic per session (`<slug>_<id>.jsonl`), so re-runs skip already-filed sessions.
Second run immediately after first → 0 new drawers, only the post-mine `repair` step runs (~5 min on 5k drawers).
**`mempalace-session --dry-run` is dedup-aware.** Each session listed is tagged `[NEW]` (would be filed) or `[SKIP]` (already in the palace), and the summary reports the split:
```
Exported 62 session(s) to ~/.cache/...
0 new → will be filed on mine
62 already filed → will be skipped (dedup by source_file)
```
So when a user asks "will it mine the same sessions again?" — point them at `mempalace-session --dry-run` and read the summary line. If `N new = 0`, nothing will be re-filed. The classification check is best-effort (falls back to "everything is new" if palace unreachable); the real mine step delegates to `mempalace mine --mode convos`, which is always the authoritative dedup source.
### Incremental catch-up
```bash
mempalace-session --since 2026-04-20 # only recent sessions
mempalace-session --session ses_abc123 # one specific session
```
### Force re-mine
```bash
rm -rf ~/.cache/mempalace-session/<wing>/ # nukes staging dir
mempalace-session # stages + mines fresh
```
Staging is ephemeral by design; the palace is the source of truth.
## Operational Routine (when to invoke)
Until upstream opencode session hooks land, **`mempalace-session` is the entire mechanism** that gets opencode conversations into the palace. If the user's opencode history isn't showing up in `mempalace_search`, the most likely cause is "`mempalace-session` hasn't been run recently".
### Agent-level triggers
Suggest invoking the tool when any of these apply:
- User asks *"why can't you find our conversation from earlier?"* / *"you should remember this from last week"* → the palace may not have the opencode session yet. Check `mempalace_list_wings` for `wing_conversations`; if the last drawer there is older than the referenced conversation, run `mempalace-session`.
- User mentions an upcoming **container recreate / system migration / OS reinstall** → suggest a full mine first as a backup checkpoint. The opencode DB normally survives via named volumes, but an explicit mine is cheap insurance.
- User has just **set up a new machine** and asks about mempalace → part of the recipe is the initial backfill. Run `mempalace-session --dry-run` first to show scope, then the real mine.
- User completed a **substantive session they want preserved past `/exit`** → offer a targeted `mempalace-session --session <id>` right then.
- User asks *"how do I keep this up to date?"* → point at [`contrib/`](../../contrib/) (systemd timer or cron recipes).
### Cadence guidance
| Situation | Suggested cadence |
|---|---|
| Active devbox, daily opencode use | Weekly automated (systemd timer or cron) |
| Occasional opencode user | Monthly manual or weekly automated |
| Fresh machine / first setup | One-shot full backfill, then schedule |
| "I'm about to rebuild the container" | Run now, as a checkpoint |
| Automated daily mines | Repair is now opt-in (`--repair`); **never** set it on an unattended schedule. Run `mempalace repair` by hand from a quiet session if HNSW genuinely needs rebuilding. |
Don't suggest running more often than daily — the post-mine HNSW repair (~5 min on 5k drawers) dominates cost, and session growth is slow enough that daily is already overkill.
### Relationship to the `mempalace` skill — and why diaries still matter
The consumer-side `mempalace` skill covers in-session habits (wake-up search, wind-down diary write). `mempalace-session` is **between-session maintenance** — automated, raw-turn capture. The two are complementary, not alternatives:
| | `mempalace-session` (this tool) | Agent diary (`mempalace` skill) |
|---|---|---|
| Content | Every turn verbatim | Curated AAAK summary |
| Written by | Extraction from `opencode.db` | The agent, at wind-down |
| Answers | *"What did we say?"* | *"What did we decide / learn?"* |
| Retrieval | Semantic search | Recency scan |
**Common misconception agents fall into:** *"session mining runs weekly, so I don't need to write diary entries."* Wrong. Diary entries capture meta-observations that were never said during the session — "lesson learned", "pattern confirmed", "user caught a bug before ship", "pending next time". Mining raw turns can never surface those because the words don't exist verbatim; they're the agent's editorial judgment at wind-down.
If a user asks whether diaries are redundant given automated mining, point them at [`ARCHITECTURE.md` §5 → "Diary vs session mine: why keep both?"](../../ARCHITECTURE.md#diary-vs-session-mine-why-keep-both) for the full treatment. Short answer: release notes vs. git log — a repo keeps both, and so should the palace.
Both systems cover each other's failure modes too: a missed diary (agent `/exit`'d without writing) → session mine catches the raw content; a missed mine (cron behind schedule, new machine) → diary captures the essence. Belt and braces.
### Quick automation pitch
If the user has a devbox and hasn't set up automation, offer it. Pick the scheduler that matches the OS:
**Linux (systemd user timer):**
```bash
cd ~/mempalace-toolkit
mkdir -p ~/.config/systemd/user
cp contrib/systemd/*.{service,timer} ~/.config/systemd/user/
systemctl --user daemon-reload
systemctl --user enable --now mempalace-session.timer
sudo loginctl enable-linger "$USER" # optional, for headless boxes
```
**macOS (launchd user agent):**
```bash
cd ~/mempalace-toolkit
sed "s|USER|$USER|g" contrib/launchd/se.jordbo.mempalace-session.plist \
> ~/Library/LaunchAgents/se.jordbo.mempalace-session.plist
mkdir -p ~/Library/Logs
launchctl bootstrap "gui/$(id -u)" ~/Library/LaunchAgents/se.jordbo.mempalace-session.plist
launchctl enable "gui/$(id -u)/se.jordbo.mempalace-session"
```
**BSD or systemd-less Linux (cron):** see [`contrib/cron/`](../../contrib/cron/).
Full install/verify/uninstall recipes for all three are in [`contrib/README.md`](../../contrib/README.md). All three default to weekly runs on Monday 03:00 local time.
## Failure Modes & Fixes
| Symptom | Cause | Fix |
| ---------------------------------------------------------- | ----------------------------------------------- | ------------------------------------------------------- |
| `mempalace-session: command not found` | `~/.local/bin` wiped (container recreate) | `cd ~/mempalace-toolkit && ./install.sh --yes` |
| Sessions missing from palace | Fewer messages than `--min-messages` (default 3)| Lower threshold or `--session <id>` explicitly |
| "Error finding id" on search after mining | Stale HNSW index | `mempalace repair --yes` + `mempalace_reconnect` |
| Drawers doubled for a project | Someone ran raw `mempalace mine` alongside wrapper, or renamed wing mid-flight | Inspect `embedding_metadata` in `chroma.sqlite3`, purge duplicates by source prefix, then `mempalace repair` |
| Post-mine ChromaDB search returns stale results in MCP | MCP server caches old index | Call `mempalace_reconnect` from MCP |
| Opencode DB not at default path | Non-standard `XDG_DATA_HOME` or opencode config | `export OPENCODE_DB=/custom/path/opencode.db` or `--db` |
## What to File Under Which Wing
| Content type | Wing (convention) | Room | Tool |
| ----------------------------------- | ------------------------------ | ---------------- | ----------------------- |
| Opencode session transcripts | `wing_conversations` | auto (keyword) | `mempalace-session` |
| Project docs (md, yaml, Dockerfile) | `wing_<project-name>` | auto | `mempalace-docs` |
| Per-agent session diaries | `wing_<agent-name>` | `diary` | `mempalace_diary_write` (from the consumer-side `mempalace` skill) |
| Ad-hoc verbatim facts | any | any | `mempalace_add_drawer` |
## Cost Profile (reference)
From a 10-day opencode corpus (140 sessions / 1491 msgs / 4656 parts):
- Dry run: seconds.
- Full mine: ~21 min wall / ~38 min user CPU → 2378 drawers from 62 qualifying sessions.
- Dedup re-run: mine instant, repair ~5 min.
Budget **~20 minutes per 60-session batch**. Scales roughly linearly with message count.
## Anti-Patterns
- **Don't run `mempalace mine` directly on a project.** Use `mempalace-docs` — otherwise source code floods the palace.
- **Don't try to point `mempalace mine --mode convos` at `opencode.db` directly.** The convos miner reads files (txt/md/json/jsonl) only — no SQLite support. Use `mempalace-session` to export first.
- **Don't delete staging dirs unnecessarily.** They're dedup anchors; deleting means a forced re-mine of everything in that wing.
- **Don't forget `mempalace_reconnect`** after a mine from inside a live MCP session — otherwise search hits the stale index.
- **Don't mine with `--min-messages 0` or `1`** — 78 out of 140 sessions in reference corpus were throwaway `/exit`'d sessions that would flood the palace with noise. Default 3 is sensible.
## Upstream Roadmap (when to retire these wrappers)
- **[MemPalace PR #1213](https://github.com/MemPalace/mempalace/pull/1213)** merges → `mempalace-docs` becomes redundant (exclude patterns in `mempalace.yaml`). Retire to thin shim or delete.
- **Opencode session-stopping hooks merge** ([PR #16598](https://github.com/anomalyco/opencode/pull/16598) et al.) **AND** `hooks_cli.py` gains `opencode` harness → live auto-save works; `mempalace-session` becomes a manual-only backfill tool (cron / historic import).
- **SQLite mode lands in `mempalace mine --mode convos`** → `mempalace-session` loses its reason to exist entirely.
Check `ARCHITECTURE.md` §6 in `mempalace-toolkit/` for current upstream status before doing any retirement work.
## See Also
- `<mempalace-toolkit>/ARCHITECTURE.md`**canonical spec** (diagrams, implementation notes, full troubleshooting).
- `<mempalace-toolkit>/README.md` — per-tool usage reference.
- `~/.agents/skills/mempalace/SKILL.md` — consumer-side skill (search, diary, KG) — *pair this skill with that one*.