# MemPalace Feeding Architecture This repository wires [opencode](https://github.com/anomalyco/opencode) and arbitrary project directories into [MemPalace](https://github.com/MemPalace/mempalace) via two thin wrappers in `bin/`. This document explains why they exist and how they fit together. **Audience:** someone setting up a new machine (or reviewing what's already set up) and asking "how does the palace actually get fed?". Pairs with the `mempalace` agent skill, which covers the *consumer* side (searching, diary, KG). This document covers the *producer* side. --- ## 1. The problem MemPalace is a persistent memory layer for AI agents — vector search over drawers (chunks of verbatim content), a knowledge graph, and per-agent diaries, all behind an MCP server. To be useful it has to be *fed*: project docs, conversation transcripts, session summaries. The stock mempalace CLI has two feeders: | Feeder | What it ingests | Gap | | ------------------------------------- | -------------------------------------------------------- | ------------------------------------------------------------------------------------------------------ | | `mempalace mine` (default mode) | Any "readable" file in a directory (code + docs + misc) | Mines source code indiscriminately → embedding index floods with low-signal `__init__` fragments. | | `mempalace mine --mode convos` | Claude Code JSONL, Claude.ai JSON, ChatGPT JSON, Slack, Codex JSONL | No opencode support. No SQLite support. Opencode persists its history in SQLite, not JSONL. | And one auto-save path: | Feeder | Harnesses supported | Gap | | ------------------------------------- | -------------------------------------------------------- | ------------------------------------------------------------------------------------------------------ | | `hooks_cli.py` (session-stop hooks) | `claude-code`, `codex` | No `opencode` harness → `/exit` mid-session leaves no diary entry behind. | So on a machine using opencode + the "docs-first palace hygiene" policy, three gaps bite: 1. Mining a project floods the palace with source code we don't want. 2. Opencode session history is trapped in SQLite, invisible to `mine --mode convos`. 3. There's no auto-save on session stop — any persistence is best-effort heuristic. The two wrappers in `bin/` close gaps **1** and **2**. Gap **3** is upstream work (see §6). --- ## 2. The architecture ``` Project dirs (/workspace/*) Opencode SQLite DB ├── *.md ~/.local/share/opencode/opencode.db ├── *.yaml ├── session (id, title, directory, time_created/updated) ├── Dockerfile ├── message (session_id, data JSON w/ role) └── … └── part (message_id, data JSON w/ type: text|tool|…) │ │ │ │ ┌─────▼──────────┐ ┌────▼──────────────┐ │ mempalace-docs │ │ mempalace-session │ │ (bin/) │ │ (bin/) │ │ │ │ │ │ stage docs │ │ export each │ │ only via cp -p │ │ session as Claude │ │ to cache dir │ │ Code JSONL to │ │ │ │ cache dir │ └─────┬──────────┘ └────┬──────────────┘ │ │ │ ~/.cache/mempalace-docs// │ ~/.cache/mempalace-session// │ │ ┌─────▼──────────┐ ┌────▼──────────────┐ │ mempalace mine │ │ mempalace mine │ │ │ │ --mode convos │ └─────┬──────────┘ └────┬──────────────┘ │ │ └───────────────────┬──────────────────────┘ │ ┌──────▼─────────┐ │ ChromaDB │ │ ~/.mempalace/ │ │ palace/ │ └──────┬─────────┘ │ MCP server (mempalace_*) │ AI agents (opencode, claude code, codex, …) ``` **Shared idiom:** *stage-to-cache-then-mine*. Neither wrapper reimplements the mempalace miner. They each: 1. Curate input (filter / transform / rename). 2. Write it to a deterministic path under `~/.cache/…//` with `mtime` preserved (via `cp -p` or explicit `os.utime`). 3. Delegate actual embedding + filing to `mempalace mine`, which already dedups on `source_file` path. This keeps the wrappers thin. A third wrapper following the same idiom would justify factoring a shared helper library — two does not. --- ## 3. Component details ### `bin/mempalace-docs` (268 lines) — docs-first mining **Input:** a project directory. **Output:** palace drawers in `wing_` (or `--wing` override), only from documentation-class files. What it files: `*.md`, `*.mdx`, `*.rst`, `*.txt`, `*.yml`, `*.yaml`, `*.toml`, selective `*.json`, shell scripts, Dockerfiles, Makefiles, license/notice files. What it drops: source code (`.py`, `.ts`, `.go`, `.rs`, …), lockfiles, `.git`, `.venv`, `node_modules`, `__pycache__`, build output. **Implementation notes:** - Reads `mempalace.yaml` (if present) to discover the actual wing name — avoids drift if someone renamed the wing after init. - Uses `cp -p` (not symlinks) because the miner skips symlinks (`miner.py` line 828). - Auto-purges pre-existing drawers whose `source_file` is under the workspace path before re-mining, to prevent doubling on re-runs. - Upstream [PR #1213](https://github.com/MemPalace/mempalace/pull/1213) will add `exclude_patterns` to `mempalace.yaml` — when merged, this wrapper should shrink to a thin shim. ### `bin/mempalace-session` (349 lines) — opencode → palace bridge **Input:** the opencode SQLite DB (default `~/.local/share/opencode/opencode.db`). **Output:** palace drawers in `wing_conversations` (or `--wing` override), one JSONL file per qualifying session. **Transform pipeline, per session:** 1. Read `session` row (`id`, `title`, `directory`, `time_created`, `time_updated`). 2. Inject synthetic header as first user turn: `[session: | <directory> | <YYYY-MM-DD>]` → makes title/dir/date semantically searchable. 3. For each `message` ordered by `id`: - Read JSON `data` → get `role` (`user` / `assistant`). - For each `part` under the message, read JSON `data` → dispatch on `type`: - `text` → text block. - `tool` → Claude Code `tool_use` block + deferred `tool_result` as synthetic human message (the mempalace normalizer folds it back into the assistant turn via its `is_tool_only` branch). - `step-start` / `step-finish` → dropped as noise. - `reasoning` → kept, prefixed with `[reasoning]`. 4. Serialize as Claude Code JSONL (`{"type": "user"|"assistant", "message": {"content": [...]}}`) — the one convos format the miner already understands. 5. Stage at `~/.cache/mempalace-session/<wing>/<slug>_<id>.jsonl` with `mtime` = `session.time_updated` (deterministic, stable under dedup). **Filters:** - `--min-messages N` (default 3) — drops throwaway `/exit`'d sessions that would flood the palace. - `--since YYYY-MM-DD` — incremental catch-up. - `--session <id>` — one-shot mode. **Then:** invokes `mempalace mine --mode convos` against the cache dir, followed by `mempalace repair` (unless `--no-repair`). --- ## 4. Setup recipe (new machine) Assumes: opencode already installed, `~/.local/share/opencode/opencode.db` exists, `mempalace` CLI installed (v3.3.3+). ```bash # 1. Clone mempalace-toolkit (holds the two wrappers in bin/) git clone ssh://git@gitea.jordbo.se:2222/joakimp/mempalace-toolkit.git ~/mempalace-toolkit cd ~/mempalace-toolkit # 2. Install (symlinks bin/* into ~/.local/bin, adds loader to rc file) ./install.sh # 3. Ensure ~/.local/bin is on PATH (installer warns if not) export PATH="$HOME/.local/bin:$PATH" # 4. Initialize palace if needed (one-time, platform-wide) mempalace init --yes # 5. Mine opencode history into the palace mempalace-session --dry-run # preview scope mempalace-session # do it for real (~20 min for ~60 sessions) # 6. Mine project docs (per project) mempalace-docs /workspace/my_project --dry-run mempalace-docs /workspace/my_project # 7. Restart any MCP-connected agent, or call mempalace_reconnect from inside one ``` ### Containerized setup (devbox) The devbox uses two named Docker volumes so these persist across container recreate: - `devbox-palace` → `~/.mempalace/palace` (the palace itself) - `devbox-data` → `~/.local/share/opencode` (opencode's SQLite DB) Code at `/workspace/mempalace-toolkit` is a bind mount from the host — survives container recreate and syncs via gitea. Staging directories (`~/.cache/mempalace-{docs,session}/`) are ephemeral but cheap to rebuild. **After container recreate**, just re-run `./install.sh` (idempotent) to relink `bin/` into the fresh `~/.local/bin/`. --- ## 5. Operational notes ### Dedup behavior Both wrappers dedup via `mempalace mine`'s built-in key: - `mempalace-docs`: keys on `source_file` path + `mtime` → edit a doc, it re-mines; unchanged files are skipped. - `mempalace-session`: keys on `source_file` path alone (convos miner doesn't check mtime) → a session's JSONL filename is `<slug>_<id>.jsonl`, stable per session, so re-runs skip already-filed sessions. To force re-mining, delete the staging dir. **Verified:** a second full `mempalace-session` run immediately after the first produces 0 new drawers. The only cost is the post-mine `repair` step (index rebuild — ~5 min on 5k drawers). ### When to re-mine - `mempalace-docs`: after significant doc changes in a project. - `mempalace-session`: opportunistically. Every few days catches new opencode sessions. Or wire to cron / systemd timer for true auto-save coverage (not yet done). ### Cost profile (reference) Measured on a ~10-day opencode corpus of 140 sessions / 1491 messages / 4656 parts: - Dry run: seconds. - Full mine: **21 minutes** (38 min user CPU). Produced 2378 drawers from 62 qualifying sessions. - Dedup re-run: mine step instant; only the repair runs (~5 min). Scaling is roughly linear in message count. Budget ~20 minutes per 60-session batch. ### Common failure modes | Symptom | Cause | Fix | | ---------------------------------------------- | ----------------------------------------------------- | --------------------------------------------------------- | | `mempalace-session: command not found` after container recreate | `~/.local/bin` wiped with container | `cd ~/mempalace-toolkit && ./install.sh` | | Search errors "Error finding id" post-mine | Stale HNSW index | `mempalace repair --yes` + `mempalace_reconnect` from MCP | | Drawers doubled after re-mining a project | Someone renamed the wing or ran raw `mempalace mine` alongside the wrapper | Inspect `embedding_metadata` in `chroma.sqlite3`; purge duplicates by source prefix, then `mempalace repair` | | Sessions missing from palace | Session has fewer than `--min-messages` messages | Lower the threshold or `--session <id>` explicitly | --- ## 6. Upstream roadmap These gaps should ideally close upstream, making the wrappers thinner or obsolete: 1. **[MemPalace PR #1213](https://github.com/MemPalace/mempalace/pull/1213)** — `exclude_patterns` in `mempalace.yaml`. When merged, `mempalace-docs` shrinks to a thin shim (or disappears) since exclude-by-extension becomes a first-class config. 2. **Opencode session hooks** — [PR #16598](https://github.com/anomalyco/opencode/pull/16598) (session.stopping), [PR #16769](https://github.com/anomalyco/opencode/pull/16769) (shutdown), [PR #15224](https://github.com/anomalyco/opencode/pull/15224) (session.start), [issue #23503](https://github.com/anomalyco/opencode/issues/23503) (session.turn.completed). When at least one merges, opencode can fire hooks mempalace can receive. 3. **Opencode harness in `hooks_cli.py`** — mempalace's hooks CLI only knows `claude-code` + `codex` today. Adding `opencode` would let the auto-save diary path work on opencode too. Pairs with #2 above. 4. **SQLite mode for `mempalace mine --mode convos`** — if upstream ever adds direct SQLite ingest for opencode, `mempalace-session` loses its reason to exist (the export-to-JSONL dance goes away). When #1 merges, retire `mempalace-docs` to a thin shim. When #2 + #3 land together, `mempalace-session` becomes a manual-only fallback (cron / backfill) while hooks handle live saves. --- ## 7. See also - [`README.md`](README.md) — human-facing quickstart + per-tool usage reference. - [`AGENTS.md`](AGENTS.md) — repo conventions for AI agents modifying this codebase. - [`SKILL.md`](SKILL.md) — agent skill (producer side), symlinked into `~/.agents/skills/opencode-mempalace-bridge/` by `install.sh`. - `~/.agents/skills/mempalace/SKILL.md` — agent skill for the **consumer** side (searching, diary, KG). Pair with `SKILL.md` in this repo. - [`cli_utils`](https://gitea.jordbo.se/joakimp/cli_utils) — sibling repo: shell quality-of-life tools. Origin of these wrappers before the 2026-04-30 split.