# MemPalace Feeding Architecture This repository wires [opencode](https://github.com/anomalyco/opencode) and arbitrary project directories into [MemPalace](https://github.com/MemPalace/mempalace) via two thin wrappers in `bin/`. This document explains why they exist and how they fit together. **Audience:** someone setting up a new machine (or reviewing what's already set up) and asking "how does the palace actually get fed?". Pairs with the `mempalace` agent skill, which covers the *consumer* side (searching, diary, KG). This document covers the *producer* side. --- ## 1. The problem MemPalace is a persistent memory layer for AI agents — vector search over drawers (chunks of verbatim content), a knowledge graph, and per-agent diaries, all behind an MCP server. To be useful it has to be *fed*: project docs, conversation transcripts, session summaries. The stock mempalace CLI has two feeders: | Feeder | What it ingests | Gap | | ------------------------------------- | -------------------------------------------------------- | ------------------------------------------------------------------------------------------------------ | | `mempalace mine` (default mode) | Any "readable" file in a directory (code + docs + misc) | Mines source code indiscriminately → embedding index floods with low-signal `__init__` fragments. | | `mempalace mine --mode convos` | Claude Code JSONL, Claude.ai JSON, ChatGPT JSON, Slack, Codex JSONL | No opencode support. No SQLite support. Opencode persists its history in SQLite, not JSONL. | And one auto-save path: | Feeder | Harnesses supported | Gap | | ------------------------------------- | -------------------------------------------------------- | ------------------------------------------------------------------------------------------------------ | | `hooks_cli.py` (session-stop hooks) | `claude-code`, `codex` | No `opencode` harness → `/exit` mid-session leaves no diary entry behind. | So on a machine using opencode + the "docs-first palace hygiene" policy, three gaps bite: 1. Mining a project floods the palace with source code we don't want. 2. Opencode session history is trapped in SQLite, invisible to `mine --mode convos`. 3. There's no auto-save on session stop — any persistence is best-effort heuristic. The two wrappers in `bin/` close gaps **1** and **2**. Gap **3** is upstream work (see §6). --- ## 2. The architecture ``` Project dirs (/workspace/*) Opencode SQLite DB ├── *.md ~/.local/share/opencode/opencode.db ├── *.yaml ├── session (id, title, directory, time_created/updated) ├── Dockerfile ├── message (session_id, data JSON w/ role) └── … └── part (message_id, data JSON w/ type: text|tool|…) │ │ │ │ ┌─────▼──────────┐ ┌────▼──────────────┐ │ mempalace-docs │ │ mempalace-session │ │ (bin/) │ │ (bin/) │ │ │ │ │ │ stage docs │ │ export each │ │ only via cp -p │ │ session as Claude │ │ to cache dir │ │ Code JSONL to │ │ │ │ cache dir │ └─────┬──────────┘ └────┬──────────────┘ │ │ │ ~/.cache/mempalace-docs// │ ~/.cache/mempalace-session// │ │ ┌─────▼──────────┐ ┌────▼──────────────┐ │ mempalace mine │ │ mempalace mine │ │ │ │ --mode convos │ └─────┬──────────┘ └────┬──────────────┘ │ │ └───────────────────┬──────────────────────┘ │ ┌──────▼─────────┐ │ ChromaDB │ │ ~/.mempalace/ │ │ palace/ │ └──────┬─────────┘ │ MCP server (mempalace_*) │ AI agents (opencode, claude code, codex, …) ``` **Shared idiom:** *stage-to-cache-then-mine*. Neither wrapper reimplements the mempalace miner. They each: 1. Curate input (filter / transform / rename). 2. Write it to a deterministic path under `~/.cache/…//` with `mtime` preserved (via `cp -p` or explicit `os.utime`). 3. Delegate actual embedding + filing to `mempalace mine`, which already dedups on `source_file` path. This keeps the wrappers thin. A third wrapper following the same idiom would justify factoring a shared helper library — two does not. --- ## 3. Component details ### `bin/mempalace-docs` (268 lines) — docs-first mining **Input:** a project directory. **Output:** palace drawers in `wing_` (or `--wing` override), only from documentation-class files. What it files: `*.md`, `*.mdx`, `*.rst`, `*.txt`, `*.yml`, `*.yaml`, `*.toml`, selective `*.json`, shell scripts, Dockerfiles, Makefiles, license/notice files. What it drops: source code (`.py`, `.ts`, `.go`, `.rs`, …), lockfiles, `.git`, `.venv`, `node_modules`, `__pycache__`, build output. **Implementation notes:** - Reads `mempalace.yaml` (if present) to discover the actual wing name — avoids drift if someone renamed the wing after init. - Uses `cp -p` (not symlinks) because the miner skips symlinks (`miner.py` line 828). - Auto-purges pre-existing drawers whose `source_file` is under the workspace path before re-mining, to prevent doubling on re-runs. - Upstream [PR #1213](https://github.com/MemPalace/mempalace/pull/1213) will add `exclude_patterns` to `mempalace.yaml` — when merged, this wrapper should shrink to a thin shim. ### `bin/mempalace-session` (349 lines) — opencode → palace bridge **Input:** the opencode SQLite DB (default `~/.local/share/opencode/opencode.db`). **Output:** palace drawers in `wing_conversations` (or `--wing` override), one JSONL file per qualifying session. **Transform pipeline, per session:** 1. Read `session` row (`id`, `title`, `directory`, `time_created`, `time_updated`). 2. Inject synthetic header as first user turn: `[session: | <directory> | <YYYY-MM-DD>]` → makes title/dir/date semantically searchable. 3. For each `message` ordered by `id`: - Read JSON `data` → get `role` (`user` / `assistant`). - For each `part` under the message, read JSON `data` → dispatch on `type`: - `text` → text block. - `tool` → Claude Code `tool_use` block + deferred `tool_result` as synthetic human message (the mempalace normalizer folds it back into the assistant turn via its `is_tool_only` branch). - `step-start` / `step-finish` → dropped as noise. - `reasoning` → kept, prefixed with `[reasoning]`. 4. Serialize as Claude Code JSONL (`{"type": "user"|"assistant", "message": {"content": [...]}}`) — the one convos format the miner already understands. 5. Stage at `~/.cache/mempalace-session/<wing>/<slug>_<id>.jsonl` with `mtime` = `session.time_updated` (deterministic, stable under dedup). **Filters:** - `--min-messages N` (default 3) — drops throwaway `/exit`'d sessions that would flood the palace. - `--since YYYY-MM-DD` — incremental catch-up. - `--session <id>` — one-shot mode. **Then:** invokes `mempalace mine --mode convos` against the cache dir. A post-mine `mempalace repair` is **opt-in** via `--repair` — it is intentionally *not* the default because the in-place HNSW rebuild has corrupted live palaces on past runs. Never pass `--repair` from an unattended schedule. --- ## 4. Setup recipe (new machine) Assumes: opencode already installed, `~/.local/share/opencode/opencode.db` exists, `mempalace` CLI installed (v3.3.3+). If mempalace isn't installed yet, [`README.md`](README.md#installing-mempalace-itself-prerequisite) covers the `uv tool install mempalace` flow for both personal machines and the `/opt/uv-tools/` container pattern used by opencode-devbox. ```bash # 1. Clone mempalace-toolkit (holds the two wrappers in bin/) git clone ssh://git@gitea.jordbo.se:2222/joakimp/mempalace-toolkit.git ~/mempalace-toolkit cd ~/mempalace-toolkit # 2. Install (symlinks bin/* into ~/.local/bin, adds loader to rc file) ./install.sh # 3. Ensure ~/.local/bin is on PATH (installer warns if not) export PATH="$HOME/.local/bin:$PATH" # 4. Mine opencode history into the palace # (No global init step needed — the palace is created on first write. # `mempalace init <dir>` is per-project, not global, and is optional.) mempalace-session --dry-run # preview scope mempalace-session # do it for real (~20 min for ~60 sessions) # 5. Mine project docs (per project — run `mempalace init --yes <dir>` # first if you want to customize the wing name or entity detection) mempalace-docs /workspace/my_project --dry-run mempalace-docs /workspace/my_project # 7. Restart any MCP-connected agent, or call mempalace_reconnect from inside one ``` ### Containerized setup (devbox) The devbox uses two named Docker volumes so these persist across container recreate: - `devbox-palace` → `~/.mempalace/palace` (the palace itself) - `devbox-data` → `~/.local/share/opencode` (opencode's SQLite DB) Code at `/workspace/mempalace-toolkit` is a bind mount from the host — survives container recreate and syncs via gitea. Staging directories (`~/.cache/mempalace-{docs,session}/`) are ephemeral but cheap to rebuild. **After container recreate**, just re-run `./install.sh` (idempotent) to relink `bin/` into the fresh `~/.local/bin/`. --- ## 5. Operational notes ### Dedup behavior Both wrappers dedup via `mempalace mine`'s built-in key: - `mempalace-docs`: keys on `source_file` path + `mtime` → edit a doc, it re-mines; unchanged files are skipped. - `mempalace-session`: keys on `source_file` path alone (convos miner doesn't check mtime) → a session's JSONL filename is `<slug>_<id>.jsonl`, stable per session, so re-runs skip already-filed sessions. To force re-mining, delete the staging dir. **Verified:** a second full `mempalace-session` run immediately after the first produces 0 new drawers. The only cost is the post-mine `repair` step (index rebuild — ~5 min on 5k drawers). ### When to re-mine - `mempalace-docs`: after significant doc changes in a project. - `mempalace-session`: see the full **Operational Routine** below. ### Operational Routine (the `mempalace-session` workflow) Until opencode grows session hooks and `hooks_cli.py` grows an opencode harness (see §6), **`mempalace-session` is the entire mechanism that gets opencode conversations into the palace.** Skip it and your session history exists only inside `~/.local/share/opencode/opencode.db` — a local SQLite file that's invisible to `mempalace_search`, vulnerable to volume wipes, and lost if the devbox is replaced. That makes the routine worth codifying: #### Triggers (when to run) | Trigger | What to run | Why | |---|---|---| | **Substantive session you want preserved past `/exit`** | `mempalace-session --session <id>` | Targeted save before destructive action; see `~/.local/share/opencode/opencode.db` `session` table for the ID. | | **Before a container recreate** | `mempalace-session` | The opencode DB lives in a named volume (`devbox-data`) so it normally survives, but a full mine right before is cheap insurance. | | **Fresh machine, first provisioning** | `mempalace-session --dry-run` then `mempalace-session` | Backfills the whole corpus. Expect ~20 min / 60 sessions. | | **Periodic sweep** | `mempalace-session` | Weekly catches anything you didn't explicitly save. Dedup is free, so running more often only costs the ~5 min repair. | | **After upstream mempalace upgrade** | `mempalace-session` + `mempalace repair` | If the miner changed normalization or chunking, re-mine ensures the palace reflects current logic. Rare. | #### Cadence (how often) **Default: weekly.** Dedup is free on unchanged sessions, and `wing_conversations` growth is roughly linear in user activity. Weekly is frequent enough that searches almost always include recent context, and infrequent enough that the cost is negligible. **Daily** is fine. Repair is now opt-in (`--repair`) and should never be set on an unattended schedule — run it manually from a quiet session if you suspect stale HNSW state. **Monthly** is too infrequent. You'll search for "that thing we discussed last Tuesday" and miss it. #### Relationship to the session lifecycle `mempalace-session` is **offline, inter-session maintenance** — it runs between agent sessions, not during them. It does not replace the in-session habits from the consumer-side `mempalace` skill: | Habit | When | Who | |---|---|---| | Wake-up search (load recent diary) | Agent session start | Agent, during session | | Wind-down diary write | Agent session end | Agent, during session | | `mempalace-session` mine | Between sessions (manual or scheduled) | Operator or automation | The first two are live; the third is batched. They're complementary, not alternatives. The next subsection explains why both matter. #### Diary vs session mine: why keep both? A reasonable question: *"if every session is mined into `wing_conversations` anyway, what's the point of the agent also writing a diary entry?"* They're not redundant. They answer different questions and cover each other's failure modes. | | Session mine (`wing_conversations`) | Diary (`wing_<agent>`) | |---|---|---| | Content | Every turn verbatim — prompts, responses, tool calls, dead ends, typos | Curated summary — what was decided, discovered, left pending | | Granularity | One session ≈ 50–200 drawers | One session ≈ 1 drawer | | Compression | None (raw JSONL → normalized turns) | High (AAAK dialect — dots + pipes + entity codes, ~30× reduction) | | Written by | Nothing — extracted from `opencode.db` | The agent that lived the session, at wind-down | | Signal density | High noise (wrong turns, corrections, `/exit`'d threads) | High signal (agent's editorial judgment of what mattered) | | Retrieval pattern | Semantic search (`mempalace_search("topic X")`) | Recency scan (`mempalace_diary_read(last_n=5)`) | | Answers the question | *"What did we say exactly?"* | *"What did we accomplish / learn / decide?"* | The distinguishing property of a diary entry is **editorial judgment by the author**. The diary captures things that were *never said aloud during the session* — meta-observations the agent made about the session as a whole: - *"this pattern came up again, worth remembering"* - *"user caught the bug before I shipped it — lesson: verify CLI examples against `--help` first"* - *"10 commits across 3 repos today, all pushed"* - *"healthy interruption: user stopped me before a long-running step"* These are thoughts *about* the session, not utterances *during* it. Mining the raw turns will never surface them because the exact words were never spoken — they're the agent's reflection at wind-down. **Three scenarios where the distinction matters in practice:** 1. **Wake-up token economics.** Reading `mempalace_diary_read(last_n=5)` returns five dense drawers, maybe 1–2k tokens total, 100% signal. Matching that orientation from the session mine would require semantic-searching for recent topics and reading chunks of raw turns — hundreds of drawers, tens of thousands of tokens, 90% noise. 2. **"What did we decide?" vs. "what did we say?"** If you ask *"when did we decide to split `mempalace-toolkit` from `cli_utils`?"* the diary gives you the crisp answer (date, trigger, rationale). The session mine gives you the actual seven-turn conversation that led up to the decision, including the turns where alternatives were considered. Both useful; different questions. 3. **Redundancy as safety.** If the agent `/exit`s without writing a diary (heuristic save missed it, no upstream hook), the session mine still catches the raw content. If `mempalace-session` hasn't run this week, the diary still captures the session's essence. The two systems cover each other's failure modes. **Practical implications for how you work with mempalace:** - **Don't skip diary writing** just because sessions are mined. A session without a diary entry is a session the next agent can read word-for-word but has no compressed summary of — expensive to orient against. - **Don't skip session mining** just because agents write diaries. Diaries miss content (especially on `/exit`), and semantic search over raw turns is valuable when "what did we say exactly?" is the right question. - **Do both, and let them specialize.** Treat the diary as your release notes (editorial, curated, recency-scanned) and the session mine as your git log (raw, searchable, complete). A repo keeps both; so should the palace. If anything, automating session mining *increases* the value of diary entries. The agent can focus the diary on the parts mining cannot capture — meta-observations, self-critique, pattern noticing, pending work — rather than re-stating content the mine already has. #### Automation Pick one: 1. **systemd user timer** (recommended on Linux). Survives reboots, optional `Persistent=true` catch-up, logs to `journalctl`, background I/O priority. Templates in [`contrib/systemd/`](contrib/systemd/). 2. **launchd user agent** (recommended on macOS). The macOS-native equivalent — runs without a login session, logs to `~/Library/Logs/`, single-instance guarantees, `ProcessType=Background` throttling. Templates in [`contrib/launchd/`](contrib/launchd/). 3. **cron** (simplest, works on BSD and systemd-less Linux distros). Templates in [`contrib/cron/`](contrib/cron/). 4. **Manual** — run `mempalace-session` opportunistically. Fine on machines where you're in and out frequently; less fine on long-running devboxes. Install recipes, verification commands, and uninstall steps for all four are in [`contrib/README.md`](contrib/README.md). Quick-start (systemd user timer, Linux): ```bash mkdir -p ~/.config/systemd/user cp contrib/systemd/*.{service,timer} ~/.config/systemd/user/ systemctl --user daemon-reload systemctl --user enable --now mempalace-session.timer # Optional on headless boxes: keep timer running when logged out sudo loginctl enable-linger "$USER" systemctl --user list-timers mempalace-session.timer ``` Quick-start (launchd user agent, macOS): ```bash sed "s|USER|$USER|g" contrib/launchd/se.jordbo.mempalace-session.plist \ > ~/Library/LaunchAgents/se.jordbo.mempalace-session.plist mkdir -p ~/Library/Logs launchctl bootstrap "gui/$(id -u)" ~/Library/LaunchAgents/se.jordbo.mempalace-session.plist launchctl enable "gui/$(id -u)/se.jordbo.mempalace-session" launchctl list | grep mempalace-session ``` Quick-start (cron): ```bash sed "s|USER|$USER|g" contrib/cron/mempalace-session.cron \ | (crontab -l 2>/dev/null; cat) | crontab - mkdir -p ~/.cache/mempalace-session ``` #### Verification After any run (manual or scheduled), confirm the palace grew: ```bash mempalace-session --dry-run # should list sessions # Or from inside a live MCP client: # mempalace_status — wing_conversations count # mempalace_reconnect — refresh index after mine ``` A healthy run produces one of: - **First run on fresh corpus**: several hundred to several thousand new drawers. - **Incremental run**: zero to a few dozen new drawers (whatever grew since last run). - **Rerun with no new activity**: zero new drawers, only the repair step runs. A run that files far more drawers than expected may indicate a staging-dir wipe (forcing a full re-mine) — check `~/.cache/mempalace-session/<wing>/` modification times. ### Cost profile (reference) Measured on a ~10-day opencode corpus of 140 sessions / 1491 messages / 4656 parts: - Dry run: seconds. - Full mine: **21 minutes** (38 min user CPU). Produced 2378 drawers from 62 qualifying sessions. - Dedup re-run: mine step instant; only the repair runs (~5 min). Scaling is roughly linear in message count. Budget ~20 minutes per 60-session batch. ### Common failure modes | Symptom | Cause | Fix | | ---------------------------------------------- | ----------------------------------------------------- | --------------------------------------------------------- | | `mempalace-session: command not found` after container recreate | `~/.local/bin` wiped with container | `cd ~/mempalace-toolkit && ./install.sh` | | Search errors "Error finding id" post-mine | Stale HNSW index | `mempalace repair --yes` + `mempalace_reconnect` from MCP | | Drawers doubled after re-mining a project | Someone renamed the wing or ran raw `mempalace mine` alongside the wrapper | Inspect `embedding_metadata` in `chroma.sqlite3`; purge duplicates by source prefix, then `mempalace repair` | | Sessions missing from palace | Session has fewer than `--min-messages` messages | Lower the threshold or `--session <id>` explicitly | --- ## 6. Upstream roadmap These gaps should ideally close upstream, making the wrappers thinner or obsolete: 1. **[MemPalace PR #1213](https://github.com/MemPalace/mempalace/pull/1213)** — `exclude_patterns` in `mempalace.yaml`. When merged, `mempalace-docs` shrinks to a thin shim (or disappears) since exclude-by-extension becomes a first-class config. 2. **Opencode session hooks** — [PR #16598](https://github.com/anomalyco/opencode/pull/16598) (session.stopping), [PR #16769](https://github.com/anomalyco/opencode/pull/16769) (shutdown), [PR #15224](https://github.com/anomalyco/opencode/pull/15224) (session.start), [issue #23503](https://github.com/anomalyco/opencode/issues/23503) (session.turn.completed). When at least one merges, opencode can fire hooks mempalace can receive. 3. **Opencode harness in `hooks_cli.py`** — mempalace's hooks CLI only knows `claude-code` + `codex` today. Adding `opencode` would let the auto-save diary path work on opencode too. Pairs with #2 above. 4. **SQLite mode for `mempalace mine --mode convos`** — if upstream ever adds direct SQLite ingest for opencode, `mempalace-session` loses its reason to exist (the export-to-JSONL dance goes away). When #1 merges, retire `mempalace-docs` to a thin shim. When #2 + #3 land together, `mempalace-session` becomes a manual-only fallback (cron / backfill) while hooks handle live saves. --- ## 7. See also - [`README.md`](README.md) — human-facing quickstart + per-tool usage reference. - [`AGENTS.md`](AGENTS.md) — repo conventions for AI agents modifying this codebase. - [`SKILL.md`](SKILL.md) — agent skill (producer side), symlinked into `~/.agents/skills/opencode-mempalace-bridge/` by `install.sh`. - [`extensions/pi/README.md`](extensions/pi/README.md) — pi coding-agent bring-up: the MemPalace MCP bridge extension, mosh-friendly keybindings, settings template for starting pi without `--model`, and the `~/.config/pi/.env` + zsh loader pattern for AWS env vars. Out of scope for this document (which is producer-side feeding), but linked from `install.sh` which handles both. - `~/.agents/skills/mempalace/SKILL.md` — agent skill for the **consumer** side (searching, diary, KG). Pair with `SKILL.md` in this repo. - [`cli_utils`](https://gitea.jordbo.se/joakimp/cli_utils) — sibling repo: shell quality-of-life tools. Origin of these wrappers before the 2026-04-30 split.