954c3f2ebb
Producer-side MemPalace tooling: two bash wrappers that bridge opencode session history and project documentation into the palace. Originally developed in cli_utils (2026-04-28); split into its own repo on 2026-04-30 because the conceptual fit was weak — cli_utils is interactive shell tooling, while this is agent memory infrastructure with its own architecture, dependency surface, and growth trajectory. Contents: - bin/mempalace-docs — docs-only mining wrapper (originally a2ddcc9 in cli_utils), bridges the gap until MemPalace PR #1213 (exclude_patterns) merges upstream. - bin/mempalace-session — opencode → palace session bridge (originally dacca0e in cli_utils). Reads ~/.local/share/opencode/opencode.db, exports each session to Claude Code JSONL, mines via 'mempalace mine --mode convos'. Bridges the gap until opencode session-stopping hooks + an opencode harness in hooks_cli.py land upstream. - ARCHITECTURE.md — canonical spec: architecture diagram, component details, setup recipe, operational notes, upstream-retirement roadmap. Originally a4cf314 in cli_utils. - SKILL.md — companion agent skill (producer side). Pairs with the consumer-side mempalace skill. Symlinked into ~/.agents/skills/opencode-mempalace-bridge/ by install.sh. - install.sh — idempotent installer, also handles --uninstall. - AGENTS.md — repo conventions. History of the individual files is not preserved in this split; see cli_utils (gitea.jordbo.se/joakimp/cli_utils) commits a2ddcc9, dacca0e, and a4cf314 for the original authorship context.
234 lines
15 KiB
Markdown
234 lines
15 KiB
Markdown
# MemPalace Feeding Architecture
|
|
|
|
This repository wires [opencode](https://github.com/anomalyco/opencode) and arbitrary project directories into [MemPalace](https://github.com/MemPalace/mempalace) via two thin wrappers in `bin/`. This document explains why they exist and how they fit together.
|
|
|
|
**Audience:** someone setting up a new machine (or reviewing what's already set up) and asking "how does the palace actually get fed?". Pairs with the `mempalace` agent skill, which covers the *consumer* side (searching, diary, KG). This document covers the *producer* side.
|
|
|
|
---
|
|
|
|
## 1. The problem
|
|
|
|
MemPalace is a persistent memory layer for AI agents — vector search over drawers (chunks of verbatim content), a knowledge graph, and per-agent diaries, all behind an MCP server. To be useful it has to be *fed*: project docs, conversation transcripts, session summaries.
|
|
|
|
The stock mempalace CLI has two feeders:
|
|
|
|
| Feeder | What it ingests | Gap |
|
|
| ------------------------------------- | -------------------------------------------------------- | ------------------------------------------------------------------------------------------------------ |
|
|
| `mempalace mine` (default mode) | Any "readable" file in a directory (code + docs + misc) | Mines source code indiscriminately → embedding index floods with low-signal `__init__` fragments. |
|
|
| `mempalace mine --mode convos` | Claude Code JSONL, Claude.ai JSON, ChatGPT JSON, Slack, Codex JSONL | No opencode support. No SQLite support. Opencode persists its history in SQLite, not JSONL. |
|
|
|
|
And one auto-save path:
|
|
|
|
| Feeder | Harnesses supported | Gap |
|
|
| ------------------------------------- | -------------------------------------------------------- | ------------------------------------------------------------------------------------------------------ |
|
|
| `hooks_cli.py` (session-stop hooks) | `claude-code`, `codex` | No `opencode` harness → `/exit` mid-session leaves no diary entry behind. |
|
|
|
|
So on a machine using opencode + the "docs-first palace hygiene" policy, three gaps bite:
|
|
|
|
1. Mining a project floods the palace with source code we don't want.
|
|
2. Opencode session history is trapped in SQLite, invisible to `mine --mode convos`.
|
|
3. There's no auto-save on session stop — any persistence is best-effort heuristic.
|
|
|
|
The two wrappers in `bin/` close gaps **1** and **2**. Gap **3** is upstream work (see §6).
|
|
|
|
---
|
|
|
|
## 2. The architecture
|
|
|
|
```
|
|
Project dirs (/workspace/*) Opencode SQLite DB
|
|
├── *.md ~/.local/share/opencode/opencode.db
|
|
├── *.yaml ├── session (id, title, directory, time_created/updated)
|
|
├── Dockerfile ├── message (session_id, data JSON w/ role)
|
|
└── … └── part (message_id, data JSON w/ type: text|tool|…)
|
|
│ │
|
|
│ │
|
|
┌─────▼──────────┐ ┌────▼──────────────┐
|
|
│ mempalace-docs │ │ mempalace-session │
|
|
│ (bin/) │ │ (bin/) │
|
|
│ │ │ │
|
|
│ stage docs │ │ export each │
|
|
│ only via cp -p │ │ session as Claude │
|
|
│ to cache dir │ │ Code JSONL to │
|
|
│ │ │ cache dir │
|
|
└─────┬──────────┘ └────┬──────────────┘
|
|
│ │
|
|
│ ~/.cache/mempalace-docs/<wing>/ │ ~/.cache/mempalace-session/<wing>/
|
|
│ │
|
|
┌─────▼──────────┐ ┌────▼──────────────┐
|
|
│ mempalace mine │ │ mempalace mine │
|
|
│ │ │ --mode convos │
|
|
└─────┬──────────┘ └────┬──────────────┘
|
|
│ │
|
|
└───────────────────┬──────────────────────┘
|
|
│
|
|
┌──────▼─────────┐
|
|
│ ChromaDB │
|
|
│ ~/.mempalace/ │
|
|
│ palace/ │
|
|
└──────┬─────────┘
|
|
│
|
|
MCP server (mempalace_*)
|
|
│
|
|
AI agents (opencode, claude code, codex, …)
|
|
```
|
|
|
|
**Shared idiom:** *stage-to-cache-then-mine*.
|
|
|
|
Neither wrapper reimplements the mempalace miner. They each:
|
|
|
|
1. Curate input (filter / transform / rename).
|
|
2. Write it to a deterministic path under `~/.cache/…/<wing>/` with `mtime` preserved (via `cp -p` or explicit `os.utime`).
|
|
3. Delegate actual embedding + filing to `mempalace mine`, which already dedups on `source_file` path.
|
|
|
|
This keeps the wrappers thin. A third wrapper following the same idiom would justify factoring a shared helper library — two does not.
|
|
|
|
---
|
|
|
|
## 3. Component details
|
|
|
|
### `bin/mempalace-docs` (268 lines) — docs-first mining
|
|
|
|
**Input:** a project directory.
|
|
**Output:** palace drawers in `wing_<directory-name>` (or `--wing` override), only from documentation-class files.
|
|
|
|
What it files: `*.md`, `*.mdx`, `*.rst`, `*.txt`, `*.yml`, `*.yaml`, `*.toml`, selective `*.json`, shell scripts, Dockerfiles, Makefiles, license/notice files.
|
|
|
|
What it drops: source code (`.py`, `.ts`, `.go`, `.rs`, …), lockfiles, `.git`, `.venv`, `node_modules`, `__pycache__`, build output.
|
|
|
|
**Implementation notes:**
|
|
|
|
- Reads `mempalace.yaml` (if present) to discover the actual wing name — avoids drift if someone renamed the wing after init.
|
|
- Uses `cp -p` (not symlinks) because the miner skips symlinks (`miner.py` line 828).
|
|
- Auto-purges pre-existing drawers whose `source_file` is under the workspace path before re-mining, to prevent doubling on re-runs.
|
|
- Upstream [PR #1213](https://github.com/MemPalace/mempalace/pull/1213) will add `exclude_patterns` to `mempalace.yaml` — when merged, this wrapper should shrink to a thin shim.
|
|
|
|
### `bin/mempalace-session` (349 lines) — opencode → palace bridge
|
|
|
|
**Input:** the opencode SQLite DB (default `~/.local/share/opencode/opencode.db`).
|
|
**Output:** palace drawers in `wing_conversations` (or `--wing` override), one JSONL file per qualifying session.
|
|
|
|
**Transform pipeline, per session:**
|
|
|
|
1. Read `session` row (`id`, `title`, `directory`, `time_created`, `time_updated`).
|
|
2. Inject synthetic header as first user turn: `[session: <title> | <directory> | <YYYY-MM-DD>]` → makes title/dir/date semantically searchable.
|
|
3. For each `message` ordered by `id`:
|
|
- Read JSON `data` → get `role` (`user` / `assistant`).
|
|
- For each `part` under the message, read JSON `data` → dispatch on `type`:
|
|
- `text` → text block.
|
|
- `tool` → Claude Code `tool_use` block + deferred `tool_result` as synthetic human message (the mempalace normalizer folds it back into the assistant turn via its `is_tool_only` branch).
|
|
- `step-start` / `step-finish` → dropped as noise.
|
|
- `reasoning` → kept, prefixed with `[reasoning]`.
|
|
4. Serialize as Claude Code JSONL (`{"type": "user"|"assistant", "message": {"content": [...]}}`) — the one convos format the miner already understands.
|
|
5. Stage at `~/.cache/mempalace-session/<wing>/<slug>_<id>.jsonl` with `mtime` = `session.time_updated` (deterministic, stable under dedup).
|
|
|
|
**Filters:**
|
|
|
|
- `--min-messages N` (default 3) — drops throwaway `/exit`'d sessions that would flood the palace.
|
|
- `--since YYYY-MM-DD` — incremental catch-up.
|
|
- `--session <id>` — one-shot mode.
|
|
|
|
**Then:** invokes `mempalace mine --mode convos` against the cache dir, followed by `mempalace repair` (unless `--no-repair`).
|
|
|
|
---
|
|
|
|
## 4. Setup recipe (new machine)
|
|
|
|
Assumes: opencode already installed, `~/.local/share/opencode/opencode.db` exists, `mempalace` CLI installed (v3.3.3+).
|
|
|
|
```bash
|
|
# 1. Clone mempalace-toolkit (holds the two wrappers in bin/)
|
|
git clone ssh://git@gitea.jordbo.se:2222/joakimp/mempalace-toolkit.git ~/mempalace-toolkit
|
|
cd ~/mempalace-toolkit
|
|
|
|
# 2. Install (symlinks bin/* into ~/.local/bin, adds loader to rc file)
|
|
./install.sh
|
|
|
|
# 3. Ensure ~/.local/bin is on PATH (installer warns if not)
|
|
export PATH="$HOME/.local/bin:$PATH"
|
|
|
|
# 4. Initialize palace if needed (one-time, platform-wide)
|
|
mempalace init --yes
|
|
|
|
# 5. Mine opencode history into the palace
|
|
mempalace-session --dry-run # preview scope
|
|
mempalace-session # do it for real (~20 min for ~60 sessions)
|
|
|
|
# 6. Mine project docs (per project)
|
|
mempalace-docs /workspace/my_project --dry-run
|
|
mempalace-docs /workspace/my_project
|
|
|
|
# 7. Restart any MCP-connected agent, or call mempalace_reconnect from inside one
|
|
```
|
|
|
|
### Containerized setup (devbox)
|
|
|
|
The devbox uses two named Docker volumes so these persist across container recreate:
|
|
|
|
- `devbox-palace` → `~/.mempalace/palace` (the palace itself)
|
|
- `devbox-data` → `~/.local/share/opencode` (opencode's SQLite DB)
|
|
|
|
Code at `/workspace/mempalace-toolkit` is a bind mount from the host — survives container recreate and syncs via gitea. Staging directories (`~/.cache/mempalace-{docs,session}/`) are ephemeral but cheap to rebuild.
|
|
|
|
**After container recreate**, just re-run `./install.sh` (idempotent) to relink `bin/` into the fresh `~/.local/bin/`.
|
|
|
|
---
|
|
|
|
## 5. Operational notes
|
|
|
|
### Dedup behavior
|
|
|
|
Both wrappers dedup via `mempalace mine`'s built-in key:
|
|
|
|
- `mempalace-docs`: keys on `source_file` path + `mtime` → edit a doc, it re-mines; unchanged files are skipped.
|
|
- `mempalace-session`: keys on `source_file` path alone (convos miner doesn't check mtime) → a session's JSONL filename is `<slug>_<id>.jsonl`, stable per session, so re-runs skip already-filed sessions. To force re-mining, delete the staging dir.
|
|
|
|
**Verified:** a second full `mempalace-session` run immediately after the first produces 0 new drawers. The only cost is the post-mine `repair` step (index rebuild — ~5 min on 5k drawers).
|
|
|
|
### When to re-mine
|
|
|
|
- `mempalace-docs`: after significant doc changes in a project.
|
|
- `mempalace-session`: opportunistically. Every few days catches new opencode sessions. Or wire to cron / systemd timer for true auto-save coverage (not yet done).
|
|
|
|
### Cost profile (reference)
|
|
|
|
Measured on a ~10-day opencode corpus of 140 sessions / 1491 messages / 4656 parts:
|
|
|
|
- Dry run: seconds.
|
|
- Full mine: **21 minutes** (38 min user CPU). Produced 2378 drawers from 62 qualifying sessions.
|
|
- Dedup re-run: mine step instant; only the repair runs (~5 min).
|
|
|
|
Scaling is roughly linear in message count. Budget ~20 minutes per 60-session batch.
|
|
|
|
### Common failure modes
|
|
|
|
| Symptom | Cause | Fix |
|
|
| ---------------------------------------------- | ----------------------------------------------------- | --------------------------------------------------------- |
|
|
| `mempalace-session: command not found` after container recreate | `~/.local/bin` wiped with container | `cd ~/mempalace-toolkit && ./install.sh` |
|
|
| Search errors "Error finding id" post-mine | Stale HNSW index | `mempalace repair --yes` + `mempalace_reconnect` from MCP |
|
|
| Drawers doubled after re-mining a project | Someone renamed the wing or ran raw `mempalace mine` alongside the wrapper | Inspect `embedding_metadata` in `chroma.sqlite3`; purge duplicates by source prefix, then `mempalace repair` |
|
|
| Sessions missing from palace | Session has fewer than `--min-messages` messages | Lower the threshold or `--session <id>` explicitly |
|
|
|
|
---
|
|
|
|
## 6. Upstream roadmap
|
|
|
|
These gaps should ideally close upstream, making the wrappers thinner or obsolete:
|
|
|
|
1. **[MemPalace PR #1213](https://github.com/MemPalace/mempalace/pull/1213)** — `exclude_patterns` in `mempalace.yaml`. When merged, `mempalace-docs` shrinks to a thin shim (or disappears) since exclude-by-extension becomes a first-class config.
|
|
2. **Opencode session hooks** — [PR #16598](https://github.com/anomalyco/opencode/pull/16598) (session.stopping), [PR #16769](https://github.com/anomalyco/opencode/pull/16769) (shutdown), [PR #15224](https://github.com/anomalyco/opencode/pull/15224) (session.start), [issue #23503](https://github.com/anomalyco/opencode/issues/23503) (session.turn.completed). When at least one merges, opencode can fire hooks mempalace can receive.
|
|
3. **Opencode harness in `hooks_cli.py`** — mempalace's hooks CLI only knows `claude-code` + `codex` today. Adding `opencode` would let the auto-save diary path work on opencode too. Pairs with #2 above.
|
|
4. **SQLite mode for `mempalace mine --mode convos`** — if upstream ever adds direct SQLite ingest for opencode, `mempalace-session` loses its reason to exist (the export-to-JSONL dance goes away).
|
|
|
|
When #1 merges, retire `mempalace-docs` to a thin shim. When #2 + #3 land together, `mempalace-session` becomes a manual-only fallback (cron / backfill) while hooks handle live saves.
|
|
|
|
---
|
|
|
|
## 7. See also
|
|
|
|
- [`README.md`](README.md) — human-facing quickstart + per-tool usage reference.
|
|
- [`AGENTS.md`](AGENTS.md) — repo conventions for AI agents modifying this codebase.
|
|
- [`SKILL.md`](SKILL.md) — agent skill (producer side), symlinked into `~/.agents/skills/opencode-mempalace-bridge/` by `install.sh`.
|
|
- `~/.agents/skills/mempalace/SKILL.md` — agent skill for the **consumer** side (searching, diary, KG). Pair with `SKILL.md` in this repo.
|
|
- [`cli_utils`](https://gitea.jordbo.se/joakimp/cli_utils) — sibling repo: shell quality-of-life tools. Origin of these wrappers before the 2026-04-30 split.
|