72e7019101
1. 'mempalace init --yes' without a dir argument fails — 'dir' is
required. The semantics were wrong too: 'mempalace init' is
per-project (sets up mempalace.yaml + entity detection in a specific
directory), not a one-time global init. The palace itself is
created lazily on first write, so neither mempalace-session nor
mempalace-docs requires any init step.
Removed the misleading 'One-time palace init' block from README.md,
ARCHITECTURE.md, and SKILL.md. Added a clarifying note:
'mempalace init <dir>' is per-project and optional (needed only to
customize the wing name or entity detection before mempalace-docs).
2. install.sh's 'Skipping <name>: <dest> exists and is not our symlink'
warning gave no actionable guidance. On the Mac, a leftover
~/.local/bin/mempalace-docs (likely from the pre-split cli_utils
days) was blocking the new install and the user had no easy way
to know what to do about it.
Expanded the warning to:
- Show whether the blocker is a symlink (and what it points at) or
a real file.
- Print the exact 'rm && ./install.sh' fix line.
- Track skipped count separately and flag it in the closing
summary so a scrolling user doesn't miss it.
Added matching troubleshooting paragraph to the README 'Install
mempalace-toolkit' section explaining the skip behaviour and
pointing at the installer's own message for the fix.
Smoke-tested the new skip-warning code path by temporarily replacing
~/.local/bin/mempalace-docs with a foreign symlink and re-running
install.sh — output is clear, specific, and restores cleanly.
330 lines
20 KiB
Markdown
330 lines
20 KiB
Markdown
# MemPalace Feeding Architecture
|
|
|
|
This repository wires [opencode](https://github.com/anomalyco/opencode) and arbitrary project directories into [MemPalace](https://github.com/MemPalace/mempalace) via two thin wrappers in `bin/`. This document explains why they exist and how they fit together.
|
|
|
|
**Audience:** someone setting up a new machine (or reviewing what's already set up) and asking "how does the palace actually get fed?". Pairs with the `mempalace` agent skill, which covers the *consumer* side (searching, diary, KG). This document covers the *producer* side.
|
|
|
|
---
|
|
|
|
## 1. The problem
|
|
|
|
MemPalace is a persistent memory layer for AI agents — vector search over drawers (chunks of verbatim content), a knowledge graph, and per-agent diaries, all behind an MCP server. To be useful it has to be *fed*: project docs, conversation transcripts, session summaries.
|
|
|
|
The stock mempalace CLI has two feeders:
|
|
|
|
| Feeder | What it ingests | Gap |
|
|
| ------------------------------------- | -------------------------------------------------------- | ------------------------------------------------------------------------------------------------------ |
|
|
| `mempalace mine` (default mode) | Any "readable" file in a directory (code + docs + misc) | Mines source code indiscriminately → embedding index floods with low-signal `__init__` fragments. |
|
|
| `mempalace mine --mode convos` | Claude Code JSONL, Claude.ai JSON, ChatGPT JSON, Slack, Codex JSONL | No opencode support. No SQLite support. Opencode persists its history in SQLite, not JSONL. |
|
|
|
|
And one auto-save path:
|
|
|
|
| Feeder | Harnesses supported | Gap |
|
|
| ------------------------------------- | -------------------------------------------------------- | ------------------------------------------------------------------------------------------------------ |
|
|
| `hooks_cli.py` (session-stop hooks) | `claude-code`, `codex` | No `opencode` harness → `/exit` mid-session leaves no diary entry behind. |
|
|
|
|
So on a machine using opencode + the "docs-first palace hygiene" policy, three gaps bite:
|
|
|
|
1. Mining a project floods the palace with source code we don't want.
|
|
2. Opencode session history is trapped in SQLite, invisible to `mine --mode convos`.
|
|
3. There's no auto-save on session stop — any persistence is best-effort heuristic.
|
|
|
|
The two wrappers in `bin/` close gaps **1** and **2**. Gap **3** is upstream work (see §6).
|
|
|
|
---
|
|
|
|
## 2. The architecture
|
|
|
|
```
|
|
Project dirs (/workspace/*) Opencode SQLite DB
|
|
├── *.md ~/.local/share/opencode/opencode.db
|
|
├── *.yaml ├── session (id, title, directory, time_created/updated)
|
|
├── Dockerfile ├── message (session_id, data JSON w/ role)
|
|
└── … └── part (message_id, data JSON w/ type: text|tool|…)
|
|
│ │
|
|
│ │
|
|
┌─────▼──────────┐ ┌────▼──────────────┐
|
|
│ mempalace-docs │ │ mempalace-session │
|
|
│ (bin/) │ │ (bin/) │
|
|
│ │ │ │
|
|
│ stage docs │ │ export each │
|
|
│ only via cp -p │ │ session as Claude │
|
|
│ to cache dir │ │ Code JSONL to │
|
|
│ │ │ cache dir │
|
|
└─────┬──────────┘ └────┬──────────────┘
|
|
│ │
|
|
│ ~/.cache/mempalace-docs/<wing>/ │ ~/.cache/mempalace-session/<wing>/
|
|
│ │
|
|
┌─────▼──────────┐ ┌────▼──────────────┐
|
|
│ mempalace mine │ │ mempalace mine │
|
|
│ │ │ --mode convos │
|
|
└─────┬──────────┘ └────┬──────────────┘
|
|
│ │
|
|
└───────────────────┬──────────────────────┘
|
|
│
|
|
┌──────▼─────────┐
|
|
│ ChromaDB │
|
|
│ ~/.mempalace/ │
|
|
│ palace/ │
|
|
└──────┬─────────┘
|
|
│
|
|
MCP server (mempalace_*)
|
|
│
|
|
AI agents (opencode, claude code, codex, …)
|
|
```
|
|
|
|
**Shared idiom:** *stage-to-cache-then-mine*.
|
|
|
|
Neither wrapper reimplements the mempalace miner. They each:
|
|
|
|
1. Curate input (filter / transform / rename).
|
|
2. Write it to a deterministic path under `~/.cache/…/<wing>/` with `mtime` preserved (via `cp -p` or explicit `os.utime`).
|
|
3. Delegate actual embedding + filing to `mempalace mine`, which already dedups on `source_file` path.
|
|
|
|
This keeps the wrappers thin. A third wrapper following the same idiom would justify factoring a shared helper library — two does not.
|
|
|
|
---
|
|
|
|
## 3. Component details
|
|
|
|
### `bin/mempalace-docs` (268 lines) — docs-first mining
|
|
|
|
**Input:** a project directory.
|
|
**Output:** palace drawers in `wing_<directory-name>` (or `--wing` override), only from documentation-class files.
|
|
|
|
What it files: `*.md`, `*.mdx`, `*.rst`, `*.txt`, `*.yml`, `*.yaml`, `*.toml`, selective `*.json`, shell scripts, Dockerfiles, Makefiles, license/notice files.
|
|
|
|
What it drops: source code (`.py`, `.ts`, `.go`, `.rs`, …), lockfiles, `.git`, `.venv`, `node_modules`, `__pycache__`, build output.
|
|
|
|
**Implementation notes:**
|
|
|
|
- Reads `mempalace.yaml` (if present) to discover the actual wing name — avoids drift if someone renamed the wing after init.
|
|
- Uses `cp -p` (not symlinks) because the miner skips symlinks (`miner.py` line 828).
|
|
- Auto-purges pre-existing drawers whose `source_file` is under the workspace path before re-mining, to prevent doubling on re-runs.
|
|
- Upstream [PR #1213](https://github.com/MemPalace/mempalace/pull/1213) will add `exclude_patterns` to `mempalace.yaml` — when merged, this wrapper should shrink to a thin shim.
|
|
|
|
### `bin/mempalace-session` (349 lines) — opencode → palace bridge
|
|
|
|
**Input:** the opencode SQLite DB (default `~/.local/share/opencode/opencode.db`).
|
|
**Output:** palace drawers in `wing_conversations` (or `--wing` override), one JSONL file per qualifying session.
|
|
|
|
**Transform pipeline, per session:**
|
|
|
|
1. Read `session` row (`id`, `title`, `directory`, `time_created`, `time_updated`).
|
|
2. Inject synthetic header as first user turn: `[session: <title> | <directory> | <YYYY-MM-DD>]` → makes title/dir/date semantically searchable.
|
|
3. For each `message` ordered by `id`:
|
|
- Read JSON `data` → get `role` (`user` / `assistant`).
|
|
- For each `part` under the message, read JSON `data` → dispatch on `type`:
|
|
- `text` → text block.
|
|
- `tool` → Claude Code `tool_use` block + deferred `tool_result` as synthetic human message (the mempalace normalizer folds it back into the assistant turn via its `is_tool_only` branch).
|
|
- `step-start` / `step-finish` → dropped as noise.
|
|
- `reasoning` → kept, prefixed with `[reasoning]`.
|
|
4. Serialize as Claude Code JSONL (`{"type": "user"|"assistant", "message": {"content": [...]}}`) — the one convos format the miner already understands.
|
|
5. Stage at `~/.cache/mempalace-session/<wing>/<slug>_<id>.jsonl` with `mtime` = `session.time_updated` (deterministic, stable under dedup).
|
|
|
|
**Filters:**
|
|
|
|
- `--min-messages N` (default 3) — drops throwaway `/exit`'d sessions that would flood the palace.
|
|
- `--since YYYY-MM-DD` — incremental catch-up.
|
|
- `--session <id>` — one-shot mode.
|
|
|
|
**Then:** invokes `mempalace mine --mode convos` against the cache dir, followed by `mempalace repair` (unless `--no-repair`).
|
|
|
|
---
|
|
|
|
## 4. Setup recipe (new machine)
|
|
|
|
Assumes: opencode already installed, `~/.local/share/opencode/opencode.db` exists, `mempalace` CLI installed (v3.3.3+). If mempalace isn't installed yet, [`README.md`](README.md#installing-mempalace-itself-prerequisite) covers the `uv tool install mempalace` flow for both personal machines and the `/opt/uv-tools/` container pattern used by opencode-devbox.
|
|
|
|
```bash
|
|
# 1. Clone mempalace-toolkit (holds the two wrappers in bin/)
|
|
git clone ssh://git@gitea.jordbo.se:2222/joakimp/mempalace-toolkit.git ~/mempalace-toolkit
|
|
cd ~/mempalace-toolkit
|
|
|
|
# 2. Install (symlinks bin/* into ~/.local/bin, adds loader to rc file)
|
|
./install.sh
|
|
|
|
# 3. Ensure ~/.local/bin is on PATH (installer warns if not)
|
|
export PATH="$HOME/.local/bin:$PATH"
|
|
|
|
# 4. Mine opencode history into the palace
|
|
# (No global init step needed — the palace is created on first write.
|
|
# `mempalace init <dir>` is per-project, not global, and is optional.)
|
|
mempalace-session --dry-run # preview scope
|
|
mempalace-session # do it for real (~20 min for ~60 sessions)
|
|
|
|
# 5. Mine project docs (per project — run `mempalace init --yes <dir>`
|
|
# first if you want to customize the wing name or entity detection)
|
|
mempalace-docs /workspace/my_project --dry-run
|
|
mempalace-docs /workspace/my_project
|
|
|
|
# 7. Restart any MCP-connected agent, or call mempalace_reconnect from inside one
|
|
```
|
|
|
|
### Containerized setup (devbox)
|
|
|
|
The devbox uses two named Docker volumes so these persist across container recreate:
|
|
|
|
- `devbox-palace` → `~/.mempalace/palace` (the palace itself)
|
|
- `devbox-data` → `~/.local/share/opencode` (opencode's SQLite DB)
|
|
|
|
Code at `/workspace/mempalace-toolkit` is a bind mount from the host — survives container recreate and syncs via gitea. Staging directories (`~/.cache/mempalace-{docs,session}/`) are ephemeral but cheap to rebuild.
|
|
|
|
**After container recreate**, just re-run `./install.sh` (idempotent) to relink `bin/` into the fresh `~/.local/bin/`.
|
|
|
|
---
|
|
|
|
## 5. Operational notes
|
|
|
|
### Dedup behavior
|
|
|
|
Both wrappers dedup via `mempalace mine`'s built-in key:
|
|
|
|
- `mempalace-docs`: keys on `source_file` path + `mtime` → edit a doc, it re-mines; unchanged files are skipped.
|
|
- `mempalace-session`: keys on `source_file` path alone (convos miner doesn't check mtime) → a session's JSONL filename is `<slug>_<id>.jsonl`, stable per session, so re-runs skip already-filed sessions. To force re-mining, delete the staging dir.
|
|
|
|
**Verified:** a second full `mempalace-session` run immediately after the first produces 0 new drawers. The only cost is the post-mine `repair` step (index rebuild — ~5 min on 5k drawers).
|
|
|
|
### When to re-mine
|
|
|
|
- `mempalace-docs`: after significant doc changes in a project.
|
|
- `mempalace-session`: see the full **Operational Routine** below.
|
|
|
|
### Operational Routine (the `mempalace-session` workflow)
|
|
|
|
Until opencode grows session hooks and `hooks_cli.py` grows an opencode harness (see §6), **`mempalace-session` is the entire mechanism that gets opencode conversations into the palace.** Skip it and your session history exists only inside `~/.local/share/opencode/opencode.db` — a local SQLite file that's invisible to `mempalace_search`, vulnerable to volume wipes, and lost if the devbox is replaced.
|
|
|
|
That makes the routine worth codifying:
|
|
|
|
#### Triggers (when to run)
|
|
|
|
| Trigger | What to run | Why |
|
|
|---|---|---|
|
|
| **Substantive session you want preserved past `/exit`** | `mempalace-session --session <id>` | Targeted save before destructive action; see `~/.local/share/opencode/opencode.db` `session` table for the ID. |
|
|
| **Before a container recreate** | `mempalace-session` | The opencode DB lives in a named volume (`devbox-data`) so it normally survives, but a full mine right before is cheap insurance. |
|
|
| **Fresh machine, first provisioning** | `mempalace-session --dry-run` then `mempalace-session` | Backfills the whole corpus. Expect ~20 min / 60 sessions. |
|
|
| **Periodic sweep** | `mempalace-session` | Weekly catches anything you didn't explicitly save. Dedup is free, so running more often only costs the ~5 min repair. |
|
|
| **After upstream mempalace upgrade** | `mempalace-session` + `mempalace repair` | If the miner changed normalization or chunking, re-mine ensures the palace reflects current logic. Rare. |
|
|
|
|
#### Cadence (how often)
|
|
|
|
**Default: weekly.** Dedup is free on unchanged sessions, and `wing_conversations` growth is roughly linear in user activity. Weekly is frequent enough that searches almost always include recent context, and infrequent enough that the cost is negligible.
|
|
|
|
**Daily** is fine but wasteful — you'll pay the post-mine `repair` cost seven times more often than you need. If you want daily runs, add `--no-repair` and schedule a separate weekly repair.
|
|
|
|
**Monthly** is too infrequent. You'll search for "that thing we discussed last Tuesday" and miss it.
|
|
|
|
#### Relationship to the session lifecycle
|
|
|
|
`mempalace-session` is **offline, inter-session maintenance** — it runs between agent sessions, not during them. It does not replace the in-session habits from the consumer-side `mempalace` skill:
|
|
|
|
| Habit | When | Who |
|
|
|---|---|---|
|
|
| Wake-up search (load recent diary) | Agent session start | Agent, during session |
|
|
| Wind-down diary write | Agent session end | Agent, during session |
|
|
| `mempalace-session` mine | Between sessions (manual or scheduled) | Operator or automation |
|
|
|
|
The first two are live; the third is batched. They're complementary, not alternatives. A machine doing only wake-up/wind-down keeps a diary but loses the actual conversation turns. A machine doing only `mempalace-session` captures the raw turns but not the curated summaries. Do both.
|
|
|
|
#### Automation
|
|
|
|
Pick one:
|
|
|
|
1. **systemd user timer** (recommended on Linux). Survives reboots, optional `Persistent=true` catch-up, logs to `journalctl`, background I/O priority. Templates in [`contrib/systemd/`](contrib/systemd/).
|
|
2. **launchd user agent** (recommended on macOS). The macOS-native equivalent — runs without a login session, logs to `~/Library/Logs/`, single-instance guarantees, `ProcessType=Background` throttling. Templates in [`contrib/launchd/`](contrib/launchd/).
|
|
3. **cron** (simplest, works on BSD and systemd-less Linux distros). Templates in [`contrib/cron/`](contrib/cron/).
|
|
4. **Manual** — run `mempalace-session` opportunistically. Fine on machines where you're in and out frequently; less fine on long-running devboxes.
|
|
|
|
Install recipes, verification commands, and uninstall steps for all four are in [`contrib/README.md`](contrib/README.md).
|
|
|
|
Quick-start (systemd user timer, Linux):
|
|
|
|
```bash
|
|
mkdir -p ~/.config/systemd/user
|
|
cp contrib/systemd/*.{service,timer} ~/.config/systemd/user/
|
|
systemctl --user daemon-reload
|
|
systemctl --user enable --now mempalace-session.timer
|
|
# Optional on headless boxes: keep timer running when logged out
|
|
sudo loginctl enable-linger "$USER"
|
|
systemctl --user list-timers mempalace-session.timer
|
|
```
|
|
|
|
Quick-start (launchd user agent, macOS):
|
|
|
|
```bash
|
|
sed "s|USER|$USER|g" contrib/launchd/se.jordbo.mempalace-session.plist \
|
|
> ~/Library/LaunchAgents/se.jordbo.mempalace-session.plist
|
|
mkdir -p ~/Library/Logs
|
|
launchctl bootstrap "gui/$(id -u)" ~/Library/LaunchAgents/se.jordbo.mempalace-session.plist
|
|
launchctl enable "gui/$(id -u)/se.jordbo.mempalace-session"
|
|
launchctl list | grep mempalace-session
|
|
```
|
|
|
|
Quick-start (cron):
|
|
|
|
```bash
|
|
sed "s|USER|$USER|g" contrib/cron/mempalace-session.cron \
|
|
| (crontab -l 2>/dev/null; cat) | crontab -
|
|
mkdir -p ~/.cache/mempalace-session
|
|
```
|
|
|
|
#### Verification
|
|
|
|
After any run (manual or scheduled), confirm the palace grew:
|
|
|
|
```bash
|
|
mempalace-session --dry-run # should list sessions
|
|
# Or from inside a live MCP client:
|
|
# mempalace_status — wing_conversations count
|
|
# mempalace_reconnect — refresh index after mine
|
|
```
|
|
|
|
A healthy run produces one of:
|
|
- **First run on fresh corpus**: several hundred to several thousand new drawers.
|
|
- **Incremental run**: zero to a few dozen new drawers (whatever grew since last run).
|
|
- **Rerun with no new activity**: zero new drawers, only the repair step runs.
|
|
|
|
A run that files far more drawers than expected may indicate a staging-dir wipe (forcing a full re-mine) — check `~/.cache/mempalace-session/<wing>/` modification times.
|
|
|
|
### Cost profile (reference)
|
|
|
|
Measured on a ~10-day opencode corpus of 140 sessions / 1491 messages / 4656 parts:
|
|
|
|
- Dry run: seconds.
|
|
- Full mine: **21 minutes** (38 min user CPU). Produced 2378 drawers from 62 qualifying sessions.
|
|
- Dedup re-run: mine step instant; only the repair runs (~5 min).
|
|
|
|
Scaling is roughly linear in message count. Budget ~20 minutes per 60-session batch.
|
|
|
|
### Common failure modes
|
|
|
|
| Symptom | Cause | Fix |
|
|
| ---------------------------------------------- | ----------------------------------------------------- | --------------------------------------------------------- |
|
|
| `mempalace-session: command not found` after container recreate | `~/.local/bin` wiped with container | `cd ~/mempalace-toolkit && ./install.sh` |
|
|
| Search errors "Error finding id" post-mine | Stale HNSW index | `mempalace repair --yes` + `mempalace_reconnect` from MCP |
|
|
| Drawers doubled after re-mining a project | Someone renamed the wing or ran raw `mempalace mine` alongside the wrapper | Inspect `embedding_metadata` in `chroma.sqlite3`; purge duplicates by source prefix, then `mempalace repair` |
|
|
| Sessions missing from palace | Session has fewer than `--min-messages` messages | Lower the threshold or `--session <id>` explicitly |
|
|
|
|
---
|
|
|
|
## 6. Upstream roadmap
|
|
|
|
These gaps should ideally close upstream, making the wrappers thinner or obsolete:
|
|
|
|
1. **[MemPalace PR #1213](https://github.com/MemPalace/mempalace/pull/1213)** — `exclude_patterns` in `mempalace.yaml`. When merged, `mempalace-docs` shrinks to a thin shim (or disappears) since exclude-by-extension becomes a first-class config.
|
|
2. **Opencode session hooks** — [PR #16598](https://github.com/anomalyco/opencode/pull/16598) (session.stopping), [PR #16769](https://github.com/anomalyco/opencode/pull/16769) (shutdown), [PR #15224](https://github.com/anomalyco/opencode/pull/15224) (session.start), [issue #23503](https://github.com/anomalyco/opencode/issues/23503) (session.turn.completed). When at least one merges, opencode can fire hooks mempalace can receive.
|
|
3. **Opencode harness in `hooks_cli.py`** — mempalace's hooks CLI only knows `claude-code` + `codex` today. Adding `opencode` would let the auto-save diary path work on opencode too. Pairs with #2 above.
|
|
4. **SQLite mode for `mempalace mine --mode convos`** — if upstream ever adds direct SQLite ingest for opencode, `mempalace-session` loses its reason to exist (the export-to-JSONL dance goes away).
|
|
|
|
When #1 merges, retire `mempalace-docs` to a thin shim. When #2 + #3 land together, `mempalace-session` becomes a manual-only fallback (cron / backfill) while hooks handle live saves.
|
|
|
|
---
|
|
|
|
## 7. See also
|
|
|
|
- [`README.md`](README.md) — human-facing quickstart + per-tool usage reference.
|
|
- [`AGENTS.md`](AGENTS.md) — repo conventions for AI agents modifying this codebase.
|
|
- [`SKILL.md`](SKILL.md) — agent skill (producer side), symlinked into `~/.agents/skills/opencode-mempalace-bridge/` by `install.sh`.
|
|
- `~/.agents/skills/mempalace/SKILL.md` — agent skill for the **consumer** side (searching, diary, KG). Pair with `SKILL.md` in this repo.
|
|
- [`cli_utils`](https://gitea.jordbo.se/joakimp/cli_utils) — sibling repo: shell quality-of-life tools. Origin of these wrappers before the 2026-04-30 split.
|