Files
mempalace-toolkit/ARCHITECTURE.md
T
Joakim Persson 36845e14b2 Document the operational routine + ship automation templates
Until opencode session-stopping hooks land upstream, mempalace-session
is the entire mechanism that gets opencode conversations into the
palace — skip it and session history stays trapped in a local SQLite
DB, invisible to semantic search. Previous docs covered setup well
but were thin on when and how often to run it.

- ARCHITECTURE.md §5: replace the one-line 'When to re-mine' note with
  a full Operational Routine section — triggers, cadence, relationship
  to the session lifecycle, automation pointers, verification.
- SKILL.md: add an Operational Routine section aimed at agents —
  when to suggest invoking the tool, cadence guidance, how to
  distinguish this producer-side tool from the consumer-side
  mempalace skill's in-session habits.
- README.md: add 'Keeping it fresh' subsection pointing at contrib/
  and the full docs.

contrib/ ships three ready-to-use templates:
- systemd/mempalace-session.{service,timer} — user units with weekly
  Mon 03:00 schedule, Persistent=true catch-up, RandomizedDelaySec for
  fleet-wide jitter, ConditionPathExists guard for opencode-less boxes,
  Nice+IOSchedulingClass=idle so it never fights interactive work.
- cron/mempalace-session.cron — sample crontab entry with log
  redirection and clear USER-substitution instructions.
- README.md with install/verify/uninstall recipes for both, a chooser
  table (systemd vs cron), container/devbox caveats, and tuning notes
  (daily vs weekly vs monthly trade-offs).

The user's LATER-list item 'wrap mempalace-session in cron/systemd
timer for true auto-save coverage' is now actionable: a single
systemctl --user enable --now command stands it up.
2026-04-30 06:29:55 +00:00

318 lines
19 KiB
Markdown

# MemPalace Feeding Architecture
This repository wires [opencode](https://github.com/anomalyco/opencode) and arbitrary project directories into [MemPalace](https://github.com/MemPalace/mempalace) via two thin wrappers in `bin/`. This document explains why they exist and how they fit together.
**Audience:** someone setting up a new machine (or reviewing what's already set up) and asking "how does the palace actually get fed?". Pairs with the `mempalace` agent skill, which covers the *consumer* side (searching, diary, KG). This document covers the *producer* side.
---
## 1. The problem
MemPalace is a persistent memory layer for AI agents — vector search over drawers (chunks of verbatim content), a knowledge graph, and per-agent diaries, all behind an MCP server. To be useful it has to be *fed*: project docs, conversation transcripts, session summaries.
The stock mempalace CLI has two feeders:
| Feeder | What it ingests | Gap |
| ------------------------------------- | -------------------------------------------------------- | ------------------------------------------------------------------------------------------------------ |
| `mempalace mine` (default mode) | Any "readable" file in a directory (code + docs + misc) | Mines source code indiscriminately → embedding index floods with low-signal `__init__` fragments. |
| `mempalace mine --mode convos` | Claude Code JSONL, Claude.ai JSON, ChatGPT JSON, Slack, Codex JSONL | No opencode support. No SQLite support. Opencode persists its history in SQLite, not JSONL. |
And one auto-save path:
| Feeder | Harnesses supported | Gap |
| ------------------------------------- | -------------------------------------------------------- | ------------------------------------------------------------------------------------------------------ |
| `hooks_cli.py` (session-stop hooks) | `claude-code`, `codex` | No `opencode` harness → `/exit` mid-session leaves no diary entry behind. |
So on a machine using opencode + the "docs-first palace hygiene" policy, three gaps bite:
1. Mining a project floods the palace with source code we don't want.
2. Opencode session history is trapped in SQLite, invisible to `mine --mode convos`.
3. There's no auto-save on session stop — any persistence is best-effort heuristic.
The two wrappers in `bin/` close gaps **1** and **2**. Gap **3** is upstream work (see §6).
---
## 2. The architecture
```
Project dirs (/workspace/*) Opencode SQLite DB
├── *.md ~/.local/share/opencode/opencode.db
├── *.yaml ├── session (id, title, directory, time_created/updated)
├── Dockerfile ├── message (session_id, data JSON w/ role)
└── … └── part (message_id, data JSON w/ type: text|tool|…)
│ │
│ │
┌─────▼──────────┐ ┌────▼──────────────┐
│ mempalace-docs │ │ mempalace-session │
│ (bin/) │ │ (bin/) │
│ │ │ │
│ stage docs │ │ export each │
│ only via cp -p │ │ session as Claude │
│ to cache dir │ │ Code JSONL to │
│ │ │ cache dir │
└─────┬──────────┘ └────┬──────────────┘
│ │
│ ~/.cache/mempalace-docs/<wing>/ │ ~/.cache/mempalace-session/<wing>/
│ │
┌─────▼──────────┐ ┌────▼──────────────┐
│ mempalace mine │ │ mempalace mine │
│ │ │ --mode convos │
└─────┬──────────┘ └────┬──────────────┘
│ │
└───────────────────┬──────────────────────┘
┌──────▼─────────┐
│ ChromaDB │
│ ~/.mempalace/ │
│ palace/ │
└──────┬─────────┘
MCP server (mempalace_*)
AI agents (opencode, claude code, codex, …)
```
**Shared idiom:** *stage-to-cache-then-mine*.
Neither wrapper reimplements the mempalace miner. They each:
1. Curate input (filter / transform / rename).
2. Write it to a deterministic path under `~/.cache/…/<wing>/` with `mtime` preserved (via `cp -p` or explicit `os.utime`).
3. Delegate actual embedding + filing to `mempalace mine`, which already dedups on `source_file` path.
This keeps the wrappers thin. A third wrapper following the same idiom would justify factoring a shared helper library — two does not.
---
## 3. Component details
### `bin/mempalace-docs` (268 lines) — docs-first mining
**Input:** a project directory.
**Output:** palace drawers in `wing_<directory-name>` (or `--wing` override), only from documentation-class files.
What it files: `*.md`, `*.mdx`, `*.rst`, `*.txt`, `*.yml`, `*.yaml`, `*.toml`, selective `*.json`, shell scripts, Dockerfiles, Makefiles, license/notice files.
What it drops: source code (`.py`, `.ts`, `.go`, `.rs`, …), lockfiles, `.git`, `.venv`, `node_modules`, `__pycache__`, build output.
**Implementation notes:**
- Reads `mempalace.yaml` (if present) to discover the actual wing name — avoids drift if someone renamed the wing after init.
- Uses `cp -p` (not symlinks) because the miner skips symlinks (`miner.py` line 828).
- Auto-purges pre-existing drawers whose `source_file` is under the workspace path before re-mining, to prevent doubling on re-runs.
- Upstream [PR #1213](https://github.com/MemPalace/mempalace/pull/1213) will add `exclude_patterns` to `mempalace.yaml` — when merged, this wrapper should shrink to a thin shim.
### `bin/mempalace-session` (349 lines) — opencode → palace bridge
**Input:** the opencode SQLite DB (default `~/.local/share/opencode/opencode.db`).
**Output:** palace drawers in `wing_conversations` (or `--wing` override), one JSONL file per qualifying session.
**Transform pipeline, per session:**
1. Read `session` row (`id`, `title`, `directory`, `time_created`, `time_updated`).
2. Inject synthetic header as first user turn: `[session: <title> | <directory> | <YYYY-MM-DD>]` → makes title/dir/date semantically searchable.
3. For each `message` ordered by `id`:
- Read JSON `data` → get `role` (`user` / `assistant`).
- For each `part` under the message, read JSON `data` → dispatch on `type`:
- `text` → text block.
- `tool` → Claude Code `tool_use` block + deferred `tool_result` as synthetic human message (the mempalace normalizer folds it back into the assistant turn via its `is_tool_only` branch).
- `step-start` / `step-finish` → dropped as noise.
- `reasoning` → kept, prefixed with `[reasoning]`.
4. Serialize as Claude Code JSONL (`{"type": "user"|"assistant", "message": {"content": [...]}}`) — the one convos format the miner already understands.
5. Stage at `~/.cache/mempalace-session/<wing>/<slug>_<id>.jsonl` with `mtime` = `session.time_updated` (deterministic, stable under dedup).
**Filters:**
- `--min-messages N` (default 3) — drops throwaway `/exit`'d sessions that would flood the palace.
- `--since YYYY-MM-DD` — incremental catch-up.
- `--session <id>` — one-shot mode.
**Then:** invokes `mempalace mine --mode convos` against the cache dir, followed by `mempalace repair` (unless `--no-repair`).
---
## 4. Setup recipe (new machine)
Assumes: opencode already installed, `~/.local/share/opencode/opencode.db` exists, `mempalace` CLI installed (v3.3.3+).
```bash
# 1. Clone mempalace-toolkit (holds the two wrappers in bin/)
git clone ssh://git@gitea.jordbo.se:2222/joakimp/mempalace-toolkit.git ~/mempalace-toolkit
cd ~/mempalace-toolkit
# 2. Install (symlinks bin/* into ~/.local/bin, adds loader to rc file)
./install.sh
# 3. Ensure ~/.local/bin is on PATH (installer warns if not)
export PATH="$HOME/.local/bin:$PATH"
# 4. Initialize palace if needed (one-time, platform-wide)
mempalace init --yes
# 5. Mine opencode history into the palace
mempalace-session --dry-run # preview scope
mempalace-session # do it for real (~20 min for ~60 sessions)
# 6. Mine project docs (per project)
mempalace-docs /workspace/my_project --dry-run
mempalace-docs /workspace/my_project
# 7. Restart any MCP-connected agent, or call mempalace_reconnect from inside one
```
### Containerized setup (devbox)
The devbox uses two named Docker volumes so these persist across container recreate:
- `devbox-palace``~/.mempalace/palace` (the palace itself)
- `devbox-data``~/.local/share/opencode` (opencode's SQLite DB)
Code at `/workspace/mempalace-toolkit` is a bind mount from the host — survives container recreate and syncs via gitea. Staging directories (`~/.cache/mempalace-{docs,session}/`) are ephemeral but cheap to rebuild.
**After container recreate**, just re-run `./install.sh` (idempotent) to relink `bin/` into the fresh `~/.local/bin/`.
---
## 5. Operational notes
### Dedup behavior
Both wrappers dedup via `mempalace mine`'s built-in key:
- `mempalace-docs`: keys on `source_file` path + `mtime` → edit a doc, it re-mines; unchanged files are skipped.
- `mempalace-session`: keys on `source_file` path alone (convos miner doesn't check mtime) → a session's JSONL filename is `<slug>_<id>.jsonl`, stable per session, so re-runs skip already-filed sessions. To force re-mining, delete the staging dir.
**Verified:** a second full `mempalace-session` run immediately after the first produces 0 new drawers. The only cost is the post-mine `repair` step (index rebuild — ~5 min on 5k drawers).
### When to re-mine
- `mempalace-docs`: after significant doc changes in a project.
- `mempalace-session`: see the full **Operational Routine** below.
### Operational Routine (the `mempalace-session` workflow)
Until opencode grows session hooks and `hooks_cli.py` grows an opencode harness (see §6), **`mempalace-session` is the entire mechanism that gets opencode conversations into the palace.** Skip it and your session history exists only inside `~/.local/share/opencode/opencode.db` — a local SQLite file that's invisible to `mempalace_search`, vulnerable to volume wipes, and lost if the devbox is replaced.
That makes the routine worth codifying:
#### Triggers (when to run)
| Trigger | What to run | Why |
|---|---|---|
| **Substantive session you want preserved past `/exit`** | `mempalace-session --session <id>` | Targeted save before destructive action; see `~/.local/share/opencode/opencode.db` `session` table for the ID. |
| **Before a container recreate** | `mempalace-session` | The opencode DB lives in a named volume (`devbox-data`) so it normally survives, but a full mine right before is cheap insurance. |
| **Fresh machine, first provisioning** | `mempalace-session --dry-run` then `mempalace-session` | Backfills the whole corpus. Expect ~20 min / 60 sessions. |
| **Periodic sweep** | `mempalace-session` | Weekly catches anything you didn't explicitly save. Dedup is free, so running more often only costs the ~5 min repair. |
| **After upstream mempalace upgrade** | `mempalace-session` + `mempalace repair` | If the miner changed normalization or chunking, re-mine ensures the palace reflects current logic. Rare. |
#### Cadence (how often)
**Default: weekly.** Dedup is free on unchanged sessions, and `wing_conversations` growth is roughly linear in user activity. Weekly is frequent enough that searches almost always include recent context, and infrequent enough that the cost is negligible.
**Daily** is fine but wasteful — you'll pay the post-mine `repair` cost seven times more often than you need. If you want daily runs, add `--no-repair` and schedule a separate weekly repair.
**Monthly** is too infrequent. You'll search for "that thing we discussed last Tuesday" and miss it.
#### Relationship to the session lifecycle
`mempalace-session` is **offline, inter-session maintenance** — it runs between agent sessions, not during them. It does not replace the in-session habits from the consumer-side `mempalace` skill:
| Habit | When | Who |
|---|---|---|
| Wake-up search (load recent diary) | Agent session start | Agent, during session |
| Wind-down diary write | Agent session end | Agent, during session |
| `mempalace-session` mine | Between sessions (manual or scheduled) | Operator or automation |
The first two are live; the third is batched. They're complementary, not alternatives. A machine doing only wake-up/wind-down keeps a diary but loses the actual conversation turns. A machine doing only `mempalace-session` captures the raw turns but not the curated summaries. Do both.
#### Automation
Pick one:
1. **systemd user timer** (recommended on modern Linux). Survives reboots, optional `Persistent=true` catch-up, logs to `journalctl`, background I/O priority. Templates in [`contrib/systemd/`](contrib/systemd/).
2. **cron** (simpler, works anywhere). Templates in [`contrib/cron/`](contrib/cron/).
3. **Manual** — run `mempalace-session` opportunistically. Fine on machines where you're in and out frequently; less fine on long-running devboxes.
Install recipes, verification commands, and uninstall steps for all three are in [`contrib/README.md`](contrib/README.md).
Quick-start (systemd user timer):
```bash
mkdir -p ~/.config/systemd/user
cp contrib/systemd/*.{service,timer} ~/.config/systemd/user/
systemctl --user daemon-reload
systemctl --user enable --now mempalace-session.timer
# Optional on headless boxes: keep timer running when logged out
sudo loginctl enable-linger "$USER"
systemctl --user list-timers mempalace-session.timer
```
Quick-start (cron):
```bash
sed "s|USER|$USER|g" contrib/cron/mempalace-session.cron \
| (crontab -l 2>/dev/null; cat) | crontab -
mkdir -p ~/.cache/mempalace-session
```
#### Verification
After any run (manual or scheduled), confirm the palace grew:
```bash
mempalace-session --dry-run # should list sessions
# Or from inside a live MCP client:
# mempalace_status — wing_conversations count
# mempalace_reconnect — refresh index after mine
```
A healthy run produces one of:
- **First run on fresh corpus**: several hundred to several thousand new drawers.
- **Incremental run**: zero to a few dozen new drawers (whatever grew since last run).
- **Rerun with no new activity**: zero new drawers, only the repair step runs.
A run that files far more drawers than expected may indicate a staging-dir wipe (forcing a full re-mine) — check `~/.cache/mempalace-session/<wing>/` modification times.
### Cost profile (reference)
Measured on a ~10-day opencode corpus of 140 sessions / 1491 messages / 4656 parts:
- Dry run: seconds.
- Full mine: **21 minutes** (38 min user CPU). Produced 2378 drawers from 62 qualifying sessions.
- Dedup re-run: mine step instant; only the repair runs (~5 min).
Scaling is roughly linear in message count. Budget ~20 minutes per 60-session batch.
### Common failure modes
| Symptom | Cause | Fix |
| ---------------------------------------------- | ----------------------------------------------------- | --------------------------------------------------------- |
| `mempalace-session: command not found` after container recreate | `~/.local/bin` wiped with container | `cd ~/mempalace-toolkit && ./install.sh` |
| Search errors "Error finding id" post-mine | Stale HNSW index | `mempalace repair --yes` + `mempalace_reconnect` from MCP |
| Drawers doubled after re-mining a project | Someone renamed the wing or ran raw `mempalace mine` alongside the wrapper | Inspect `embedding_metadata` in `chroma.sqlite3`; purge duplicates by source prefix, then `mempalace repair` |
| Sessions missing from palace | Session has fewer than `--min-messages` messages | Lower the threshold or `--session <id>` explicitly |
---
## 6. Upstream roadmap
These gaps should ideally close upstream, making the wrappers thinner or obsolete:
1. **[MemPalace PR #1213](https://github.com/MemPalace/mempalace/pull/1213)** — `exclude_patterns` in `mempalace.yaml`. When merged, `mempalace-docs` shrinks to a thin shim (or disappears) since exclude-by-extension becomes a first-class config.
2. **Opencode session hooks** — [PR #16598](https://github.com/anomalyco/opencode/pull/16598) (session.stopping), [PR #16769](https://github.com/anomalyco/opencode/pull/16769) (shutdown), [PR #15224](https://github.com/anomalyco/opencode/pull/15224) (session.start), [issue #23503](https://github.com/anomalyco/opencode/issues/23503) (session.turn.completed). When at least one merges, opencode can fire hooks mempalace can receive.
3. **Opencode harness in `hooks_cli.py`** — mempalace's hooks CLI only knows `claude-code` + `codex` today. Adding `opencode` would let the auto-save diary path work on opencode too. Pairs with #2 above.
4. **SQLite mode for `mempalace mine --mode convos`** — if upstream ever adds direct SQLite ingest for opencode, `mempalace-session` loses its reason to exist (the export-to-JSONL dance goes away).
When #1 merges, retire `mempalace-docs` to a thin shim. When #2 + #3 land together, `mempalace-session` becomes a manual-only fallback (cron / backfill) while hooks handle live saves.
---
## 7. See also
- [`README.md`](README.md) — human-facing quickstart + per-tool usage reference.
- [`AGENTS.md`](AGENTS.md) — repo conventions for AI agents modifying this codebase.
- [`SKILL.md`](SKILL.md) — agent skill (producer side), symlinked into `~/.agents/skills/opencode-mempalace-bridge/` by `install.sh`.
- `~/.agents/skills/mempalace/SKILL.md` — agent skill for the **consumer** side (searching, diary, KG). Pair with `SKILL.md` in this repo.
- [`cli_utils`](https://gitea.jordbo.se/joakimp/cli_utils) — sibling repo: shell quality-of-life tools. Origin of these wrappers before the 2026-04-30 split.