Files
mempalace-toolkit/ARCHITECTURE.md
T
Joakim Persson 954c3f2ebb Initial commit — split out from cli_utils
Producer-side MemPalace tooling: two bash wrappers that bridge opencode
session history and project documentation into the palace. Originally
developed in cli_utils (2026-04-28); split into its own repo on
2026-04-30 because the conceptual fit was weak — cli_utils is
interactive shell tooling, while this is agent memory infrastructure
with its own architecture, dependency surface, and growth trajectory.

Contents:
- bin/mempalace-docs — docs-only mining wrapper (originally a2ddcc9 in
  cli_utils), bridges the gap until MemPalace PR #1213 (exclude_patterns)
  merges upstream.
- bin/mempalace-session — opencode → palace session bridge (originally
  dacca0e in cli_utils). Reads ~/.local/share/opencode/opencode.db,
  exports each session to Claude Code JSONL, mines via
  'mempalace mine --mode convos'. Bridges the gap until opencode
  session-stopping hooks + an opencode harness in hooks_cli.py land
  upstream.
- ARCHITECTURE.md — canonical spec: architecture diagram, component
  details, setup recipe, operational notes, upstream-retirement
  roadmap. Originally a4cf314 in cli_utils.
- SKILL.md — companion agent skill (producer side). Pairs with the
  consumer-side mempalace skill. Symlinked into
  ~/.agents/skills/opencode-mempalace-bridge/ by install.sh.
- install.sh — idempotent installer, also handles --uninstall.
- AGENTS.md — repo conventions.

History of the individual files is not preserved in this split; see
cli_utils (gitea.jordbo.se/joakimp/cli_utils) commits a2ddcc9, dacca0e,
and a4cf314 for the original authorship context.
2026-04-30 05:30:04 +00:00

15 KiB

MemPalace Feeding Architecture

This repository wires opencode and arbitrary project directories into MemPalace via two thin wrappers in bin/. This document explains why they exist and how they fit together.

Audience: someone setting up a new machine (or reviewing what's already set up) and asking "how does the palace actually get fed?". Pairs with the mempalace agent skill, which covers the consumer side (searching, diary, KG). This document covers the producer side.


1. The problem

MemPalace is a persistent memory layer for AI agents — vector search over drawers (chunks of verbatim content), a knowledge graph, and per-agent diaries, all behind an MCP server. To be useful it has to be fed: project docs, conversation transcripts, session summaries.

The stock mempalace CLI has two feeders:

Feeder What it ingests Gap
mempalace mine (default mode) Any "readable" file in a directory (code + docs + misc) Mines source code indiscriminately → embedding index floods with low-signal __init__ fragments.
mempalace mine --mode convos Claude Code JSONL, Claude.ai JSON, ChatGPT JSON, Slack, Codex JSONL No opencode support. No SQLite support. Opencode persists its history in SQLite, not JSONL.

And one auto-save path:

Feeder Harnesses supported Gap
hooks_cli.py (session-stop hooks) claude-code, codex No opencode harness → /exit mid-session leaves no diary entry behind.

So on a machine using opencode + the "docs-first palace hygiene" policy, three gaps bite:

  1. Mining a project floods the palace with source code we don't want.
  2. Opencode session history is trapped in SQLite, invisible to mine --mode convos.
  3. There's no auto-save on session stop — any persistence is best-effort heuristic.

The two wrappers in bin/ close gaps 1 and 2. Gap 3 is upstream work (see §6).


2. The architecture

  Project dirs (/workspace/*)                Opencode SQLite DB
  ├── *.md                                   ~/.local/share/opencode/opencode.db
  ├── *.yaml                                 ├── session  (id, title, directory, time_created/updated)
  ├── Dockerfile                             ├── message  (session_id, data JSON w/ role)
  └── …                                      └── part     (message_id, data JSON w/ type: text|tool|…)
        │                                          │
        │                                          │
  ┌─────▼──────────┐                          ┌────▼──────────────┐
  │ mempalace-docs │                          │ mempalace-session │
  │ (bin/)         │                          │ (bin/)            │
  │                │                          │                   │
  │ stage docs     │                          │ export each       │
  │ only via cp -p │                          │ session as Claude │
  │ to cache dir   │                          │ Code JSONL to     │
  │                │                          │ cache dir         │
  └─────┬──────────┘                          └────┬──────────────┘
        │                                          │
        │  ~/.cache/mempalace-docs/<wing>/          │  ~/.cache/mempalace-session/<wing>/
        │                                          │
  ┌─────▼──────────┐                          ┌────▼──────────────┐
  │ mempalace mine │                          │ mempalace mine    │
  │                │                          │ --mode convos     │
  └─────┬──────────┘                          └────┬──────────────┘
        │                                          │
        └───────────────────┬──────────────────────┘
                            │
                     ┌──────▼─────────┐
                     │ ChromaDB       │
                     │ ~/.mempalace/  │
                     │   palace/      │
                     └──────┬─────────┘
                            │
                     MCP server (mempalace_*)
                            │
                     AI agents (opencode, claude code, codex, …)

Shared idiom: stage-to-cache-then-mine.

Neither wrapper reimplements the mempalace miner. They each:

  1. Curate input (filter / transform / rename).
  2. Write it to a deterministic path under ~/.cache/…/<wing>/ with mtime preserved (via cp -p or explicit os.utime).
  3. Delegate actual embedding + filing to mempalace mine, which already dedups on source_file path.

This keeps the wrappers thin. A third wrapper following the same idiom would justify factoring a shared helper library — two does not.


3. Component details

bin/mempalace-docs (268 lines) — docs-first mining

Input: a project directory. Output: palace drawers in wing_<directory-name> (or --wing override), only from documentation-class files.

What it files: *.md, *.mdx, *.rst, *.txt, *.yml, *.yaml, *.toml, selective *.json, shell scripts, Dockerfiles, Makefiles, license/notice files.

What it drops: source code (.py, .ts, .go, .rs, …), lockfiles, .git, .venv, node_modules, __pycache__, build output.

Implementation notes:

  • Reads mempalace.yaml (if present) to discover the actual wing name — avoids drift if someone renamed the wing after init.
  • Uses cp -p (not symlinks) because the miner skips symlinks (miner.py line 828).
  • Auto-purges pre-existing drawers whose source_file is under the workspace path before re-mining, to prevent doubling on re-runs.
  • Upstream PR #1213 will add exclude_patterns to mempalace.yaml — when merged, this wrapper should shrink to a thin shim.

bin/mempalace-session (349 lines) — opencode → palace bridge

Input: the opencode SQLite DB (default ~/.local/share/opencode/opencode.db). Output: palace drawers in wing_conversations (or --wing override), one JSONL file per qualifying session.

Transform pipeline, per session:

  1. Read session row (id, title, directory, time_created, time_updated).
  2. Inject synthetic header as first user turn: [session: <title> | <directory> | <YYYY-MM-DD>] → makes title/dir/date semantically searchable.
  3. For each message ordered by id:
    • Read JSON data → get role (user / assistant).
    • For each part under the message, read JSON data → dispatch on type:
      • text → text block.
      • tool → Claude Code tool_use block + deferred tool_result as synthetic human message (the mempalace normalizer folds it back into the assistant turn via its is_tool_only branch).
      • step-start / step-finish → dropped as noise.
      • reasoning → kept, prefixed with [reasoning].
  4. Serialize as Claude Code JSONL ({"type": "user"|"assistant", "message": {"content": [...]}}) — the one convos format the miner already understands.
  5. Stage at ~/.cache/mempalace-session/<wing>/<slug>_<id>.jsonl with mtime = session.time_updated (deterministic, stable under dedup).

Filters:

  • --min-messages N (default 3) — drops throwaway /exit'd sessions that would flood the palace.
  • --since YYYY-MM-DD — incremental catch-up.
  • --session <id> — one-shot mode.

Then: invokes mempalace mine --mode convos against the cache dir, followed by mempalace repair (unless --no-repair).


4. Setup recipe (new machine)

Assumes: opencode already installed, ~/.local/share/opencode/opencode.db exists, mempalace CLI installed (v3.3.3+).

# 1. Clone mempalace-toolkit (holds the two wrappers in bin/)
git clone ssh://git@gitea.jordbo.se:2222/joakimp/mempalace-toolkit.git ~/mempalace-toolkit
cd ~/mempalace-toolkit

# 2. Install (symlinks bin/* into ~/.local/bin, adds loader to rc file)
./install.sh

# 3. Ensure ~/.local/bin is on PATH (installer warns if not)
export PATH="$HOME/.local/bin:$PATH"

# 4. Initialize palace if needed (one-time, platform-wide)
mempalace init --yes

# 5. Mine opencode history into the palace
mempalace-session --dry-run              # preview scope
mempalace-session                        # do it for real (~20 min for ~60 sessions)

# 6. Mine project docs (per project)
mempalace-docs /workspace/my_project --dry-run
mempalace-docs /workspace/my_project

# 7. Restart any MCP-connected agent, or call mempalace_reconnect from inside one

Containerized setup (devbox)

The devbox uses two named Docker volumes so these persist across container recreate:

  • devbox-palace~/.mempalace/palace (the palace itself)
  • devbox-data~/.local/share/opencode (opencode's SQLite DB)

Code at /workspace/mempalace-toolkit is a bind mount from the host — survives container recreate and syncs via gitea. Staging directories (~/.cache/mempalace-{docs,session}/) are ephemeral but cheap to rebuild.

After container recreate, just re-run ./install.sh (idempotent) to relink bin/ into the fresh ~/.local/bin/.


5. Operational notes

Dedup behavior

Both wrappers dedup via mempalace mine's built-in key:

  • mempalace-docs: keys on source_file path + mtime → edit a doc, it re-mines; unchanged files are skipped.
  • mempalace-session: keys on source_file path alone (convos miner doesn't check mtime) → a session's JSONL filename is <slug>_<id>.jsonl, stable per session, so re-runs skip already-filed sessions. To force re-mining, delete the staging dir.

Verified: a second full mempalace-session run immediately after the first produces 0 new drawers. The only cost is the post-mine repair step (index rebuild — ~5 min on 5k drawers).

When to re-mine

  • mempalace-docs: after significant doc changes in a project.
  • mempalace-session: opportunistically. Every few days catches new opencode sessions. Or wire to cron / systemd timer for true auto-save coverage (not yet done).

Cost profile (reference)

Measured on a ~10-day opencode corpus of 140 sessions / 1491 messages / 4656 parts:

  • Dry run: seconds.
  • Full mine: 21 minutes (38 min user CPU). Produced 2378 drawers from 62 qualifying sessions.
  • Dedup re-run: mine step instant; only the repair runs (~5 min).

Scaling is roughly linear in message count. Budget ~20 minutes per 60-session batch.

Common failure modes

Symptom Cause Fix
mempalace-session: command not found after container recreate ~/.local/bin wiped with container cd ~/mempalace-toolkit && ./install.sh
Search errors "Error finding id" post-mine Stale HNSW index mempalace repair --yes + mempalace_reconnect from MCP
Drawers doubled after re-mining a project Someone renamed the wing or ran raw mempalace mine alongside the wrapper Inspect embedding_metadata in chroma.sqlite3; purge duplicates by source prefix, then mempalace repair
Sessions missing from palace Session has fewer than --min-messages messages Lower the threshold or --session <id> explicitly

6. Upstream roadmap

These gaps should ideally close upstream, making the wrappers thinner or obsolete:

  1. MemPalace PR #1213exclude_patterns in mempalace.yaml. When merged, mempalace-docs shrinks to a thin shim (or disappears) since exclude-by-extension becomes a first-class config.
  2. Opencode session hooksPR #16598 (session.stopping), PR #16769 (shutdown), PR #15224 (session.start), issue #23503 (session.turn.completed). When at least one merges, opencode can fire hooks mempalace can receive.
  3. Opencode harness in hooks_cli.py — mempalace's hooks CLI only knows claude-code + codex today. Adding opencode would let the auto-save diary path work on opencode too. Pairs with #2 above.
  4. SQLite mode for mempalace mine --mode convos — if upstream ever adds direct SQLite ingest for opencode, mempalace-session loses its reason to exist (the export-to-JSONL dance goes away).

When #1 merges, retire mempalace-docs to a thin shim. When #2 + #3 land together, mempalace-session becomes a manual-only fallback (cron / backfill) while hooks handle live saves.


7. See also

  • README.md — human-facing quickstart + per-tool usage reference.
  • AGENTS.md — repo conventions for AI agents modifying this codebase.
  • SKILL.md — agent skill (producer side), symlinked into ~/.agents/skills/opencode-mempalace-bridge/ by install.sh.
  • ~/.agents/skills/mempalace/SKILL.md — agent skill for the consumer side (searching, diary, KG). Pair with SKILL.md in this repo.
  • cli_utils — sibling repo: shell quality-of-life tools. Origin of these wrappers before the 2026-04-30 split.