commit 954c3f2ebb6f78406eaf501846360131e9e7e2a0 Author: Joakim Persson Date: Thu Apr 30 05:30:04 2026 +0000 Initial commit — split out from cli_utils Producer-side MemPalace tooling: two bash wrappers that bridge opencode session history and project documentation into the palace. Originally developed in cli_utils (2026-04-28); split into its own repo on 2026-04-30 because the conceptual fit was weak — cli_utils is interactive shell tooling, while this is agent memory infrastructure with its own architecture, dependency surface, and growth trajectory. Contents: - bin/mempalace-docs — docs-only mining wrapper (originally a2ddcc9 in cli_utils), bridges the gap until MemPalace PR #1213 (exclude_patterns) merges upstream. - bin/mempalace-session — opencode → palace session bridge (originally dacca0e in cli_utils). Reads ~/.local/share/opencode/opencode.db, exports each session to Claude Code JSONL, mines via 'mempalace mine --mode convos'. Bridges the gap until opencode session-stopping hooks + an opencode harness in hooks_cli.py land upstream. - ARCHITECTURE.md — canonical spec: architecture diagram, component details, setup recipe, operational notes, upstream-retirement roadmap. Originally a4cf314 in cli_utils. - SKILL.md — companion agent skill (producer side). Pairs with the consumer-side mempalace skill. Symlinked into ~/.agents/skills/opencode-mempalace-bridge/ by install.sh. - install.sh — idempotent installer, also handles --uninstall. - AGENTS.md — repo conventions. History of the individual files is not preserved in this split; see cli_utils (gitea.jordbo.se/joakimp/cli_utils) commits a2ddcc9, dacca0e, and a4cf314 for the original authorship context. diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..b7e0194 --- /dev/null +++ b/.gitignore @@ -0,0 +1,15 @@ +# Per-project mempalace state — shouldn't live in the tool repo +mempalace.yaml +entities.json +.mempalace/ + +# Editor / OS noise +*.swp +*.swo +.DS_Store + +# Local caches +.mypy_cache/ +.ruff_cache/ +__pycache__/ +*.pyc diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..c018e59 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,79 @@ +# AGENTS.md + +## What this is + +Producer-side tooling for [MemPalace](https://github.com/MemPalace/mempalace). Two thin wrappers in `bin/` plus the companion agent skill. Pairs with the consumer-side `mempalace` skill. + +Read [`ARCHITECTURE.md`](ARCHITECTURE.md) first — it's the canonical spec for what this repo does and why. + +## Structure + +``` +install.sh # Idempotent installer — symlinks bin/* into ~/.local/bin + # and SKILL.md into ~/.agents/skills/opencode-mempalace-bridge/ +ARCHITECTURE.md # Canonical spec: diagrams, setup recipe, ops notes, upstream roadmap +README.md # Human-facing quickstart + per-tool usage reference +SKILL.md # Agent skill (symlinked into ~/.agents/skills/ on install) +bin/ + mempalace-docs # Docs-only MemPalace miner (bash wrapper) + mempalace-session # Opencode session → MemPalace bridge (bash + inline Python) +``` + +## Conventions + +- **Standalone executables** in `bin/` with `#!/usr/bin/env bash` shebang, no extension, `chmod +x`. Must work in non-interactive contexts (agent processes, cron, CI). +- **Thin wrappers only.** Neither tool reimplements the mempalace miner. Both follow the **stage-to-cache-then-mine** idiom: curate input to `~/.cache/…//`, then delegate to `mempalace mine`. +- **Idempotent + dry-runnable.** Every tool supports `--dry-run`. Second invocation on unchanged input is a no-op (dedup via `source_file` path, optionally + `mtime`). +- **No external Python deps.** Stdlib only (`sqlite3`, `json`, `pathlib`). Inline in the bash wrapper via heredoc. +- Argument parsing: `--help`/`-h` first, then mode flags, then positional args. +- Comment sections use `# ── Section Name ──────` style (matches sibling `cli_utils` repo). + +## Adding a new wrapper + +A third wrapper would justify factoring a shared helper library. Until then, copy the pattern from `mempalace-session` (richest example): + +1. Create `bin/` with `#!/usr/bin/env bash` + `chmod +x`. +2. Implement `--help`, `--dry-run`, `--no-repair` flags. +3. Stage to `~/.cache///` with deterministic filenames. +4. Invoke `mempalace mine ...` (choose `--mode convos` if input is chat-like). +5. End with `mempalace repair` unless `--no-repair`. +6. Update `README.md` with usage + rationale. +7. Update `install.sh`? No — `bin/*` is auto-linked. +8. Update `ARCHITECTURE.md` if the wrapper fills a new architectural gap. +9. Update `SKILL.md` if agents should know when to invoke it. + +## Testing + +Manual only. Integration-shaped: + +```bash +# Smoke test — does it parse args and list what would happen? +./bin/mempalace-session --help +./bin/mempalace-session --dry-run + +# Real test on a single session (safe, deterministic) +./bin/mempalace-session --session ses_ --dry-run +./bin/mempalace-session --session ses_ # file into palace +mempalace_search "a phrase from that session" # verify visibility +./bin/mempalace-session --session ses_ # re-run → should skip +``` + +For `mempalace-docs`, test on a small repo (e.g. this one) first: + +```bash +./bin/mempalace-docs "$PWD" --dry-run +``` + +## Gotchas + +- `install.sh` is idempotent but interactive — use `--yes` in non-interactive contexts. +- `~/.local/bin` must be on `$PATH`. The installer warns if not. +- The companion skill lives at `~/.agents/skills/opencode-mempalace-bridge/SKILL.md` and is a **symlink into this repo**. Editing that file edits `SKILL.md` here. To propagate to Claude Code / Kiro, run `agents-sync` from [`cli_utils`](https://gitea.jordbo.se/joakimp/cli_utils). +- The opencode DB path defaults to `~/.local/share/opencode/opencode.db`. Override via `$OPENCODE_DB` or `--db`. +- The mempalace miner **skips symlinks** (as of v3.3.3 — `miner.py` line ~828). That's why the wrappers use `cp -p` / explicit file writes for staging, not symlinks. +- The convos miner dedups on `source_file` path only (no mtime check). Staging filenames must be stable per session; deleting a staged JSONL forces a re-mine. +- The docs miner dedups on `source_file` path + `mtime`. That's why staging uses `cp -p` (preserves mtime). + +## History + +Split out from [`cli_utils`](https://gitea.jordbo.se/joakimp/cli_utils) on 2026-04-30. The wrappers originated there but the conceptual fit was weak (`cli_utils` is interactive shell tools; these are agent memory infrastructure). Some older diary entries and KG facts in the palace reference the original paths. diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md new file mode 100644 index 0000000..f3a27d6 --- /dev/null +++ b/ARCHITECTURE.md @@ -0,0 +1,233 @@ +# MemPalace Feeding Architecture + +This repository wires [opencode](https://github.com/anomalyco/opencode) and arbitrary project directories into [MemPalace](https://github.com/MemPalace/mempalace) via two thin wrappers in `bin/`. This document explains why they exist and how they fit together. + +**Audience:** someone setting up a new machine (or reviewing what's already set up) and asking "how does the palace actually get fed?". Pairs with the `mempalace` agent skill, which covers the *consumer* side (searching, diary, KG). This document covers the *producer* side. + +--- + +## 1. The problem + +MemPalace is a persistent memory layer for AI agents — vector search over drawers (chunks of verbatim content), a knowledge graph, and per-agent diaries, all behind an MCP server. To be useful it has to be *fed*: project docs, conversation transcripts, session summaries. + +The stock mempalace CLI has two feeders: + +| Feeder | What it ingests | Gap | +| ------------------------------------- | -------------------------------------------------------- | ------------------------------------------------------------------------------------------------------ | +| `mempalace mine` (default mode) | Any "readable" file in a directory (code + docs + misc) | Mines source code indiscriminately → embedding index floods with low-signal `__init__` fragments. | +| `mempalace mine --mode convos` | Claude Code JSONL, Claude.ai JSON, ChatGPT JSON, Slack, Codex JSONL | No opencode support. No SQLite support. Opencode persists its history in SQLite, not JSONL. | + +And one auto-save path: + +| Feeder | Harnesses supported | Gap | +| ------------------------------------- | -------------------------------------------------------- | ------------------------------------------------------------------------------------------------------ | +| `hooks_cli.py` (session-stop hooks) | `claude-code`, `codex` | No `opencode` harness → `/exit` mid-session leaves no diary entry behind. | + +So on a machine using opencode + the "docs-first palace hygiene" policy, three gaps bite: + +1. Mining a project floods the palace with source code we don't want. +2. Opencode session history is trapped in SQLite, invisible to `mine --mode convos`. +3. There's no auto-save on session stop — any persistence is best-effort heuristic. + +The two wrappers in `bin/` close gaps **1** and **2**. Gap **3** is upstream work (see §6). + +--- + +## 2. The architecture + +``` + Project dirs (/workspace/*) Opencode SQLite DB + ├── *.md ~/.local/share/opencode/opencode.db + ├── *.yaml ├── session (id, title, directory, time_created/updated) + ├── Dockerfile ├── message (session_id, data JSON w/ role) + └── … └── part (message_id, data JSON w/ type: text|tool|…) + │ │ + │ │ + ┌─────▼──────────┐ ┌────▼──────────────┐ + │ mempalace-docs │ │ mempalace-session │ + │ (bin/) │ │ (bin/) │ + │ │ │ │ + │ stage docs │ │ export each │ + │ only via cp -p │ │ session as Claude │ + │ to cache dir │ │ Code JSONL to │ + │ │ │ cache dir │ + └─────┬──────────┘ └────┬──────────────┘ + │ │ + │ ~/.cache/mempalace-docs// │ ~/.cache/mempalace-session// + │ │ + ┌─────▼──────────┐ ┌────▼──────────────┐ + │ mempalace mine │ │ mempalace mine │ + │ │ │ --mode convos │ + └─────┬──────────┘ └────┬──────────────┘ + │ │ + └───────────────────┬──────────────────────┘ + │ + ┌──────▼─────────┐ + │ ChromaDB │ + │ ~/.mempalace/ │ + │ palace/ │ + └──────┬─────────┘ + │ + MCP server (mempalace_*) + │ + AI agents (opencode, claude code, codex, …) +``` + +**Shared idiom:** *stage-to-cache-then-mine*. + +Neither wrapper reimplements the mempalace miner. They each: + +1. Curate input (filter / transform / rename). +2. Write it to a deterministic path under `~/.cache/…//` with `mtime` preserved (via `cp -p` or explicit `os.utime`). +3. Delegate actual embedding + filing to `mempalace mine`, which already dedups on `source_file` path. + +This keeps the wrappers thin. A third wrapper following the same idiom would justify factoring a shared helper library — two does not. + +--- + +## 3. Component details + +### `bin/mempalace-docs` (268 lines) — docs-first mining + +**Input:** a project directory. +**Output:** palace drawers in `wing_` (or `--wing` override), only from documentation-class files. + +What it files: `*.md`, `*.mdx`, `*.rst`, `*.txt`, `*.yml`, `*.yaml`, `*.toml`, selective `*.json`, shell scripts, Dockerfiles, Makefiles, license/notice files. + +What it drops: source code (`.py`, `.ts`, `.go`, `.rs`, …), lockfiles, `.git`, `.venv`, `node_modules`, `__pycache__`, build output. + +**Implementation notes:** + +- Reads `mempalace.yaml` (if present) to discover the actual wing name — avoids drift if someone renamed the wing after init. +- Uses `cp -p` (not symlinks) because the miner skips symlinks (`miner.py` line 828). +- Auto-purges pre-existing drawers whose `source_file` is under the workspace path before re-mining, to prevent doubling on re-runs. +- Upstream [PR #1213](https://github.com/MemPalace/mempalace/pull/1213) will add `exclude_patterns` to `mempalace.yaml` — when merged, this wrapper should shrink to a thin shim. + +### `bin/mempalace-session` (349 lines) — opencode → palace bridge + +**Input:** the opencode SQLite DB (default `~/.local/share/opencode/opencode.db`). +**Output:** palace drawers in `wing_conversations` (or `--wing` override), one JSONL file per qualifying session. + +**Transform pipeline, per session:** + +1. Read `session` row (`id`, `title`, `directory`, `time_created`, `time_updated`). +2. Inject synthetic header as first user turn: `[session: | <directory> | <YYYY-MM-DD>]` → makes title/dir/date semantically searchable. +3. For each `message` ordered by `id`: + - Read JSON `data` → get `role` (`user` / `assistant`). + - For each `part` under the message, read JSON `data` → dispatch on `type`: + - `text` → text block. + - `tool` → Claude Code `tool_use` block + deferred `tool_result` as synthetic human message (the mempalace normalizer folds it back into the assistant turn via its `is_tool_only` branch). + - `step-start` / `step-finish` → dropped as noise. + - `reasoning` → kept, prefixed with `[reasoning]`. +4. Serialize as Claude Code JSONL (`{"type": "user"|"assistant", "message": {"content": [...]}}`) — the one convos format the miner already understands. +5. Stage at `~/.cache/mempalace-session/<wing>/<slug>_<id>.jsonl` with `mtime` = `session.time_updated` (deterministic, stable under dedup). + +**Filters:** + +- `--min-messages N` (default 3) — drops throwaway `/exit`'d sessions that would flood the palace. +- `--since YYYY-MM-DD` — incremental catch-up. +- `--session <id>` — one-shot mode. + +**Then:** invokes `mempalace mine --mode convos` against the cache dir, followed by `mempalace repair` (unless `--no-repair`). + +--- + +## 4. Setup recipe (new machine) + +Assumes: opencode already installed, `~/.local/share/opencode/opencode.db` exists, `mempalace` CLI installed (v3.3.3+). + +```bash +# 1. Clone mempalace-toolkit (holds the two wrappers in bin/) +git clone ssh://git@gitea.jordbo.se:2222/joakimp/mempalace-toolkit.git ~/mempalace-toolkit +cd ~/mempalace-toolkit + +# 2. Install (symlinks bin/* into ~/.local/bin, adds loader to rc file) +./install.sh + +# 3. Ensure ~/.local/bin is on PATH (installer warns if not) +export PATH="$HOME/.local/bin:$PATH" + +# 4. Initialize palace if needed (one-time, platform-wide) +mempalace init --yes + +# 5. Mine opencode history into the palace +mempalace-session --dry-run # preview scope +mempalace-session # do it for real (~20 min for ~60 sessions) + +# 6. Mine project docs (per project) +mempalace-docs /workspace/my_project --dry-run +mempalace-docs /workspace/my_project + +# 7. Restart any MCP-connected agent, or call mempalace_reconnect from inside one +``` + +### Containerized setup (devbox) + +The devbox uses two named Docker volumes so these persist across container recreate: + +- `devbox-palace` → `~/.mempalace/palace` (the palace itself) +- `devbox-data` → `~/.local/share/opencode` (opencode's SQLite DB) + +Code at `/workspace/mempalace-toolkit` is a bind mount from the host — survives container recreate and syncs via gitea. Staging directories (`~/.cache/mempalace-{docs,session}/`) are ephemeral but cheap to rebuild. + +**After container recreate**, just re-run `./install.sh` (idempotent) to relink `bin/` into the fresh `~/.local/bin/`. + +--- + +## 5. Operational notes + +### Dedup behavior + +Both wrappers dedup via `mempalace mine`'s built-in key: + +- `mempalace-docs`: keys on `source_file` path + `mtime` → edit a doc, it re-mines; unchanged files are skipped. +- `mempalace-session`: keys on `source_file` path alone (convos miner doesn't check mtime) → a session's JSONL filename is `<slug>_<id>.jsonl`, stable per session, so re-runs skip already-filed sessions. To force re-mining, delete the staging dir. + +**Verified:** a second full `mempalace-session` run immediately after the first produces 0 new drawers. The only cost is the post-mine `repair` step (index rebuild — ~5 min on 5k drawers). + +### When to re-mine + +- `mempalace-docs`: after significant doc changes in a project. +- `mempalace-session`: opportunistically. Every few days catches new opencode sessions. Or wire to cron / systemd timer for true auto-save coverage (not yet done). + +### Cost profile (reference) + +Measured on a ~10-day opencode corpus of 140 sessions / 1491 messages / 4656 parts: + +- Dry run: seconds. +- Full mine: **21 minutes** (38 min user CPU). Produced 2378 drawers from 62 qualifying sessions. +- Dedup re-run: mine step instant; only the repair runs (~5 min). + +Scaling is roughly linear in message count. Budget ~20 minutes per 60-session batch. + +### Common failure modes + +| Symptom | Cause | Fix | +| ---------------------------------------------- | ----------------------------------------------------- | --------------------------------------------------------- | +| `mempalace-session: command not found` after container recreate | `~/.local/bin` wiped with container | `cd ~/mempalace-toolkit && ./install.sh` | +| Search errors "Error finding id" post-mine | Stale HNSW index | `mempalace repair --yes` + `mempalace_reconnect` from MCP | +| Drawers doubled after re-mining a project | Someone renamed the wing or ran raw `mempalace mine` alongside the wrapper | Inspect `embedding_metadata` in `chroma.sqlite3`; purge duplicates by source prefix, then `mempalace repair` | +| Sessions missing from palace | Session has fewer than `--min-messages` messages | Lower the threshold or `--session <id>` explicitly | + +--- + +## 6. Upstream roadmap + +These gaps should ideally close upstream, making the wrappers thinner or obsolete: + +1. **[MemPalace PR #1213](https://github.com/MemPalace/mempalace/pull/1213)** — `exclude_patterns` in `mempalace.yaml`. When merged, `mempalace-docs` shrinks to a thin shim (or disappears) since exclude-by-extension becomes a first-class config. +2. **Opencode session hooks** — [PR #16598](https://github.com/anomalyco/opencode/pull/16598) (session.stopping), [PR #16769](https://github.com/anomalyco/opencode/pull/16769) (shutdown), [PR #15224](https://github.com/anomalyco/opencode/pull/15224) (session.start), [issue #23503](https://github.com/anomalyco/opencode/issues/23503) (session.turn.completed). When at least one merges, opencode can fire hooks mempalace can receive. +3. **Opencode harness in `hooks_cli.py`** — mempalace's hooks CLI only knows `claude-code` + `codex` today. Adding `opencode` would let the auto-save diary path work on opencode too. Pairs with #2 above. +4. **SQLite mode for `mempalace mine --mode convos`** — if upstream ever adds direct SQLite ingest for opencode, `mempalace-session` loses its reason to exist (the export-to-JSONL dance goes away). + +When #1 merges, retire `mempalace-docs` to a thin shim. When #2 + #3 land together, `mempalace-session` becomes a manual-only fallback (cron / backfill) while hooks handle live saves. + +--- + +## 7. See also + +- [`README.md`](README.md) — human-facing quickstart + per-tool usage reference. +- [`AGENTS.md`](AGENTS.md) — repo conventions for AI agents modifying this codebase. +- [`SKILL.md`](SKILL.md) — agent skill (producer side), symlinked into `~/.agents/skills/opencode-mempalace-bridge/` by `install.sh`. +- `~/.agents/skills/mempalace/SKILL.md` — agent skill for the **consumer** side (searching, diary, KG). Pair with `SKILL.md` in this repo. +- [`cli_utils`](https://gitea.jordbo.se/joakimp/cli_utils) — sibling repo: shell quality-of-life tools. Origin of these wrappers before the 2026-04-30 split. diff --git a/README.md b/README.md new file mode 100644 index 0000000..5f451a9 --- /dev/null +++ b/README.md @@ -0,0 +1,153 @@ +# mempalace-toolkit + +Producer-side tooling for [MemPalace](https://github.com/MemPalace/mempalace) — bridges that feed opencode session history and project documentation into the palace. Pairs with the consumer-side [`mempalace` agent skill](https://github.com/MemPalace/mempalace). + +**What this repo contains:** + +- `bin/mempalace-session` — exports [opencode](https://github.com/anomalyco/opencode) session history from its local SQLite DB to Claude Code JSONL, then mines it via `mempalace mine --mode convos`. +- `bin/mempalace-docs` — mines project directories into MemPalace while excluding source code, keeping the palace signal-dense. +- [`ARCHITECTURE.md`](ARCHITECTURE.md) — **canonical spec**: architecture diagram, component details, setup recipe, operational notes, upstream-retirement roadmap. +- [`SKILL.md`](SKILL.md) — the companion agent skill, symlinked into `~/.agents/skills/opencode-mempalace-bridge/` on install. + +**If you're just trying to get this working on a new machine → jump to [Setup](#setup).** +**If you want the full architecture story → read [`ARCHITECTURE.md`](ARCHITECTURE.md).** + +--- + +## Why this exists + +MemPalace is the agent memory layer. Its stock CLI has two gaps that bite on a machine running opencode with a docs-first palace policy: + +1. **`mempalace mine` floods the palace with source code** — every `__init__` fragment, every generated file, hundreds of low-signal drawers per project. `mempalace-docs` fixes this by staging only documentation-class files (`*.md`, `*.yml`, `Dockerfile`, etc.) before mining. +2. **`mempalace mine --mode convos` can't read opencode's SQLite DB** — only file-based chat formats (Claude Code JSONL, Claude.ai JSON, ChatGPT, Slack, Codex). Opencode persists every turn in `~/.local/share/opencode/opencode.db` and has no upstream hook into mempalace's auto-save. `mempalace-session` fixes this by exporting each session to Claude Code JSONL before mining. + +Both wrappers follow the same **stage-to-cache-then-mine** idiom. Neither reimplements the miner; they curate input and delegate. + +Long-term, both should retire: +- `mempalace-docs` → retires when [MemPalace PR #1213](https://github.com/MemPalace/mempalace/pull/1213) (`exclude_patterns` in `mempalace.yaml`) merges. +- `mempalace-session` → retires when opencode session-stopping hooks ([PR #16598](https://github.com/anomalyco/opencode/pull/16598) et al.) merge **and** `hooks_cli.py` gains an `opencode` harness. Until both land, this repo fills the gap. + +See [`ARCHITECTURE.md`](ARCHITECTURE.md) §6 for the full upstream roadmap. + +--- + +## Setup + +### Prerequisites + +- [MemPalace](https://github.com/MemPalace/mempalace) CLI v3.3.3+ +- Python 3 (stdlib `sqlite3` only — no extra deps) +- [opencode](https://github.com/anomalyco/opencode) with an active session DB at `~/.local/share/opencode/opencode.db` *(only needed for `mempalace-session`)* + +### Install + +```bash +git clone ssh://git@gitea.jordbo.se:2222/joakimp/mempalace-toolkit.git ~/mempalace-toolkit +cd ~/mempalace-toolkit +./install.sh +``` + +The installer symlinks `bin/*` into `~/.local/bin/` and optionally installs the agent skill into `~/.agents/skills/opencode-mempalace-bridge/`. + +Ensure `~/.local/bin` is on `$PATH`: + +```bash +export PATH="$HOME/.local/bin:$PATH" +``` + +### First mine + +```bash +# One-time palace init (if not done) +mempalace init --yes + +# Mine opencode session history into wing_conversations +mempalace-session --dry-run # preview qualifying sessions +mempalace-session # do it (~20 min per 60 sessions) + +# Mine a project (docs only) +mempalace-docs /workspace/my_project --dry-run +mempalace-docs /workspace/my_project +``` + +### Containerized (devbox) notes + +On a Docker-based devbox, the palace and opencode DB should live on named volumes so they survive container recreate: + +- `devbox-palace` → `~/.mempalace/palace` +- `devbox-data` → `~/.local/share/opencode` + +This repo is typically bind-mounted from the host, so code survives recreate and syncs via git. After a container recreate, `~/.local/bin` is wiped — just re-run `./install.sh` (idempotent) to relink. + +--- + +## `mempalace-docs` + +Docs-only MemPalace miner. Stages documentation files into a cache dir and runs `mempalace mine` against the cache — never against the raw project dir. + +```bash +mempalace-docs <directory> # mine with wing = dirname +mempalace-docs <directory> --wing my_project # override wing name +mempalace-docs <directory> --agent alice # record agent on drawers +mempalace-docs <directory> --dry-run # list files, don't file +mempalace-docs <directory> --no-repair # skip post-mine repair +mempalace-docs --help +``` + +**What gets mined:** `*.md`, `*.mdx`, `*.rst`, `*.txt`, `*.yml`, `*.yaml`, `*.toml`, `*.json`, `*.sh`, `*.bash`, `*.zsh`, `*.fish`, `Dockerfile*`, `Makefile*`, `*.conf`, `*.cfg`, `*.ini`, `LICENSE*`, `COPYING*`, `NOTICE*`. + +**What gets skipped:** `.py`, `.ts`, `.tsx`, `.js`, `.jsx`, `.go`, `.rs`, `.java`, `.cpp`, `.c`, `.rb`, `.kt`, `.swift`, build output directories (`.git`, `.venv`, `node_modules`, `__pycache__`, `.mypy_cache`, `.pytest_cache`, `.ruff_cache`, `dist`, `build`, `.next`, `target`, `coverage`), lockfiles. + +**Rationale:** the palace is for *context and intent*. Agents already have `grep`/`glob`/`Read` for code — always authoritative, never stale. Embedding source code creates a parallel, lossier, drift-prone copy that pollutes semantic search for years. + +--- + +## `mempalace-session` + +Opencode → MemPalace session bridge. Reads `~/.local/share/opencode/opencode.db`, transforms each session into Claude Code JSONL, and files via `mempalace mine --mode convos`. + +```bash +mempalace-session # mine all sessions (≥3 msgs) +mempalace-session --wing my_convos # custom wing (default: wing_conversations) +mempalace-session --session ses_abc123 # one session only +mempalace-session --since 2026-04-01 # only sessions updated on/after date +mempalace-session --min-messages 6 # stricter short-session filter +mempalace-session --db /custom/path/opencode.db # non-default DB location +mempalace-session --dry-run # export + list, skip mine +mempalace-session --no-repair # skip post-mine index repair +mempalace-session --help +``` + +**What gets exported per session:** + +- Synthetic header injected as the first user turn (`[session: <title> | <dir> | <date>]`) so the palace can find sessions by topic, not just by ID. +- Each message → Claude Code JSONL line (`{"type": "user"|"assistant", "message": {"content": ...}}`). +- Tool calls → `tool_use` blocks. Known tools (`Bash`, `Read`, `Grep`, `Edit`, `Write`) get formatted summaries; unknown tools are JSON-serialized. +- Tool outputs → `tool_result` blocks in a follow-up human message, folded back into the assistant turn by the mempalace normalizer. +- `step-start` / `step-finish` parts are dropped as noise. `reasoning` parts are kept with a `[reasoning]` prefix. + +**Dedup:** staging at `~/.cache/mempalace-session/<wing>/` with deterministic per-session filenames (`<slug>_<id>.jsonl`). The convos miner keys on `source_file`, so re-runs skip unchanged sessions. To force re-mining a session, delete its JSONL from the staging dir. + +**Filter:** sessions with fewer than `--min-messages` messages (default 3) are skipped — drops throwaway `/exit`'d sessions that would otherwise flood the palace. On a reference 140-session corpus, 78 were filtered this way. + +**Cost profile:** ~20 minutes per 60-session batch. Scales roughly linearly with message count. Dedup re-run: mine step instant, only the post-mine `repair` runs (~5 min on 5k drawers). + +--- + +## Companion agent skill + +Installing this repo symlinks `SKILL.md` into `~/.agents/skills/opencode-mempalace-bridge/SKILL.md`, where it's auto-discovered by opencode (and by Claude Code / Kiro if you run `agents-sync` from [`cli_utils`](https://gitea.jordbo.se/joakimp/cli_utils)). + +The skill is the *short-form checklist* for agents — when to use which wrapper, failure modes, setup recipes, anti-patterns. The canonical reference is always [`ARCHITECTURE.md`](ARCHITECTURE.md); the skill points there for deep context. + +The skill pairs with the consumer-side [`mempalace` skill](https://github.com/MemPalace/mempalace) — that one covers using the palace (search, diary, KG); this one covers feeding it. + +--- + +## See also + +- [`ARCHITECTURE.md`](ARCHITECTURE.md) — canonical spec: diagrams, setup recipe, failure modes, upstream roadmap. +- [`AGENTS.md`](AGENTS.md) — repo conventions for AI agents modifying this codebase. +- [MemPalace](https://github.com/MemPalace/mempalace) — the memory layer itself. +- [opencode](https://github.com/anomalyco/opencode) — the agent harness this bridges. +- [cli_utils](https://gitea.jordbo.se/joakimp/cli_utils) — sibling repo with shell quality-of-life tools (origin of these wrappers before the 2026-04-30 split). diff --git a/SKILL.md b/SKILL.md new file mode 100644 index 0000000..f575552 --- /dev/null +++ b/SKILL.md @@ -0,0 +1,157 @@ +--- +name: opencode-mempalace-bridge +description: Set up the producer side of MemPalace — feed opencode session history and project docs into the palace via the cli_utils wrappers. Use when provisioning a new machine, when the user asks how palace feeding works, when opencode sessions aren't showing up in searches, or when a project needs docs-only mining. Pairs with the `mempalace` skill (consumer side). +--- + +# Opencode ↔ MemPalace Bridge (producer side) + +## Overview + +The `mempalace` skill covers *using* the palace (search, diary, KG). This skill covers *feeding* it — specifically, how to wire opencode session history and project docs into the palace on a new machine or after a container recreate. + +**Authoritative source:** `/workspace/cli_utils/ARCHITECTURE.md` (also at `<cli_utils>/ARCHITECTURE.md` in the gitea repo). When in doubt, read that file — it's the canonical spec. This skill is the short-form checklist. + +**Core idea:** two thin wrappers in `cli_utils/bin/` close gaps in the stock mempalace CLI: + +| Gap | Wrapper | +| ---------------------------------------------------------------------------------------- | -------------------- | +| `mempalace mine` floods the palace with source code we don't want | `mempalace-docs` | +| `mempalace mine --mode convos` can't read opencode's SQLite DB | `mempalace-session` | + +Both follow the same **stage-to-cache-then-mine** idiom — they curate input into `~/.cache/…/<wing>/`, then delegate to `mempalace mine`. + +## When to Load This Skill + +- User asks "how does the palace get fed?" or mentions setting up mempalace on a new machine. +- Opencode conversations are missing from palace searches (`wing_conversations` is empty or stale). +- A project needs to be mined but you want *docs only, no source code*. +- User asks about `mempalace-docs` or `mempalace-session`. +- After a container recreate on a devbox — the wrappers need reinstall. +- Planning to retire either wrapper once upstream PRs merge (see §6 of ARCHITECTURE.md). + +## Setup Recipe (new machine) + +Prerequisites: `opencode` installed with an active DB at `~/.local/share/opencode/opencode.db`, `mempalace` CLI v3.3.3+, Python 3 (stdlib `sqlite3` only — no extra deps). + +```bash +# 1. Clone cli_utils (holds the two wrappers in bin/) +git clone <gitea-url>/cli_utils ~/cli_utils +cd ~/cli_utils + +# 2. Install — symlinks bin/* into ~/.local/bin, adds loader to rc file +./install.sh + +# 3. Verify ~/.local/bin is on PATH +which mempalace-session mempalace-docs + +# 4. Initialize palace (one-time, platform-wide) +mempalace init --yes + +# 5. Mine opencode session history into wing_conversations +mempalace-session --dry-run # preview: which sessions qualify? +mempalace-session # do it (~20 min per 60 sessions) + +# 6. Mine project docs per project (docs only — no source code) +mempalace-docs /workspace/my_project --dry-run +mempalace-docs /workspace/my_project + +# 7. If a long-lived MCP session is open, reconnect it +# (from inside the MCP client): mempalace_reconnect +``` + +### Containerized (devbox) specifics + +Named Docker volumes preserve state across container recreate: + +- `devbox-palace` → `~/.mempalace/palace` +- `devbox-data` → `~/.local/share/opencode` + +Bind mount `/workspace/cli_utils` from the host — code survives recreate, syncs via gitea. + +**After container recreate:** `~/.local/bin` is ephemeral. Just re-run `./install.sh` (idempotent) — everything else already persists. + +## Key Operational Rules + +### Always dry-run first on a cold system + +```bash +mempalace-session --dry-run # shows qualifying sessions +mempalace-docs <dir> --dry-run # shows files that would be mined +``` + +A docs-heavy repo should produce ~5–10 drawers per file. >15 drawers/file on average = code leaked in; investigate. + +### Dedup is free — re-running is safe + +- `mempalace-docs`: dedup keyed on `source_file` path + `mtime`. Unchanged files skipped. +- `mempalace-session`: dedup keyed on `source_file` path alone (no mtime check for convos). Staging filenames are deterministic per session (`<slug>_<id>.jsonl`), so re-runs skip already-filed sessions. + +Second run immediately after first → 0 new drawers, only the post-mine `repair` step runs (~5 min on 5k drawers). + +### Incremental catch-up + +```bash +mempalace-session --since 2026-04-20 # only recent sessions +mempalace-session --session ses_abc123 # one specific session +``` + +### Force re-mine + +```bash +rm -rf ~/.cache/mempalace-session/<wing>/ # nukes staging dir +mempalace-session # stages + mines fresh +``` + +Staging is ephemeral by design; the palace is the source of truth. + +## Failure Modes & Fixes + +| Symptom | Cause | Fix | +| ---------------------------------------------------------- | ----------------------------------------------- | ------------------------------------------------------- | +| `mempalace-session: command not found` | `~/.local/bin` wiped (container recreate) | `cd ~/cli_utils && ./install.sh` | +| Sessions missing from palace | Fewer messages than `--min-messages` (default 3)| Lower threshold or `--session <id>` explicitly | +| "Error finding id" on search after mining | Stale HNSW index | `mempalace repair --yes` + `mempalace_reconnect` | +| Drawers doubled for a project | Someone ran raw `mempalace mine` alongside wrapper, or renamed wing mid-flight | Inspect `embedding_metadata` in `chroma.sqlite3`, purge duplicates by source prefix, then `mempalace repair` | +| Post-mine ChromaDB search returns stale results in MCP | MCP server caches old index | Call `mempalace_reconnect` from MCP | +| Opencode DB not at default path | Non-standard `XDG_DATA_HOME` or opencode config | `export OPENCODE_DB=/custom/path/opencode.db` or `--db` | + +## What to File Under Which Wing + +| Content type | Wing (convention) | Room | Tool | +| ----------------------------------- | ------------------------------ | ---------------- | ----------------------- | +| Opencode session transcripts | `wing_conversations` | auto (keyword) | `mempalace-session` | +| Project docs (md, yaml, Dockerfile) | `wing_<project-name>` | auto | `mempalace-docs` | +| Per-agent session diaries | `wing_<agent-name>` | `diary` | `mempalace_diary_write` (from the consumer-side `mempalace` skill) | +| Ad-hoc verbatim facts | any | any | `mempalace_add_drawer` | + +## Cost Profile (reference) + +From a 10-day opencode corpus (140 sessions / 1491 msgs / 4656 parts): + +- Dry run: seconds. +- Full mine: ~21 min wall / ~38 min user CPU → 2378 drawers from 62 qualifying sessions. +- Dedup re-run: mine instant, repair ~5 min. + +Budget **~20 minutes per 60-session batch**. Scales roughly linearly with message count. + +## Anti-Patterns + +- **Don't run `mempalace mine` directly on a project.** Use `mempalace-docs` — otherwise source code floods the palace. +- **Don't try to point `mempalace mine --mode convos` at `opencode.db` directly.** The convos miner reads files (txt/md/json/jsonl) only — no SQLite support. Use `mempalace-session` to export first. +- **Don't delete staging dirs unnecessarily.** They're dedup anchors; deleting means a forced re-mine of everything in that wing. +- **Don't forget `mempalace_reconnect`** after a mine from inside a live MCP session — otherwise search hits the stale index. +- **Don't mine with `--min-messages 0` or `1`** — 78 out of 140 sessions in reference corpus were throwaway `/exit`'d sessions that would flood the palace with noise. Default 3 is sensible. + +## Upstream Roadmap (when to retire these wrappers) + +- **[MemPalace PR #1213](https://github.com/MemPalace/mempalace/pull/1213)** merges → `mempalace-docs` becomes redundant (exclude patterns in `mempalace.yaml`). Retire to thin shim or delete. +- **Opencode session-stopping hooks merge** ([PR #16598](https://github.com/anomalyco/opencode/pull/16598) et al.) **AND** `hooks_cli.py` gains `opencode` harness → live auto-save works; `mempalace-session` becomes a manual-only backfill tool (cron / historic import). +- **SQLite mode lands in `mempalace mine --mode convos`** → `mempalace-session` loses its reason to exist entirely. + +Check `ARCHITECTURE.md` §6 in `cli_utils/` for current upstream status before doing any retirement work. + +## See Also + +- `<cli_utils>/ARCHITECTURE.md` — **canonical spec** (diagrams, implementation notes, full troubleshooting). +- `<cli_utils>/README.md` — per-tool usage reference. +- `~/.agents/skills/mempalace/SKILL.md` — consumer-side skill (search, diary, KG) — *pair this skill with that one*. diff --git a/bin/mempalace-docs b/bin/mempalace-docs new file mode 100755 index 0000000..469ee57 --- /dev/null +++ b/bin/mempalace-docs @@ -0,0 +1,268 @@ +#!/usr/bin/env bash +# mempalace-docs — mine a project into MemPalace with docs-only filtering +# +# Works around the fact that upstream `mempalace mine` has a hardcoded +# READABLE_EXTENSIONS list that includes .py / .ts / .js / .go / .rs etc, +# which pollutes the palace with low-signal code-fragment drawers. +# +# Strategy: stage a copy of only docs/config/script files into /tmp, then +# run `mempalace mine` against that staging dir. Wing is derived from the +# source directory name (override with --wing). +# +# Once MemPalace PR #1213 (exclude_patterns in mempalace.yaml) lands, this +# wrapper becomes a thin shim over `mempalace mine` with a default +# exclude_patterns injected. +# +# Usage: +# mempalace-docs <directory> +# mempalace-docs <directory> --wing <name> +# mempalace-docs <directory> --agent <name> +# mempalace-docs <directory> --dry-run +# mempalace-docs --help +# +# Exit codes: +# 0 success +# 1 usage / argument error +# 2 source directory missing +# 3 mempalace CLI not installed +# 4 mine failed +# +# Dependencies: bash, find, cp, mempalace (v3.3.3+) + +set -euo pipefail + +# ── Defaults ───────────────────────────────────────────────────────── +AGENT="${USER:-mempalace}" +WING="" +SRC="" +DRY_RUN=0 +NO_REPAIR=0 + +# File patterns to include. Docs + config + intent-bearing scripts. +# Everything else (code) is excluded by omission. +INCLUDE_GLOBS=( + '*.md' '*.mdx' '*.rst' '*.txt' + '*.yml' '*.yaml' '*.toml' + '*.json' # includes package.json, pyproject companions; lockfiles filtered below + '*.sh' '*.bash' '*.zsh' '*.fish' + 'Dockerfile*' 'Makefile*' 'Containerfile*' + '*.conf' '*.cfg' '*.ini' + 'LICENSE*' 'COPYING*' 'NOTICE*' 'AUTHORS*' 'CONTRIBUTORS*' +) + +# Path segments to always skip (in addition to .gitignore). +SKIP_DIRS=( + '.git' '.venv' 'venv' '__pycache__' 'node_modules' + '.mypy_cache' '.pytest_cache' '.ruff_cache' '.tox' '.nox' + 'dist' 'build' '.next' '.nuxt' 'target' 'coverage' + '.DS_Store' +) + +# Filename patterns to skip even if caught by an include glob. +SKIP_FILES=( + 'package-lock.json' 'yarn.lock' 'pnpm-lock.yaml' 'poetry.lock' + 'Cargo.lock' 'Gemfile.lock' 'composer.lock' + '.gitignore' '.dockerignore' +) + +# ── Usage ──────────────────────────────────────────────────────────── +usage() { + cat <<'EOF' +mempalace-docs — mine a project into MemPalace, docs/config/scripts only + +Usage: + mempalace-docs <directory> [options] + +Options: + --wing <name> Override wing name (default: source directory name) + --agent <name> Agent name recorded on drawers (default: $USER) + --dry-run List files that would be mined; do not file + --no-repair Skip `mempalace repair` after mining + -h, --help Show this help + +What gets mined: + Docs: *.md *.mdx *.rst *.txt + Config: *.yml *.yaml *.toml *.json *.conf *.cfg *.ini + Scripts: *.sh *.bash *.zsh *.fish Dockerfile* Makefile* + Legal: LICENSE* COPYING* NOTICE* AUTHORS* + +What gets skipped (by design): + Source code: .py .ts .tsx .js .jsx .go .rs .java .cpp .c .rb .kt .swift + Caches / deps: .git .venv venv node_modules __pycache__ .mypy_cache + .pytest_cache .ruff_cache dist build .next target coverage + Lockfiles: package-lock.json yarn.lock poetry.lock Cargo.lock ... + +Rationale: + The palace is for context and intent. Agents read code directly via + grep/glob/Read — mining it creates a parallel, lossier, drift-prone + copy that pollutes semantic search. + + This wrapper is a bridge until MemPalace PR #1213 (exclude_patterns) + lands upstream. +EOF +} + +# ── Parse args ─────────────────────────────────────────────────────── +while [[ $# -gt 0 ]]; do + case "$1" in + -h|--help) usage; exit 0 ;; + --wing) WING="${2:-}"; shift 2 ;; + --agent) AGENT="${2:-}"; shift 2 ;; + --dry-run) DRY_RUN=1; shift ;; + --no-repair) NO_REPAIR=1; shift ;; + --) shift; break ;; + -*) echo "error: unknown option: $1" >&2; usage >&2; exit 1 ;; + *) if [[ -z "$SRC" ]]; then SRC="$1"; shift; else echo "error: unexpected arg: $1" >&2; exit 1; fi ;; + esac +done + +if [[ -z "$SRC" ]]; then usage >&2; exit 1; fi +if [[ ! -d "$SRC" ]]; then + echo "error: not a directory: $SRC" >&2; exit 2 +fi +if ! command -v mempalace >/dev/null 2>&1; then + echo "error: mempalace CLI not found in PATH" >&2; exit 3 +fi + +SRC="$(cd "$SRC" && pwd)" + +# Determine wing name with the following precedence: +# 1. explicit --wing flag (user override) +# 2. `wing:` value in $SRC/mempalace.yaml (respect existing project config) +# 3. sanitized source directory basename (hyphens → underscores, matching +# mempalace's convention for implicit wing names) +if [[ -z "$WING" && -f "$SRC/mempalace.yaml" ]]; then + WING="$(awk -F': *' '/^wing:/ { gsub(/["\x27 ]/,"",$2); print $2; exit }' "$SRC/mempalace.yaml" 2>/dev/null || true)" +fi +if [[ -z "$WING" ]]; then + WING="$(basename "$SRC" | tr '-' '_')" +fi + +# ── Build staging directory ────────────────────────────────────────── +# Use a deterministic, per-wing cache path so re-runs produce the same +# source_file paths the miner saw last time. This is critical: mempalace +# dedup keys on source_file + source_mtime, so a mktemp path would cause +# every run to re-file the entire wing. +CACHE_ROOT="${XDG_CACHE_HOME:-$HOME/.cache}/mempalace-docs" +STAGE="$CACHE_ROOT/$WING" +mkdir -p "$CACHE_ROOT" +rm -rf "$STAGE" +mkdir -p "$STAGE" +# Only clean up the per-wing stage on exit — leave $CACHE_ROOT itself +# alone in case other wings are staging concurrently. +trap 'rm -rf "$STAGE"' EXIT INT TERM + +# Build find expression +find_cmd=(find "$SRC" -type f) + +# Prune unwanted dirs +for d in "${SKIP_DIRS[@]}"; do + find_cmd+=('!' -path "*/$d/*" '!' -path "*/$d") +done + +# Include only matching names +find_cmd+=('(' -false) +for g in "${INCLUDE_GLOBS[@]}"; do + find_cmd+=('-o' '-name' "$g") +done +find_cmd+=(')') + +# Gather matches, then filter skip_files +mapfile -t matches < <("${find_cmd[@]}") + +filtered=() +for f in "${matches[@]}"; do + base="$(basename "$f")" + skip=0 + for sf in "${SKIP_FILES[@]}"; do + if [[ "$base" == "$sf" ]]; then skip=1; break; fi + done + [[ $skip -eq 0 ]] && filtered+=("$f") +done + +count="${#filtered[@]}" + +if [[ $count -eq 0 ]]; then + echo "no matching files found in $SRC" >&2 + exit 0 +fi + +if [[ $DRY_RUN -eq 1 ]]; then + echo "Would mine $count files into wing '$WING':" + printf ' %s\n' "${filtered[@]}" | sed "s#^ $SRC/# #" + exit 0 +fi + +# Copy into staging, preserving mtime (critical for mempalace dedup — +# the miner compares stored mtime against the staged copy's mtime). +for f in "${filtered[@]}"; do + rel="${f#$SRC/}" + dest="$STAGE/$rel" + mkdir -p "$(dirname "$dest")" + cp -p "$f" "$dest" +done + +# Purge any drawers in this wing that came from the original source +# directory. The miner records source_file = absolute path from the +# staging dir; this differs from a prior `mempalace mine <source>` run, +# so without this purge the wing would accumulate duplicates every time +# we switch between upstream `mempalace mine` and this wrapper. +# We only purge source_file paths matching $SRC/*, leaving other wings +# and other sources alone. +python3 - "$WING" "$SRC" <<'PY' +import sqlite3, sys, os +wing, src = sys.argv[1], sys.argv[2].rstrip("/") +db_path = os.path.expanduser("~/.mempalace/palace/chroma.sqlite3") +if not os.path.exists(db_path): + sys.exit(0) +db = sqlite3.connect(db_path) +cur = db.cursor() +# Find embedding ids in target wing whose source_file is under $SRC/ +q = """ +SELECT DISTINCT w.id +FROM embedding_metadata w +JOIN embedding_metadata s ON w.id = s.id AND s.key = 'source_file' +WHERE w.key = 'wing' + AND w.string_value = ? + AND (s.string_value LIKE ? OR s.string_value LIKE ?) +""" +pats = (f"{src}/%", f"{src}") +ids = [r[0] for r in cur.execute(q, (wing, pats[0], pats[1]))] +if ids: + ph = ",".join("?" * len(ids)) + for tbl in ("embedding_metadata", "embeddings"): + try: + cur.execute(f"DELETE FROM {tbl} WHERE id IN ({ph})", ids) + except sqlite3.OperationalError: + pass + db.commit() + print(f" purged {len(ids)} pre-existing drawers for {src} from wing '{wing}'") +db.close() +PY + +# Write mempalace.yaml into staging dir so the miner uses the right wing +cat > "$STAGE/mempalace.yaml" <<EOF +wing: $WING +rooms: + - name: general + description: Docs, config, and scripts from $WING + keywords: [general] +EOF + +echo "Staging $count files into wing '$WING'..." + +# ── Run the mine ───────────────────────────────────────────────────── +if ! mempalace mine "$STAGE" --agent "$AGENT" --wing "$WING"; then + echo "error: mempalace mine failed" >&2 + exit 4 +fi + +# ── Repair index ───────────────────────────────────────────────────── +if [[ $NO_REPAIR -eq 0 ]]; then + echo "" + echo "Rebuilding HNSW index..." + mempalace repair --yes +fi + +echo "" +echo "Done. Wing '$WING' is ready. Remember to reconnect any live MCP sessions." diff --git a/bin/mempalace-session b/bin/mempalace-session new file mode 100755 index 0000000..cb9ab6b --- /dev/null +++ b/bin/mempalace-session @@ -0,0 +1,349 @@ +#!/usr/bin/env bash +# mempalace-session — mine opencode session history into MemPalace +# +# Opencode persists every session (verbatim user/assistant turns + tool calls) +# in a local SQLite DB at ~/.local/share/opencode/opencode.db. There is +# currently no opencode session-stopping hook upstream, so the diary-based +# auto-save is best-effort; this wrapper closes the gap by mining the SQLite +# directly. +# +# Strategy: +# 1. Read opencode.db and export each qualifying session to a Claude Code +# JSONL file (format the mempalace normalizer already understands). +# 2. Stage exports under ~/.cache/mempalace-session/<wing>/. +# 3. Run `mempalace mine --mode convos` against the staging dir. +# +# Dedup: mempalace convos mode keys on source_file (absolute staging path). +# The staging path is deterministic (per-wing under XDG_CACHE_HOME) so re-runs +# are idempotent as long as session content hasn't changed. +# +# Session filter: sessions with fewer than --min-messages messages (default 3) +# are skipped to avoid filing throwaway /exit'd sessions. +# +# Usage: +# mempalace-session +# mempalace-session --wing <name> +# mempalace-session --session <id> +# mempalace-session --since 2026-04-01 +# mempalace-session --min-messages 6 +# mempalace-session --dry-run +# mempalace-session --help +# +# Exit codes: +# 0 success +# 1 usage / argument error +# 2 opencode.db missing or unreadable +# 3 mempalace CLI not installed +# 4 mine failed +# +# Dependencies: bash, python3 (stdlib sqlite3), mempalace (v3.3.3+) + +set -euo pipefail + +# ── Defaults ───────────────────────────────────────────────────────── +AGENT="${USER:-mempalace}" +WING="wing_conversations" +SESSION_ID="" +SINCE="" +MIN_MESSAGES=3 +DRY_RUN=0 +NO_REPAIR=0 +OPENCODE_DB="${OPENCODE_DB:-$HOME/.local/share/opencode/opencode.db}" + +# ── Usage ──────────────────────────────────────────────────────────── +usage() { + cat <<'EOF' +mempalace-session — mine opencode session history into MemPalace + +Usage: + mempalace-session [options] + +Options: + --wing <name> Target wing (default: wing_conversations) + --session <id> Export one session only (default: all qualifying) + --since <YYYY-MM-DD> Only sessions with time_updated on/after this date + --min-messages <N> Skip sessions with fewer than N messages (default: 3) + --agent <name> Agent name recorded on drawers (default: $USER) + --db <path> Path to opencode.db (default: $OPENCODE_DB or + ~/.local/share/opencode/opencode.db) + --dry-run Export + list; do not mine into palace + --no-repair Skip `mempalace repair` after mining + -h, --help Show this help + +What gets mined: + - Each qualifying session → one Claude Code JSONL file + - Staged under ~/.cache/mempalace-session/<wing>/ + - Filed via `mempalace mine --mode convos` + +Transcript shape per session: + - Synthetic header as first user turn: + [session: <title> | <directory> | <YYYY-MM-DD>] + - User/assistant messages extracted from message.data + part.data + - Tool calls → Claude Code `tool_use` blocks + - Tool outputs → `tool_result` blocks (folded into the assistant turn by the + mempalace normalizer) + - `step-start` / `step-finish` parts are dropped (noise) + - `reasoning` parts prefixed with `[reasoning]` and kept as text + +Dedup: + - source_file = absolute staging path (deterministic per session ID) + - Re-runs skip unchanged sessions. To force re-mining, delete the staging + dir: rm -rf ~/.cache/mempalace-session/<wing>/ + +Rationale: + Opencode lacks a session-stopping hook (upstream PRs #16598, #16769 still + open). Until that lands + mempalace hooks_cli.py gains an opencode harness, + this wrapper is how we get automatic session capture. +EOF +} + +# ── Parse args ─────────────────────────────────────────────────────── +while [[ $# -gt 0 ]]; do + case "$1" in + -h|--help) usage; exit 0 ;; + --wing) WING="${2:-}"; shift 2 ;; + --session) SESSION_ID="${2:-}"; shift 2 ;; + --since) SINCE="${2:-}"; shift 2 ;; + --min-messages) MIN_MESSAGES="${2:-}"; shift 2 ;; + --agent) AGENT="${2:-}"; shift 2 ;; + --db) OPENCODE_DB="${2:-}"; shift 2 ;; + --dry-run) DRY_RUN=1; shift ;; + --no-repair) NO_REPAIR=1; shift ;; + --) shift; break ;; + -*) echo "error: unknown option: $1" >&2; usage >&2; exit 1 ;; + *) echo "error: unexpected arg: $1" >&2; exit 1 ;; + esac +done + +# ── Preflight ──────────────────────────────────────────────────────── +if [[ ! -f "$OPENCODE_DB" ]]; then + echo "error: opencode.db not found at $OPENCODE_DB" >&2 + echo " override with --db <path> or OPENCODE_DB env var" >&2 + exit 2 +fi +if ! command -v mempalace >/dev/null 2>&1; then + echo "error: mempalace CLI not found in PATH" >&2 + exit 3 +fi +if ! [[ "$MIN_MESSAGES" =~ ^[0-9]+$ ]]; then + echo "error: --min-messages must be an integer" >&2 + exit 1 +fi + +# ── Staging dir ────────────────────────────────────────────────────── +# Deterministic per-wing path so source_file dedup works across re-runs. +CACHE_ROOT="${XDG_CACHE_HOME:-$HOME/.cache}/mempalace-session" +STAGE="$CACHE_ROOT/$WING" +mkdir -p "$STAGE" + +# ── Export sessions (Python heredoc) ──────────────────────────────── +# Writes one JSONL file per qualifying session into $STAGE. +# Prints: EXPORTED <count> on stdout, plus per-session lines. +export_count=$(python3 - "$OPENCODE_DB" "$STAGE" "$SESSION_ID" "$SINCE" "$MIN_MESSAGES" <<'PY' +import sqlite3, json, sys, os +from datetime import datetime, timezone +from pathlib import Path + +db_path, stage, session_filter, since, min_messages = sys.argv[1:6] +min_messages = int(min_messages) +stage = Path(stage) + +# Convert --since YYYY-MM-DD to epoch ms (opencode uses ms timestamps) +since_ms = None +if since: + try: + since_ms = int(datetime.strptime(since, "%Y-%m-%d").replace(tzinfo=timezone.utc).timestamp() * 1000) + except ValueError: + print(f"error: --since must be YYYY-MM-DD, got {since!r}", file=sys.stderr) + sys.exit(1) + +conn = sqlite3.connect(f"file:{db_path}?mode=ro", uri=True) +conn.row_factory = sqlite3.Row +cur = conn.cursor() + +# Select sessions +q = "SELECT * FROM session WHERE 1=1" +params = [] +if session_filter: + q += " AND id = ?" + params.append(session_filter) +if since_ms is not None: + q += " AND time_updated >= ?" + params.append(since_ms) +q += " ORDER BY time_updated" +cur.execute(q, params) +sessions = [dict(r) for r in cur.fetchall()] + +if not sessions: + print("EXPORTED 0") + sys.exit(0) + +# Prefetch messages + parts for qualifying sessions +exported = 0 +skipped_short = 0 +for sess in sessions: + sid = sess["id"] + cur.execute("SELECT COUNT(*) FROM message WHERE session_id=?", (sid,)) + msg_count = cur.fetchone()[0] + if msg_count < min_messages: + skipped_short += 1 + continue + + cur.execute( + "SELECT * FROM message WHERE session_id=? ORDER BY time_created", (sid,) + ) + messages = [dict(r) for r in cur.fetchall()] + cur.execute( + "SELECT * FROM part WHERE session_id=? ORDER BY time_created", (sid,) + ) + parts_by_msg: dict[str, list] = {} + for r in cur.fetchall(): + d = dict(r) + parts_by_msg.setdefault(d["message_id"], []).append(d) + + # Build JSONL lines + out_lines: list[dict] = [] + + # Synthetic header as first user turn — injects title/directory/date + # into the transcript so semantic search can find sessions by topic, + # not just by session-id filename. + title = sess.get("title") or "(untitled)" + directory = sess.get("directory") or "?" + date_str = datetime.fromtimestamp( + sess["time_created"] / 1000, tz=timezone.utc + ).strftime("%Y-%m-%d") + header = f"[session: {title} | {directory} | {date_str}]" + out_lines.append({"type": "user", "message": {"content": header}}) + + for msg in messages: + mdata = json.loads(msg["data"]) + role = mdata.get("role") + if role not in ("user", "assistant"): + continue + parts = parts_by_msg.get(msg["id"], []) + + blocks = [] + tool_results = [] + for p in parts: + try: + pd = json.loads(p["data"]) + except json.JSONDecodeError: + continue + t = pd.get("type") + if t == "text": + txt = (pd.get("text") or "").strip() + if txt: + blocks.append({"type": "text", "text": txt}) + elif t == "tool": + # opencode tool part → tool_use block + deferred tool_result + state = pd.get("state") or {} + tool_name = pd.get("tool") or "Unknown" + call_id = pd.get("callID") or p["id"] + tool_input = state.get("input") or {} + tool_output = state.get("output") + blocks.append({ + "type": "tool_use", + "id": call_id, + "name": tool_name, + "input": tool_input, + }) + if tool_output: + tool_results.append({ + "type": "tool_result", + "tool_use_id": call_id, + "content": str(tool_output), + }) + elif t in ("step-start", "step-finish"): + continue + elif t == "reasoning": + rtext = (pd.get("text") or "").strip() + if rtext: + blocks.append({"type": "text", "text": f"[reasoning] {rtext}"}) + + if not blocks: + continue + + # Simplify single-text-block messages to a bare string (more tolerant + # of normalizer edge cases; mempalace accepts either shape). + if len(blocks) == 1 and blocks[0]["type"] == "text": + content = blocks[0]["text"] + else: + content = blocks + + out_lines.append({ + "type": role, + "message": {"content": content}, + }) + + # For assistants, follow up with a synthetic human tool_result message + # per tool call. The mempalace normalizer's `is_tool_only` branch + # folds these back into the assistant turn (see normalize.py:211-214). + if role == "assistant" and tool_results: + out_lines.append({ + "type": "human", + "message": {"content": tool_results}, + }) + + # Must have at least 2 turns for the normalizer to accept the file + if len(out_lines) < 2: + skipped_short += 1 + continue + + slug = sess.get("slug") or "session" + out_path = stage / f"{slug}_{sid}.jsonl" + with open(out_path, "w", encoding="utf-8") as f: + for obj in out_lines: + f.write(json.dumps(obj, ensure_ascii=False) + "\n") + + # Set mtime to session time_updated so dedup sees a stable value. + try: + ts = sess["time_updated"] / 1000 + os.utime(out_path, (ts, ts)) + except Exception: + pass + + exported += 1 + print(f" {out_path.name} ({msg_count} msgs, {len(out_lines)} turns)", + file=sys.stderr) + +print(f"EXPORTED {exported}") +if skipped_short: + print(f"SKIPPED_SHORT {skipped_short}", file=sys.stderr) +PY +) + +# Parse count from stdout +count="${export_count##*EXPORTED }" +count="${count%%[!0-9]*}" +count="${count:-0}" + +if [[ "$count" -eq 0 ]]; then + echo "no sessions qualified for export" + exit 0 +fi + +echo "" +echo "Exported $count session(s) to $STAGE" + +if [[ $DRY_RUN -eq 1 ]]; then + echo "--dry-run: skipping mine step" + exit 0 +fi + +# ── Run the mine ───────────────────────────────────────────────────── +echo "" +echo "Mining into wing '$WING'..." +if ! mempalace mine "$STAGE" --mode convos --wing "$WING" --agent "$AGENT"; then + echo "error: mempalace mine failed" >&2 + exit 4 +fi + +# ── Repair index ───────────────────────────────────────────────────── +if [[ $NO_REPAIR -eq 0 ]]; then + echo "" + echo "Rebuilding HNSW index..." + mempalace repair --yes +fi + +echo "" +echo "Done. Wing '$WING' updated. Remember to reconnect any live MCP sessions." diff --git a/install.sh b/install.sh new file mode 100644 index 0000000..84a7c43 --- /dev/null +++ b/install.sh @@ -0,0 +1,184 @@ +#!/usr/bin/env bash +# install.sh — install mempalace-toolkit executables + companion agent skill +# +# Idempotent. Safe to re-run after container recreate. + +set -euo pipefail + +# ── locate self ────────────────────────────────────── +SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )" + +# ── targets ────────────────────────────────────────── +BIN_SRC="${SCRIPT_DIR}/bin" +BIN_DEST="${HOME}/.local/bin" + +SKILL_SRC="${SCRIPT_DIR}/SKILL.md" +SKILL_DEST_DIR="${HOME}/.agents/skills/opencode-mempalace-bridge" +SKILL_DEST="${SKILL_DEST_DIR}/SKILL.md" + +# ── args ───────────────────────────────────────────── +ACTION="install" +ASSUME_YES="no" + +while [[ $# -gt 0 ]]; do + case "$1" in + --uninstall) ACTION="uninstall"; shift ;; + -y|--yes) ASSUME_YES="yes"; shift ;; + -h|--help) + cat <<EOF +install.sh — install mempalace-toolkit + +Usage: + ./install.sh # install (interactive confirm) + ./install.sh --yes # install without prompt + ./install.sh --uninstall # remove symlinks + +What install does: + - Symlinks each executable in bin/ into ~/.local/bin/ + - Symlinks SKILL.md into ~/.agents/skills/opencode-mempalace-bridge/SKILL.md + (auto-discovered by opencode; run agents-sync from cli_utils to also + reach Claude Code and Kiro) + +What uninstall does: + - Removes symlinks in ~/.local/bin/ that point into this repo + - Removes the skill symlink if it points into this repo +EOF + exit 0 ;; + *) echo "Unknown flag: $1" >&2; exit 2 ;; + esac +done + +# ── helpers ────────────────────────────────────────── +ok() { printf ' \e[32m✓\e[0m %s\n' "$*"; } +note() { printf '==> %s\n' "$*"; } +warn() { printf ' \e[33m!\e[0m %s\n' "$*" >&2; } +err() { printf ' \e[31m✗\e[0m %s\n' "$*" >&2; } + +confirm() { + [[ "$ASSUME_YES" == "yes" ]] && return 0 + read -r -p "Proceed? [y/N] " ans + [[ "$ans" =~ ^[Yy]$ ]] +} + +link_if_into_repo() { + # Return 0 if $1 is a symlink pointing into $SCRIPT_DIR + local target + [[ -L "$1" ]] || return 1 + target=$(readlink -f "$1") + [[ "$target" == "$SCRIPT_DIR"/* ]] +} + +# ── install ────────────────────────────────────────── +install_bin() { + mkdir -p "$BIN_DEST" + note "Symlinking bin/ executables into $BIN_DEST" + local count=0 + for src in "$BIN_SRC"/*; do + [[ -x "$src" && -f "$src" ]] || continue + local name; name=$(basename "$src") + local dest="$BIN_DEST/$name" + if [[ -e "$dest" || -L "$dest" ]]; then + if link_if_into_repo "$dest"; then + ok "Already linked: $name" + count=$((count+1)) + continue + else + warn "Skipping $name: $dest exists and is not our symlink" + continue + fi + fi + ln -s "$src" "$dest" + ok "Linked $name → $src" + count=$((count+1)) + done + echo + ok "Installed $count executable(s)" +} + +install_skill() { + note "Linking companion agent skill" + mkdir -p "$SKILL_DEST_DIR" + if [[ -e "$SKILL_DEST" || -L "$SKILL_DEST" ]]; then + if link_if_into_repo "$SKILL_DEST"; then + ok "Skill already linked" + return 0 + else + warn "Skipping skill: $SKILL_DEST exists and is not our symlink" + return 0 + fi + fi + ln -s "$SKILL_SRC" "$SKILL_DEST" + ok "Linked SKILL.md → $SKILL_SRC" +} + +check_path() { + case ":$PATH:" in + *":$BIN_DEST:"*) : ;; + *) warn "$BIN_DEST is not on \$PATH. Add to your shell rc:"; + printf ' export PATH="%s:$PATH"\n' "\$HOME/.local/bin" ;; + esac +} + +do_install() { + echo + echo "mempalace-toolkit installer" + echo "Repository: $SCRIPT_DIR" + echo + echo "==> Installation plan:" + echo " Symlink executables in bin/ into $BIN_DEST" + echo " Symlink SKILL.md into $SKILL_DEST" + echo + confirm || { echo "Aborted."; exit 0; } + echo + install_bin + echo + install_skill + echo + check_path + echo + ok "Done." + echo + echo "Next: ./bin/mempalace-session --dry-run" + echo " or: ./bin/mempalace-docs /path/to/project --dry-run" +} + +# ── uninstall ──────────────────────────────────────── +do_uninstall() { + echo + echo "mempalace-toolkit uninstaller" + echo "Repository: $SCRIPT_DIR" + echo + confirm || { echo "Aborted."; exit 0; } + echo + + note "Removing executable symlinks from $BIN_DEST" + local removed=0 + for src in "$BIN_SRC"/*; do + [[ -x "$src" && -f "$src" ]] || continue + local name; name=$(basename "$src") + local dest="$BIN_DEST/$name" + if link_if_into_repo "$dest"; then + rm "$dest" + ok "Removed $name" + removed=$((removed+1)) + fi + done + ok "Removed $removed executable symlink(s)" + + echo + note "Removing skill symlink" + if link_if_into_repo "$SKILL_DEST"; then + rm "$SKILL_DEST" + ok "Removed skill symlink" + else + ok "No skill symlink to remove" + fi + + echo + ok "Done." +} + +case "$ACTION" in + install) do_install ;; + uninstall) do_uninstall ;; +esac