From b17dc1fa1f52c26ab0957f9f7be65a3b9189a9ca Mon Sep 17 00:00:00 2001 From: pi Date: Sun, 14 Jun 2026 18:06:47 +0200 Subject: [PATCH] docs: add single-writer MemPalace broker design (RFC, queue #4) --- docs/mempalace-broker-design.md | 243 ++++++++++++++++++++++++++++++++ 1 file changed, 243 insertions(+) create mode 100644 docs/mempalace-broker-design.md diff --git a/docs/mempalace-broker-design.md b/docs/mempalace-broker-design.md new file mode 100644 index 0000000..aea6dc9 --- /dev/null +++ b/docs/mempalace-broker-design.md @@ -0,0 +1,243 @@ +# Design: single-writer MemPalace broker (cross-host serialization) + +> **Status:** DRAFT / RFC — not yet implemented. Captures the design so it can be +> picked up later. Authored 2026-06-14. +> **Owner:** unassigned. **Tracking:** queue item #4 ("host-side mempalace-mcp +> daemon over a UNIX/shared socket"). + +## Problem + +The pi-devbox container's `~/.mempalace` (`/home/developer/.mempalace`) is a +**virtiofs bind-mount of the host's `/Users/joakim/.mempalace`** (verified +2026-06-14 via `/proc/mounts`: `mac /home/developer/.mempalace virtiofs rw`). +Container pi and host-native pi therefore **read and write ONE shared palace** — +full memory parity already exists; nothing needs to be built to *enable* sharing. + +The actual hazard is the opposite of sharing: **concurrency**. Two pi processes +(one native on the host, one in the container) can open the same +`chroma.sqlite3` / `knowledge_graph.sqlite3` and write at the same time. The +palace directory already shows the scars of this: + +- `chroma.sqlite3.broken-20260505` +- many `*.corrupt-20260528` +- a long run of `*.drift-2026*` +- `locks/` with `mine_palace_*.lock` files, including a **stale** one. + +These are mempalace's defensive lock + auto-snapshot/repair machinery firing +under concurrent access. + +### Why a shared lock file is NOT sufficient + +The container runs inside a Linux VM (OrbStack / Docker Desktop on macOS); the +palace bytes live on the macOS host, surfaced into the VM via virtiofs. +Consequences: + +- A **UNIX-domain socket file** visible at `~/.mempalace/broker.sock` inside the + container is a *host-kernel* object. The container's kernel can see the inode + but **cannot connect to it** across the VM boundary. +- **flock / advisory lockfiles are not coherent across the host↔VM boundary.** + A lock taken on the host is not reliably seen in the container and vice-versa. + (The stale `mine_palace_*.lock` is direct evidence the existing lock scheme is + not bulletproof across this boundary.) + +**Therefore the only trustworthy serialization is to route every write through a +single process.** That single process is the broker. The design question is *not* +"how do we lock" — it's "**where does the one writer live, and how does every pi +(host or container) reach it across the VM boundary?**" + +## Goals + +1. Exactly one process opens the palace SQLite files at any time (single writer; + concurrent reads are fine). +2. Works in all three topologies on a given host: + - native pi only, + - native pi + container pi, + - container pi only. +3. pi configuration is **identical** in every topology (no per-environment MCP + config divergence). +4. No new corruption pathway introduced; degrade safely when the broker is + genuinely unreachable and there are no peers. + +### Non-goals (for this iteration) + +- opencode / opencode-devbox co-existence (see "Co-existence with opencode" + below — deferred until the pi case is solved). +- Multi-host palace replication. This is about one host's local palace. +- Changing mempalace's on-disk format or its public MCP tool surface. + +## Architecture + +``` +pi (host) ─stdio─► mp-shim ─┐ + ├─► mempalace-broker ─► chroma.sqlite3 +pi (ctr) ─stdio─► mp-shim ─┘ (SINGLE owner; knowledge_graph.sqlite3 + serialized writer, + in-memory HNSW index + concurrent readers) +``` + +### `mempalace-broker` + +A long-lived process that is the **only** opener of the palace SQLite files. It: + +- runs the real mempalace engine, +- holds the HNSW index in memory, +- pushes all mutations through a single writer queue (reads may fan out), +- exposes the mempalace MCP JSON-RPC surface over one or more transports, +- is the canonical owner of palace state for the lifetime of the host session. + +**Bonus:** a single always-resident owner also eliminates the stale-HNSW-index +problem that `mempalace_reconnect` exists to work around — there is never an +external writer to desync the in-memory index against. + +### `mp-shim` + +A tiny stdio↔transport adapter. pi's mempalace MCP config points at the shim +**everywhere, unchanged**. pi still believes it is speaking stdio MCP to a local +server; the shim forwards JSON-RPC to the broker over whichever transport is +available, and handles all discovery / startup / election complexity. Keeping +pi's config identical across topologies is a hard requirement (goal #3) and the +shim is what makes it possible. + +## Canonical owner = the host + +The broker's home is **always the host**, because: + +1. The palace bytes physically live there (`/Users/joakim/.mempalace`). +2. The host outlives any container — ownership does not evaporate on + `docker compose down`. +3. Containers already have a route back to it (`host.docker.internal` and the + verified dssh ControlMaster bridge). + +The broker binds **two listeners feeding one queue**: + +- **AF_UNIX** at `$MEMPALACE_PATH/broker.sock` — for host-native pi (fast, + filesystem-perms-secured). +- a **cross-boundary** transport for container clients (below). + +## Transport matrix + +| Topology | Broker runs on | Host pi reaches it via | Container pi reaches it via | +|---|---|---|---| +| native only | host | AF_UNIX socket | — | +| native + container | host | AF_UNIX socket | SSH-forwarded socket (preferred) or TCP | +| container only | host (started via bridge) | — | SSH-forwarded socket or TCP | + +### Cross-boundary transport options + +**(a) SSH-forwarded UNIX socket over the existing dssh ControlMaster — PREFERRED.** +The container's `setup-lan-access.sh` already establishes a ControlMaster to the +host with `ControlPersist 4h`. The container shim forwards the host broker socket +over that master: + +``` +ssh -F ~/.ssh-local/config \ + -L "$XDG_RUNTIME_DIR/mp.sock:$HOME/.mempalace/broker.sock" host +``` + +then connects to the local forwarded socket. Auth = SSH key; nothing is +LAN-exposed; no extra shared secret needed; rides the persistent master so setup +cost is near-zero. Most portable across non-OrbStack hosts. + +**(b) TCP on `host.docker.internal:PORT` — fallback.** Simpler, but the broker +must bind a routable interface (not just `127.0.0.1`), which requires a +**shared-secret token** to prevent other local/LAN processes from talking to it. +The token is written to `broker.json` in the virtiofs-mounted palace dir +(readable from both sides). More care required to get the bind + auth right. + +## Discovery + on-demand start (the shim's algorithm) + +Run by the shim on every pi session start, so it is correct regardless of who is +already running: + +``` +1. If $MEMPALACE_BROKER is set → use it verbatim (escape hatch). +2. Read $MEMPALACE_PATH/broker.json → endpoint + pid + token. + Try to connect (UNIX if host; forwarded-sock / TCP if container). + If connected & healthy → done. +3. Broker not reachable → START IT: + - On host: flock($MEMPALACE_PATH/broker.lock, non-blocking) + win → exec broker, wait for broker.json, connect. + lose → someone else is starting it; backoff + retry connect. + - In container: run `ssh host 'mempalace-broker --ensure'` (idempotent; + performs the SAME flock election ON THE HOST), then forward + + connect. +4. Last-resort fallback (no broker, cannot start one): + open the palace DIRECTLY — but ONLY after asserting this process is the sole + writer (no other live broker/pid recorded in broker.json). Degrades to + today's behaviour for the genuinely-alone case; never used when a broker + exists. +``` + +**Key trick:** host-side election uses `flock` on the host, where it is coherent +(same kernel) — bulletproof. The cross-boundary case **never relies on cross-VM +locking**; it relies on `ssh host 'broker --ensure'`, which runs the election on +the host where flock works. That is what makes the design topology-independent. + +### Lifecycle + +- Broker writes `broker.json` (endpoint + pid + token) **atomically** after + binding. +- Broker holds `broker.lock` for its entire lifetime → at most one host broker. +- Idle-exit after N minutes with no connected clients; the next client + re-elects. (Or keep-alive; idle-exit is friendlier on resources.) +- Clients reclaim a stale lock if the pid recorded in `broker.json` is dead. +- Clients retry with backoff while a broker is mid-startup. + +## The genuinely hard case + +**Container-only with no SSH bridge configured** (e.g. plain Linux Docker, +`HOST_SSH_USER` unset, no `host.docker.internal`). The container cannot start or +reach a host broker. Options, none free: + +1. **Require the bridge** for multi-writer container setups, and document it as a + precondition. Reasonable: pi-devbox already ships `setup-lan-access.sh` and + the bridge is the supported path. +2. **Run the broker inside the container**, publishing a Docker port the host can + later reach. Works, but inverts ownership and the broker dies with the + container — only acceptable if containers are the *sole* writers on that host. +3. **Accept degraded mode** (algorithm step 4): a lone container with no peers + has no concurrency, so direct access is safe *as long as* nothing else opens + the palace concurrently. The host shim also checks `broker.json` before + opening directly, so a later host pi will not silently start a second + uncoordinated writer. + +**Summary:** fully robust for native-only, native+container, and +container-only-with-bridge. The only residual sharp edge is container-only +*without* a bridge *and* a future concurrent host writer — intrinsic (no shared +coherent lock exists across that boundary), best handled by mandating the bridge +rather than pretending file locks work. + +## Co-existence with opencode / opencode-devbox (DEFERRED — context only) + +The palace is shared by more than pi. opencode (native) and opencode-devbox +(container) also write to the same `~/.mempalace`. **Assumption to verify:** +opencode sessions write to **different wings** than pi sessions (pi uses +`wing_pi`, diaries per-agent, etc.), so cross-tool intermixing into the *same* +destination may be a non-issue at the application level. + +However, the corruption risk here is at the **SQLite-file level, not the wing +level** — two processes writing different wings of the *same* `chroma.sqlite3` +concurrently is still a concurrent write to one file. So the broker, once it +exists, is the right serialization point for opencode too: opencode's mempalace +client would route through the same broker via the same shim mechanism. + +**Decision:** do not design for opencode co-existence yet. Resolve the pi case +first; then revisit whether opencode clients adopt the same shim. The residual +risk in the interim is native + container *opencode* sessions writing the same +palace simultaneously — explicitly deferred ("cross that bridge later"). + +## Open questions / TODO before implementation + +- Does the mempalace engine expose an embeddable entrypoint suitable for running + inside a long-lived broker, or does the broker wrap the existing MCP server + binary and multiplex stdio clients onto it? (Affects whether reads can truly + fan out or are also serialized.) +- Idle-exit timeout default + whether to expose it via env. +- `broker.json` schema + atomic-write + stale-pid-reclaim details. +- TCP-path token handling and safe bind interface selection on Linux Docker + (`--add-host=host.docker.internal:host-gateway`). +- Where the broker binary ships: baked into `Dockerfile.base`? host install via + pi-toolkit / mempalace-toolkit? Both, since both sides need the shim and the + host needs the broker. +- Smoke-test plan: prove single-writer invariant under a deliberate concurrent + host+container write storm (should produce zero `.corrupt`/`.drift` snapshots).