Files
pi-devbox/docs/mempalace-broker-design.md
T

14 KiB

Design: single-writer MemPalace broker (cross-host serialization)

Status: DRAFT / RFC — not yet implemented. Captures the design so it can be picked up later. Authored 2026-06-14. Owner: unassigned. Tracking: queue item #4 ("host-side mempalace-mcp daemon over a UNIX/shared socket").

Problem

The pi-devbox container's ~/.mempalace (/home/developer/.mempalace) is a virtiofs bind-mount of the host's /Users/joakim/.mempalace (verified 2026-06-14 via /proc/mounts: mac /home/developer/.mempalace virtiofs rw). Container pi and host-native pi therefore read and write ONE shared palace — full memory parity already exists; nothing needs to be built to enable sharing.

The actual hazard is the opposite of sharing: concurrency. Two pi processes (one native on the host, one in the container) can open the same chroma.sqlite3 / knowledge_graph.sqlite3 and write at the same time. The palace directory already shows the scars of this:

  • chroma.sqlite3.broken-20260505
  • many *.corrupt-20260528
  • a long run of *.drift-2026*
  • locks/ with mine_palace_*.lock files, including a stale one.

These are mempalace's defensive lock + auto-snapshot/repair machinery firing under concurrent access.

Why a shared lock file is NOT sufficient

The container runs inside a Linux VM (OrbStack / Docker Desktop on macOS); the palace bytes live on the macOS host, surfaced into the VM via virtiofs. Consequences:

  • A UNIX-domain socket file visible at ~/.mempalace/broker.sock inside the container is a host-kernel object. The container's kernel can see the inode but cannot connect to it across the VM boundary.
  • flock / advisory lockfiles are not coherent across the host↔VM boundary. A lock taken on the host is not reliably seen in the container and vice-versa. (The stale mine_palace_*.lock is direct evidence the existing lock scheme is not bulletproof across this boundary.)

Therefore the only trustworthy serialization is to route every write through a single process. That single process is the broker. The design question is not "how do we lock" — it's "where does the one writer live, and how does every pi (host or container) reach it across the VM boundary?"

Goals

  1. Exactly one process opens the palace SQLite files at any time (single writer; concurrent reads are fine).
  2. Works in all three topologies on a given host:
    • native pi only,
    • native pi + container pi,
    • container pi only.
  3. pi configuration is identical in every topology (no per-environment MCP config divergence).
  4. No new corruption pathway introduced; degrade safely when the broker is genuinely unreachable and there are no peers.

Non-goals (for this iteration)

  • opencode / opencode-devbox co-existence (see "Co-existence with opencode" below — deferred until the pi case is solved).
  • Multi-host palace replication. This is about one host's local palace.
  • Changing mempalace's on-disk format or its public MCP tool surface.

Architecture

pi (host)  ─stdio─►  mp-shim ─┐
                              ├─►  mempalace-broker  ─►  chroma.sqlite3
pi (ctr)   ─stdio─►  mp-shim ─┘     (SINGLE owner;        knowledge_graph.sqlite3
                                    serialized writer,    + in-memory HNSW index
                                    concurrent readers)

mempalace-broker

A long-lived process that is the only opener of the palace SQLite files. It:

  • runs the real mempalace engine,
  • holds the HNSW index in memory,
  • pushes all mutations through a single writer queue (reads may fan out),
  • exposes the mempalace MCP JSON-RPC surface over one or more transports,
  • is the canonical owner of palace state for the lifetime of the host session.

Bonus: a single always-resident owner also eliminates the stale-HNSW-index problem that mempalace_reconnect exists to work around — there is never an external writer to desync the in-memory index against.

mp-shim

A tiny stdio↔transport adapter. pi's mempalace MCP config points at the shim everywhere, unchanged. pi still believes it is speaking stdio MCP to a local server; the shim forwards JSON-RPC to the broker over whichever transport is available, and handles all discovery / startup / election complexity. Keeping pi's config identical across topologies is a hard requirement (goal #3) and the shim is what makes it possible.

Canonical owner = the host

The broker's home is always the host, because:

  1. The palace bytes physically live there (/Users/joakim/.mempalace).
  2. The host outlives any container — ownership does not evaporate on docker compose down.
  3. Containers already have a route back to it (host.docker.internal and the verified dssh ControlMaster bridge).

The broker binds two listeners feeding one queue:

  • AF_UNIX at $MEMPALACE_PATH/broker.sock — for host-native pi (fast, filesystem-perms-secured).
  • a cross-boundary transport for container clients (below).

Transport matrix

Topology Broker runs on Host pi reaches it via Container pi reaches it via
native only host AF_UNIX socket
native + container host AF_UNIX socket SSH-forwarded socket (preferred) or TCP
container only host (started via bridge) SSH-forwarded socket or TCP

Cross-boundary transport options

(a) SSH-forwarded UNIX socket over the existing dssh ControlMaster — PREFERRED. The container's setup-lan-access.sh already establishes a ControlMaster to the host with ControlPersist 4h. The container shim forwards the host broker socket over that master:

ssh -F ~/.ssh-local/config \
    -L "$XDG_RUNTIME_DIR/mp.sock:$HOME/.mempalace/broker.sock" host

then connects to the local forwarded socket. Auth = SSH key; nothing is LAN-exposed; no extra shared secret needed; rides the persistent master so setup cost is near-zero. Most portable across non-OrbStack hosts.

(b) TCP on host.docker.internal:PORT — fallback. Simpler, but the broker must bind a routable interface (not just 127.0.0.1), which requires a shared-secret token to prevent other local/LAN processes from talking to it. The token is written to broker.json in the virtiofs-mounted palace dir (readable from both sides). More care required to get the bind + auth right.

Discovery + on-demand start (the shim's algorithm)

Run by the shim on every pi session start, so it is correct regardless of who is already running:

1. If $MEMPALACE_BROKER is set        → use it verbatim (escape hatch).
2. Read $MEMPALACE_PATH/broker.json   → endpoint + pid + token.
   Try to connect (UNIX if host; forwarded-sock / TCP if container).
   If connected & healthy             → done.
3. Broker not reachable → START IT:
   - On host:      flock($MEMPALACE_PATH/broker.lock, non-blocking)
                     win  → exec broker, wait for broker.json, connect.
                     lose → someone else is starting it; backoff + retry connect.
   - In container: run `ssh host 'mempalace-broker --ensure'` (idempotent;
                   performs the SAME flock election ON THE HOST), then forward +
                   connect.
4. Last-resort fallback (no broker, cannot start one):
   open the palace DIRECTLY — but ONLY after asserting this process is the sole
   writer (no other live broker/pid recorded in broker.json). Degrades to
   today's behaviour for the genuinely-alone case; never used when a broker
   exists.

Key trick: host-side election uses flock on the host, where it is coherent (same kernel) — bulletproof. The cross-boundary case never relies on cross-VM locking; it relies on ssh host 'broker --ensure', which runs the election on the host where flock works. That is what makes the design topology-independent.

Lifecycle

  • Broker writes broker.json (endpoint + pid + token) atomically after binding.
  • Broker holds broker.lock for its entire lifetime → at most one host broker.
  • Idle-exit after N minutes with no connected clients; the next client re-elects. (Or keep-alive; idle-exit is friendlier on resources.)
  • Clients reclaim a stale lock if the pid recorded in broker.json is dead.
  • Clients retry with backoff while a broker is mid-startup.

Engine vs. shim — what the image must still ship

The component bundled in the images today is really two separable pieces:

  • the mempalace engine — opens the SQLite files, computes embeddings, owns the HNSW index (the heavy part: chromadb, embedding model, etc.), and
  • the thin client surface pi actually talks to.

In the brokered design these split cleanly:

  • the broker is the only thing that runs the engine;
  • the shim is engine-free — it just forwards MCP JSON-RPC. It needs no chromadb, no embedding model, no heavy deps. Embeddings/search happen broker-side. (Potential image-slimming opportunity, though see below for why we keep the engine bundled anyway.)

Whether the bundled engine is "used as-is" or merely fronted by the broker depends on who owns the broker:

A) Host runs the broker (native, or native+container — the common case). The host's engine is authoritative and used as-is. The broker is purely an intermediate step so writes can't collide; the host engine does the read/write. The container's bundled engine is dormant — the container uses only its shim to reach the host broker. The engine in the image is not needed for this path.

B) Container lands on a host with no mempalace (fresh-host case). The bundled engine earns its keep — you cannot conjure an engine onto the host without installing one. Either the container runs the broker itself (in-container ownership, bundled engine used as-is) or it falls back to degraded direct mode (single writer, bundled engine used directly).

Decision: keep shipping the engine in the images — but for three specific reasons, not because the brokered path needs it:

  1. Self-containedness — pi-devbox's promise is "works on any host." A container with no memory unless the host pre-installed mempalace breaks that, especially for the Docker Hub audience.
  2. Fresh-host bootstrap (case B) — no host engine to borrow.
  3. Degraded fallback — the no-broker-reachable path opens the DB locally and needs the engine present.

In the host-managed common case the bundled engine is just dormant insurance; the shim is the only piece the container actively uses.

Version-coherence note

Because only the broker's engine ever writes, its version defines the on-disk format. Host-vs-bundled engine version skew is therefore harmless in the brokered path (only one engine ever touches the bytes). Skew only bites in degraded direct mode, where the container writes with a possibly-different engine version than the host would. This argues for the broker pinning/owning the authoritative engine version and treating the bundled engine as fallback-only.

Partially resolves the "where the broker binary ships" open question below: the shim must ship on both sides; the engine must ship on the host (to run the broker) and stays bundled in the image as fallback/bootstrap insurance, not as the authoritative writer in the common case.

The genuinely hard case

Container-only with no SSH bridge configured (e.g. plain Linux Docker, HOST_SSH_USER unset, no host.docker.internal). The container cannot start or reach a host broker. Options, none free:

  1. Require the bridge for multi-writer container setups, and document it as a precondition. Reasonable: pi-devbox already ships setup-lan-access.sh and the bridge is the supported path.
  2. Run the broker inside the container, publishing a Docker port the host can later reach. Works, but inverts ownership and the broker dies with the container — only acceptable if containers are the sole writers on that host.
  3. Accept degraded mode (algorithm step 4): a lone container with no peers has no concurrency, so direct access is safe as long as nothing else opens the palace concurrently. The host shim also checks broker.json before opening directly, so a later host pi will not silently start a second uncoordinated writer.

Summary: fully robust for native-only, native+container, and container-only-with-bridge. The only residual sharp edge is container-only without a bridge and a future concurrent host writer — intrinsic (no shared coherent lock exists across that boundary), best handled by mandating the bridge rather than pretending file locks work.

Co-existence with opencode / opencode-devbox (DEFERRED — context only)

The palace is shared by more than pi. opencode (native) and opencode-devbox (container) also write to the same ~/.mempalace. Assumption to verify: opencode sessions write to different wings than pi sessions (pi uses wing_pi, diaries per-agent, etc.), so cross-tool intermixing into the same destination may be a non-issue at the application level.

However, the corruption risk here is at the SQLite-file level, not the wing level — two processes writing different wings of the same chroma.sqlite3 concurrently is still a concurrent write to one file. So the broker, once it exists, is the right serialization point for opencode too: opencode's mempalace client would route through the same broker via the same shim mechanism.

Decision: do not design for opencode co-existence yet. Resolve the pi case first; then revisit whether opencode clients adopt the same shim. The residual risk in the interim is native + container opencode sessions writing the same palace simultaneously — explicitly deferred ("cross that bridge later").

Open questions / TODO before implementation

  • Does the mempalace engine expose an embeddable entrypoint suitable for running inside a long-lived broker, or does the broker wrap the existing MCP server binary and multiplex stdio clients onto it? (Affects whether reads can truly fan out or are also serialized.)
  • Idle-exit timeout default + whether to expose it via env.
  • broker.json schema + atomic-write + stale-pid-reclaim details.
  • TCP-path token handling and safe bind interface selection on Linux Docker (--add-host=host.docker.internal:host-gateway).
  • Where the broker binary ships: baked into Dockerfile.base? host install via pi-toolkit / mempalace-toolkit? Both, since both sides need the shim and the host needs the broker.
  • Smoke-test plan: prove single-writer invariant under a deliberate concurrent host+container write storm (should produce zero .corrupt/.drift snapshots).