10 Commits

Author SHA1 Message Date
pi b17dc1fa1f docs: add single-writer MemPalace broker design (RFC, queue #4) 2026-06-14 18:06:47 +02:00
pi 3eec9bc23c docs: correct mempalace anyOf workaround watch-target (PR #1735 is dead)
Parity with opencode-devbox: PR #1735 (the diary_write root-anyOf fix) was
closed UNMERGED on 2026-06-11, so the old "remove once PR #1735 ships" TODO
pointed at a dead PR. Issue #1728 is still open; PR #1717 is the current live
candidate; mempalace PyPI latest is still 3.4.0 (== our pin), so the
workaround stays.

- Dockerfile.base: rewrite the upstream-tracking comment + TODO (#1735 dead,
  watch #1717, removal trigger = a PyPI release > 3.4.0 stripping root anyOf).
- CHANGELOG: Unreleased Docs entry.

Docs-only; no behavior change.
2026-06-14 15:52:34 +02:00
pi 4744f05232 ci: CI-resolve mempalace-toolkit to a pinned SHA
mempalace-toolkit is the only companion cloned in Dockerfile.base (all
others live in Dockerfile.variant), so it bypassed the resolve-versions ->
build-arg plumbing and its ref stayed a literal `main`. Because the base
only rebuilds on a content hash of Dockerfile.base + rootfs/* + entrypoints,
a toolkit-only fix would silently fail to land unless Dockerfile.base itself
changed (as it incidentally did in v1.1.1).

Changes:
- resolve-versions: new mempalace_toolkit_ref output (gitea commits API,
  mirrors pi-toolkit resolution; jq '.[0].sha // "main"' fallback).
- base-decide: needs resolve-versions; fold the resolved SHA into the
  base-tag hash so a moved toolkit forces a base rebuild automatically.
- build-base: needs resolve-versions; pass --build-arg MEMPALACE_TOOLKIT_REF.
- Dockerfile.base: switch clone from `git clone --branch` to a SHA-capable
  `git fetch <ref> + checkout FETCH_HEAD` (the --branch <SHA> footgun
  already fixed in Dockerfile.variant, run 374).

base_tag now reflects a live gitea lookup; on API blip it falls back to
`main`, triggering one extra rebuild, never a missed one.

No new tag — lands on the next v* release or workflow_dispatch.
2026-06-14 15:11:22 +02:00
pi 314c3767a8 release: v1.1.1 — pi 0.79.3 + mempalace-mcp hang fix
Publish Docker Image / base-decide (push) Successful in 9s
Publish Docker Image / resolve-versions (push) Successful in 12s
Publish Docker Image / build-base (push) Successful in 33m29s
Publish Docker Image / smoke (push) Successful in 3m20s
Publish Docker Image / smoke-studio (push) Successful in 3m38s
Publish Docker Image / build-variant (push) Successful in 15m27s
Publish Docker Image / promote-base-latest (push) Successful in 8s
Publish Docker Image / update-description (push) Successful in 11s
Publish Docker Image / build-variant-studio (push) Successful in 16m51s
2026-06-13 23:59:25 +02:00
pi 05e88c5c75 fix: mempalace-mcp uninterruptible hang resolved via toolkit ext timeout
The per-request timeout + stall-kill landed in mempalace-toolkit's
mempalace.ts pi extension (commit a3b8829), which the base clones at
build via MEMPALACE_TOOLKIT_REF=main. A base rebuild picks it up.

- CHANGELOG: move from 'Known issues' to 'Fixed'; document the env knobs
  (MEMPALACE_MCP_TIMEOUT_MS / MEMPALACE_MCP_INIT_TIMEOUT_MS) and why the
  standalone stdio-watchdog shim was dropped.
- Dockerfile.base: replace the TODO with a note pointing at the fix.
2026-06-13 23:49:36 +02:00
pi 7f67c36a1c docs: capture mempalace-mcp uninterruptible-hang diagnosis (2026-06-13)
Symptom: pi TUI blocks on a mempalace tool call, ESC does not abort.
Initial WAL-contention hypothesis ruled out (no other writer running).
Likely cause: virtiofs cold open of chroma.sqlite3 stalls the JSON-RPC
initialize handshake; pi has no per-call MCP timeout.

Recovery today: docker exec <ctr> pkill -9 -f mempalace-mcp, restart pi.

Planned fix (deferred until after opencode-devbox pi removal): stdio
watchdog shim with per-REQUEST timeout. A naive process-lifetime
timeout wrapper is wrong because mempalace-mcp is long-lived.

Sharing the palace across harnesses remains the goal.
2026-06-13 16:18:45 +02:00
pi ab5ff8ec56 feat: bundle dot-watch helper for live graphviz .dot -> PNG re-render in Studio
pi-studio renders Mermaid natively but has no DOT renderer. Its markdown
preview displays local PNG/JPG/GIF/WEBP images, so dot-watch closes the
loop for Graphviz: edit .dot -> auto-render <name>.png -> Studio
refresh-from-disk shows the update. Uses mtime polling (no inotify dep).

- rootfs/usr/local/bin/dot-watch: the helper (executable)
- Dockerfile.base: COPY + chmod, following the studio-expose pattern
- README.md: 'Graphviz diagrams in Studio' subsection
- CHANGELOG.md: Unreleased entry

graphviz was already in the base image; no new package.
2026-06-11 16:25:27 +02:00
pi 421558477d docs(studio): add commented studio ports + STUDIO_EXPOSE to basic-shape compose 2026-06-11 13:20:44 +02:00
pi b655faab9f docs(studio): render network-hop figure as mermaid flowchart 2026-06-11 11:25:23 +02:00
pi 3b0335f34e docs(studio): clarify studio-expose foreground + token, add remote/mosh end-to-end recipe 2026-06-11 11:23:15 +02:00
6 changed files with 534 additions and 9 deletions
+20 -1
View File
@@ -47,6 +47,7 @@ env:
jobs:
# ── Phase 1: decide whether base needs rebuilding ──────────────────
base-decide:
needs: [resolve-versions]
runs-on: ubuntu-latest
container:
image: catthehacker/ubuntu:act-latest
@@ -75,6 +76,10 @@ jobs:
! -name '._*' \
-print0 2>/dev/null | sort -z | xargs -0 cat 2>/dev/null
cat entrypoint.sh entrypoint-user.sh
# mempalace-toolkit is cloned in Dockerfile.base at a ref CI
# resolves to a SHA; fold it in so base_tag changes when the
# toolkit moves (otherwise a toolkit-only fix never lands).
echo "${{ needs.resolve-versions.outputs.mempalace_toolkit_ref }}"
} | sha256sum | cut -c1-12
)
BASE_TAG="base-${HASH}"
@@ -117,6 +122,7 @@ jobs:
toolkit_ref: ${{ steps.resolve.outputs.toolkit_ref }}
extensions_ref: ${{ steps.resolve.outputs.extensions_ref }}
studio_ref: ${{ steps.resolve.outputs.studio_ref }}
mempalace_toolkit_ref: ${{ steps.resolve.outputs.mempalace_toolkit_ref }}
steps:
- name: Resolve pi version + companion refs
id: resolve
@@ -151,6 +157,16 @@ jobs:
[ -n "$EXTENSIONS_REF" ] || EXTENSIONS_REF=main
echo "toolkit_ref=${TOOLKIT_REF}" >> "$GITHUB_OUTPUT"
echo "extensions_ref=${EXTENSIONS_REF}" >> "$GITHUB_OUTPUT"
# Resolve mempalace-toolkit main HEAD to a SHA. UNLIKE the others,
# mempalace-toolkit is cloned in Dockerfile.base, so this SHA is
# ALSO folded into the base-decide hash to force a base rebuild
# when the toolkit moves (without it, a toolkit-only fix silently
# fails to land unless Dockerfile.base itself changes).
MEMPALACE_TOOLKIT_REF=$(curl -sf -H "Authorization: token ${GITEA_BUILD_TOKEN:-${GITHUB_TOKEN:-}}" \
"https://gitea.jordbo.se/api/v1/repos/joakimp/mempalace-toolkit/commits?limit=1&sha=main" \
| jq -r '.[0].sha // "main"' 2>/dev/null || echo "main")
[ -n "$MEMPALACE_TOOLKIT_REF" ] || MEMPALACE_TOOLKIT_REF=main
echo "mempalace_toolkit_ref=${MEMPALACE_TOOLKIT_REF}" >> "$GITHUB_OUTPUT"
# Resolve pi-studio (omaclaren/pi-studio) main HEAD to a SHA for
# the :latest-studio variant — same cache-busting rationale.
STUDIO_REF=$(curl -sf -H "Accept: application/vnd.github.sha" \
@@ -161,10 +177,11 @@ jobs:
echo "Resolved PI_FORK_REF=${FORK_REF}, PI_OBSMEM_REF=${OBSMEM_REF}"
echo "Resolved PI_TOOLKIT_REF=${TOOLKIT_REF}, PI_EXTENSIONS_REF=${EXTENSIONS_REF}"
echo "Resolved PI_STUDIO_REF=${STUDIO_REF}"
echo "Resolved MEMPALACE_TOOLKIT_REF=${MEMPALACE_TOOLKIT_REF}"
# ── Phase 2: build & push base (multi-arch), only when needed ──────
build-base:
needs: [base-decide]
needs: [base-decide, resolve-versions]
if: needs.base-decide.outputs.need_build == 'true'
runs-on: ubuntu-latest
container:
@@ -206,6 +223,7 @@ jobs:
shell: bash
env:
BASE_TAG_FULL: ${{ env.IMAGE }}:${{ needs.base-decide.outputs.base_tag }}
MEMPALACE_TOOLKIT_REF: ${{ needs.resolve-versions.outputs.mempalace_toolkit_ref }}
run: |
set -euo pipefail
# 3-attempt retry around `docker buildx build --push` for transient
@@ -219,6 +237,7 @@ jobs:
if docker buildx build \
--platform linux/amd64,linux/arm64 \
--file Dockerfile.base \
--build-arg MEMPALACE_TOOLKIT_REF="${MEMPALACE_TOOLKIT_REF}" \
--push \
--tag "${BASE_TAG_FULL}" \
.; then
+94
View File
@@ -13,6 +13,100 @@ Pre-v1.0.0 tags followed the pi npm version (`v{pi_version}[letter]`).
## Unreleased
### Changed
- **`mempalace-toolkit` is now CI-resolved to a commit SHA**, closing a
silent-staleness footgun. It is the only companion cloned in
`Dockerfile.base` (all others are cloned in `Dockerfile.variant`), so it
was never run through the `resolve-versions` → build-arg plumbing. Its
ref stayed a literal `main`, and because the base only rebuilds when the
hash of `Dockerfile.base + rootfs/* + entrypoints` changes, a
toolkit-only fix would *not* land in the image unless `Dockerfile.base`
itself happened to change (as it did, incidentally, in v1.1.1).
Now `resolve-versions` resolves `mempalace-toolkit` `main` HEAD to a SHA
(new `mempalace_toolkit_ref` output), `base-decide` folds that SHA into
the base-tag hash (so a moved toolkit forces a base rebuild), and
`build-base` passes it as `--build-arg MEMPALACE_TOOLKIT_REF`. The base
clone switched from `git clone --branch` to a SHA-capable
`git fetch <ref> + checkout FETCH_HEAD` (the `--branch <40-char-SHA>`
footgun previously fixed in `Dockerfile.variant`, run 374).
Note: `base-decide` now depends on `resolve-versions`, so the base tag
reflects a live gitea API lookup. On an API blip it falls back to `main`
— which hashes differently than a SHA and triggers one *extra* rebuild,
never a *missed* one (fail-toward-rebuild).
### Docs (no image change)
- Correct the MemPalace `diary_write` anyOf workaround watch-target in
`Dockerfile.base`: upstream PR #1735 was **closed unmerged** (2026-06-11),
so the old “remove once #1735 ships” TODO pointed at a dead PR. Issue #1728
is still open; PR #1717 is the current live candidate; mempalace PyPI latest
is still 3.4.0 (== our pin), so the workaround stays. Removal trigger is now
a PyPI release > 3.4.0 that actually strips the root-level anyOf.
---
## v1.1.1 — 2026-06-13
Patch release: pi `0.79.1``0.79.3` (auto-resolved at build) plus the
mempalace-mcp hang fix below.
### Fixed
- **`mempalace-mcp` no longer hangs the pi TUI uninterruptibly.** When
the palace is bind-mounted from the macOS host (OrbStack virtiofs) and
the container opened a large `chroma.sqlite3` for the first time, a
cold storage open / HNSW load could stall the server before it emitted
its JSON-RPC response. The awaiting promise then hung forever and the
TUI froze — ESC cancels the LLM stream, not a pending MCP tool call, so
there was no way out short of `docker exec <container> pkill -9 -f
mempalace-mcp` and restarting pi.
The fix lives in the `mempalace.ts` pi extension shipped by
**mempalace-toolkit** (cloned into the base at build time via
`MEMPALACE_TOOLKIT_REF`, default `main`): the JSON-RPC client now arms
a **per-request** timeout. On expiry it rejects the request *and* kills
the stalled child (SIGTERM→SIGKILL), so pi surfaces an error instead of
hanging; the bridge then marks itself unavailable so subsequent calls
fail fast (restart pi to retry). This is deliberately per-REQUEST, not
a process-lifetime `timeout 60 mempalace-mcp` wrapper — the long-lived
server is only killed when a request genuinely stalls.
Tunables (env): `MEMPALACE_MCP_TIMEOUT_MS` (tool-call timeout, default
`60000`), `MEMPALACE_MCP_INIT_TIMEOUT_MS` (initialize/tools-list
handshake, default `120000`); set either to `0` to disable. Requires a
base rebuild to pull the updated extension. The earlier plan of a
standalone Python stdio-watchdog shim was dropped: the extension
already owns request/response correlation, so a separate
framing-reparsing shim is unnecessary.
Still open (out of scope here): sharing one palace across harnesses
ideally wants a single host-side `mempalace-mcp` daemon multiplexing
stdio over a UNIX socket, so all clients share one writer on native
APFS rather than each cold-opening over virtiofs.
`mempalace-mcp` that applies a per-request timeout and kills the child
on stall, **without** killing the long-lived server itself (a naive
`timeout 60 mempalace-mcp` wrapper is wrong — it kills the server
mid-session). Sharing the palace across harnesses (native pi, container
pi, opencode) remains the goal — isolated palaces defeat the point.
Longer term: run a single mempalace-mcp daemon on the host and
multiplex stdio over a UNIX socket so all clients share one writer on
native APFS.
### Added
- **`dot-watch` helper** (`/usr/local/bin/dot-watch`) — auto-rerenders a
Graphviz `.dot` file to PNG on every save via mtime polling (no
`inotify` dependency). pi-studio renders Mermaid natively but has no
DOT renderer; since its markdown preview displays local PNG/JPG/GIF/WEBP
images, this closes the loop for Graphviz: edit `.dot``dot-watch`
regenerates `<name>.png` → Studio *refresh-from-disk* shows the update.
`graphviz` was already in the base image, so no new package. Baked into
`Dockerfile.base` following the `studio-expose` pattern; documented in
the README Studio section.
## v1.1.0 — 2026-06-10
### Added — `:latest-studio` variant
+41 -7
View File
@@ -48,6 +48,8 @@ ENV DEBIAN_FRONTEND=noninteractive
# preview/export pipelines and broadly useful for any
# agent-driven document workflow. ~200 MB.
# graphviz — `dot` rendering for many diagram tools. ~10 MB.
# See the bundled `dot-watch` helper for live .dot -> PNG
# re-render (handy with pi-studio's image preview).
# imagemagick — image conversion / resizing for thumbnails, etc. ~50 MB.
# yq — YAML-aware companion to jq.
# socat — TCP relay. Powers `studio-expose`, which bridges
@@ -277,6 +279,16 @@ RUN ARCH=$(case "${TARGETARCH}" in amd64) echo "x86_64" ;; arm64) echo "aarch64"
# Provides semantic search over conversation history via 29 MCP tools.
# Always installed in the base. Set INSTALL_MEMPALACE=false at base-build
# time to shave ~300 MB.
#
# Stall protection (fixed 2026-06-13): mempalace-mcp is launched by the
# `mempalace.ts` pi extension from mempalace-toolkit (cloned below). That
# extension now applies a per-REQUEST timeout in its JSON-RPC client and
# kills the child on stall, so a virtiofs cold-open of chroma.sqlite3 /
# HNSW load can no longer hang the pi TUI uninterruptibly. Tunables:
# MEMPALACE_MCP_TIMEOUT_MS (default 60000), MEMPALACE_MCP_INIT_TIMEOUT_MS
# (default 120000); 0 disables. A standalone stdio-watchdog shim is NOT
# needed — the extension already owns request/response correlation. See
# CHANGELOG.md "Unreleased > Fixed".
ARG INSTALL_MEMPALACE=true
# Pin to a known-good version. Bump deliberately, not implicitly: an
# unpinned install silently swept in mempalace 3.3.x/3.4.0 with a broken
@@ -304,12 +316,18 @@ RUN if [ "${INSTALL_MEMPALACE}" = "true" ]; then \
# kwarg alias so existing callers still work.
#
# Idempotent and self-deactivating: once upstream releases the fix the
# regex no longer matches and this RUN is a silent no-op.
# Upstream tracking:
# regex no longer matches (and the WARN below fires) — that's the signal
# to delete this RUN.
# Upstream status (last checked 2026-06-14):
# issue #1728 — STILL OPEN (root-level anyOf rejected by Anthropic/Codex)
# PR #1735 — CLOSED UNMERGED 2026-06-11; do NOT watch it (dead)
# PR #1717 — open; the current live fix candidate to watch
# mempalace PyPI latest = 3.4.0 (== our pin) → no release contains the fix yet
# https://github.com/MemPalace/mempalace/issues/1728
# https://github.com/MemPalace/mempalace/pull/1735
# TODO: remove this RUN once a mempalace release containing PR #1735 is on
# PyPI and installed by the line above.
# https://github.com/MemPalace/mempalace/pull/1717
# TODO: remove this RUN once a mempalace release > 3.4.0 that actually strips
# the root-level anyOf ships on PyPI and is installed by the line above.
# Keep MEMPALACE_VERSION in lockstep with opencode-devbox when bumping.
RUN if [ "${INSTALL_MEMPALACE}" = "true" ]; then \
MP_FILE="$(find /opt/uv-tools/mempalace -path '*/mempalace/mcp_server.py' | head -n1)" && \
if [ -z "$MP_FILE" ]; then echo "mempalace mcp_server.py not found" >&2; exit 1; fi && \
@@ -324,9 +342,23 @@ RUN if [ "${INSTALL_MEMPALACE}" = "true" ]; then \
# ── mempalace-toolkit — bash wrappers for session/docs mining ────────
ARG INSTALL_MEMPALACE_TOOLKIT=true
ARG MEMPALACE_TOOLKIT_REF=main
# MEMPALACE_TOOLKIT_REF accepts EITHER a branch name OR a commit SHA. CI
# resolves it to a SHA (resolve-versions job) and folds that SHA into the
# base-decide hash so the base rebuilds when the toolkit moves. `git clone
# --branch <40-char-SHA>` fails ("Remote branch not found") — the same
# footgun fixed in Dockerfile.variant (v1.0.0-rerun, run 374) — so use
# `git fetch <ref> + checkout FETCH_HEAD`, which works for name and SHA.
RUN if [ "${INSTALL_MEMPALACE}" = "true" ] && [ "${INSTALL_MEMPALACE_TOOLKIT}" = "true" ]; then \
git clone --depth 1 --branch "${MEMPALACE_TOOLKIT_REF}" \
https://gitea.jordbo.se/joakimp/mempalace-toolkit.git /opt/mempalace-toolkit && \
rm -rf /opt/mempalace-toolkit && mkdir -p /opt/mempalace-toolkit && \
git -C /opt/mempalace-toolkit init -q && \
git -C /opt/mempalace-toolkit remote add origin https://gitea.jordbo.se/joakimp/mempalace-toolkit.git && \
ok=0; for i in 1 2 3 4 5; do \
if git -C /opt/mempalace-toolkit fetch --depth 1 origin "${MEMPALACE_TOOLKIT_REF}" && \
git -C /opt/mempalace-toolkit checkout -q FETCH_HEAD; then ok=1; break; fi; \
echo "git fetch mempalace-toolkit@${MEMPALACE_TOOLKIT_REF} failed (attempt $i/5), retrying in $((i*5))s..."; \
sleep $((i*5)); \
done; \
[ "$ok" = "1" ] && \
ln -sf /opt/mempalace-toolkit/bin/mempalace-session /usr/local/bin/mempalace-session && \
ln -sf /opt/mempalace-toolkit/bin/mempalace-docs /usr/local/bin/mempalace-docs && \
chmod +x /opt/mempalace-toolkit/bin/mempalace-session /opt/mempalace-toolkit/bin/mempalace-docs && \
@@ -436,10 +468,12 @@ COPY rootfs/home/developer/.inputrc /etc/skel-devbox/.inputrc
# ── Entrypoint ────────────────────────────────────────────────────────
COPY rootfs/usr/local/lib/pi-devbox/ /usr/local/lib/pi-devbox/
COPY rootfs/usr/local/bin/studio-expose /usr/local/bin/studio-expose
COPY rootfs/usr/local/bin/dot-watch /usr/local/bin/dot-watch
COPY entrypoint.sh /usr/local/bin/entrypoint.sh
COPY entrypoint-user.sh /usr/local/bin/entrypoint-user.sh
RUN chmod +x /usr/local/bin/entrypoint.sh /usr/local/bin/entrypoint-user.sh \
/usr/local/bin/studio-expose \
/usr/local/bin/dot-watch \
/usr/local/lib/pi-devbox/*.sh 2>/dev/null || true
# Start as root — entrypoint adjusts UID/GID then drops to developer
+77 -1
View File
@@ -199,9 +199,15 @@ With `STUDIO_EXPOSE=1`, the entrypoint starts the bridge for you; just run
(leave `STUDIO_EXPOSE` unset), run `studio-expose` in a container shell:
```bash
studio-expose # bridges $STUDIO_PORT (default 8765); --help for details
studio-expose & # bridges $STUDIO_PORT (default 8765); --help for details
```
> **`studio-expose` runs in the foreground** (it's a `socat` relay) — it
> blocks the shell until Ctrl-C. Background it with `&` or run it in its
> own tmux pane. It only relays traffic; it does **not** print a token.
> The lines it prints ending in `...token=...` are literal help text, not
> a truncated URL — the real token comes from `/studio` (see below).
> **Security:** the bridge intentionally exposes Studio beyond loopback;
> its tokenized URL is the only auth. Keep the host-side publish on
> `127.0.0.1:` and use `ssh -L` for remote access. Default is **off**.
@@ -221,10 +227,75 @@ tunnel alongside mosh (mosh for the shell, ssh for the port), or reach the
host's published port directly over a trusted network (LAN / Tailscale /
WireGuard).
#### End-to-end recipe: remote host, mosh shell, `studio-expose` bridge
The full path has four network hops, each added by one step:
```mermaid
flowchart LR
browser["laptop browser"]
host["host :8765"]
eth0["container eth0 :8765"]
loop["container 127.0.0.1 :8765"]
studio["pi-studio"]
browser -->|"ssh -L"| host
host -->|"docker -p"| eth0
eth0 -->|"studio-expose (socat)"| loop
studio -->|"binds"| loop
```
Assuming the compose file publishes `127.0.0.1:8765:8765` (see method B):
1. **In a container shell** — start the bridge (skip if `STUDIO_EXPOSE=1`
is set in compose, which auto-starts it):
```bash
studio-expose &
```
2. **In your pi session** (the pi TUI in the container) — start Studio and
print the tokenized URL. `/studio` is a slash command you type in the
TUI, not a shell command:
```
/studio --no-browser --port 8765
/studio --status # reprint the URL anytime
```
Copy the `http://…:8765/?token=<token>` it prints. **This** is where
the real token comes from — not `studio-expose`.
3. **On your laptop** — open the ssh port-forward alongside mosh:
```bash
ssh -L 8765:127.0.0.1:8765 user@docker-host
```
4. **In your laptop browser** — open `http://127.0.0.1:8765/?token=<token>`
(keep the port and token verbatim; only the host part is `127.0.0.1`).
> **Order check:** nothing listens on the container's `127.0.0.1:8765`
> until step 2 runs. If the browser can't connect, verify Studio is up
> (`/studio --status`) and the bridge is running (`ps aux | grep socat`).
> PDF export (`/studio-pdf`, `studio_export_pdf`) needs a LaTeX engine,
> which is **not** in `-studio` (only the planned `-studio-tex`). HTML
> export, KaTeX, Mermaid, and all REPL features work without it.
### Graphviz diagrams in Studio: `dot-watch`
pi-studio renders **Mermaid** natively but has **no Graphviz/DOT renderer**.
Its markdown preview *does* render local image links (`.png`/`.jpg`/`.gif`/
`.webp`), so the workflow for Graphviz is: write a `.dot` file, render it to
PNG with `dot`, and preview the PNG (directly, or embedded in a markdown
file). The bundled **`dot-watch`** helper automates the re-render so edits
show up on Studio's *refresh-from-disk*:
```bash
dot-watch graph.dot # dot engine, 150 dpi -> graph.png
dot-watch graph.dot neato 200 # pick layout engine + dpi
```
It polls the file's mtime (no `inotify` dependency) and regenerates
`<name>.png` on every save, printing timestamped status and indenting any
DOT syntax errors instead of crashing. Then in Studio: open the PNG (or a
`.md` that embeds it) and hit **refresh-from-disk** after each edit.
Note: SVG is **not** in Studio's local-image-link allowlist — use PNG.
## docker-compose.yml — basic shape
```yaml
@@ -236,10 +307,15 @@ services:
container_name: pi-devbox
stdin_open: true
tty: true
# pi-studio (only on `-studio` images): publish loopback + enable the
# socat bridge so the browser UI is reachable. See "Using pi-studio".
# ports:
# - "127.0.0.1:8765:8765" # host-localhost only; use ssh -L for remote
env_file:
- .env
environment:
- TERM=xterm-256color
# - STUDIO_EXPOSE=1 # -studio only: auto-start the socat bridge on boot
- GITEA_ACCESS_TOKEN=${GITEA_ACCESS_TOKEN:-}
- GITEA_HOST=${GITEA_HOST:-}
- GITHUB_PERSONAL_ACCESS_TOKEN=${GITHUB_PERSONAL_ACCESS_TOKEN:-}
+243
View File
@@ -0,0 +1,243 @@
# Design: single-writer MemPalace broker (cross-host serialization)
> **Status:** DRAFT / RFC — not yet implemented. Captures the design so it can be
> picked up later. Authored 2026-06-14.
> **Owner:** unassigned. **Tracking:** queue item #4 ("host-side mempalace-mcp
> daemon over a UNIX/shared socket").
## Problem
The pi-devbox container's `~/.mempalace` (`/home/developer/.mempalace`) is a
**virtiofs bind-mount of the host's `/Users/joakim/.mempalace`** (verified
2026-06-14 via `/proc/mounts`: `mac /home/developer/.mempalace virtiofs rw`).
Container pi and host-native pi therefore **read and write ONE shared palace**
full memory parity already exists; nothing needs to be built to *enable* sharing.
The actual hazard is the opposite of sharing: **concurrency**. Two pi processes
(one native on the host, one in the container) can open the same
`chroma.sqlite3` / `knowledge_graph.sqlite3` and write at the same time. The
palace directory already shows the scars of this:
- `chroma.sqlite3.broken-20260505`
- many `*.corrupt-20260528`
- a long run of `*.drift-2026*`
- `locks/` with `mine_palace_*.lock` files, including a **stale** one.
These are mempalace's defensive lock + auto-snapshot/repair machinery firing
under concurrent access.
### Why a shared lock file is NOT sufficient
The container runs inside a Linux VM (OrbStack / Docker Desktop on macOS); the
palace bytes live on the macOS host, surfaced into the VM via virtiofs.
Consequences:
- A **UNIX-domain socket file** visible at `~/.mempalace/broker.sock` inside the
container is a *host-kernel* object. The container's kernel can see the inode
but **cannot connect to it** across the VM boundary.
- **flock / advisory lockfiles are not coherent across the host↔VM boundary.**
A lock taken on the host is not reliably seen in the container and vice-versa.
(The stale `mine_palace_*.lock` is direct evidence the existing lock scheme is
not bulletproof across this boundary.)
**Therefore the only trustworthy serialization is to route every write through a
single process.** That single process is the broker. The design question is *not*
"how do we lock" — it's "**where does the one writer live, and how does every pi
(host or container) reach it across the VM boundary?**"
## Goals
1. Exactly one process opens the palace SQLite files at any time (single writer;
concurrent reads are fine).
2. Works in all three topologies on a given host:
- native pi only,
- native pi + container pi,
- container pi only.
3. pi configuration is **identical** in every topology (no per-environment MCP
config divergence).
4. No new corruption pathway introduced; degrade safely when the broker is
genuinely unreachable and there are no peers.
### Non-goals (for this iteration)
- opencode / opencode-devbox co-existence (see "Co-existence with opencode"
below — deferred until the pi case is solved).
- Multi-host palace replication. This is about one host's local palace.
- Changing mempalace's on-disk format or its public MCP tool surface.
## Architecture
```
pi (host) ─stdio─► mp-shim ─┐
├─► mempalace-broker ─► chroma.sqlite3
pi (ctr) ─stdio─► mp-shim ─┘ (SINGLE owner; knowledge_graph.sqlite3
serialized writer, + in-memory HNSW index
concurrent readers)
```
### `mempalace-broker`
A long-lived process that is the **only** opener of the palace SQLite files. It:
- runs the real mempalace engine,
- holds the HNSW index in memory,
- pushes all mutations through a single writer queue (reads may fan out),
- exposes the mempalace MCP JSON-RPC surface over one or more transports,
- is the canonical owner of palace state for the lifetime of the host session.
**Bonus:** a single always-resident owner also eliminates the stale-HNSW-index
problem that `mempalace_reconnect` exists to work around — there is never an
external writer to desync the in-memory index against.
### `mp-shim`
A tiny stdio↔transport adapter. pi's mempalace MCP config points at the shim
**everywhere, unchanged**. pi still believes it is speaking stdio MCP to a local
server; the shim forwards JSON-RPC to the broker over whichever transport is
available, and handles all discovery / startup / election complexity. Keeping
pi's config identical across topologies is a hard requirement (goal #3) and the
shim is what makes it possible.
## Canonical owner = the host
The broker's home is **always the host**, because:
1. The palace bytes physically live there (`/Users/joakim/.mempalace`).
2. The host outlives any container — ownership does not evaporate on
`docker compose down`.
3. Containers already have a route back to it (`host.docker.internal` and the
verified dssh ControlMaster bridge).
The broker binds **two listeners feeding one queue**:
- **AF_UNIX** at `$MEMPALACE_PATH/broker.sock` — for host-native pi (fast,
filesystem-perms-secured).
- a **cross-boundary** transport for container clients (below).
## Transport matrix
| Topology | Broker runs on | Host pi reaches it via | Container pi reaches it via |
|---|---|---|---|
| native only | host | AF_UNIX socket | — |
| native + container | host | AF_UNIX socket | SSH-forwarded socket (preferred) or TCP |
| container only | host (started via bridge) | — | SSH-forwarded socket or TCP |
### Cross-boundary transport options
**(a) SSH-forwarded UNIX socket over the existing dssh ControlMaster — PREFERRED.**
The container's `setup-lan-access.sh` already establishes a ControlMaster to the
host with `ControlPersist 4h`. The container shim forwards the host broker socket
over that master:
```
ssh -F ~/.ssh-local/config \
-L "$XDG_RUNTIME_DIR/mp.sock:$HOME/.mempalace/broker.sock" host
```
then connects to the local forwarded socket. Auth = SSH key; nothing is
LAN-exposed; no extra shared secret needed; rides the persistent master so setup
cost is near-zero. Most portable across non-OrbStack hosts.
**(b) TCP on `host.docker.internal:PORT` — fallback.** Simpler, but the broker
must bind a routable interface (not just `127.0.0.1`), which requires a
**shared-secret token** to prevent other local/LAN processes from talking to it.
The token is written to `broker.json` in the virtiofs-mounted palace dir
(readable from both sides). More care required to get the bind + auth right.
## Discovery + on-demand start (the shim's algorithm)
Run by the shim on every pi session start, so it is correct regardless of who is
already running:
```
1. If $MEMPALACE_BROKER is set → use it verbatim (escape hatch).
2. Read $MEMPALACE_PATH/broker.json → endpoint + pid + token.
Try to connect (UNIX if host; forwarded-sock / TCP if container).
If connected & healthy → done.
3. Broker not reachable → START IT:
- On host: flock($MEMPALACE_PATH/broker.lock, non-blocking)
win → exec broker, wait for broker.json, connect.
lose → someone else is starting it; backoff + retry connect.
- In container: run `ssh host 'mempalace-broker --ensure'` (idempotent;
performs the SAME flock election ON THE HOST), then forward +
connect.
4. Last-resort fallback (no broker, cannot start one):
open the palace DIRECTLY — but ONLY after asserting this process is the sole
writer (no other live broker/pid recorded in broker.json). Degrades to
today's behaviour for the genuinely-alone case; never used when a broker
exists.
```
**Key trick:** host-side election uses `flock` on the host, where it is coherent
(same kernel) — bulletproof. The cross-boundary case **never relies on cross-VM
locking**; it relies on `ssh host 'broker --ensure'`, which runs the election on
the host where flock works. That is what makes the design topology-independent.
### Lifecycle
- Broker writes `broker.json` (endpoint + pid + token) **atomically** after
binding.
- Broker holds `broker.lock` for its entire lifetime → at most one host broker.
- Idle-exit after N minutes with no connected clients; the next client
re-elects. (Or keep-alive; idle-exit is friendlier on resources.)
- Clients reclaim a stale lock if the pid recorded in `broker.json` is dead.
- Clients retry with backoff while a broker is mid-startup.
## The genuinely hard case
**Container-only with no SSH bridge configured** (e.g. plain Linux Docker,
`HOST_SSH_USER` unset, no `host.docker.internal`). The container cannot start or
reach a host broker. Options, none free:
1. **Require the bridge** for multi-writer container setups, and document it as a
precondition. Reasonable: pi-devbox already ships `setup-lan-access.sh` and
the bridge is the supported path.
2. **Run the broker inside the container**, publishing a Docker port the host can
later reach. Works, but inverts ownership and the broker dies with the
container — only acceptable if containers are the *sole* writers on that host.
3. **Accept degraded mode** (algorithm step 4): a lone container with no peers
has no concurrency, so direct access is safe *as long as* nothing else opens
the palace concurrently. The host shim also checks `broker.json` before
opening directly, so a later host pi will not silently start a second
uncoordinated writer.
**Summary:** fully robust for native-only, native+container, and
container-only-with-bridge. The only residual sharp edge is container-only
*without* a bridge *and* a future concurrent host writer — intrinsic (no shared
coherent lock exists across that boundary), best handled by mandating the bridge
rather than pretending file locks work.
## Co-existence with opencode / opencode-devbox (DEFERRED — context only)
The palace is shared by more than pi. opencode (native) and opencode-devbox
(container) also write to the same `~/.mempalace`. **Assumption to verify:**
opencode sessions write to **different wings** than pi sessions (pi uses
`wing_pi`, diaries per-agent, etc.), so cross-tool intermixing into the *same*
destination may be a non-issue at the application level.
However, the corruption risk here is at the **SQLite-file level, not the wing
level** — two processes writing different wings of the *same* `chroma.sqlite3`
concurrently is still a concurrent write to one file. So the broker, once it
exists, is the right serialization point for opencode too: opencode's mempalace
client would route through the same broker via the same shim mechanism.
**Decision:** do not design for opencode co-existence yet. Resolve the pi case
first; then revisit whether opencode clients adopt the same shim. The residual
risk in the interim is native + container *opencode* sessions writing the same
palace simultaneously — explicitly deferred ("cross that bridge later").
## Open questions / TODO before implementation
- Does the mempalace engine expose an embeddable entrypoint suitable for running
inside a long-lived broker, or does the broker wrap the existing MCP server
binary and multiplex stdio clients onto it? (Affects whether reads can truly
fan out or are also serialized.)
- Idle-exit timeout default + whether to expose it via env.
- `broker.json` schema + atomic-write + stale-pid-reclaim details.
- TCP-path token handling and safe bind interface selection on Linux Docker
(`--add-host=host.docker.internal:host-gateway`).
- Where the broker binary ships: baked into `Dockerfile.base`? host install via
pi-toolkit / mempalace-toolkit? Both, since both sides need the shim and the
host needs the broker.
- Smoke-test plan: prove single-writer invariant under a deliberate concurrent
host+container write storm (should produce zero `.corrupt`/`.drift` snapshots).
+59
View File
@@ -0,0 +1,59 @@
#!/usr/bin/env bash
# dot-watch — auto-rerender a graphviz .dot file to PNG on every save.
#
# WHY THIS EXISTS
# pi-studio renders mermaid natively but has no graphviz/DOT renderer.
# Its markdown preview DOES render local image links (.png/.jpg/.gif/.webp),
# and the editor offers "refresh from disk". This helper closes the loop:
# edit a .dot file -> dot-watch regenerates <name>.png -> hit refresh in
# Studio to see the update. Uses mtime polling (no inotify dependency,
# which isn't in the trixie-slim base).
#
# USAGE
# dot-watch <file.dot> [layout] [dpi]
# layout: dot|neato|fdp|circo|twopi (default: dot)
# dpi: output resolution (default: 150)
# env: DOT_WATCH_INTERVAL=<seconds> poll interval (default: 1)
#
# EXAMPLES
# dot-watch /workspace/graph.dot
# dot-watch graph.dot neato 200
set -euo pipefail
SRC="${1:?usage: dot-watch <file.dot> [layout] [dpi]}"
LAYOUT="${2:-dot}"
DPI="${3:-150}"
[[ -f "$SRC" ]] || { echo "error: no such file: $SRC" >&2; exit 1; }
command -v "$LAYOUT" >/dev/null || { echo "error: layout engine '$LAYOUT' not found" >&2; exit 1; }
OUT="${SRC%.dot}.png"
INTERVAL="${DOT_WATCH_INTERVAL:-1}" # seconds between polls
ERRLOG="$(mktemp -t dot-watch.XXXXXX.err)"
trap 'rm -f "$ERRLOG"' EXIT
render() {
if "$LAYOUT" -Tpng -Gdpi="$DPI" "$SRC" -o "$OUT" 2> "$ERRLOG"; then
printf '[%s] rendered -> %s\n' "$(date +%H:%M:%S)" "$OUT"
else
printf '[%s] DOT error:\n' "$(date +%H:%M:%S)"
sed 's/^/ /' "$ERRLOG"
fi
}
# portable mtime (GNU stat, fallback to BSD stat)
mtime() { stat -c %Y "$1" 2>/dev/null || stat -f %m "$1" 2>/dev/null; }
echo "watching $SRC ($LAYOUT, ${DPI}dpi) -> $OUT [Ctrl-C to stop]"
render
last="$(mtime "$SRC")"
while true; do
sleep "$INTERVAL"
[[ -f "$SRC" ]] || continue
now="$(mtime "$SRC")"
if [[ "$now" != "$last" ]]; then
last="$now"
render
fi
done