diff --git a/CHANGELOG.md b/CHANGELOG.md index 676333d..7746990 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -13,6 +13,37 @@ Pre-v1.0.0 tags followed the pi npm version (`v{pi_version}[letter]`). ## Unreleased +### Known issues + +- **`mempalace-mcp` can hang the pi TUI uninterruptibly** when the + palace is bind-mounted from the macOS host (OrbStack virtiofs) and the + container opens a large `chroma.sqlite3` for the first time. Symptoms: + pi sits silently after a tool call, ESC does not abort, no progress + output. Root cause is **not** WAL contention with another writer (we + initially suspected this and ruled it out — diagnosis 2026-06-13 with + no other mempalace process running). Most likely causes, in order: + 1. SQLite cold-open `fcntl`/`flock` semantics over OrbStack virtiofs + stalling the chromadb open path before mempalace-mcp emits its + `initialize` JSON-RPC response — pi blocks on the handshake. + 2. Cold HNSW index load/rebuild for a large wing (~23k drawers) doing + random-access I/O over virtiofs. + 3. Stale WAL recovery from a previously OOM-killed mempalace-mcp. + + ESC not interrupting is a pi-side limitation: pi cancels the LLM stream + but keeps awaiting the MCP child's stdio. There is no per-call MCP + timeout in pi's config. Workaround when stuck: + `docker exec pkill -9 -f mempalace-mcp` then restart pi. + + Planned fix: a thin Python stdio-watchdog shim in front of + `mempalace-mcp` that applies a per-request timeout and kills the child + on stall, **without** killing the long-lived server itself (a naive + `timeout 60 mempalace-mcp` wrapper is wrong — it kills the server + mid-session). Sharing the palace across harnesses (native pi, container + pi, opencode) remains the goal — isolated palaces defeat the point. + Longer term: run a single mempalace-mcp daemon on the host and + multiplex stdio over a UNIX socket so all clients share one writer on + native APFS. + ### Added - **`dot-watch` helper** (`/usr/local/bin/dot-watch`) — auto-rerenders a diff --git a/Dockerfile.base b/Dockerfile.base index 191d03c..9f3ae0d 100644 --- a/Dockerfile.base +++ b/Dockerfile.base @@ -279,6 +279,15 @@ RUN ARCH=$(case "${TARGETARCH}" in amd64) echo "x86_64" ;; arm64) echo "aarch64" # Provides semantic search over conversation history via 29 MCP tools. # Always installed in the base. Set INSTALL_MEMPALACE=false at base-build # time to shave ~300 MB. +# +# TODO(2026-06-13): wrap mempalace-mcp with a stdio-watchdog shim that +# applies a per-REQUEST timeout (not a per-process timeout — naive +# `timeout 60 mempalace-mcp` would kill the long-lived server mid-session). +# When the palace is bind-mounted from macOS via OrbStack virtiofs, cold +# chroma.sqlite3 open or HNSW load can stall the JSON-RPC `initialize` +# response and pi's TUI sits uninterruptibly (ESC cancels the LLM stream, +# not the MCP child stdio). See CHANGELOG.md "Unreleased > Known issues". +# Recovery today: `docker exec pkill -9 -f mempalace-mcp`. ARG INSTALL_MEMPALACE=true # Pin to a known-good version. Bump deliberately, not implicitly: an # unpinned install silently swept in mempalace 3.3.x/3.4.0 with a broken