docs: capture mempalace-mcp uninterruptible-hang diagnosis (2026-06-13)

Symptom: pi TUI blocks on a mempalace tool call, ESC does not abort.
Initial WAL-contention hypothesis ruled out (no other writer running).
Likely cause: virtiofs cold open of chroma.sqlite3 stalls the JSON-RPC
initialize handshake; pi has no per-call MCP timeout.

Recovery today: docker exec <ctr> pkill -9 -f mempalace-mcp, restart pi.

Planned fix (deferred until after opencode-devbox pi removal): stdio
watchdog shim with per-REQUEST timeout. A naive process-lifetime
timeout wrapper is wrong because mempalace-mcp is long-lived.

Sharing the palace across harnesses remains the goal.
This commit is contained in:
pi
2026-06-13 16:18:45 +02:00
parent ab5ff8ec56
commit 7f67c36a1c
2 changed files with 40 additions and 0 deletions
+9
View File
@@ -279,6 +279,15 @@ RUN ARCH=$(case "${TARGETARCH}" in amd64) echo "x86_64" ;; arm64) echo "aarch64"
# Provides semantic search over conversation history via 29 MCP tools.
# Always installed in the base. Set INSTALL_MEMPALACE=false at base-build
# time to shave ~300 MB.
#
# TODO(2026-06-13): wrap mempalace-mcp with a stdio-watchdog shim that
# applies a per-REQUEST timeout (not a per-process timeout — naive
# `timeout 60 mempalace-mcp` would kill the long-lived server mid-session).
# When the palace is bind-mounted from macOS via OrbStack virtiofs, cold
# chroma.sqlite3 open or HNSW load can stall the JSON-RPC `initialize`
# response and pi's TUI sits uninterruptibly (ESC cancels the LLM stream,
# not the MCP child stdio). See CHANGELOG.md "Unreleased > Known issues".
# Recovery today: `docker exec <ctr> pkill -9 -f mempalace-mcp`.
ARG INSTALL_MEMPALACE=true
# Pin to a known-good version. Bump deliberately, not implicitly: an
# unpinned install silently swept in mempalace 3.3.x/3.4.0 with a broken