Files

T

pi e12b624cf7 feat(pi-ext): self-healing respawn + scoped init timeout for mempalace-mcp

A stall-kill (or any crash) of mempalace-mcp was a permanent latch:
available flipped off and stayed off until pi restart. Now the next tool
call transparently respawns the server and retries.

- ensureAlive(): bounded respawn with capped exponential backoff
  (MEMPALACE_MCP_MAX_RESPAWNS, default 2; MEMPALACE_MCP_RESPAWN_BACKOFF_MS,
  default 1000). Respawn budget resets on any successful JSON-RPC response,
  so a recovered server regains full patience while a persistently-broken
  one hits the cap and stays down (no hot-loop).
- Init timeout default raised 120000 -> 300000 (scoped to init only): a
  genuine virtiofs cold-open shouldn't be killed mid-progress only to
  respawn and re-pay the same cost. Per-call timeout stays 60000.
- Concurrency hardening: generation counter so a late exit from a killed
  old process can't tear down a fresh respawn; explicit healthy flag
  replaces racy proc!=null liveness check.
- README: document self-heal, new env vars, and why generous-init +
  bounded-respawn compose rather than overlap.

2026-06-26 00:22:21 +02:00

10 KiB

Raw Blame History

pi ↔ MemPalace MCP bridge

The canonical source of ~/.pi/agent/extensions/mempalace.ts — the TypeScript extension that wires MemPalace's MCP server into the pi coding-agent harness. Installs wake-up context injection, per-tool schema passthrough, and a /mempalace-diary slash-command.

This directory only holds the bridge. Pi's own base config (keybindings, environment loader, settings template) lives in the sibling pi-toolkit repo — split out 2026-05-05 so opencode-devbox can build slim containers that include pi without dragging in mempalace's dependencies (~300 MB).

Jump to:

What it does
The Type.Unsafe gotcha
Deploying pi with mempalace on a new machine
Fail-soft, identity, debugging

What it does

Spawns mempalace-mcp as a subprocess and does the MCP stdio JSON-RPC handshake (initialize + notifications/initialized + tools/list).
Registers each MCP tool as a pi tool with its real inputSchema passed through via Type.Unsafe(...) (see gotcha below).
Wake-up auto-injection (before_agent_start, one-shot per fresh session): calls mempalace_status + mempalace_diary_read and injects the result as a mempalace-wakeup system message so the agent orients itself the way ~/.agents/skills/mempalace/SKILL.md describes. Skipped on resume/fork (context is already in the thread).
Manual wind-down via a /mempalace-diary [topic] slash command: sends a prompt asking the LLM to call mempalace_diary_write with an AAAK-formatted entry summarizing the session. Not fully auto because pi sessions are typically short/tactical and session_shutdown fires too late to drive another LLM turn.

Fail-soft

If mempalace-mcp can't be spawned (PATH missing, binary crashes at startup, …) the extension logs to stderr and returns early. pi keeps working without palace tools rather than refusing to start.

Identity

agent_name for diary calls comes from $MEMPALACE_AGENT_NAME, defaulting to "pi". First diary write against that identity creates wing_<name> in the palace. Set the env var if you want to run pi under a distinct identity on a given machine (e.g. pi-laptop vs pi-server).

Stall protection (per-request timeout)

Every JSON-RPC request to mempalace-mcp carries a timeout. Without it, a wedged server (classically: an OrbStack/virtiofs cold-open of a large chroma.sqlite3 or an HNSW load) leaves the awaiting promise pending forever, which freezes the pi TUI — ESC cancels the LLM stream, not a pending tool execute(). On timeout the extension rejects the request and kills the stalled child (SIGTERM→SIGKILL), so pi gets a clear error instead of hanging. This is a per-REQUEST timeout, not a process-lifetime one — the long-lived server is only killed when a request genuinely stalls.

MEMPALACE_MCP_TIMEOUT_MS — tool-call/request timeout. Default 60000. Kept short on purpose: a query taking this long is genuinely wedged.
MEMPALACE_MCP_INIT_TIMEOUT_MS — initialize + tools/list handshake timeout. Default 300000. Deliberately generous: a genuine first cold-open over virtiofs can legitimately take minutes, and killing a still-progressing init only to respawn and re-pay the same cold cost is strictly worse than waiting.
Set either to 0 to disable (legacy unbounded behavior).

Self-heal (respawn instead of a permanent latch)

A stall-kill (or any crash) used to be a permanent latch: available flipped off and stayed off until you restarted pi. It is now self-healing — the next tool call transparently respawns mempalace-mcp and retries.

Respawns use capped exponential backoff so a persistently-broken server can't hot-loop: MEMPALACE_MCP_MAX_RESPAWNS attempts (default 2; set 0 to disable self-heal and keep the old fail-fast latch), with MEMPALACE_MCP_RESPAWN_BACKOFF_MS (default 1000) doubled per attempt.
The budget resets on any successful JSON-RPC response — proof the server is actually live — so a server that recovers regains full patience, while one that keeps dying hits the cap and stays down (then restart pi).
Why the long init timeout and bounded respawn compose rather than overlap: once a server has opened the palace once, the OS page cache is warm, so respawn cold-opens are fast. The long init timeout prevents killing a healthy first cold-open; the respawn handles a genuinely dead server cheaply afterwards. (Note the HNSW deserialize is CPU work that isn't page-cacheable across spawns, which is exactly why we can't rely on respawn-warming alone and keep the generous init budget.)
The initial startup is tolerant too: if the very first start() fails, the extension runs the same bounded respawn before falling back to fail-soft (pi keeps working without palace tools).

Debugging

MEMPALACE_EXT_DEBUG=1 — surface mempalace-mcp stderr into pi's stderr. Without this, stderr is drained silently so a misbehaving server doesn't flood the TUI.
If a tool call fails with a generic "Internal tool error", spawn mempalace-mcp manually with raw JSON-RPC on stdin to read the server-side error — much faster than guessing.

The `Type.Unsafe` gotcha

Earlier versions of this extension registered every MCP tool with parameters: Type.Object({}, { additionalProperties: true }), which discarded each tool's real inputSchema. The LLM then saw no parameter names and had to guess, leading to bugs like mempalace_diary_read being called with agent= instead of the required agent_name= and crashing the Python server with TypeError: missing 1 required positional argument.

The fix (≈ lines 160-170) is to wrap the incoming JSON Schema with Type.Unsafe<...>(tool.inputSchema). TypeBox schemas are plain JSON Schema at runtime plus a Symbol marker, so wrapping an externally-sourced schema with Unsafe is sufficient — no conversion to a full TypeBox tree is needed, and the LLM now sees every tool's real parameter names.

If you ever need to re-loosen the schema for debugging, fall back to the Type.Object({}, { additionalProperties: true }) default only for that specific tool, not globally.

Deploying pi with mempalace on a new machine

This is the "pi + memory" recipe. For pi without mempalace, see pi-toolkit's README.

0. Prerequisites

Shell: zsh + oh-my-zsh recommended (both toolkits install loaders into ~/.oh-my-zsh/custom/; bash works too, installers print the manual source snippet).
git, node ≥ 20, uv, tmux ≥ 3.2, pi installed upstream.
AWS credentials reachable via AWS_PROFILE — only if using amazon-bedrock as pi's provider.

1. Dotfiles (if you keep one)

Brings ~/.config/pi/.env (AWS creds, git-crypt encrypted), tmux CSI-u extended keys, and other machine state:

git clone <your-dotfiles> ~/src/dotfiles
cd ~/src/dotfiles
git-crypt unlock <key>
./provision.sh --profile <profile>    # or your equivalent tool

2. Install pi upstream

brew install pi-coding-agent       # macOS
# or see https://github.com/earendil-works/pi for Linux
pi --help                          # creates ~/.pi/agent/

3. Install pi-toolkit (base pi config)

git clone ssh://git@gitea.jordbo.se:2222/joakimp/pi-toolkit.git ~/pi-toolkit
cd ~/pi-toolkit && ./install.sh

Symlinks keybindings.json, copies pi-env.zsh into ~/.oh-my-zsh/custom/, and prints the settings.json bootstrap command.

4. Bootstrap pi settings

cp ~/pi-toolkit/settings.example.json ~/.pi/agent/settings.json
$EDITOR ~/.pi/agent/settings.json   # eu./us./anthropic: prefix

5. Install mempalace CLI + this toolkit

uv tool install mempalace
git clone ssh://git@gitea.jordbo.se:2222/joakimp/mempalace-toolkit.git ~/mempalace-toolkit
cd ~/mempalace-toolkit && ./install.sh

Detects pi, symlinks mempalace.ts into ~/.pi/agent/extensions/. Also detects pi-toolkit artifacts and prints a green check (or a warning telling you to install pi-toolkit first if you skipped step 3).

6. Register mempalace MCP with opencode (if applicable)

Skip if this box is pi-only. Otherwise:

Install opencode-toolkit so ~/.config/opencode/.env is sourced into every shell (GitHub / Gitea / other MCP server tokens).
Register the mempalace MCP server in ~/.config/opencode/opencode.json — see root README § Registering mempalace with opencode.

7. First run

exec zsh
pi   # should start with defaults; wake-up injection shows palace status

If the wake-up doesn't print, run MEMPALACE_EXT_DEBUG=1 pi to surface mempalace-mcp stderr.

Verification checklist

# MCP bridge in place
ls -la ~/.pi/agent/extensions/mempalace.ts    # → this repo

# pi-toolkit artifacts also in place
ls -la ~/.pi/agent/keybindings.json           # → pi-toolkit
ls -la ~/.oh-my-zsh/custom/pi-env.zsh         # cp from pi-toolkit

# Env loaded
zsh -ic 'echo $AWS_PROFILE $AWS_REGION'

# Palace reachable
mempalace status

Uninstall

cd ~/mempalace-toolkit && ./install.sh --uninstall --yes   # bridge only
cd ~/pi-toolkit        && ./install.sh --uninstall --yes   # pi base config
# Leaves pi itself, mempalace CLI, and ~/.config/pi/.env alone.

File layout

mempalace-toolkit/
└── extensions/
    └── pi/
        ├── README.md        ← this file
        └── mempalace.ts     ← symlinked into ~/.pi/agent/extensions/

Pi base config (keybindings, env loader, settings template) lives in pi-toolkit. install.sh detects pi via ~/.pi/agent/extensions/ and runs a check_pi_toolkit probe that warns if pi-toolkit's artifacts are missing.

10 KiB Raw Blame History