feat(skills): bake pi-extensions + mempalace fallback skills

The pi-toolkit global AGENTS.md tells every pi session to read
~/.agents/skills/pi-extensions/SKILL.md at start (the fork/recall
under-utilisation fix), but that skill lived only in the private skillset
repo — so the pointer dangled in any container started without skillset
mounted. Bake fallbacks so the pointer always resolves.

- pi-extensions (Option 1 + Option 2, layered):
  * Canonical skill promoted to the public pi-extensions package repo under
    skill/ (separate commit there); co-located with the code it documents.
  * rootfs/ carries a committed snapshot (the floor).
  * Dockerfile.variant copies /opt/pi-extensions/skill/ over the snapshot
    after the pinned clone, so a normal build ships the fresh package copy
    (recorded via PI_EXTENSIONS_REF) and an old-ref/mirror build still ships
    the snapshot. Helper evaluate-extension-usage.py travels with it.
- mempalace (Option 2 only): snapshot in rootfs/. Its consumer skill has no
  public package home (mempalace-toolkit ships a different skill,
  opencode-mempalace-bridge), so no build-time refresh.
- entrypoint links both (only-when-absent; mounted skillset still wins).
- smoke-test: build-time presence + package-match check + runtime symlink
  assertions; readiness gate now waits on the last-linked skill.
- docs: skills/VENDORED.md (provenance + refresh), README, AGENTS.md,
  CHANGELOG [Unreleased].

Note: shipped in the NEXT release; v1.2.0 (run 409) predates this.
This commit is contained in:
Joakim Persson
2026-06-23 15:32:04 +02:00
parent d619a6e2ec
commit a7d6a7d235
9 changed files with 848 additions and 4 deletions
@@ -0,0 +1,298 @@
---
name: pi-extensions
description: >-
Use the pi extensions (pi-fork, pi-observational-memory, ssh-controlmaster) effectively in the pi coding agent harness. Load this skill only when running inside pi (detection - `fork` and `recall` are present in your tool list, or `pi --ssh` was used to start the session). pi-fork dispatches focused subtasks to forked agents at fast/balanced/deep effort tiers; pi-observational-memory compacts long sessions into recallable observations + reflections; ssh-controlmaster rewires pi's read/write/edit/bash tools to execute on a remote host over a multiplexed SSH connection. This skill covers tier selection, task design, boundary discipline, when to use recall, and remote-pi mechanics.
---
# Pi Extensions: pi-fork, pi-observational-memory, ssh-controlmaster
## When to Load This Skill
Load only when **both** of these are true:
1. You are running inside the **pi coding agent harness** (not Claude Code, not opencode, not any other harness).
2. The `fork` and/or `recall` tools appear in your available tool list, **or** the session was started with `pi --ssh ...`.
If you do not see those tools, this skill does not apply — skip it. Other harnesses do not have these extensions and the patterns below will not work there.
This skill is most useful at the start of any non-trivial session where you may need to dispatch parallel subtasks, where the conversation is likely to compact (sessions running > ~80k tokens), or where pi is operating against a remote host.
## Pi extension landscape (where the wiring lives)
Pi has **two distinct extension locations** and it's easy to look in the wrong one:
| Location | Mechanism | Examples |
|---|---|---|
| `~/.pi/agent/extensions/*.ts` (or `.ts.off`) | **Local extensions** — TypeScript files, usually symlinks into `/opt/pi-extensions/extensions/` or similar. Toggled via `/ext` slash command. | `ssh-controlmaster`, `git-checkpoint`, `notify`, `todo`, `mempalace`, `mcp-loader`, `ext-toggle`, `confirm-destructive` |
| `~/.pi/agent/git/<host>/<owner>/<repo>/` | **Package extensions** — git-cloned npm packages registered via the `packages` array in `~/.pi/agent/settings.json`. | `pi-fork` (`github.com/elpapi42/pi-fork`), `pi-observational-memory` (`github.com/elpapi42/pi-observational-memory`, **default branch `master`** — a `main` branch does not exist, so `pi install git:...` resolves against `master`) |
When the user asks how to use "the X extension", **check both locations**`find ~/.pi/agent -maxdepth 4 -name "*X*"` covers both. The `/ext` slash command shows the local-extensions list with enable/disable state. There is also a distinct skill-bundled-script category (e.g. `ci-release-watcher`'s `ssh-control-master-setup.sh`) which is **not** a pi extension at all — it's a helper script inside a skill. Don't conflate the three.
## Why These Extensions Belong Together
pi-fork and pi-observational-memory are symbiotic. **pi-fork burns context** (each fork dispatches a focused subtask whose detailed exploration would otherwise pollute your main thread). **pi-observational-memory preserves context** (when the main thread eventually compacts, observations + reflections survive the fold and can be recalled by ID). Aggressive forking only works long-term if the surviving summary is high-fidelity, and OM only earns its keep when it's preserving genuinely valuable distilled work.
ssh-controlmaster is orthogonal but composes cleanly: when pi is operating remotely, fork still spawns local sub-agents (each fork *itself* doesn't ssh), but their `bash`/`read`/`write`/`edit` calls do — see Part 3 caveats.
---
## Part 1: pi-fork
### Effort tier mapping
Configured in `~/.pi/agent/settings.json` under `pi-fork.effortProfiles`. The conventional mapping is:
| Tier | Model | Use for |
|---|---|---|
| `fast` | haiku | mechanical edits, narrow lookups, file-listing, single-fact verification, simple syntactic checks |
| `balanced` | sonnet (default) | normal exploration, implementation, testing, code review, option analysis |
| `deep` | opus | architecture decisions, security analysis, concurrency reasoning, ambiguous debugging, high-risk reviews, runbook drafting where subtle mistakes are costly |
**Rule of thumb:** start at `balanced` unless you have a specific reason to go up or down. Going too cheap on a deep task wastes a fork; going too expensive on a mechanical task is just slow.
### When to fork vs. do it yourself
Fork when **any** of:
- The task requires reading many files whose contents you don't need to keep in your main context afterwards (the fork returns a dense summary; raw file contents stay in the fork's context and are discarded).
- You want to run multiple analyses in **parallel** (especially: comparing N options, where independent reasoning is itself a signal — see "parallel forks" below).
- The task is well-scoped enough to specify completely up front and well-bounded enough that returning a dense report is more useful than continuing the dialogue.
- You are about to do something that would burn a lot of tokens on tool calls (long file reads, many bash invocations) whose output you will mostly discard.
Don't fork when:
- The work fits in your current context budget without crowding out what comes next.
- The task is exploratory and you'll need to iterate based on what you find (forking turns iteration into round-trips with full task-spec rewrites).
- You need to make decisions during the work that depend on context only the main thread has.
### Task design: the four things a fork brief must contain
1. **Verified context up front.** Do not say "go look at the codebase and figure out X". Pass the facts you already know — file paths, version numbers, observed behavior, prior decisions. The fork should be reasoning *from* context, not *finding* context. Discovery work costs the fork tokens that don't come back to you.
2. **A specific deliverable.** "Analyze X" is too vague. "Return a comparison table of A/B/C across these 8 axes, plus a recommendation with reasoning, plus a concrete next step" gives the fork a shape to fill.
3. **Decision authority.** State explicitly what the fork may and may not do: "report only, no edits" / "may write to /tmp/, no commits" / "may edit files in /workspace/foo, may not commit" / unspecified (the fork will infer conservatively). **State this even when it seems obvious.** See "Boundary discipline" below.
4. **What "unsure" looks like.** Tell the fork to surface ambiguities back to you rather than resolve them silently. "Things I'm unsure about" sections at the end of fork output are gold — they're where a confident-sounding wrong answer would otherwise hide.
### Parallel forks for option-comparison
When facing a "which approach should we take" question with 24 candidate approaches, dispatching the candidates as parallel forks is high-leverage:
- They reason **independently**. No fork sees the others' work.
- **Convergence is signal.** If three forks at different effort tiers reach the same recommendation citing different evidence, that's a strong validation that doesn't depend on any one model's bias.
- **Divergence is also signal.** If one disagrees, read its reasoning carefully — it may have spotted something the others missed, or it may have a tier-specific weakness worth knowing.
Sample shape for an option-comparison call:
- Fork 1 (deep) — detailed runbook for option A, with timing/risk/rollback
- Fork 2 (balanced) — comparison table A vs B vs C across N axes, with a recommendation
- Fork 3 (fast) — focused sub-question (e.g., "which container image / library version / CLI flag")
This costs more than a single fork but the cross-validation is often worth it for decisions you'll execute on prod systems.
### Boundary discipline (observed behavior)
Forks **mostly** honor explicit decision-authority instructions, but not infallibly. Observed pattern from real sessions:
- **Pure analysis tasks** (no write authority, "report only") — high compliance. Forks reliably return analysis without editing files or committing.
- **Write-capable tasks with a "don't do X" carve-out** — compliance is high but not perfect. Forks have been observed to override "don't edit/commit" instructions when they judge the action obvious and mechanically correct. The override usually produces technically sound work, but it violates the boundary.
**Practical rules:**
- State decision authority explicitly, every time, even when "report only" feels redundant.
- For high-stakes write authority, verify the fork's actions afterwards (`git status`, `git log -1`, file diffs) rather than assuming compliance.
- If a boundary violation is unacceptable (e.g., compliance review, sandboxed exploration, "don't touch prod"), do not give the fork write tools at all — keep it strictly in analysis mode.
- The fact that the fork was "right anyway" is not the same as the fork having followed instructions.
### Anti-patterns
- **Forking trivial work.** A fork has overhead. If the task takes < 30 seconds in your main thread, just do it.
- **Vague briefs.** "Look into the database thing" returns vague output. The fork is not telepathic.
- **Forking iterative work.** Forks are one-shot. If you need to iterate, you'll re-spec the task each time — usually worse than doing it yourself.
- **Recursive forking** (forks spawning forks). Disabled by default and should stay disabled unless you have a specific batch-fanout use case.
- **Treating fork output as ground truth without verification.** Especially for cited code/commit hashes/URLs — forks can hallucinate these like any LLM. Spot-check decisive evidence.
---
## Part 2: pi-observational-memory
### How it actually works
Observational memory (OM v3, "session-ledger" architecture) runs an **observer agent** in the background as your conversation grows. When token thresholds are crossed (defaults: observe at 10k, reflect at 20k, compact at 81k), the observer distills the recent transcript into:
- **Observations** — timestamped events, each with a 12-character hex ID like `[3682ebfad7af]`. Compact one-liners describing what happened in the conversation.
- **Reflections** — durable, long-lived facts about the user, project, decisions, and constraints. Some reflections include observation IDs as evidence pointers.
When compaction fires, the raw transcript is folded away and replaced with a structured summary block containing the observations + reflections. **You — the next turn of the same agent — receive that summary block as your starting context.** That's the recovery mechanism.
**Storage is in-transcript, not on disk.** Do not grep for `observations.jsonl` or similar files; you will not find them. The artifact lives in the model's input context window.
Configuration lives in `~/.pi/agent/settings.json` under `observational-memory`. Tune `observeAfterTokens`, `reflectAfterTokens`, `compactAfterTokens`, and `observationsPoolMaxTokens` if observations feel sparse or noisy. The default 81k compaction threshold is well-calibrated for typical multi-task sessions.
### The `recall` tool
`recall(<12-char-hex-id>)` resolves a specific observation or reflection ID back to the original source context — the exact bash output, file contents, tool call results, commit message, or transcript fragment that the observation was distilled from.
**Use recall when:**
- You are about to make a decision that depends materially on a compacted observation or reflection whose details are unclear.
- You need exact wording, paths, commands, errors, commits, or user constraints behind a remembered claim.
- A broad reflection is relevant but you need its supporting observations to act safely.
- The user asks "why do you believe X" or "what supports that memory".
**Do not use recall for:**
- Semantic search (it's keyed by ID, not topic — you must already have a specific 12-char hex ID).
- Browsing the transcript out of curiosity.
- Preemptive lookup of every ID in your context "just in case".
Recall costs tokens. Use it when exact source context will materially change your next action.
> **Calibration note (from a real ~1-month trial, 2026-05/06):** across 20 logged container sessions, `recall` was invoked **0 times** while obsmem passively carried 529 observations across 6 compactions. Zero recall is a *warning sign*, not a badge of efficiency — it means decisions after a compaction were made on the distilled one-liner alone, without ever re-checking the source. The injected summary is **lossy by design**. Default habit to adopt: when you are about to **edit code, ship a change, or assert a fact** that rests on a `[high]`/`[critical]` observation or a reflection you did not produce *this* turn, `recall` its ID **first**. One recall before a load-bearing action is cheap; redoing finished work or contradicting a prior correction is not.
### Reading the compaction summary
When you see a block like `The conversation history before this point was compacted into the following summary:` at the start of a session or turn, that's OM output. Standard structure:
- **Reflections** at the top: stable facts. Some have IDs in brackets.
- **Observations** below, chronological: timestamped events with IDs in brackets and importance markers (`[high]`, `[critical]`, etc.).
When entries conflict, **the most recent observation reflects the latest known state.** Work that prior observations describe as completed should not be redone unless the user explicitly asks to revisit it.
### Anti-patterns
- **Treating compacted memory as definitive without recall** when stakes are high. Compaction is lossy; the observation may have lost a constraint that was on the line above it in the original transcript.
- **Recalling every ID preemptively.** Wasteful. Recall on demand.
- **Assuming the disk holds OM artifacts.** It doesn't. Don't waste time looking.
- **Ignoring the summary block** when starting a session. It's there because the prior session was real work — read it before answering questions about past work.
---
## Quick Reference
```
fork(task=..., effort=fast|balanced|deep)
- state decision authority explicitly
- pass verified context up front
- specify deliverable shape
- ask for "unsure about" section
recall(id=<12-char-hex>)
- only when stakes justify the cost
- id must already be visible in your context
- not a search tool
```
```
~/.pi/agent/settings.json
pi-fork.effortProfiles — model + thinking-depth per tier
pi-fork.defaultEffort — usually "balanced"
observational-memory.* — token thresholds, model, agentMaxTurns
observational-memory.debugLog: true — opt-in NDJSON telemetry at
~/.pi/agent/observational-memory/debug/<session>.ndjson (off by default)
```
### Installing on a fresh machine (host)
These are git-sourced pi packages (pi-fork is **not** on npm). Add to the
`packages` array in `~/.pi/agent/settings.json`, or:
```
pi install git:github.com/elpapi42/pi-fork
pi install git:github.com/elpapi42/pi-observational-memory # default branch: master (no main)
# obsmem is also published: pi install npm:pi-observational-memory
```
Restart pi after install. Enable `observational-memory.debugLog` if you want
the next window instrumented.
### Evaluating usage
`evaluate-extension-usage.py` (bundled next to this skill) mines pi session
transcripts for fork/recall counts and obsmem compaction stats. Run it per
machine (transcripts live at `~/.pi/agent/sessions/`) for a combined
host+container picture:
```
./evaluate-extension-usage.py # ~/.pi/agent/sessions
./evaluate-extension-usage.py /path/a /path/b # multiple roots
```
---
## Part 3: ssh-controlmaster
### What it does
When pi is launched with `--ssh`, this extension **rewires pi's `read`, `write`, `edit`, and `bash` tools to execute on the remote machine**, multiplexed over a single SSH ControlMaster socket. Pi is still running locally — the LLM, the UI, the MCP servers, the fork dispatcher all live on your local box — but anything those tools touch on the filesystem is the *remote's* filesystem.
This is fundamentally different from running pi locally and using `bash` to ssh inside it: with `--ssh`, the tool layer itself is remoted, so the LLM thinks it's working in the remote's `cwd` (the system prompt is rewritten to say so).
### Usage
```bash
# Key-based auth (preferred), remote cwd defaults to remote $HOME
pi --ssh lagret
# Pin to a specific remote directory
pi --ssh lagret:/volume1/docker/portainer/compose/119
# Password auth (input is NOT masked when typing)
pi --ssh user@host --ssh-ask-pass
```
The `lagret` form requires a `Host lagret` block in `~/.ssh/config` or a resolvable hostname. The status bar shows `SSH ⚡ own master <host>:<cwd>` or `SSH ⚡ system master <host>:<cwd>` once connected.
### How it cooperates with system SSH config
It reads `ssh -G <host>` to learn the effective config, then:
| `~/.ssh/config` for the host | Behavior |
|---|---|
| `ControlMaster auto` or `yes` with a `ControlPath` | Reuses the system master socket. Does **not** tear it down on pi exit ("it was the system's to manage before pi arrived"). |
| No ControlMaster configured (or explicitly `no`) | Creates its own master at `/tmp/pi-cm-<pid>.sock` with `ControlPersist=yes`. Tears it down on pi `session_shutdown`. |
This means it composes cleanly with the system-wide `ssh-control-master-setup.sh` helper from the `ci-release-watcher` skill: if that script has already configured `~/.ssh/config` for the host, `pi --ssh` rides on the existing master rather than opening a parallel connection.
### Caveats and edge cases
- **Local vs remote tool boundary.** Only `read`/`write`/`edit`/`bash` are remoted. **MCP servers are still local**`mempalace` files drawers and diary entries against the local palace even when your shell work happens remotely. Same for `fork`, `recall`, `todo`, and any other custom tool. This is usually what you want (palace memory survives across remote sessions) but worth knowing.
- **fork over ssh.** Forks spawn locally and inherit the same `--ssh` mode by virtue of the parent's tool wiring; the fork's bash calls hit the same ControlMaster. Forks burn the same SSH socket, not a parallel one — multiplexing wins again.
- **macOS Unix socket path limit.** The own-master socket lives at `/tmp/pi-cm-<pid>.sock` to stay under macOS's ~104-char limit. If you have a non-default `TMPDIR` long enough to blow this, ssh will fail to start the master.
- **Password auth password visibility.** From the source: *"input is NOT masked — the password is visible while typing."* The password is written to a chmod-700 SSH_ASKPASS script in `/tmp` and deleted after the master establishes; not persisted, but on-screen during entry.
- **Remote bash environment.** The remote shell is whatever `ssh user@host '<cmd>'` invokes — typically a non-login non-interactive bash. Don't expect `~/.bashrc` aliases or PATH manipulations from `~/.profile`. Pin tool paths or invoke via `bash -lc '...'` if you need login-shell behavior.
- **Path translation is naive.** The extension does `path.replace(localCwd, remoteCwd)` to translate paths in tool calls. If the LLM emits an absolute remote path that doesn't share the local-cwd prefix, the path is passed through unchanged — usually fine but pathological for paths that happen to contain the local-cwd substring.
### When to use it
- Editing configs on a NAS / homelab host without scp ping-pong (`pi --ssh lagret:/volume1/...`)
- Operating against a host whose tools/data you need but whose disk is too slow to mount via SSHFS
- Investigating runner state, container configs, etc., on a remote host as if local
- Multi-step remote work where opening a fresh ssh connection per step would burn your CGNAT flow budget
### Anti-patterns
- **Using `pi --ssh` for one-off shell work.** Just `ssh` directly. The extension shines when there are dozens of tool calls per session.
- **Filing palace drawers expecting them on the remote.** They go to the local palace. If you want palace artifacts on the remote host, ssh into the remote and run pi *there* against its local palace.
- **Forgetting `--ssh` in followup sessions.** Status bar is the canary — if you don't see `SSH ⚡` you're operating locally despite intending remote. Easy mistake on a fresh terminal.
### Reaching the devbox host from inside the container (`dssh` / `dscp`)
Distinct from `pi --ssh` above. When the **pi-devbox container** runs under OrbStack / Docker Desktop on macOS, it can SSH back to its own host. The entrypoint's `setup-lan-access.sh` regenerates `~/.ssh-local/config` on **every container start** (the in-container `~/.ssh` is mounted read-only, so a sidecar config + `known_hosts` + `ControlPath` under `~/.ssh-local/` is used instead).
```bash
# Interactive shells get aliases (from ~/.bash_aliases):
dssh host 'cmd' # = ssh -F ~/.ssh-local/config host
dscp file host:/path # = scp -F ~/.ssh-local/config ...
```
**The agent's `bash` tool is non-interactive — those aliases are NOT loaded.** Use the explicit form:
```bash
ssh -F ~/.ssh-local/config host 'cmd'
scp -F ~/.ssh-local/config <src> host:<dst>
```
- Host aliases `host` and `mac` both resolve to `host.docker.internal` (user varies per host machine — check `~/.ssh-local/config` for the active `User` value, key `~/.ssh-local/devbox_jump_ed25519`, `ControlMaster auto` / `ControlPersist 4h`).
- The config chains `Include ~/.config/devbox-shell/ssh-lan.conf` then `Include ~/.ssh/config`, so LAN targets are reachable too (add `ProxyJump host` to those entries).
- **Use it for:** enabling/inspecting the host's pi config (`~/.pi/agent/settings.json`), running `evaluate-extension-usage.py` against the host's `~/.pi/agent/sessions/` for a combined host+container metric, or copying host transcripts into the container. The host's pi runs natively there; its palace, sessions, and extensions are separate from the container's.
---
## Cross-Skill Notes
- **mempalace** is for cross-session persistent memory (diary, knowledge graph, drawer storage). OM is for **within-session** context survival across compaction. They complement each other: write a diary entry at session end *and* let OM compact your work-in-progress mid-session.
- **systematic-debugging** and **test-driven-development** skills pair well with deep-tier forks: a deep fork can carry out a focused debugging investigation or write a failing test suite without polluting your main context.
- **ci-release-watcher** ships a `scripts/ssh-control-master-setup.sh` helper that configures system-wide SSH ControlMaster in `~/.ssh/config`. That's a separate mechanism from the `ssh-controlmaster` pi extension — they compose, they don't overlap. Use the script for persistent host-wide multiplexing, the extension for per-pi-session remote operation.