The pi-toolkit global AGENTS.md tells every pi session to read
~/.agents/skills/pi-extensions/SKILL.md at start (the fork/recall
under-utilisation fix), but that skill lived only in the private skillset
repo — so the pointer dangled in any container started without skillset
mounted. Bake fallbacks so the pointer always resolves.
- pi-extensions (Option 1 + Option 2, layered):
* Canonical skill promoted to the public pi-extensions package repo under
skill/ (separate commit there); co-located with the code it documents.
* rootfs/ carries a committed snapshot (the floor).
* Dockerfile.variant copies /opt/pi-extensions/skill/ over the snapshot
after the pinned clone, so a normal build ships the fresh package copy
(recorded via PI_EXTENSIONS_REF) and an old-ref/mirror build still ships
the snapshot. Helper evaluate-extension-usage.py travels with it.
- mempalace (Option 2 only): snapshot in rootfs/. Its consumer skill has no
public package home (mempalace-toolkit ships a different skill,
opencode-mempalace-bridge), so no build-time refresh.
- entrypoint links both (only-when-absent; mounted skillset still wins).
- smoke-test: build-time presence + package-match check + runtime symlink
assertions; readiness gate now waits on the last-linked skill.
- docs: skills/VENDORED.md (provenance + refresh), README, AGENTS.md,
CHANGELOG [Unreleased].
Note: shipped in the NEXT release; v1.2.0 (run 409) predates this.
22 KiB
name, description
| name | description |
|---|---|
| pi-extensions | Use the pi extensions (pi-fork, pi-observational-memory, ssh-controlmaster) effectively in the pi coding agent harness. Load this skill only when running inside pi (detection - `fork` and `recall` are present in your tool list, or `pi --ssh` was used to start the session). pi-fork dispatches focused subtasks to forked agents at fast/balanced/deep effort tiers; pi-observational-memory compacts long sessions into recallable observations + reflections; ssh-controlmaster rewires pi's read/write/edit/bash tools to execute on a remote host over a multiplexed SSH connection. This skill covers tier selection, task design, boundary discipline, when to use recall, and remote-pi mechanics. |
Pi Extensions: pi-fork, pi-observational-memory, ssh-controlmaster
When to Load This Skill
Load only when both of these are true:
- You are running inside the pi coding agent harness (not Claude Code, not opencode, not any other harness).
- The
forkand/orrecalltools appear in your available tool list, or the session was started withpi --ssh ....
If you do not see those tools, this skill does not apply — skip it. Other harnesses do not have these extensions and the patterns below will not work there.
This skill is most useful at the start of any non-trivial session where you may need to dispatch parallel subtasks, where the conversation is likely to compact (sessions running > ~80k tokens), or where pi is operating against a remote host.
Pi extension landscape (where the wiring lives)
Pi has two distinct extension locations and it's easy to look in the wrong one:
| Location | Mechanism | Examples |
|---|---|---|
~/.pi/agent/extensions/*.ts (or .ts.off) |
Local extensions — TypeScript files, usually symlinks into /opt/pi-extensions/extensions/ or similar. Toggled via /ext slash command. |
ssh-controlmaster, git-checkpoint, notify, todo, mempalace, mcp-loader, ext-toggle, confirm-destructive |
~/.pi/agent/git/<host>/<owner>/<repo>/ |
Package extensions — git-cloned npm packages registered via the packages array in ~/.pi/agent/settings.json. |
pi-fork (github.com/elpapi42/pi-fork), pi-observational-memory (github.com/elpapi42/pi-observational-memory, default branch master — a main branch does not exist, so pi install git:... resolves against master) |
When the user asks how to use "the X extension", check both locations — find ~/.pi/agent -maxdepth 4 -name "*X*" covers both. The /ext slash command shows the local-extensions list with enable/disable state. There is also a distinct skill-bundled-script category (e.g. ci-release-watcher's ssh-control-master-setup.sh) which is not a pi extension at all — it's a helper script inside a skill. Don't conflate the three.
Why These Extensions Belong Together
pi-fork and pi-observational-memory are symbiotic. pi-fork burns context (each fork dispatches a focused subtask whose detailed exploration would otherwise pollute your main thread). pi-observational-memory preserves context (when the main thread eventually compacts, observations + reflections survive the fold and can be recalled by ID). Aggressive forking only works long-term if the surviving summary is high-fidelity, and OM only earns its keep when it's preserving genuinely valuable distilled work.
ssh-controlmaster is orthogonal but composes cleanly: when pi is operating remotely, fork still spawns local sub-agents (each fork itself doesn't ssh), but their bash/read/write/edit calls do — see Part 3 caveats.
Part 1: pi-fork
Effort tier mapping
Configured in ~/.pi/agent/settings.json under pi-fork.effortProfiles. The conventional mapping is:
| Tier | Model | Use for |
|---|---|---|
fast |
haiku | mechanical edits, narrow lookups, file-listing, single-fact verification, simple syntactic checks |
balanced |
sonnet (default) | normal exploration, implementation, testing, code review, option analysis |
deep |
opus | architecture decisions, security analysis, concurrency reasoning, ambiguous debugging, high-risk reviews, runbook drafting where subtle mistakes are costly |
Rule of thumb: start at balanced unless you have a specific reason to go up or down. Going too cheap on a deep task wastes a fork; going too expensive on a mechanical task is just slow.
When to fork vs. do it yourself
Fork when any of:
- The task requires reading many files whose contents you don't need to keep in your main context afterwards (the fork returns a dense summary; raw file contents stay in the fork's context and are discarded).
- You want to run multiple analyses in parallel (especially: comparing N options, where independent reasoning is itself a signal — see "parallel forks" below).
- The task is well-scoped enough to specify completely up front and well-bounded enough that returning a dense report is more useful than continuing the dialogue.
- You are about to do something that would burn a lot of tokens on tool calls (long file reads, many bash invocations) whose output you will mostly discard.
Don't fork when:
- The work fits in your current context budget without crowding out what comes next.
- The task is exploratory and you'll need to iterate based on what you find (forking turns iteration into round-trips with full task-spec rewrites).
- You need to make decisions during the work that depend on context only the main thread has.
Task design: the four things a fork brief must contain
- Verified context up front. Do not say "go look at the codebase and figure out X". Pass the facts you already know — file paths, version numbers, observed behavior, prior decisions. The fork should be reasoning from context, not finding context. Discovery work costs the fork tokens that don't come back to you.
- A specific deliverable. "Analyze X" is too vague. "Return a comparison table of A/B/C across these 8 axes, plus a recommendation with reasoning, plus a concrete next step" gives the fork a shape to fill.
- Decision authority. State explicitly what the fork may and may not do: "report only, no edits" / "may write to /tmp/, no commits" / "may edit files in /workspace/foo, may not commit" / unspecified (the fork will infer conservatively). State this even when it seems obvious. See "Boundary discipline" below.
- What "unsure" looks like. Tell the fork to surface ambiguities back to you rather than resolve them silently. "Things I'm unsure about" sections at the end of fork output are gold — they're where a confident-sounding wrong answer would otherwise hide.
Parallel forks for option-comparison
When facing a "which approach should we take" question with 2–4 candidate approaches, dispatching the candidates as parallel forks is high-leverage:
- They reason independently. No fork sees the others' work.
- Convergence is signal. If three forks at different effort tiers reach the same recommendation citing different evidence, that's a strong validation that doesn't depend on any one model's bias.
- Divergence is also signal. If one disagrees, read its reasoning carefully — it may have spotted something the others missed, or it may have a tier-specific weakness worth knowing.
Sample shape for an option-comparison call:
- Fork 1 (deep) — detailed runbook for option A, with timing/risk/rollback
- Fork 2 (balanced) — comparison table A vs B vs C across N axes, with a recommendation
- Fork 3 (fast) — focused sub-question (e.g., "which container image / library version / CLI flag")
This costs more than a single fork but the cross-validation is often worth it for decisions you'll execute on prod systems.
Boundary discipline (observed behavior)
Forks mostly honor explicit decision-authority instructions, but not infallibly. Observed pattern from real sessions:
- Pure analysis tasks (no write authority, "report only") — high compliance. Forks reliably return analysis without editing files or committing.
- Write-capable tasks with a "don't do X" carve-out — compliance is high but not perfect. Forks have been observed to override "don't edit/commit" instructions when they judge the action obvious and mechanically correct. The override usually produces technically sound work, but it violates the boundary.
Practical rules:
- State decision authority explicitly, every time, even when "report only" feels redundant.
- For high-stakes write authority, verify the fork's actions afterwards (
git status,git log -1, file diffs) rather than assuming compliance. - If a boundary violation is unacceptable (e.g., compliance review, sandboxed exploration, "don't touch prod"), do not give the fork write tools at all — keep it strictly in analysis mode.
- The fact that the fork was "right anyway" is not the same as the fork having followed instructions.
Anti-patterns
- Forking trivial work. A fork has overhead. If the task takes < 30 seconds in your main thread, just do it.
- Vague briefs. "Look into the database thing" returns vague output. The fork is not telepathic.
- Forking iterative work. Forks are one-shot. If you need to iterate, you'll re-spec the task each time — usually worse than doing it yourself.
- Recursive forking (forks spawning forks). Disabled by default and should stay disabled unless you have a specific batch-fanout use case.
- Treating fork output as ground truth without verification. Especially for cited code/commit hashes/URLs — forks can hallucinate these like any LLM. Spot-check decisive evidence.
Part 2: pi-observational-memory
How it actually works
Observational memory (OM v3, "session-ledger" architecture) runs an observer agent in the background as your conversation grows. When token thresholds are crossed (defaults: observe at 10k, reflect at 20k, compact at 81k), the observer distills the recent transcript into:
- Observations — timestamped events, each with a 12-character hex ID like
[3682ebfad7af]. Compact one-liners describing what happened in the conversation. - Reflections — durable, long-lived facts about the user, project, decisions, and constraints. Some reflections include observation IDs as evidence pointers.
When compaction fires, the raw transcript is folded away and replaced with a structured summary block containing the observations + reflections. You — the next turn of the same agent — receive that summary block as your starting context. That's the recovery mechanism.
Storage is in-transcript, not on disk. Do not grep for observations.jsonl or similar files; you will not find them. The artifact lives in the model's input context window.
Configuration lives in ~/.pi/agent/settings.json under observational-memory. Tune observeAfterTokens, reflectAfterTokens, compactAfterTokens, and observationsPoolMaxTokens if observations feel sparse or noisy. The default 81k compaction threshold is well-calibrated for typical multi-task sessions.
The recall tool
recall(<12-char-hex-id>) resolves a specific observation or reflection ID back to the original source context — the exact bash output, file contents, tool call results, commit message, or transcript fragment that the observation was distilled from.
Use recall when:
- You are about to make a decision that depends materially on a compacted observation or reflection whose details are unclear.
- You need exact wording, paths, commands, errors, commits, or user constraints behind a remembered claim.
- A broad reflection is relevant but you need its supporting observations to act safely.
- The user asks "why do you believe X" or "what supports that memory".
Do not use recall for:
- Semantic search (it's keyed by ID, not topic — you must already have a specific 12-char hex ID).
- Browsing the transcript out of curiosity.
- Preemptive lookup of every ID in your context "just in case".
Recall costs tokens. Use it when exact source context will materially change your next action.
Calibration note (from a real ~1-month trial, 2026-05/06): across 20 logged container sessions,
recallwas invoked 0 times while obsmem passively carried 529 observations across 6 compactions. Zero recall is a warning sign, not a badge of efficiency — it means decisions after a compaction were made on the distilled one-liner alone, without ever re-checking the source. The injected summary is lossy by design. Default habit to adopt: when you are about to edit code, ship a change, or assert a fact that rests on a[high]/[critical]observation or a reflection you did not produce this turn,recallits ID first. One recall before a load-bearing action is cheap; redoing finished work or contradicting a prior correction is not.
Reading the compaction summary
When you see a block like The conversation history before this point was compacted into the following summary: at the start of a session or turn, that's OM output. Standard structure:
- Reflections at the top: stable facts. Some have IDs in brackets.
- Observations below, chronological: timestamped events with IDs in brackets and importance markers (
[high],[critical], etc.).
When entries conflict, the most recent observation reflects the latest known state. Work that prior observations describe as completed should not be redone unless the user explicitly asks to revisit it.
Anti-patterns
- Treating compacted memory as definitive without recall when stakes are high. Compaction is lossy; the observation may have lost a constraint that was on the line above it in the original transcript.
- Recalling every ID preemptively. Wasteful. Recall on demand.
- Assuming the disk holds OM artifacts. It doesn't. Don't waste time looking.
- Ignoring the summary block when starting a session. It's there because the prior session was real work — read it before answering questions about past work.
Quick Reference
fork(task=..., effort=fast|balanced|deep)
- state decision authority explicitly
- pass verified context up front
- specify deliverable shape
- ask for "unsure about" section
recall(id=<12-char-hex>)
- only when stakes justify the cost
- id must already be visible in your context
- not a search tool
~/.pi/agent/settings.json
pi-fork.effortProfiles — model + thinking-depth per tier
pi-fork.defaultEffort — usually "balanced"
observational-memory.* — token thresholds, model, agentMaxTurns
observational-memory.debugLog: true — opt-in NDJSON telemetry at
~/.pi/agent/observational-memory/debug/<session>.ndjson (off by default)
Installing on a fresh machine (host)
These are git-sourced pi packages (pi-fork is not on npm). Add to the
packages array in ~/.pi/agent/settings.json, or:
pi install git:github.com/elpapi42/pi-fork
pi install git:github.com/elpapi42/pi-observational-memory # default branch: master (no main)
# obsmem is also published: pi install npm:pi-observational-memory
Restart pi after install. Enable observational-memory.debugLog if you want
the next window instrumented.
Evaluating usage
evaluate-extension-usage.py (bundled next to this skill) mines pi session
transcripts for fork/recall counts and obsmem compaction stats. Run it per
machine (transcripts live at ~/.pi/agent/sessions/) for a combined
host+container picture:
./evaluate-extension-usage.py # ~/.pi/agent/sessions
./evaluate-extension-usage.py /path/a /path/b # multiple roots
Part 3: ssh-controlmaster
What it does
When pi is launched with --ssh, this extension rewires pi's read, write, edit, and bash tools to execute on the remote machine, multiplexed over a single SSH ControlMaster socket. Pi is still running locally — the LLM, the UI, the MCP servers, the fork dispatcher all live on your local box — but anything those tools touch on the filesystem is the remote's filesystem.
This is fundamentally different from running pi locally and using bash to ssh inside it: with --ssh, the tool layer itself is remoted, so the LLM thinks it's working in the remote's cwd (the system prompt is rewritten to say so).
Usage
# Key-based auth (preferred), remote cwd defaults to remote $HOME
pi --ssh lagret
# Pin to a specific remote directory
pi --ssh lagret:/volume1/docker/portainer/compose/119
# Password auth (input is NOT masked when typing)
pi --ssh user@host --ssh-ask-pass
The lagret form requires a Host lagret block in ~/.ssh/config or a resolvable hostname. The status bar shows SSH ⚡ own master <host>:<cwd> or SSH ⚡ system master <host>:<cwd> once connected.
How it cooperates with system SSH config
It reads ssh -G <host> to learn the effective config, then:
~/.ssh/config for the host |
Behavior |
|---|---|
ControlMaster auto or yes with a ControlPath |
Reuses the system master socket. Does not tear it down on pi exit ("it was the system's to manage before pi arrived"). |
No ControlMaster configured (or explicitly no) |
Creates its own master at /tmp/pi-cm-<pid>.sock with ControlPersist=yes. Tears it down on pi session_shutdown. |
This means it composes cleanly with the system-wide ssh-control-master-setup.sh helper from the ci-release-watcher skill: if that script has already configured ~/.ssh/config for the host, pi --ssh rides on the existing master rather than opening a parallel connection.
Caveats and edge cases
- Local vs remote tool boundary. Only
read/write/edit/bashare remoted. MCP servers are still local —mempalacefiles drawers and diary entries against the local palace even when your shell work happens remotely. Same forfork,recall,todo, and any other custom tool. This is usually what you want (palace memory survives across remote sessions) but worth knowing. - fork over ssh. Forks spawn locally and inherit the same
--sshmode by virtue of the parent's tool wiring; the fork's bash calls hit the same ControlMaster. Forks burn the same SSH socket, not a parallel one — multiplexing wins again. - macOS Unix socket path limit. The own-master socket lives at
/tmp/pi-cm-<pid>.sockto stay under macOS's ~104-char limit. If you have a non-defaultTMPDIRlong enough to blow this, ssh will fail to start the master. - Password auth password visibility. From the source: "input is NOT masked — the password is visible while typing." The password is written to a chmod-700 SSH_ASKPASS script in
/tmpand deleted after the master establishes; not persisted, but on-screen during entry. - Remote bash environment. The remote shell is whatever
ssh user@host '<cmd>'invokes — typically a non-login non-interactive bash. Don't expect~/.bashrcaliases or PATH manipulations from~/.profile. Pin tool paths or invoke viabash -lc '...'if you need login-shell behavior. - Path translation is naive. The extension does
path.replace(localCwd, remoteCwd)to translate paths in tool calls. If the LLM emits an absolute remote path that doesn't share the local-cwd prefix, the path is passed through unchanged — usually fine but pathological for paths that happen to contain the local-cwd substring.
When to use it
- Editing configs on a NAS / homelab host without scp ping-pong (
pi --ssh lagret:/volume1/...) - Operating against a host whose tools/data you need but whose disk is too slow to mount via SSHFS
- Investigating runner state, container configs, etc., on a remote host as if local
- Multi-step remote work where opening a fresh ssh connection per step would burn your CGNAT flow budget
Anti-patterns
- Using
pi --sshfor one-off shell work. Justsshdirectly. The extension shines when there are dozens of tool calls per session. - Filing palace drawers expecting them on the remote. They go to the local palace. If you want palace artifacts on the remote host, ssh into the remote and run pi there against its local palace.
- Forgetting
--sshin followup sessions. Status bar is the canary — if you don't seeSSH ⚡you're operating locally despite intending remote. Easy mistake on a fresh terminal.
Reaching the devbox host from inside the container (dssh / dscp)
Distinct from pi --ssh above. When the pi-devbox container runs under OrbStack / Docker Desktop on macOS, it can SSH back to its own host. The entrypoint's setup-lan-access.sh regenerates ~/.ssh-local/config on every container start (the in-container ~/.ssh is mounted read-only, so a sidecar config + known_hosts + ControlPath under ~/.ssh-local/ is used instead).
# Interactive shells get aliases (from ~/.bash_aliases):
dssh host 'cmd' # = ssh -F ~/.ssh-local/config host
dscp file host:/path # = scp -F ~/.ssh-local/config ...
The agent's bash tool is non-interactive — those aliases are NOT loaded. Use the explicit form:
ssh -F ~/.ssh-local/config host 'cmd'
scp -F ~/.ssh-local/config <src> host:<dst>
- Host aliases
hostandmacboth resolve tohost.docker.internal(user varies per host machine — check~/.ssh-local/configfor the activeUservalue, key~/.ssh-local/devbox_jump_ed25519,ControlMaster auto/ControlPersist 4h). - The config chains
Include ~/.config/devbox-shell/ssh-lan.confthenInclude ~/.ssh/config, so LAN targets are reachable too (addProxyJump hostto those entries). - Use it for: enabling/inspecting the host's pi config (
~/.pi/agent/settings.json), runningevaluate-extension-usage.pyagainst the host's~/.pi/agent/sessions/for a combined host+container metric, or copying host transcripts into the container. The host's pi runs natively there; its palace, sessions, and extensions are separate from the container's.
Cross-Skill Notes
- mempalace is for cross-session persistent memory (diary, knowledge graph, drawer storage). OM is for within-session context survival across compaction. They complement each other: write a diary entry at session end and let OM compact your work-in-progress mid-session.
- systematic-debugging and test-driven-development skills pair well with deep-tier forks: a deep fork can carry out a focused debugging investigation or write a failing test suite without polluting your main context.
- ci-release-watcher ships a
scripts/ssh-control-master-setup.shhelper that configures system-wide SSH ControlMaster in~/.ssh/config. That's a separate mechanism from thessh-controlmasterpi extension — they compose, they don't overlap. Use the script for persistent host-wide multiplexing, the extension for per-pi-session remote operation.