feat(skills): bake pi-extensions + mempalace fallback skills

The pi-toolkit global AGENTS.md tells every pi session to read ~/.agents/skills/pi-extensions/SKILL.md at start (the fork/recall under-utilisation fix), but that skill lived only in the private skillset repo — so the pointer dangled in any container started without skillset mounted. Bake fallbacks so the pointer always resolves. - pi-extensions (Option 1 + Option 2, layered): * Canonical skill promoted to the public pi-extensions package repo under skill/ (separate commit there); co-located with the code it documents. * rootfs/ carries a committed snapshot (the floor). * Dockerfile.variant copies /opt/pi-extensions/skill/ over the snapshot after the pinned clone, so a normal build ships the fresh package copy (recorded via PI_EXTENSIONS_REF) and an old-ref/mirror build still ships the snapshot. Helper evaluate-extension-usage.py travels with it. - mempalace (Option 2 only): snapshot in rootfs/. Its consumer skill has no public package home (mempalace-toolkit ships a different skill, opencode-mempalace-bridge), so no build-time refresh. - entrypoint links both (only-when-absent; mounted skillset still wins). - smoke-test: build-time presence + package-match check + runtime symlink assertions; readiness gate now waits on the last-linked skill. - docs: skills/VENDORED.md (provenance + refresh), README, AGENTS.md, CHANGELOG [Unreleased]. Note: shipped in the NEXT release; v1.2.0 (run 409) predates this.
2026-06-23 15:32:04 +02:00
parent d619a6e2ec
commit a7d6a7d235
9 changed files with 848 additions and 4 deletions
@@ -0,0 +1,298 @@
+---
+name: pi-extensions
+description: >-
+  Use the pi extensions (pi-fork, pi-observational-memory, ssh-controlmaster) effectively in the pi coding agent harness. Load this skill only when running inside pi (detection - `fork` and `recall` are present in your tool list, or `pi --ssh` was used to start the session). pi-fork dispatches focused subtasks to forked agents at fast/balanced/deep effort tiers; pi-observational-memory compacts long sessions into recallable observations + reflections; ssh-controlmaster rewires pi's read/write/edit/bash tools to execute on a remote host over a multiplexed SSH connection. This skill covers tier selection, task design, boundary discipline, when to use recall, and remote-pi mechanics.
+---
+
+# Pi Extensions: pi-fork, pi-observational-memory, ssh-controlmaster
+
+## When to Load This Skill
+
+Load only when **both** of these are true:
+
+1. You are running inside the **pi coding agent harness** (not Claude Code, not opencode, not any other harness).
+2. The `fork` and/or `recall` tools appear in your available tool list, **or** the session was started with `pi --ssh ...`.
+
+If you do not see those tools, this skill does not apply — skip it. Other harnesses do not have these extensions and the patterns below will not work there.
+
+This skill is most useful at the start of any non-trivial session where you may need to dispatch parallel subtasks, where the conversation is likely to compact (sessions running > ~80k tokens), or where pi is operating against a remote host.
+
+## Pi extension landscape (where the wiring lives)
+
+Pi has **two distinct extension locations** and it's easy to look in the wrong one:
+
+| Location | Mechanism | Examples |
+|---|---|---|
+| `~/.pi/agent/extensions/*.ts` (or `.ts.off`) | **Local extensions** — TypeScript files, usually symlinks into `/opt/pi-extensions/extensions/` or similar. Toggled via `/ext` slash command. | `ssh-controlmaster`, `git-checkpoint`, `notify`, `todo`, `mempalace`, `mcp-loader`, `ext-toggle`, `confirm-destructive` |
+| `~/.pi/agent/git/<host>/<owner>/<repo>/` | **Package extensions** — git-cloned npm packages registered via the `packages` array in `~/.pi/agent/settings.json`. | `pi-fork` (`github.com/elpapi42/pi-fork`), `pi-observational-memory` (`github.com/elpapi42/pi-observational-memory`, **default branch `master`** — a `main` branch does not exist, so `pi install git:...` resolves against `master`) |
+
+When the user asks how to use "the X extension", **check both locations** — `find ~/.pi/agent -maxdepth 4 -name "*X*"` covers both. The `/ext` slash command shows the local-extensions list with enable/disable state. There is also a distinct skill-bundled-script category (e.g. `ci-release-watcher`'s `ssh-control-master-setup.sh`) which is **not** a pi extension at all — it's a helper script inside a skill. Don't conflate the three.
+
+## Why These Extensions Belong Together
+
+pi-fork and pi-observational-memory are symbiotic. **pi-fork burns context** (each fork dispatches a focused subtask whose detailed exploration would otherwise pollute your main thread). **pi-observational-memory preserves context** (when the main thread eventually compacts, observations + reflections survive the fold and can be recalled by ID). Aggressive forking only works long-term if the surviving summary is high-fidelity, and OM only earns its keep when it's preserving genuinely valuable distilled work.
+
+ssh-controlmaster is orthogonal but composes cleanly: when pi is operating remotely, fork still spawns local sub-agents (each fork *itself* doesn't ssh), but their `bash`/`read`/`write`/`edit` calls do — see Part 3 caveats.
+
+---
+
+## Part 1: pi-fork
+
+### Effort tier mapping
+
+Configured in `~/.pi/agent/settings.json` under `pi-fork.effortProfiles`. The conventional mapping is:
+
+| Tier | Model | Use for |
+|---|---|---|
+| `fast` | haiku | mechanical edits, narrow lookups, file-listing, single-fact verification, simple syntactic checks |
+| `balanced` | sonnet (default) | normal exploration, implementation, testing, code review, option analysis |
+| `deep` | opus | architecture decisions, security analysis, concurrency reasoning, ambiguous debugging, high-risk reviews, runbook drafting where subtle mistakes are costly |
+
+**Rule of thumb:** start at `balanced` unless you have a specific reason to go up or down. Going too cheap on a deep task wastes a fork; going too expensive on a mechanical task is just slow.
+
+### When to fork vs. do it yourself
+
+Fork when **any** of:
+- The task requires reading many files whose contents you don't need to keep in your main context afterwards (the fork returns a dense summary; raw file contents stay in the fork's context and are discarded).
+- You want to run multiple analyses in **parallel** (especially: comparing N options, where independent reasoning is itself a signal — see "parallel forks" below).
+- The task is well-scoped enough to specify completely up front and well-bounded enough that returning a dense report is more useful than continuing the dialogue.
+- You are about to do something that would burn a lot of tokens on tool calls (long file reads, many bash invocations) whose output you will mostly discard.
+
+Don't fork when:
+- The work fits in your current context budget without crowding out what comes next.
+- The task is exploratory and you'll need to iterate based on what you find (forking turns iteration into round-trips with full task-spec rewrites).
+- You need to make decisions during the work that depend on context only the main thread has.
+
+### Task design: the four things a fork brief must contain
+
+1. **Verified context up front.** Do not say "go look at the codebase and figure out X". Pass the facts you already know — file paths, version numbers, observed behavior, prior decisions. The fork should be reasoning *from* context, not *finding* context. Discovery work costs the fork tokens that don't come back to you.
+2. **A specific deliverable.** "Analyze X" is too vague. "Return a comparison table of A/B/C across these 8 axes, plus a recommendation with reasoning, plus a concrete next step" gives the fork a shape to fill.
+3. **Decision authority.** State explicitly what the fork may and may not do: "report only, no edits" / "may write to /tmp/, no commits" / "may edit files in /workspace/foo, may not commit" / unspecified (the fork will infer conservatively). **State this even when it seems obvious.** See "Boundary discipline" below.
+4. **What "unsure" looks like.** Tell the fork to surface ambiguities back to you rather than resolve them silently. "Things I'm unsure about" sections at the end of fork output are gold — they're where a confident-sounding wrong answer would otherwise hide.
+
+### Parallel forks for option-comparison
+
+When facing a "which approach should we take" question with 2–4 candidate approaches, dispatching the candidates as parallel forks is high-leverage:
+
+- They reason **independently**. No fork sees the others' work.
+- **Convergence is signal.** If three forks at different effort tiers reach the same recommendation citing different evidence, that's a strong validation that doesn't depend on any one model's bias.
+- **Divergence is also signal.** If one disagrees, read its reasoning carefully — it may have spotted something the others missed, or it may have a tier-specific weakness worth knowing.
+
+Sample shape for an option-comparison call:
+- Fork 1 (deep) — detailed runbook for option A, with timing/risk/rollback
+- Fork 2 (balanced) — comparison table A vs B vs C across N axes, with a recommendation
+- Fork 3 (fast) — focused sub-question (e.g., "which container image / library version / CLI flag")
+
+This costs more than a single fork but the cross-validation is often worth it for decisions you'll execute on prod systems.
+
+### Boundary discipline (observed behavior)
+
+Forks **mostly** honor explicit decision-authority instructions, but not infallibly. Observed pattern from real sessions:
+
+- **Pure analysis tasks** (no write authority, "report only") — high compliance. Forks reliably return analysis without editing files or committing.
+- **Write-capable tasks with a "don't do X" carve-out** — compliance is high but not perfect. Forks have been observed to override "don't edit/commit" instructions when they judge the action obvious and mechanically correct. The override usually produces technically sound work, but it violates the boundary.
+
+**Practical rules:**
+- State decision authority explicitly, every time, even when "report only" feels redundant.
+- For high-stakes write authority, verify the fork's actions afterwards (`git status`, `git log -1`, file diffs) rather than assuming compliance.
+- If a boundary violation is unacceptable (e.g., compliance review, sandboxed exploration, "don't touch prod"), do not give the fork write tools at all — keep it strictly in analysis mode.
+- The fact that the fork was "right anyway" is not the same as the fork having followed instructions.
+
+### Anti-patterns
+
+- **Forking trivial work.** A fork has overhead. If the task takes < 30 seconds in your main thread, just do it.
+- **Vague briefs.** "Look into the database thing" returns vague output. The fork is not telepathic.
+- **Forking iterative work.** Forks are one-shot. If you need to iterate, you'll re-spec the task each time — usually worse than doing it yourself.
+- **Recursive forking** (forks spawning forks). Disabled by default and should stay disabled unless you have a specific batch-fanout use case.
+- **Treating fork output as ground truth without verification.** Especially for cited code/commit hashes/URLs — forks can hallucinate these like any LLM. Spot-check decisive evidence.
+
+---
+
+## Part 2: pi-observational-memory
+
+### How it actually works
+
+Observational memory (OM v3, "session-ledger" architecture) runs an **observer agent** in the background as your conversation grows. When token thresholds are crossed (defaults: observe at 10k, reflect at 20k, compact at 81k), the observer distills the recent transcript into:
+
+- **Observations** — timestamped events, each with a 12-character hex ID like `[3682ebfad7af]`. Compact one-liners describing what happened in the conversation.
+- **Reflections** — durable, long-lived facts about the user, project, decisions, and constraints. Some reflections include observation IDs as evidence pointers.
+
+When compaction fires, the raw transcript is folded away and replaced with a structured summary block containing the observations + reflections. **You — the next turn of the same agent — receive that summary block as your starting context.** That's the recovery mechanism.
+
+**Storage is in-transcript, not on disk.** Do not grep for `observations.jsonl` or similar files; you will not find them. The artifact lives in the model's input context window.
+
+Configuration lives in `~/.pi/agent/settings.json` under `observational-memory`. Tune `observeAfterTokens`, `reflectAfterTokens`, `compactAfterTokens`, and `observationsPoolMaxTokens` if observations feel sparse or noisy. The default 81k compaction threshold is well-calibrated for typical multi-task sessions.
+
+### The `recall` tool
+
+`recall(<12-char-hex-id>)` resolves a specific observation or reflection ID back to the original source context — the exact bash output, file contents, tool call results, commit message, or transcript fragment that the observation was distilled from.
+
+**Use recall when:**
+- You are about to make a decision that depends materially on a compacted observation or reflection whose details are unclear.
+- You need exact wording, paths, commands, errors, commits, or user constraints behind a remembered claim.
+- A broad reflection is relevant but you need its supporting observations to act safely.
+- The user asks "why do you believe X" or "what supports that memory".
+
+**Do not use recall for:**
+- Semantic search (it's keyed by ID, not topic — you must already have a specific 12-char hex ID).
+- Browsing the transcript out of curiosity.
+- Preemptive lookup of every ID in your context "just in case".
+
+Recall costs tokens. Use it when exact source context will materially change your next action.
+
+> **Calibration note (from a real ~1-month trial, 2026-05/06):** across 20 logged container sessions, `recall` was invoked **0 times** while obsmem passively carried 529 observations across 6 compactions. Zero recall is a *warning sign*, not a badge of efficiency — it means decisions after a compaction were made on the distilled one-liner alone, without ever re-checking the source. The injected summary is **lossy by design**. Default habit to adopt: when you are about to **edit code, ship a change, or assert a fact** that rests on a `[high]`/`[critical]` observation or a reflection you did not produce *this* turn, `recall` its ID **first**. One recall before a load-bearing action is cheap; redoing finished work or contradicting a prior correction is not.
+
+### Reading the compaction summary
+
+When you see a block like `The conversation history before this point was compacted into the following summary:` at the start of a session or turn, that's OM output. Standard structure:
+
+- **Reflections** at the top: stable facts. Some have IDs in brackets.
+- **Observations** below, chronological: timestamped events with IDs in brackets and importance markers (`[high]`, `[critical]`, etc.).
+
+When entries conflict, **the most recent observation reflects the latest known state.** Work that prior observations describe as completed should not be redone unless the user explicitly asks to revisit it.
+
+### Anti-patterns
+
+- **Treating compacted memory as definitive without recall** when stakes are high. Compaction is lossy; the observation may have lost a constraint that was on the line above it in the original transcript.
+- **Recalling every ID preemptively.** Wasteful. Recall on demand.
+- **Assuming the disk holds OM artifacts.** It doesn't. Don't waste time looking.
+- **Ignoring the summary block** when starting a session. It's there because the prior session was real work — read it before answering questions about past work.
+
+---
+
+## Quick Reference
+
+```
+fork(task=..., effort=fast|balanced|deep)
+  - state decision authority explicitly
+  - pass verified context up front
+  - specify deliverable shape
+  - ask for "unsure about" section
+
+recall(id=<12-char-hex>)
+  - only when stakes justify the cost
+  - id must already be visible in your context
+  - not a search tool
+```
+
+```
+~/.pi/agent/settings.json
+  pi-fork.effortProfiles    — model + thinking-depth per tier
+  pi-fork.defaultEffort     — usually "balanced"
+  observational-memory.*    — token thresholds, model, agentMaxTurns
+  observational-memory.debugLog: true  — opt-in NDJSON telemetry at
+    ~/.pi/agent/observational-memory/debug/<session>.ndjson (off by default)
+```
+
+### Installing on a fresh machine (host)
+
+These are git-sourced pi packages (pi-fork is **not** on npm). Add to the
+`packages` array in `~/.pi/agent/settings.json`, or:
+
+```
+pi install git:github.com/elpapi42/pi-fork
+pi install git:github.com/elpapi42/pi-observational-memory   # default branch: master (no main)
+# obsmem is also published: pi install npm:pi-observational-memory
+```
+
+Restart pi after install. Enable `observational-memory.debugLog` if you want
+the next window instrumented.
+
+### Evaluating usage
+
+`evaluate-extension-usage.py` (bundled next to this skill) mines pi session
+transcripts for fork/recall counts and obsmem compaction stats. Run it per
+machine (transcripts live at `~/.pi/agent/sessions/`) for a combined
+host+container picture:
+
+```
+./evaluate-extension-usage.py                  # ~/.pi/agent/sessions
+./evaluate-extension-usage.py /path/a /path/b  # multiple roots
+```
+
+---
+
+## Part 3: ssh-controlmaster
+
+### What it does
+
+When pi is launched with `--ssh`, this extension **rewires pi's `read`, `write`, `edit`, and `bash` tools to execute on the remote machine**, multiplexed over a single SSH ControlMaster socket. Pi is still running locally — the LLM, the UI, the MCP servers, the fork dispatcher all live on your local box — but anything those tools touch on the filesystem is the *remote's* filesystem.
+
+This is fundamentally different from running pi locally and using `bash` to ssh inside it: with `--ssh`, the tool layer itself is remoted, so the LLM thinks it's working in the remote's `cwd` (the system prompt is rewritten to say so).
+
+### Usage
+
+```bash
+# Key-based auth (preferred), remote cwd defaults to remote $HOME
+pi --ssh lagret
+
+# Pin to a specific remote directory
+pi --ssh lagret:/volume1/docker/portainer/compose/119
+
+# Password auth (input is NOT masked when typing)
+pi --ssh user@host --ssh-ask-pass
+```
+
+The `lagret` form requires a `Host lagret` block in `~/.ssh/config` or a resolvable hostname. The status bar shows `SSH ⚡ own master <host>:<cwd>` or `SSH ⚡ system master <host>:<cwd>` once connected.
+
+### How it cooperates with system SSH config
+
+It reads `ssh -G <host>` to learn the effective config, then:
+
+| `~/.ssh/config` for the host | Behavior |
+|---|---|
+| `ControlMaster auto` or `yes` with a `ControlPath` | Reuses the system master socket. Does **not** tear it down on pi exit ("it was the system's to manage before pi arrived"). |
+| No ControlMaster configured (or explicitly `no`) | Creates its own master at `/tmp/pi-cm-<pid>.sock` with `ControlPersist=yes`. Tears it down on pi `session_shutdown`. |
+
+This means it composes cleanly with the system-wide `ssh-control-master-setup.sh` helper from the `ci-release-watcher` skill: if that script has already configured `~/.ssh/config` for the host, `pi --ssh` rides on the existing master rather than opening a parallel connection.
+
+### Caveats and edge cases
+
+- **Local vs remote tool boundary.** Only `read`/`write`/`edit`/`bash` are remoted. **MCP servers are still local** — `mempalace` files drawers and diary entries against the local palace even when your shell work happens remotely. Same for `fork`, `recall`, `todo`, and any other custom tool. This is usually what you want (palace memory survives across remote sessions) but worth knowing.
+- **fork over ssh.** Forks spawn locally and inherit the same `--ssh` mode by virtue of the parent's tool wiring; the fork's bash calls hit the same ControlMaster. Forks burn the same SSH socket, not a parallel one — multiplexing wins again.
+- **macOS Unix socket path limit.** The own-master socket lives at `/tmp/pi-cm-<pid>.sock` to stay under macOS's ~104-char limit. If you have a non-default `TMPDIR` long enough to blow this, ssh will fail to start the master.
+- **Password auth password visibility.** From the source: *"input is NOT masked — the password is visible while typing."* The password is written to a chmod-700 SSH_ASKPASS script in `/tmp` and deleted after the master establishes; not persisted, but on-screen during entry.
+- **Remote bash environment.** The remote shell is whatever `ssh user@host '<cmd>'` invokes — typically a non-login non-interactive bash. Don't expect `~/.bashrc` aliases or PATH manipulations from `~/.profile`. Pin tool paths or invoke via `bash -lc '...'` if you need login-shell behavior.
+- **Path translation is naive.** The extension does `path.replace(localCwd, remoteCwd)` to translate paths in tool calls. If the LLM emits an absolute remote path that doesn't share the local-cwd prefix, the path is passed through unchanged — usually fine but pathological for paths that happen to contain the local-cwd substring.
+
+### When to use it
+
+- Editing configs on a NAS / homelab host without scp ping-pong (`pi --ssh lagret:/volume1/...`)
+- Operating against a host whose tools/data you need but whose disk is too slow to mount via SSHFS
+- Investigating runner state, container configs, etc., on a remote host as if local
+- Multi-step remote work where opening a fresh ssh connection per step would burn your CGNAT flow budget
+
+### Anti-patterns
+
+- **Using `pi --ssh` for one-off shell work.** Just `ssh` directly. The extension shines when there are dozens of tool calls per session.
+- **Filing palace drawers expecting them on the remote.** They go to the local palace. If you want palace artifacts on the remote host, ssh into the remote and run pi *there* against its local palace.
+- **Forgetting `--ssh` in followup sessions.** Status bar is the canary — if you don't see `SSH ⚡` you're operating locally despite intending remote. Easy mistake on a fresh terminal.
+
+### Reaching the devbox host from inside the container (`dssh` / `dscp`)
+
+Distinct from `pi --ssh` above. When the **pi-devbox container** runs under OrbStack / Docker Desktop on macOS, it can SSH back to its own host. The entrypoint's `setup-lan-access.sh` regenerates `~/.ssh-local/config` on **every container start** (the in-container `~/.ssh` is mounted read-only, so a sidecar config + `known_hosts` + `ControlPath` under `~/.ssh-local/` is used instead).
+
+```bash
+# Interactive shells get aliases (from ~/.bash_aliases):
+dssh host 'cmd'        # = ssh -F ~/.ssh-local/config host
+dscp file host:/path   # = scp -F ~/.ssh-local/config ...
+```
+
+**The agent's `bash` tool is non-interactive — those aliases are NOT loaded.** Use the explicit form:
+
+```bash
+ssh  -F ~/.ssh-local/config host 'cmd'
+scp  -F ~/.ssh-local/config <src> host:<dst>
+```
+
+- Host aliases `host` and `mac` both resolve to `host.docker.internal` (user varies per host machine — check `~/.ssh-local/config` for the active `User` value, key `~/.ssh-local/devbox_jump_ed25519`, `ControlMaster auto` / `ControlPersist 4h`).
+- The config chains `Include ~/.config/devbox-shell/ssh-lan.conf` then `Include ~/.ssh/config`, so LAN targets are reachable too (add `ProxyJump host` to those entries).
+- **Use it for:** enabling/inspecting the host's pi config (`~/.pi/agent/settings.json`), running `evaluate-extension-usage.py` against the host's `~/.pi/agent/sessions/` for a combined host+container metric, or copying host transcripts into the container. The host's pi runs natively there; its palace, sessions, and extensions are separate from the container's.
+
+---
+
+## Cross-Skill Notes
+
+- **mempalace** is for cross-session persistent memory (diary, knowledge graph, drawer storage). OM is for **within-session** context survival across compaction. They complement each other: write a diary entry at session end *and* let OM compact your work-in-progress mid-session.
+- **systematic-debugging** and **test-driven-development** skills pair well with deep-tier forks: a deep fork can carry out a focused debugging investigation or write a failing test suite without polluting your main context.
+- **ci-release-watcher** ships a `scripts/ssh-control-master-setup.sh` helper that configures system-wide SSH ControlMaster in `~/.ssh/config`. That's a separate mechanism from the `ssh-controlmaster` pi extension — they compose, they don't overlap. Use the script for persistent host-wide multiplexing, the extension for per-pi-session remote operation.