Compare commits

..

3 Commits

Author SHA1 Message Date
Joakim Persson 13e67599c4 release: v1.2.1 — fallback skills + mempalace directive
Publish Docker Image / smoke-studio (push) Successful in 6m27s
Publish Docker Image / build-variant (push) Successful in 16m16s
Publish Docker Image / update-description (push) Successful in 6s
Publish Docker Image / promote-base-latest (push) Successful in 8s
Publish Docker Image / build-variant-studio (push) Successful in 21m13s
Publish Docker Image / resolve-versions (push) Successful in 6s
Publish Docker Image / base-decide (push) Successful in 7s
Publish Docker Image / build-base (push) Successful in 46m21s
Publish Docker Image / smoke (push) Successful in 3m43s
Bake pi-extensions + mempalace skills into the image (available without a
mounted skillset) and add the mempalace session-start proactive-load directive
so frequently-recreated containers actually pick the skill up. Closes the
fork/recall + mempalace under-utilisation gap.

CHANGELOG: [Unreleased] -> v1.2.1.
2026-06-23 16:02:57 +02:00
Joakim Persson 7551947466 feat(skills): add mempalace proactive-load directive for containers
Baking the mempalace fallback skill fixed *availability*, but mempalace had
no proactive-load directive anywhere (pi-toolkit's global AGENTS.md only
points to pi-extensions), so a new container would still surface it only via
description-matching — the same under-utilisation the pi-extensions directive
was created to fix.

Add a session-start pointer to the pi-devbox managed AGENTS.md block
(pi-global-AGENTS.append.md): gated to pi-devbox containers and conditional on
the MemPalace MCP tools being present. Memory continuity matters most in a
frequently-recreated container — the palace is its only cross-recreate memory.

- pi-global-AGENTS.append.md: '## Session start: load the mempalace skill'.
- smoke-test: assert the pointer merges into the global AGENTS.md at build.
- docs: VENDORED.md, README, CHANGELOG [Unreleased].

Now both skills are complete in pi-devbox: directive + skill file.
pi-extensions = directive (pi-toolkit) + baked skill; mempalace = directive
(this block) + baked skill.
2026-06-23 15:54:13 +02:00
Joakim Persson a7d6a7d235 feat(skills): bake pi-extensions + mempalace fallback skills
The pi-toolkit global AGENTS.md tells every pi session to read
~/.agents/skills/pi-extensions/SKILL.md at start (the fork/recall
under-utilisation fix), but that skill lived only in the private skillset
repo — so the pointer dangled in any container started without skillset
mounted. Bake fallbacks so the pointer always resolves.

- pi-extensions (Option 1 + Option 2, layered):
  * Canonical skill promoted to the public pi-extensions package repo under
    skill/ (separate commit there); co-located with the code it documents.
  * rootfs/ carries a committed snapshot (the floor).
  * Dockerfile.variant copies /opt/pi-extensions/skill/ over the snapshot
    after the pinned clone, so a normal build ships the fresh package copy
    (recorded via PI_EXTENSIONS_REF) and an old-ref/mirror build still ships
    the snapshot. Helper evaluate-extension-usage.py travels with it.
- mempalace (Option 2 only): snapshot in rootfs/. Its consumer skill has no
  public package home (mempalace-toolkit ships a different skill,
  opencode-mempalace-bridge), so no build-time refresh.
- entrypoint links both (only-when-absent; mounted skillset still wins).
- smoke-test: build-time presence + package-match check + runtime symlink
  assertions; readiness gate now waits on the last-linked skill.
- docs: skills/VENDORED.md (provenance + refresh), README, AGENTS.md,
  CHANGELOG [Unreleased].

Note: shipped in the NEXT release; v1.2.0 (run 409) predates this.
2026-06-23 15:32:04 +02:00
10 changed files with 884 additions and 4 deletions
+8 -3
View File
@@ -17,7 +17,10 @@ re-brand of opencode-devbox's `pi-only` variant.
(`-studio` variant). Also appends the pi-devbox managed block from
`pi-global-AGENTS.append.md` onto pi-toolkit's `pi-global-AGENTS.md` (the
single global instruction slot pi loads) so containers proactively load the
baked `pi-devbox-environment` skill. Idempotent via a marker grep.
baked `pi-devbox-environment` skill. Idempotent via a marker grep. After the
pinned clones it also refreshes the vendored `pi-extensions` fallback skill
by copying `/opt/pi-extensions/skill/` over the committed `rootfs/` snapshot
(Option 1 over Option 2 — see `skills/VENDORED.md`).
- `entrypoint.sh` — UID/GID alignment as root, then drops to `developer`.
- `entrypoint-user.sh` — per-container start: SSH ControlMaster socket
dir, LAN-access setup, MemPalace init, pi-toolkit + pi-extensions
@@ -27,8 +30,10 @@ re-brand of opencode-devbox's `pi-only` variant.
- `rootfs/` — files baked into the image (bash aliases, inputrc,
setup-lan-access.sh, `studio-expose` helper). Also
`usr/local/share/pi-devbox/skills/<name>/SKILL.md` — image-baked agent
skills (e.g. `pi-devbox-environment`) symlinked into `~/.agents/skills/` by
the entrypoint, available with or without a mounted skillset — plus
skills (the repo-authored `pi-devbox-environment`, plus vendored fallback
copies of `pi-extensions` and `mempalace` — see `skills/VENDORED.md`)
symlinked into `~/.agents/skills/` by the entrypoint, available with or
without a mounted skillset — plus
`usr/local/share/pi-devbox/pi-global-AGENTS.append.md` (the global-AGENTS
pointer concatenated in `Dockerfile.variant`).
- `scripts/smoke-test.sh` — sanity checks run by CI before pushing to Hub.
+43
View File
@@ -11,6 +11,49 @@ Pre-v1.0.0 tags followed the pi npm version (`v{pi_version}[letter]`).
---
## v1.2.1 — 2026-06-22
Patch release: close the fork/recall + mempalace **under-utilisation gap** in
containers started without the private `skillset` repo — bake the
`pi-extensions` and `mempalace` skills into the image and add the missing
mempalace session-start directive. pi version is re-resolved from npm `latest`
at build.
### Added
- **Vendored fallback skills: `pi-extensions` + `mempalace`.** The pi-toolkit
global `AGENTS.md` directs every pi session to read
`~/.agents/skills/pi-extensions/SKILL.md` at start (the fix for fork/recall
under-utilisation). That pointer dangled in a container started **without**
the private `skillset` repo mounted. The image now bakes fallback copies of
both skills under `/usr/local/share/pi-devbox/skills/`, symlinked in by
`entrypoint-user.sh` (only when absent, so a mounted skillset still wins).
- **Proactive-load directive for `mempalace`.** Baking the skill only fixes
*availability*; nothing in pi-toolkit's global `AGENTS.md` told sessions to
load it, so it would still surface only via description-matching. The
pi-devbox managed block (`pi-global-AGENTS.append.md`) now adds a
session-start pointer (gated to pi-devbox containers, conditional on the
MemPalace MCP tools being present) so a new container actually picks the
skill up — memory continuity matters most in a frequently-recreated
container. (`pi-extensions`'s directive already ships in pi-toolkit, so only
its skill file needed baking.)
- **Layered freshness for the `pi-extensions` skill (Option 1 + Option 2).**
The canonical skill was promoted into the **public `pi-extensions` package
repo** under `skill/` (co-located with the extensions it documents). A
committed snapshot in `rootfs/` is the *floor*; `Dockerfile.variant` copies
`/opt/pi-extensions/skill/` (the pinned, manifest-recorded clone) over it at
build, so a normal build ships the fresh package copy and an old-ref/mirror
build still ships the snapshot. `mempalace` is snapshot-only (its consumer
skill has no public package home — the `mempalace-toolkit` repo ships a
*different* skill, `opencode-mempalace-bridge`). Provenance + refresh steps:
`rootfs/usr/local/share/pi-devbox/skills/VENDORED.md`.
- **Smoke-test coverage** for the fallback skills: build-time presence of both
`SKILL.md`s and the `pi-extensions` helper, a check that the baked
`pi-extensions` skill matches the package copy when the clone carries it, and
runtime assertions that both are symlinked into `~/.agents/skills/`.
---
## v1.2.0 — 2026-06-22
Minor release: **image-baked agent skills** — a new base mechanism that ships
+22
View File
@@ -94,6 +94,28 @@ RUN set -e && \
echo "pi-fork at $(cd /opt/pi-fork && git rev-parse --short HEAD)" && \
echo "pi-observational-memory at $(cd /opt/pi-observational-memory && git rev-parse --short HEAD)"
# ── Image-baked skill refresh: pi-extensions (Option 1 over Option 2) ──
# rootfs ships a VENDORED snapshot of the pi-extensions skill at
# /usr/local/share/pi-devbox/skills/pi-extensions/ (the "floor" — guarantees the
# skill is always in the image). The pi-extensions PACKAGE repo now co-locates
# the canonical skill under skill/, so here — after the pinned clone — we copy
# that over the snapshot. Result: a normal build ships the fresh, package-owned
# copy (pinned + recorded in the manifest via PI_EXTENSIONS_REF); a build whose
# ref predates the skill, or a fork pointing at a mirror without it, still ships
# the committed snapshot. The skill calls ./evaluate-extension-usage.py, so it
# is copied alongside. Idempotent and cache-safe (depends only on the clone).
RUN if [ -f /opt/pi-extensions/skill/SKILL.md ]; then \
cp /opt/pi-extensions/skill/SKILL.md \
/usr/local/share/pi-devbox/skills/pi-extensions/SKILL.md && \
if [ -f /opt/pi-extensions/skill/evaluate-extension-usage.py ]; then \
cp /opt/pi-extensions/skill/evaluate-extension-usage.py \
/usr/local/share/pi-devbox/skills/pi-extensions/evaluate-extension-usage.py ; \
fi && \
echo "refreshed pi-extensions skill from package @ $(cd /opt/pi-extensions && git rev-parse --short HEAD)" ; \
else \
echo "pi-extensions package has no skill/ at this ref — keeping vendored snapshot" ; \
fi
# ── pi-devbox awareness: append our pointer to the global AGENTS.md ──
# pi loads a SINGLE global instruction file (~/.pi/agent/AGENTS.md), which
# pi-toolkit's install.sh re-symlinks to /opt/pi-toolkit/pi-global-AGENTS.md on
+19 -1
View File
@@ -459,6 +459,22 @@ directory, and they compose:
tmux 0-indexing, uv-first Python, and pi-studio reachability, all as
*mechanisms* (deployment-specific hostnames/domains/nameservers are
discovered at runtime, never hardcoded).
- **Vendored fallback skills.** The pi-toolkit global `AGENTS.md` tells every
pi session to read `~/.agents/skills/pi-extensions/SKILL.md` at start (to fix
fork/recall under-utilisation). That pointer would dangle in a container
started *without* the private `skillset` repo, so the image also bakes
fallback copies of **`pi-extensions`** and **`mempalace`**. They are
symlinked only when absent, so a mounted skillset always overrides them. The
`pi-extensions` skill is *layered*: a committed snapshot in `rootfs/` is the
floor, and `Dockerfile.variant` copies the canonical, package-owned copy from
the pinned `pi-extensions` clone (`/opt/pi-extensions/skill/`) over it at
build, so a normal build ships the fresh copy and an old-ref/mirror build
still ships the snapshot. `mempalace` is snapshot-only (its consumer skill
has no public package home), and because pi-toolkit's `AGENTS.md` has no
directive for it, the pi-devbox managed block adds a session-start
*proactive-load* pointer for it (gated to pi-devbox containers, conditional
on the MemPalace MCP tools) so a new container actually loads it. See
`rootfs/usr/local/share/pi-devbox/skills/VENDORED.md`.
- **Skillset repo (optional).** If a `skillset` repo is mounted (at
`$HOME/skillset` or `/workspace/skillset`, or via `SKILLSET_CONTAINER_PATH`),
`deploy-skills.sh` symlinks its skills in too. Image-baked skills are
@@ -472,7 +488,9 @@ pointer fires only inside a pi-devbox container (it checks for
To add another image-baked skill: drop a `SKILL.md` under
`rootfs/usr/local/share/pi-devbox/skills/<name>/`; the `COPY` in
`Dockerfile.base` and the entrypoint symlink loop pick it up automatically.
`Dockerfile.base` and the entrypoint symlink loop pick it up automatically. To
refresh a vendored fallback, see
`rootfs/usr/local/share/pi-devbox/skills/VENDORED.md`.
## SSH and ControlMaster
@@ -18,3 +18,14 @@ be discovered at runtime, never assumed. And interactive shell aliases
(`dssh`, `dscp`, `cat``bat`) do **not** exist in your non-interactive bash
tool, so spell out the underlying command (e.g.
`ssh -F "$HOME/.ssh-local/config" mac …`).
## Session start: load the mempalace skill
If MemPalace MCP tools (e.g. `mempalace_search`, `mempalace_diary_write`) are in
your tool list, **read `~/.agents/skills/mempalace/SKILL.md` before doing
non-trivial work** and follow its protocol: search the palace before answering
about past work, and write a diary entry before the session ends. This is
especially load-bearing here — a pi-devbox container is frequently recreated, so
the palace is your only memory across recreates. Without the habit it is just
storage, not memory. (The skill is the consumer side; feeding the palace is the
separate `opencode-mempalace-bridge` skill, if present.)
@@ -0,0 +1,47 @@
# Vendored fallback skills
Most directories here are **image-baked skills** that `entrypoint-user.sh`
symlinks into `~/.agents/skills/` on container start (only when a skill of the
same name is not already present, so a mounted `skillset` repo or a user
override always wins).
| skill | owner | how it gets here |
|-------|-------|------------------|
| `pi-devbox-environment` | pi-devbox (this repo) | authored here; the canonical copy |
| `pi-extensions` | the `pi-extensions` package repo (`skill/`) | **vendored fallback** + refreshed at build |
| `mempalace` | the `skillset` repo | **vendored fallback** (snapshot only) |
## Why fallbacks exist
The pi-toolkit global `AGENTS.md` tells every pi session to read
`~/.agents/skills/pi-extensions/SKILL.md` at start (to fix fork/recall
under-utilisation). That pointer dangles in a container started **without** the
private `skillset` repo mounted. Baking the skill closes that *availability*
gap. `mempalace` is baked for the same reason (memory continuity); since
nothing in pi-toolkit's `AGENTS.md` points to it, the pi-devbox managed block
(`pi-global-AGENTS.append.md`) also adds the matching *proactive-load*
directive ("load the mempalace skill at session start") so a new container
actually picks it up rather than relying on description-matching.
`pi-extensions`'s directive already ships in pi-toolkit's `AGENTS.md`, so only
its skill file needed baking.
## Freshness model (layered — see Dockerfile.variant)
- **`pi-extensions`** — Option 1 + Option 2. The committed copy here is the
*floor*; at build time `Dockerfile.variant` copies `/opt/pi-extensions/skill/`
(the pinned, package-owned source) over it, so a normal build ships the fresh
package copy and a stale-ref / mirror build still ships the snapshot. Keep
`evaluate-extension-usage.py` alongside `SKILL.md` — the skill calls it via
`./`.
- **`mempalace`** — Option 2 only. The `mempalace` *consumer* skill lives only
in the private `skillset` repo (the `mempalace-toolkit` repo ships a
*different* skill, `opencode-mempalace-bridge`), so there is no public
package source to copy from. This snapshot is refreshed manually per release.
## Refreshing the snapshots
cp <skillset>/skills/pi-extensions/SKILL.md pi-extensions/SKILL.md
cp <skillset>/skills/pi-extensions/evaluate-extension-usage.py pi-extensions/
cp <skillset>/skills/mempalace/SKILL.md mempalace/SKILL.md
Snapshot provenance at last refresh: skillset `8e8db64`, pi-extensions pkg `a7f3044`.
@@ -0,0 +1,301 @@
---
name: mempalace
description: MemPalace agent memory protocol. Use on every session to maintain continuity across conversations — search before answering about past work, write diary entries before session ends, and mine new projects into the palace. Load this skill at session start.
---
# MemPalace Agent Memory Protocol
## Overview
MemPalace gives you persistent memory across sessions via an MCP server. It stores project knowledge (mined from files), conversation summaries (diary entries), and entity relationships (knowledge graph). Without this protocol, you have tools but no habits — and memory without habits is just storage.
**Core principle:** Storage is not memory. Storage + protocol = memory.
## When to Load This Skill
- At the **start of every session** (proactively, before the user asks)
- When the user mentions **past conversations, decisions, or work**
- When working on a **new project or repository** for the first time
- When the user asks about **people, projects, or relationships**
## Session Lifecycle
### Phase 1: Wake Up (session start)
Run these immediately when a session begins, before responding to the user:
1. **Load palace overview:**
```
mempalace_status
```
This returns wing/room counts, the AAAK spec, and the memory protocol reminder.
2. **Read your recent diary:**
```
mempalace_diary_read(agent_name="<your_agent_name>", last_n=5)
```
Scan for context about recent sessions — what was worked on, what matters, what's pending.
3. **Check the knowledge graph** for the user or active project if relevant:
```
mempalace_kg_query(entity="<project_or_person>")
```
Do NOT announce this to the user. Just do it silently to orient yourself.
### Phase 2: Active Session (during work)
#### Search Before You Speak
Before answering questions about past work, decisions, people, or projects:
```
mempalace_search(query="<keywords>", wing="<project>")
```
**Never guess about facts that might be in the palace.** Wrong is worse than slow. Say "let me check" and query.
#### Mine New Projects
When working on a new codebase for the first time:
1. Check if it's already mined:
```
mempalace_list_wings
```
2. **Decide what to mine — docs first, code never (by default).**
The palace is for *context and intent*, not code recall. Code is better read from the working tree via `Read`/`Grep`/`glob` — always authoritative, never stale. Embedding source code produces thousands of low-signal drawers (e.g. `def __init__(self, ...)` across every class) that pollute search for years.
**Mine by default:**
- `*.md`, `*.rst`, `*.txt` — docs, READMEs, CHANGELOGs, architecture notes
- `AGENTS.md`, `CLAUDE.md`, `CONTRIBUTING.md`, design/decision docs — highest signal per byte
- `*.sh`, `Dockerfile`, `Makefile`, entrypoints — small, intent-bearing
- `*.yml`, `*.yaml`, `*.toml`, selective `*.json` (`docker-compose`, `pyproject`, `mkdocs.yml`, CI workflows) — skip lockfiles
**Do NOT mine by default:**
- `*.py`, `*.ts`, `*.tsx`, `*.js`, `*.go`, `*.rs`, `*.java`, `*.cpp`, `*.c`, `*.rb` — raw source code
- Test files, fixtures, generated code
- `node_modules/`, `.venv/`, `__pycache__/`, `.mypy_cache/`, `.pytest_cache/`, `.ruff_cache/` (the miner respects `.gitignore` but double-check)
Exception: if a code file *is* the documentation (e.g. a heavily-commented reference script, or a protocol definition), file it manually via `mempalace_add_drawer`.
3. **Before mining**, inspect the repo to estimate drawer count:
```bash
# Quick audit — what will actually get mined?
find <dir> -type f \
-not -path '*/.git/*' -not -path '*/node_modules/*' \
-not -path '*/.venv/*' -not -path '*/__pycache__/*' \
\( -name '*.md' -o -name '*.sh' -o -name '*.yml' -o -name '*.yaml' \
-o -name '*.toml' -o -name 'Dockerfile*' -o -name 'Makefile' \) | wc -l
```
A docs-heavy repo should produce ~510 drawers per file. If a mine produces >15 drawers/file on average, code leaked in — investigate.
4. Run the mine:
```bash
mempalace init --yes <directory>
mempalace mine <directory> --agent <your_agent_name>
```
The miner currently lacks a `--docs-only` or `--exclude-ext` flag (as of v3.3.3). Until it does, either:
- (a) Add a `mempalace.yaml` at the repo root with explicit include globs, OR
- (b) Mine everything, then surgically remove code-sourced drawers via SQL on `~/.mempalace/palace/chroma.sqlite3` (delete by `embedding_metadata.source_file LIKE '%.py'`), followed by `mempalace repair --yes`.
5. If the CLI miner misses a file you *do* want (e.g., `.zsh`, an undocumented extension), file it manually:
```
mempalace_add_drawer(wing="<project>", room="<aspect>", content="<verbatim content>", source_file="<path>")
```
6. After mining, reconnect to pick up the new embeddings:
```
mempalace_reconnect
```
If search errors occur after mining ("Error finding id"), repair the index:
```bash
mempalace repair --yes
```
#### Track Facts in the Knowledge Graph
When you learn new facts about people, projects, or relationships:
```
mempalace_kg_add(subject="ProjectX", predicate="uses", object="PostgreSQL")
mempalace_kg_add(subject="Alice", predicate="owns", object="ProjectX", valid_from="2026-01-15")
```
When facts change (ended, no longer true):
```
mempalace_kg_invalidate(subject="Alice", predicate="works_at", object="OldCorp", ended="2026-03-01")
```
#### Cross-Reference with Tunnels
When content in one project relates to another, create a tunnel:
```
mempalace_create_tunnel(
source_wing="project_api", source_room="endpoints",
target_wing="project_db", target_room="schema",
label="API endpoints map to these DB tables"
)
```
#### Feeding opencode session history (opencode + mempalace-toolkit only)
MemPalace has no upstream integration with [opencode](https://github.com/anomalyco/opencode) as of v3.3.3 — `hooks_cli.py` only supports `claude-code` and `codex` harnesses. Opencode persists every turn in a local SQLite DB at `~/.local/share/opencode/opencode.db`, but nothing moves that data into the palace automatically.
On a machine with opencode + the [`mempalace-toolkit`](https://gitea.jordbo.se/joakimp/mempalace-toolkit) installed, session history is fed into `wing_conversations` via `mempalace-session` — either manually, or on a weekly systemd user timer / cron schedule shipped in `mempalace-toolkit/contrib/`. If this is missing, opencode conversations exist only in the local SQLite DB and are invisible to `mempalace_search`.
**How to tell if it's set up:**
```
mempalace_list_wings
```
If `wing_conversations` exists and has a drawer count comparable to the user's opencode session count, session feeding is working. If it's empty or suspiciously small, suggest:
1. Check if the toolkit is installed: `which mempalace-session`.
2. If installed, suggest running `mempalace-session --dry-run` to preview and `mempalace-session` to file.
3. If not installed, point the user at `gitea.jordbo.se/joakimp/mempalace-toolkit` for setup.
**Don't try to paper over the gap by dumping turn-level content into the palace manually via `mempalace_add_drawer`** — that reinvents what `mempalace-session` does with normalization and dedup. Use the tool.
Full routine (triggers, cadence, automation) is in the [`opencode-mempalace-bridge`](https://gitea.jordbo.se/joakimp/mempalace-toolkit) skill and the toolkit's `ARCHITECTURE.md` §5. The two skills pair: this one (`mempalace`) covers using the palace; that one (`opencode-mempalace-bridge`) covers feeding it from opencode.
### Phase 3: Wind Down (session end)
**Always write a diary entry before the session ends.** This is the most important habit.
```
mempalace_diary_write(
agent_name="<your_agent_name>",
entry="<AAAK compressed summary>",
topic="session-summary"
)
```
#### Why still write diaries when sessions may be mined automatically?
On machines running opencode + `mempalace-toolkit`, every session is mined into `wing_conversations` on a weekly (or user-defined) schedule. A common and incorrect conclusion: *"since every turn is captured automatically, writing a diary entry is redundant."* It isn't.
Session mining captures **what was said** (every turn, verbatim). A diary captures **what the session meant** — editorial judgment by the agent who lived it:
- Lessons learned, patterns noticed, pending items rolled forward
- Meta-observations that were never said aloud during the session
- Aggregate counts (commits shipped, bugs fixed, hours spent)
- A compressed, recency-scannable summary for the *next* agent's wake-up
Mining raw turns cannot surface these because the words don't exist verbatim — they're the agent's reflection at wind-down. Think of the split as *release notes* (diary) vs. *git log with diffs* (session mine): a repo keeps both because they answer different questions. So does the palace.
**Practical rule:** automated mining does not replace Phase 3. Both systems cover each other's failure modes — a skipped diary is recovered from the raw turns; a missed mine is recovered from the diary summary. For the full treatment (comparison table, retrieval patterns, token economics), see [`mempalace-toolkit/ARCHITECTURE.md` §5 → "Diary vs session mine: why keep both?"](https://gitea.jordbo.se/joakimp/mempalace-toolkit/src/branch/main/ARCHITECTURE.md#diary-vs-session-mine-why-keep-both).
#### AAAK Diary Format
Write diary entries in compressed AAAK format for efficiency. Structure:
```
SESSION:<date>|<what.you.worked.on>|
TASKS:
1.<task.description>→<outcome>|
2.<task.description>→<outcome>|
DISCOVERED:<unexpected.findings>|
ENTITIES:<people.or.projects.encountered>|
<importance: one to five stars>
```
Example:
```
SESSION:2026-04-28|api.refactor+db.migration|
TASKS:
1.refactored.auth.endpoints→split.into.3.modules|
2.added.user.roles.migration→postgres.enum.type|
DISCOVERED:legacy.session.table.unused.since.v2|
ENTITIES:ProjectX;Alice(reviewer)|
***
```
Rules:
- Use dots instead of spaces within phrases
- Use pipes as field separators
- Use arrows for cause/effect or transitions
- Stars indicate session importance (one to five)
- Keep it tight — a future agent should get the gist in seconds
#### What to Capture
Prioritize recording:
- **Decisions made** and their rationale
- **Discoveries** — things that surprised you or that a future session needs to know
- **Unfinished work** — what's pending, what was deferred
- **User preferences** observed during the session
- **Entities encountered** — people, projects, tools, services
### Phase 4: Fact Updates
If facts changed during the session, update the knowledge graph before writing the diary:
```
mempalace_kg_invalidate(subject="...", predicate="...", object="...", ended="<today>")
mempalace_kg_add(subject="...", predicate="...", object="...", valid_from="<today>")
```
## Palace Structure
### Wings
Wings are top-level categories, typically one per project or domain:
- Named after the project directory (e.g., `cli_utils`, `opencode_devbox`)
- Agent diaries live in `wing_<agent_name>` (e.g., `wing_orchestrator`, `wing_pi`)
#### Multi-harness palace
A single palace can be fed by multiple coding-agent harnesses. On this machine the palace is shared between **opencode** and **pi** (Mario Zechner's pi-coding-agent). Implications:
- **`wing_conversations` mixes sources.** Both harnesses' session feeders write into the same wing. To tell them apart, look at the `source_file` metadata on each drawer:
- `pi_<uuid>.jsonl` → pi session
- `<slug>_ses_<id>.jsonl` → opencode session
- The first chunk of each session also carries a `| source: opencode` or `| source: pi` marker in the synthetic header line.
- **Other wings may belong to other harnesses.** For example `wing_pi` is pi's diary, not opencode's. Don't assume every diary entry was written by you — check `agent_name` on the entry.
- **Session feeders run on different schedules.** Pi sessions are fed Tue 03:00, opencode sessions Mon 03:00. Recent sessions from either harness can lag the palace by up to a week, so absence-of-evidence in `wing_conversations` is not evidence-of-absence for recent work.
- **Reading another harness's diary is useful.** When orienting after a gap, `mempalace_diary_read agent_name=pi` (or whichever sibling agent has been active) often gives a fresher picture than waiting for the conversations feeder to catch up.
### Rooms
Rooms are aspects within a wing:
- `fzf`, `scripts`, `configuration`, `general` — whatever the miner detects
- Diary entries go into rooms by topic tag
### Drawers
Drawers hold verbatim content — never summarized, always searchable.
### Tunnels
Cross-wing connections linking related content across projects.
### Knowledge Graph
Entity-relationship triples with temporal validity. Query with `mempalace_kg_query`, browse with `mempalace_kg_timeline`.
## Troubleshooting
| Problem | Fix |
|---|---|
| "No palace found" | Run `mempalace init <dir>` then `mempalace mine <dir>` |
| "Error finding id" after mining | Run `mempalace repair --yes` then `mempalace_reconnect` |
| Search returns irrelevant results | Use `max_distance=1.0` for stricter matching; add `wing` filter |
| Miner skips file types | File manually with `mempalace_add_drawer` or use `--no-gitignore` |
| Stale results after external changes | Call `mempalace_reconnect` |
## Anti-Patterns
- **Don't guess when you can search.** If a question touches past work, search first.
- **Don't skip the diary.** A session without a diary entry is a session forgotten.
- **Don't summarize drawer content.** File verbatim — the embedding model needs the original words.
- **Don't mine .git directories or node_modules.** The CLI miner respects .gitignore by default.
- **Don't create duplicate drawers.** Use `mempalace_check_duplicate` before adding manually.
- **Don't treat the palace as a task list.** It's for knowledge and context, not todos.
@@ -0,0 +1,298 @@
---
name: pi-extensions
description: >-
Use the pi extensions (pi-fork, pi-observational-memory, ssh-controlmaster) effectively in the pi coding agent harness. Load this skill only when running inside pi (detection - `fork` and `recall` are present in your tool list, or `pi --ssh` was used to start the session). pi-fork dispatches focused subtasks to forked agents at fast/balanced/deep effort tiers; pi-observational-memory compacts long sessions into recallable observations + reflections; ssh-controlmaster rewires pi's read/write/edit/bash tools to execute on a remote host over a multiplexed SSH connection. This skill covers tier selection, task design, boundary discipline, when to use recall, and remote-pi mechanics.
---
# Pi Extensions: pi-fork, pi-observational-memory, ssh-controlmaster
## When to Load This Skill
Load only when **both** of these are true:
1. You are running inside the **pi coding agent harness** (not Claude Code, not opencode, not any other harness).
2. The `fork` and/or `recall` tools appear in your available tool list, **or** the session was started with `pi --ssh ...`.
If you do not see those tools, this skill does not apply — skip it. Other harnesses do not have these extensions and the patterns below will not work there.
This skill is most useful at the start of any non-trivial session where you may need to dispatch parallel subtasks, where the conversation is likely to compact (sessions running > ~80k tokens), or where pi is operating against a remote host.
## Pi extension landscape (where the wiring lives)
Pi has **two distinct extension locations** and it's easy to look in the wrong one:
| Location | Mechanism | Examples |
|---|---|---|
| `~/.pi/agent/extensions/*.ts` (or `.ts.off`) | **Local extensions** — TypeScript files, usually symlinks into `/opt/pi-extensions/extensions/` or similar. Toggled via `/ext` slash command. | `ssh-controlmaster`, `git-checkpoint`, `notify`, `todo`, `mempalace`, `mcp-loader`, `ext-toggle`, `confirm-destructive` |
| `~/.pi/agent/git/<host>/<owner>/<repo>/` | **Package extensions** — git-cloned npm packages registered via the `packages` array in `~/.pi/agent/settings.json`. | `pi-fork` (`github.com/elpapi42/pi-fork`), `pi-observational-memory` (`github.com/elpapi42/pi-observational-memory`, **default branch `master`** — a `main` branch does not exist, so `pi install git:...` resolves against `master`) |
When the user asks how to use "the X extension", **check both locations**`find ~/.pi/agent -maxdepth 4 -name "*X*"` covers both. The `/ext` slash command shows the local-extensions list with enable/disable state. There is also a distinct skill-bundled-script category (e.g. `ci-release-watcher`'s `ssh-control-master-setup.sh`) which is **not** a pi extension at all — it's a helper script inside a skill. Don't conflate the three.
## Why These Extensions Belong Together
pi-fork and pi-observational-memory are symbiotic. **pi-fork burns context** (each fork dispatches a focused subtask whose detailed exploration would otherwise pollute your main thread). **pi-observational-memory preserves context** (when the main thread eventually compacts, observations + reflections survive the fold and can be recalled by ID). Aggressive forking only works long-term if the surviving summary is high-fidelity, and OM only earns its keep when it's preserving genuinely valuable distilled work.
ssh-controlmaster is orthogonal but composes cleanly: when pi is operating remotely, fork still spawns local sub-agents (each fork *itself* doesn't ssh), but their `bash`/`read`/`write`/`edit` calls do — see Part 3 caveats.
---
## Part 1: pi-fork
### Effort tier mapping
Configured in `~/.pi/agent/settings.json` under `pi-fork.effortProfiles`. The conventional mapping is:
| Tier | Model | Use for |
|---|---|---|
| `fast` | haiku | mechanical edits, narrow lookups, file-listing, single-fact verification, simple syntactic checks |
| `balanced` | sonnet (default) | normal exploration, implementation, testing, code review, option analysis |
| `deep` | opus | architecture decisions, security analysis, concurrency reasoning, ambiguous debugging, high-risk reviews, runbook drafting where subtle mistakes are costly |
**Rule of thumb:** start at `balanced` unless you have a specific reason to go up or down. Going too cheap on a deep task wastes a fork; going too expensive on a mechanical task is just slow.
### When to fork vs. do it yourself
Fork when **any** of:
- The task requires reading many files whose contents you don't need to keep in your main context afterwards (the fork returns a dense summary; raw file contents stay in the fork's context and are discarded).
- You want to run multiple analyses in **parallel** (especially: comparing N options, where independent reasoning is itself a signal — see "parallel forks" below).
- The task is well-scoped enough to specify completely up front and well-bounded enough that returning a dense report is more useful than continuing the dialogue.
- You are about to do something that would burn a lot of tokens on tool calls (long file reads, many bash invocations) whose output you will mostly discard.
Don't fork when:
- The work fits in your current context budget without crowding out what comes next.
- The task is exploratory and you'll need to iterate based on what you find (forking turns iteration into round-trips with full task-spec rewrites).
- You need to make decisions during the work that depend on context only the main thread has.
### Task design: the four things a fork brief must contain
1. **Verified context up front.** Do not say "go look at the codebase and figure out X". Pass the facts you already know — file paths, version numbers, observed behavior, prior decisions. The fork should be reasoning *from* context, not *finding* context. Discovery work costs the fork tokens that don't come back to you.
2. **A specific deliverable.** "Analyze X" is too vague. "Return a comparison table of A/B/C across these 8 axes, plus a recommendation with reasoning, plus a concrete next step" gives the fork a shape to fill.
3. **Decision authority.** State explicitly what the fork may and may not do: "report only, no edits" / "may write to /tmp/, no commits" / "may edit files in /workspace/foo, may not commit" / unspecified (the fork will infer conservatively). **State this even when it seems obvious.** See "Boundary discipline" below.
4. **What "unsure" looks like.** Tell the fork to surface ambiguities back to you rather than resolve them silently. "Things I'm unsure about" sections at the end of fork output are gold — they're where a confident-sounding wrong answer would otherwise hide.
### Parallel forks for option-comparison
When facing a "which approach should we take" question with 24 candidate approaches, dispatching the candidates as parallel forks is high-leverage:
- They reason **independently**. No fork sees the others' work.
- **Convergence is signal.** If three forks at different effort tiers reach the same recommendation citing different evidence, that's a strong validation that doesn't depend on any one model's bias.
- **Divergence is also signal.** If one disagrees, read its reasoning carefully — it may have spotted something the others missed, or it may have a tier-specific weakness worth knowing.
Sample shape for an option-comparison call:
- Fork 1 (deep) — detailed runbook for option A, with timing/risk/rollback
- Fork 2 (balanced) — comparison table A vs B vs C across N axes, with a recommendation
- Fork 3 (fast) — focused sub-question (e.g., "which container image / library version / CLI flag")
This costs more than a single fork but the cross-validation is often worth it for decisions you'll execute on prod systems.
### Boundary discipline (observed behavior)
Forks **mostly** honor explicit decision-authority instructions, but not infallibly. Observed pattern from real sessions:
- **Pure analysis tasks** (no write authority, "report only") — high compliance. Forks reliably return analysis without editing files or committing.
- **Write-capable tasks with a "don't do X" carve-out** — compliance is high but not perfect. Forks have been observed to override "don't edit/commit" instructions when they judge the action obvious and mechanically correct. The override usually produces technically sound work, but it violates the boundary.
**Practical rules:**
- State decision authority explicitly, every time, even when "report only" feels redundant.
- For high-stakes write authority, verify the fork's actions afterwards (`git status`, `git log -1`, file diffs) rather than assuming compliance.
- If a boundary violation is unacceptable (e.g., compliance review, sandboxed exploration, "don't touch prod"), do not give the fork write tools at all — keep it strictly in analysis mode.
- The fact that the fork was "right anyway" is not the same as the fork having followed instructions.
### Anti-patterns
- **Forking trivial work.** A fork has overhead. If the task takes < 30 seconds in your main thread, just do it.
- **Vague briefs.** "Look into the database thing" returns vague output. The fork is not telepathic.
- **Forking iterative work.** Forks are one-shot. If you need to iterate, you'll re-spec the task each time — usually worse than doing it yourself.
- **Recursive forking** (forks spawning forks). Disabled by default and should stay disabled unless you have a specific batch-fanout use case.
- **Treating fork output as ground truth without verification.** Especially for cited code/commit hashes/URLs — forks can hallucinate these like any LLM. Spot-check decisive evidence.
---
## Part 2: pi-observational-memory
### How it actually works
Observational memory (OM v3, "session-ledger" architecture) runs an **observer agent** in the background as your conversation grows. When token thresholds are crossed (defaults: observe at 10k, reflect at 20k, compact at 81k), the observer distills the recent transcript into:
- **Observations** — timestamped events, each with a 12-character hex ID like `[3682ebfad7af]`. Compact one-liners describing what happened in the conversation.
- **Reflections** — durable, long-lived facts about the user, project, decisions, and constraints. Some reflections include observation IDs as evidence pointers.
When compaction fires, the raw transcript is folded away and replaced with a structured summary block containing the observations + reflections. **You — the next turn of the same agent — receive that summary block as your starting context.** That's the recovery mechanism.
**Storage is in-transcript, not on disk.** Do not grep for `observations.jsonl` or similar files; you will not find them. The artifact lives in the model's input context window.
Configuration lives in `~/.pi/agent/settings.json` under `observational-memory`. Tune `observeAfterTokens`, `reflectAfterTokens`, `compactAfterTokens`, and `observationsPoolMaxTokens` if observations feel sparse or noisy. The default 81k compaction threshold is well-calibrated for typical multi-task sessions.
### The `recall` tool
`recall(<12-char-hex-id>)` resolves a specific observation or reflection ID back to the original source context — the exact bash output, file contents, tool call results, commit message, or transcript fragment that the observation was distilled from.
**Use recall when:**
- You are about to make a decision that depends materially on a compacted observation or reflection whose details are unclear.
- You need exact wording, paths, commands, errors, commits, or user constraints behind a remembered claim.
- A broad reflection is relevant but you need its supporting observations to act safely.
- The user asks "why do you believe X" or "what supports that memory".
**Do not use recall for:**
- Semantic search (it's keyed by ID, not topic — you must already have a specific 12-char hex ID).
- Browsing the transcript out of curiosity.
- Preemptive lookup of every ID in your context "just in case".
Recall costs tokens. Use it when exact source context will materially change your next action.
> **Calibration note (from a real ~1-month trial, 2026-05/06):** across 20 logged container sessions, `recall` was invoked **0 times** while obsmem passively carried 529 observations across 6 compactions. Zero recall is a *warning sign*, not a badge of efficiency — it means decisions after a compaction were made on the distilled one-liner alone, without ever re-checking the source. The injected summary is **lossy by design**. Default habit to adopt: when you are about to **edit code, ship a change, or assert a fact** that rests on a `[high]`/`[critical]` observation or a reflection you did not produce *this* turn, `recall` its ID **first**. One recall before a load-bearing action is cheap; redoing finished work or contradicting a prior correction is not.
### Reading the compaction summary
When you see a block like `The conversation history before this point was compacted into the following summary:` at the start of a session or turn, that's OM output. Standard structure:
- **Reflections** at the top: stable facts. Some have IDs in brackets.
- **Observations** below, chronological: timestamped events with IDs in brackets and importance markers (`[high]`, `[critical]`, etc.).
When entries conflict, **the most recent observation reflects the latest known state.** Work that prior observations describe as completed should not be redone unless the user explicitly asks to revisit it.
### Anti-patterns
- **Treating compacted memory as definitive without recall** when stakes are high. Compaction is lossy; the observation may have lost a constraint that was on the line above it in the original transcript.
- **Recalling every ID preemptively.** Wasteful. Recall on demand.
- **Assuming the disk holds OM artifacts.** It doesn't. Don't waste time looking.
- **Ignoring the summary block** when starting a session. It's there because the prior session was real work — read it before answering questions about past work.
---
## Quick Reference
```
fork(task=..., effort=fast|balanced|deep)
- state decision authority explicitly
- pass verified context up front
- specify deliverable shape
- ask for "unsure about" section
recall(id=<12-char-hex>)
- only when stakes justify the cost
- id must already be visible in your context
- not a search tool
```
```
~/.pi/agent/settings.json
pi-fork.effortProfiles — model + thinking-depth per tier
pi-fork.defaultEffort — usually "balanced"
observational-memory.* — token thresholds, model, agentMaxTurns
observational-memory.debugLog: true — opt-in NDJSON telemetry at
~/.pi/agent/observational-memory/debug/<session>.ndjson (off by default)
```
### Installing on a fresh machine (host)
These are git-sourced pi packages (pi-fork is **not** on npm). Add to the
`packages` array in `~/.pi/agent/settings.json`, or:
```
pi install git:github.com/elpapi42/pi-fork
pi install git:github.com/elpapi42/pi-observational-memory # default branch: master (no main)
# obsmem is also published: pi install npm:pi-observational-memory
```
Restart pi after install. Enable `observational-memory.debugLog` if you want
the next window instrumented.
### Evaluating usage
`evaluate-extension-usage.py` (bundled next to this skill) mines pi session
transcripts for fork/recall counts and obsmem compaction stats. Run it per
machine (transcripts live at `~/.pi/agent/sessions/`) for a combined
host+container picture:
```
./evaluate-extension-usage.py # ~/.pi/agent/sessions
./evaluate-extension-usage.py /path/a /path/b # multiple roots
```
---
## Part 3: ssh-controlmaster
### What it does
When pi is launched with `--ssh`, this extension **rewires pi's `read`, `write`, `edit`, and `bash` tools to execute on the remote machine**, multiplexed over a single SSH ControlMaster socket. Pi is still running locally — the LLM, the UI, the MCP servers, the fork dispatcher all live on your local box — but anything those tools touch on the filesystem is the *remote's* filesystem.
This is fundamentally different from running pi locally and using `bash` to ssh inside it: with `--ssh`, the tool layer itself is remoted, so the LLM thinks it's working in the remote's `cwd` (the system prompt is rewritten to say so).
### Usage
```bash
# Key-based auth (preferred), remote cwd defaults to remote $HOME
pi --ssh lagret
# Pin to a specific remote directory
pi --ssh lagret:/volume1/docker/portainer/compose/119
# Password auth (input is NOT masked when typing)
pi --ssh user@host --ssh-ask-pass
```
The `lagret` form requires a `Host lagret` block in `~/.ssh/config` or a resolvable hostname. The status bar shows `SSH ⚡ own master <host>:<cwd>` or `SSH ⚡ system master <host>:<cwd>` once connected.
### How it cooperates with system SSH config
It reads `ssh -G <host>` to learn the effective config, then:
| `~/.ssh/config` for the host | Behavior |
|---|---|
| `ControlMaster auto` or `yes` with a `ControlPath` | Reuses the system master socket. Does **not** tear it down on pi exit ("it was the system's to manage before pi arrived"). |
| No ControlMaster configured (or explicitly `no`) | Creates its own master at `/tmp/pi-cm-<pid>.sock` with `ControlPersist=yes`. Tears it down on pi `session_shutdown`. |
This means it composes cleanly with the system-wide `ssh-control-master-setup.sh` helper from the `ci-release-watcher` skill: if that script has already configured `~/.ssh/config` for the host, `pi --ssh` rides on the existing master rather than opening a parallel connection.
### Caveats and edge cases
- **Local vs remote tool boundary.** Only `read`/`write`/`edit`/`bash` are remoted. **MCP servers are still local**`mempalace` files drawers and diary entries against the local palace even when your shell work happens remotely. Same for `fork`, `recall`, `todo`, and any other custom tool. This is usually what you want (palace memory survives across remote sessions) but worth knowing.
- **fork over ssh.** Forks spawn locally and inherit the same `--ssh` mode by virtue of the parent's tool wiring; the fork's bash calls hit the same ControlMaster. Forks burn the same SSH socket, not a parallel one — multiplexing wins again.
- **macOS Unix socket path limit.** The own-master socket lives at `/tmp/pi-cm-<pid>.sock` to stay under macOS's ~104-char limit. If you have a non-default `TMPDIR` long enough to blow this, ssh will fail to start the master.
- **Password auth password visibility.** From the source: *"input is NOT masked — the password is visible while typing."* The password is written to a chmod-700 SSH_ASKPASS script in `/tmp` and deleted after the master establishes; not persisted, but on-screen during entry.
- **Remote bash environment.** The remote shell is whatever `ssh user@host '<cmd>'` invokes — typically a non-login non-interactive bash. Don't expect `~/.bashrc` aliases or PATH manipulations from `~/.profile`. Pin tool paths or invoke via `bash -lc '...'` if you need login-shell behavior.
- **Path translation is naive.** The extension does `path.replace(localCwd, remoteCwd)` to translate paths in tool calls. If the LLM emits an absolute remote path that doesn't share the local-cwd prefix, the path is passed through unchanged — usually fine but pathological for paths that happen to contain the local-cwd substring.
### When to use it
- Editing configs on a NAS / homelab host without scp ping-pong (`pi --ssh lagret:/volume1/...`)
- Operating against a host whose tools/data you need but whose disk is too slow to mount via SSHFS
- Investigating runner state, container configs, etc., on a remote host as if local
- Multi-step remote work where opening a fresh ssh connection per step would burn your CGNAT flow budget
### Anti-patterns
- **Using `pi --ssh` for one-off shell work.** Just `ssh` directly. The extension shines when there are dozens of tool calls per session.
- **Filing palace drawers expecting them on the remote.** They go to the local palace. If you want palace artifacts on the remote host, ssh into the remote and run pi *there* against its local palace.
- **Forgetting `--ssh` in followup sessions.** Status bar is the canary — if you don't see `SSH ⚡` you're operating locally despite intending remote. Easy mistake on a fresh terminal.
### Reaching the devbox host from inside the container (`dssh` / `dscp`)
Distinct from `pi --ssh` above. When the **pi-devbox container** runs under OrbStack / Docker Desktop on macOS, it can SSH back to its own host. The entrypoint's `setup-lan-access.sh` regenerates `~/.ssh-local/config` on **every container start** (the in-container `~/.ssh` is mounted read-only, so a sidecar config + `known_hosts` + `ControlPath` under `~/.ssh-local/` is used instead).
```bash
# Interactive shells get aliases (from ~/.bash_aliases):
dssh host 'cmd' # = ssh -F ~/.ssh-local/config host
dscp file host:/path # = scp -F ~/.ssh-local/config ...
```
**The agent's `bash` tool is non-interactive — those aliases are NOT loaded.** Use the explicit form:
```bash
ssh -F ~/.ssh-local/config host 'cmd'
scp -F ~/.ssh-local/config <src> host:<dst>
```
- Host aliases `host` and `mac` both resolve to `host.docker.internal` (user varies per host machine — check `~/.ssh-local/config` for the active `User` value, key `~/.ssh-local/devbox_jump_ed25519`, `ControlMaster auto` / `ControlPersist 4h`).
- The config chains `Include ~/.config/devbox-shell/ssh-lan.conf` then `Include ~/.ssh/config`, so LAN targets are reachable too (add `ProxyJump host` to those entries).
- **Use it for:** enabling/inspecting the host's pi config (`~/.pi/agent/settings.json`), running `evaluate-extension-usage.py` against the host's `~/.pi/agent/sessions/` for a combined host+container metric, or copying host transcripts into the container. The host's pi runs natively there; its palace, sessions, and extensions are separate from the container's.
---
## Cross-Skill Notes
- **mempalace** is for cross-session persistent memory (diary, knowledge graph, drawer storage). OM is for **within-session** context survival across compaction. They complement each other: write a diary entry at session end *and* let OM compact your work-in-progress mid-session.
- **systematic-debugging** and **test-driven-development** skills pair well with deep-tier forks: a deep fork can carry out a focused debugging investigation or write a failing test suite without polluting your main context.
- **ci-release-watcher** ships a `scripts/ssh-control-master-setup.sh` helper that configures system-wide SSH ControlMaster in `~/.ssh/config`. That's a separate mechanism from the `ssh-controlmaster` pi extension — they compose, they don't overlap. Use the script for persistent host-wide multiplexing, the extension for per-pi-session remote operation.
@@ -0,0 +1,117 @@
#!/usr/bin/env python3
"""Evaluate pi-fork / pi-observational-memory usage from pi session transcripts.
Mines pi's session .jsonl transcripts and reports:
- per-tool call counts (highlighting `fork` and `recall`)
- per-session fork/recall breakdown
- obsmem passive activity: compaction events, observations carried,
relevance-tier distribution, tokensBefore
Works on any machine. Point it at one or more session roots; by default it
scans ~/.pi/agent/sessions (the standard pi location, host or container).
Usage:
./evaluate-extension-usage.py # ~/.pi/agent/sessions
./evaluate-extension-usage.py /path/to/sessions ... # explicit roots
./evaluate-extension-usage.py --host HOST /path ... # label a root (for combined host+container runs)
For a true host+container picture, run once per machine (or copy each
machine's ~/.pi/agent/sessions here) and pass all roots together.
"""
import json, sys, os, glob, re, collections, argparse
TIER_RE = re.compile(r'\[(low|medium|high|critical)\]')
OBS_LINE_RE = re.compile(r'^\[[0-9a-f]{12}\] ', re.M)
def walk_tools(x, counter):
if isinstance(x, dict):
tn = x.get("toolName")
if tn:
counter[tn] += 1
for v in x.values():
walk_tools(v, counter)
elif isinstance(x, list):
for v in x:
walk_tools(v, counter)
def analyze(roots):
files = []
for r in roots:
if os.path.isfile(r) and r.endswith(".jsonl"):
files.append(r)
else:
files += glob.glob(os.path.join(r, "**", "*.jsonl"), recursive=True)
files = sorted(set(files))
tool_total = collections.Counter()
per_session = []
compactions = []
for f in files:
tc = collections.Counter()
with open(f, errors="ignore") as fh:
for ln in fh:
ln = ln.strip()
if not ln:
continue
try:
o = json.loads(ln)
except Exception:
continue
walk_tools(o, tc)
if o.get("type") == "compaction":
s = o.get("summary", "") or ""
compactions.append({
"file": os.path.basename(f),
"tokensBefore": o.get("tokensBefore"),
"observations": len(OBS_LINE_RE.findall(s)),
"tiers": dict(collections.Counter(TIER_RE.findall(s))),
})
tool_total.update(tc)
per_session.append((os.path.basename(f)[:10], tc.get("fork", 0),
tc.get("recall", 0), sum(tc.values())))
return files, tool_total, per_session, compactions
def main():
ap = argparse.ArgumentParser()
ap.add_argument("roots", nargs="*",
default=[os.path.expanduser("~/.pi/agent/sessions")])
args = ap.parse_args()
files, tool_total, per_session, comp = analyze(args.roots)
if not files:
print("No .jsonl transcripts found under:", args.roots, file=sys.stderr)
sys.exit(1)
print(f"=== {len(files)} transcripts under {args.roots} ===\n")
print("Tool call totals:")
for t, c in tool_total.most_common():
mark = " <== pi-fork" if t == "fork" else (" <== obsmem recall" if t == "recall" else "")
print(f" {c:6d} {t}{mark}")
fk = tool_total["fork"]; rc = tool_total["recall"]
fk_sess = sum(1 for p in per_session if p[1])
rc_sess = sum(1 for p in per_session if p[2])
print(f"\npi-fork: {fk} calls across {fk_sess} sessions")
print(f"recall: {rc} calls across {rc_sess} sessions"
+ (" (!) zero recall over the window — see SKILL.md calibration note" if rc == 0 else ""))
if comp:
tot_obs = sum(c["observations"] for c in comp)
tb = [c["tokensBefore"] for c in comp if c["tokensBefore"]]
print(f"\nobsmem passive: {len(comp)} compactions, {tot_obs} observations carried"
+ (f", avg tokensBefore {sum(tb)//len(tb):,}" if tb else ""))
agg = collections.Counter()
for c in comp:
agg.update(c["tiers"])
if agg:
print(" relevance tiers:", dict(agg))
else:
print("\nobsmem passive: no compaction events found "
"(short sessions, or obsmem not active on these transcripts)")
if __name__ == "__main__":
main()
+18
View File
@@ -86,6 +86,20 @@ run "global-AGENTS append snippet present" \
"test -f /usr/local/share/pi-devbox/pi-global-AGENTS.append.md"
run "pi-devbox block merged into pi-global-AGENTS.md" \
"grep -q 'pi-devbox:managed-block' /opt/pi-toolkit/pi-global-AGENTS.md"
run "mempalace session-start pointer merged into global AGENTS.md" \
"grep -q 'load the mempalace skill' /opt/pi-toolkit/pi-global-AGENTS.md"
# Vendored fallback skills (so a no-skillset container still resolves the
# AGENTS.md 'read the pi-extensions skill' pointer).
run "image-baked pi-extensions fallback skill" \
"test -f /usr/local/share/pi-devbox/skills/pi-extensions/SKILL.md"
run "pi-extensions skill ships its helper" \
"test -f /usr/local/share/pi-devbox/skills/pi-extensions/evaluate-extension-usage.py"
run "image-baked mempalace fallback skill" \
"test -f /usr/local/share/pi-devbox/skills/mempalace/SKILL.md"
# Layered freshness: when the pinned pi-extensions clone carries the skill, the
# baked copy must be the fresh package copy (Option 1), not the stale snapshot.
run "pi-extensions skill refreshed from package when present" \
"if [ -f /opt/pi-extensions/skill/SKILL.md ]; then cmp -s /opt/pi-extensions/skill/SKILL.md /usr/local/share/pi-devbox/skills/pi-extensions/SKILL.md; else true; fi"
# ── tmux 0-indexing (required for pi-studio variants) ─────────────────
echo ""
@@ -163,6 +177,8 @@ for i in $(seq 1 45); do
test -L /home/developer/.pi/agent/keybindings.json && \
test -L /home/developer/.pi/agent/extensions/mempalace.ts && \
test -L /home/developer/.agents/skills/pi-devbox-environment && \
test -L /home/developer/.agents/skills/pi-extensions && \
test -L /home/developer/.agents/skills/mempalace && \
count=$(ls -1 /home/developer/.pi/agent/extensions/*.ts 2>/dev/null | wc -l) && \
[ "$count" -ge 4 ]
' >/dev/null 2>&1; then
@@ -185,6 +201,8 @@ exec_test "extensions ≥ 4 (pi-extensions)" 'count=$(ls -1 $HOME/.pi/age
exec_test "mempalace.ts bridge" 'test -L $HOME/.pi/agent/extensions/mempalace.ts && echo ok'
exec_test "settings.json bootstrapped" 'test -f $HOME/.pi/agent/settings.json && echo ok'
exec_test "pi-devbox-environment skill linked" 'test -L $HOME/.agents/skills/pi-devbox-environment && test -f $HOME/.agents/skills/pi-devbox-environment/SKILL.md && echo ok'
exec_test "pi-extensions skill linked (fallback)" 'test -L $HOME/.agents/skills/pi-extensions && test -f $HOME/.agents/skills/pi-extensions/SKILL.md && echo ok'
exec_test "mempalace skill linked (fallback)" 'test -L $HOME/.agents/skills/mempalace && test -f $HOME/.agents/skills/mempalace/SKILL.md && echo ok'
# pi-fork + pi-observational-memory are registered by entrypoint-user.sh via
# `pi install /opt/<pkg>`, which runs slightly after the keybindings marker.