pi-devbox/README.md

# pi-devbox

A self-contained Docker image for running [pi](https://pi.dev) — the pi
coding-agent — in an isolated, reproducible Linux environment with a
curated set of developer tooling, AI memory, and shell improvements.

pi-devbox is opinionated about what's inside but unopinionated about how
you use it: a single `docker compose up` gives you an interactive
container with pi, a stack of modern CLI tools, MemPalace for persistent
agent memory across sessions, and a UID-aligned `/workspace` mount so
files you edit inside the container appear with your normal ownership
on the host.

## What's inside

### The pi coding-agent

- `pi` — the pi-coding-agent CLI (`@earendil-works/pi-coding-agent`)
- `pi-toolkit` — keybindings, AWS env loader, settings template
- `pi-extensions` — TypeScript extensions for pi (preview, MCP bridges,
  mempalace integration, etc.)
- `pi-fork` — the `fork` tool for spawning sub-agents
- `pi-observational-memory` — the `recall` tool for session compaction

### MemPalace (AI memory)

- `mempalace` — local-first agent memory system (29 MCP tools)
- `mempalace-toolkit` — bash wrappers for session/docs mining
- ChromaDB embedding model pre-warmed at build time (`all-MiniLM-L6-v2`)

The host-mounted palace at `~/.mempalace` is shared across the host and
this container so all your agents share one brain.

### Modern CLI tooling

| Tool | Purpose |
|---|---|
| `nvim` | Neovim text editor |
| `tmux` | Terminal multiplexer (configured for 0-indexed sessions) |
| `ripgrep`, `fd` | Fast file content / filename search |
| `fzf` | Fuzzy finder |
| `bat` | Syntax-highlighted `cat` |
| `eza` | Modern `ls` |
| `zoxide` | Smart `cd` |
| `jq`, `yq` | JSON / YAML query and transformation |
| `tldr` (tealdeer) | Quick command examples |
| `git`, `git-lfs`, `git-crypt` | Git + extensions |
| `gitleaks` | Secret scanning pre-commit hook |
| `gosu` | Privilege de-escalation in entrypoint |
| `htop`, `tree`, `less` | Inspection utilities |

### Document and image tooling

- `pandoc` — universal Markdown↔HTML/Org/RST/etc. converter
- `graphviz` — `dot` rendering for diagram pipelines
- `imagemagick` — image conversion / resizing (invoked as `magick`)

### Language toolchains

- `python3` + `python3-venv` + `python3-pip` (system Python)
- `uv` + `uvx` — fast Python package manager (preferred over pip/venv)
- `nodejs` (v22) + `npm`
- `gcc`, `g++`, `make` — C/C++ build tools
- `rustup-init` — Rust toolchain installer (toolchains opt-in at runtime)
- Optional `INSTALL_GO=true` build arg for Go

For Python REPLs and notebooks beyond the system interpreter, see the
[uv-driven REPL recipes](#uv-driven-repl-recipes) section.

### Cloud + secrets

- AWS CLI v2 — for SSO + Bedrock auth
- `gitea-mcp` — MCP server for Gitea API
- `age`, `git-crypt` — encryption tooling

### SSH and networking

- OpenSSH client with **ControlMaster auto** preconfigured on a
  writable socket path (`/tmp/sshcm/`). Mitigates ssh banner-exchange
  failures behind CGNAT-restricted residential ISPs (~4-flow caps) by
  multiplexing many ssh calls over one TCP flow.
- A LAN-access helper that auto-configures ssh jump-via-host on
  VM-backed hosts (OrbStack / Docker Desktop on macOS) so the container
  can reach the host's directly-attached LAN peers.

## Quickstart

### Prerequisites

- Docker or OrbStack (recommended on macOS)
- Optional: AWS credentials configured on the host if you'll use the
  Bedrock LLM provider

### Pull and run

```bash
git clone https://gitea.jordbo.se/joakimp/pi-devbox
cd pi-devbox
cp .env.example .env       # edit if needed
docker compose up -d
docker compose exec -u developer devbox bash
```

You're now in the container as user `developer` with `pi` on PATH and
your host workspace mounted at `/workspace`.

To start pi:

```bash
pi
```

First-run pi-toolkit and pi-extensions install steps run automatically
on container start; symlinks are written to `~/.pi/agent/` on the
named volume (so they persist across container recreations).

### Stop / recreate / update

```bash
docker compose down              # stop, keep volumes
docker compose down -v           # stop, wipe per-container volumes (palace data is bind-mounted, so unaffected)
docker compose pull              # fetch latest image
docker compose up -d --force-recreate
```

## Image variants

Currently published:

| Tag | Includes | Size (approx.) |
|---|---|---|
| `joakimp/pi-devbox:latest` | base + pi + tooling | ~3.2 GB |
| `joakimp/pi-devbox:vX.Y.Z` | pinned-version equivalent | ~3.2 GB |
| `joakimp/pi-devbox:latest-studio` | `latest` + [pi-studio](https://github.com/omaclaren/pi-studio) (browser prompt editor, KaTeX/Mermaid preview, tmux-backed literate REPLs) | ~3.25 GB |
| `joakimp/pi-devbox:vX.Y.Z-studio` | pinned-version studio equivalent | ~3.25 GB |

Planned for an upcoming minor release:

- `joakimp/pi-devbox:latest-studio-tex` — `-studio` plus `texlive-xetex`
  for PDF export from Studio. Adds ~600 MB on top of `-studio`.

## Using pi-studio (`-studio` variant)

The `-studio` images bundle [pi-studio](https://github.com/omaclaren/pi-studio):
a two-pane browser workspace with a prompt/response editor, live
KaTeX/Mermaid preview, and tmux-backed literate REPLs (Shell / Python /
IPython / Julia / R / GHCi / Clojure). It is registered automatically on
container start (no `pi install` needed) and exposes the `/studio` slash
command plus the `studio_repl_send` / `studio_export_*` agent tools.

Inside a pi session in the container:

```
/studio --no-browser --port 8765      # pin a fixed port; STUDIO_PORT=8765 is the baked default
/studio --status                      # reprint the tokenized URL
```

### Reaching the UI from your browser (the container caveat)

pi-studio **hard-binds its server to `127.0.0.1` inside the container**
(`index.ts`: `.listen(port, "127.0.0.1")`) and serves a tokenized URL.
There is no `--host`/bind flag. This matters for a container: a plain
`docker run -p 8765:8765` publish forwards to the container's *external*
interface, **not** its loopback, so it will not reach Studio. Two paths
work:

**A. Host networking (simplest — OrbStack / single-host, no bridge).**
Run the container with host networking so the container's loopback is the
host's loopback:

```yaml
services:
  devbox:
    network_mode: host     # container 127.0.0.1 == host 127.0.0.1
```

Then `http://127.0.0.1:8765/?token=…` works in a browser on the Docker
host. This is the most secure option (Studio never leaves loopback). Note:
host networking changes `host.docker.internal` semantics, so weigh it
against the LAN-jump SSH feature if you use that.

**B. `studio-expose` bridge (portable — any networking mode).** Publish a
port and run the bundled `studio-expose` helper, which uses `socat` to
bridge the container's loopback to its external interface (binding the
egress IP on the same port, so the token URL Studio printed works
verbatim):

```yaml
services:
  devbox:
    ports:
      - "127.0.0.1:8765:8765"   # host-localhost only
    environment:
      - STUDIO_EXPOSE=1          # auto-start the bridge on container boot
```

With `STUDIO_EXPOSE=1`, the entrypoint starts the bridge for you; just run
`/studio --port 8765` in your pi session. To bridge manually instead
(leave `STUDIO_EXPOSE` unset), run `studio-expose` in a container shell:

```bash
studio-expose            # bridges $STUDIO_PORT (default 8765); --help for details
```

> **Security:** the bridge intentionally exposes Studio beyond loopback;
> its tokenized URL is the only auth. Keep the host-side publish on
> `127.0.0.1:` and use `ssh -L` for remote access. Default is **off**.

### Remote host (SSH / mosh)

When the Docker host is remote, keep Studio on localhost and forward the
port from your laptop:

```bash
ssh -L 8765:127.0.0.1:8765 user@docker-host      # then open the token URL locally
```

**mosh cannot forward ports** (no `-L`/`-R` equivalent). To use Studio
over a mosh session, run a *separate* `ssh -L 8765:127.0.0.1:8765 host`
tunnel alongside mosh (mosh for the shell, ssh for the port), or reach the
host's published port directly over a trusted network (LAN / Tailscale /
WireGuard).

> PDF export (`/studio-pdf`, `studio_export_pdf`) needs a LaTeX engine,
> which is **not** in `-studio` (only the planned `-studio-tex`). HTML
> export, KaTeX, Mermaid, and all REPL features work without it.

## docker-compose.yml — basic shape

```yaml
name: pi-devbox

services:
  devbox:
    image: joakimp/pi-devbox:latest
    container_name: pi-devbox
    stdin_open: true
    tty: true
    env_file:
      - .env
    environment:
      - TERM=xterm-256color
      - GITEA_ACCESS_TOKEN=${GITEA_ACCESS_TOKEN:-}
      - GITEA_HOST=${GITEA_HOST:-}
      - GITHUB_PERSONAL_ACCESS_TOKEN=${GITHUB_PERSONAL_ACCESS_TOKEN:-}
    volumes:
      # Workspace: your host source tree
      - ${WORKSPACE_PATH:-.}:/workspace
      # SSH keys: read-only from host
      - ${SSH_KEY_PATH:-~/.ssh}:/home/developer/.ssh:ro
      # Per-container persistent state
      - devbox-pi-config:/home/developer/.pi
      - devbox-ssh-local:/home/developer/.ssh-local
      - devbox-shell-history:/home/developer/.cache/bash
      - devbox-zoxide:/home/developer/.local/share/zoxide
      - devbox-nvim-data:/home/developer/.local/share/nvim
      - devbox-uv:/home/developer/.local/share/uv
      # Optional (uncomment to enable):
      # - ~/.aws:/home/developer/.aws                          # AWS creds
      # - devbox-palace:/home/developer/.mempalace             # persist palace
      # - devbox-chroma-cache:/home/developer/.cache/chroma    # embedding cache

volumes:
  devbox-pi-config:
  devbox-ssh-local:
  devbox-shell-history:
  devbox-zoxide:
  devbox-nvim-data:
  devbox-uv:
  # devbox-palace:
  # devbox-chroma-cache:
```

See `docker-compose.yml` and `.env.example` in the repo for the full
template (build-from-source args, LAN-jump and skillset mounts, MemPalace
persistence). To share one palace between host pi and the container,
bind-mount your host `~/.mempalace` to `/home/developer/.mempalace`.

## uv-driven REPL recipes

uv is installed in the base image and is the recommended way to run
Python interpreters and notebooks without bloating the image:

| Goal | One-liner |
|---|---|
| IPython REPL | `uv run --with ipython ipython` |
| IPython + scientific stack | `uv run --with ipython --with numpy --with matplotlib --with pandas ipython` |
| JupyterLab (browser, port-forward needed) | `uv run --with jupyterlab jupyter lab --no-browser --port 8888` |
| Marimo (modern alternative) | `uv run --with marimo marimo edit --port 8889` |

For long-lived environments, prefer a project venv:

```bash
cd /workspace/myproj
uv init && uv add ipython numpy matplotlib
# then:
uv run ipython
```

`pyproject.toml` + `uv.lock` then capture the dependency state and
travel with the project in git.

uv only manages Python. For other languages:

| Toolchain | How to add |
|---|---|
| R | `sudo apt-get install r-base-core` (~200 MB) |
| GHCi (Haskell) | `sudo apt-get install ghc` (~700 MB) |
| Clojure | `sudo apt-get install clojure` (~150 MB + JVM) |
| Julia | `juliaup` is planned for an upcoming release |

These are runtime opt-ins and persist only in the container's writable
layer — they don't survive `docker compose down -v` or image updates.

## tldr — first-run cache

The `tldr` command (provided by tealdeer) shows a "Page cache not
found" message on first invocation. To populate the cache:

```bash
tldr --update
```

This fetches ~1500 command pages from the [tldr-pages](https://tldr.sh)
project and caches them in `~/.cache/tealdeer/`. After that, `tldr ls`,
`tldr docker`, etc. work instantly. Re-run `tldr --update` periodically
to refresh.

## Volumes and persistence

| Path inside container | Volume | What survives |
|---|---|---|
| `/workspace` | host bind-mount (`WORKSPACE_PATH`) | host filesystem |
| `~/.ssh` | host bind-mount (read-only, `SSH_KEY_PATH`) | host filesystem |
| `~/.pi` | named volume `devbox-pi-config` | `down -v` wipes |
| `~/.ssh-local` | named volume `devbox-ssh-local` | `down -v` wipes |
| `~/.cache/bash` | named volume `devbox-shell-history` | `down -v` wipes |
| `~/.local/share/zoxide` | named volume `devbox-zoxide` | `down -v` wipes |
| `~/.local/share/nvim` | named volume `devbox-nvim-data` | `down -v` wipes |
| `~/.local/share/uv` | named volume `devbox-uv` | `down -v` wipes |
| `~/.mempalace` | host bind-mount or `devbox-palace` (optional) | host / volume |
| `~/.cache/chroma` | `devbox-chroma-cache` (optional) | `down -v` wipes |

Anything not on a volume is on the writable layer and is lost on
container recreate.

## MemPalace integration

MemPalace is installed in the base image and pre-warmed with the
ChromaDB ONNX embedding model so first-time semantic search is
instant.

The palace data lives at `~/.mempalace/palace` on the host
(bind-mounted into the container). This means:

- A pi running on the host and a pi running inside this container see
  the same palace.
- SQLite's WAL mode handles concurrent reads + single writer cleanly,
  so simultaneous use is safe in practice.

`mempalace-session` and `mempalace-docs` are on PATH for one-off
session/docs mining; the 29 MCP tools (search, kg-query, drawer-add,
diary-write, etc.) are wired into pi automatically by the pi-extensions
mempalace bridge.

## SSH and ControlMaster

The base image preconfigures `Host *` ssh defaults:

```
ControlMaster auto
ControlPath /tmp/sshcm/%r@%h:%p
ControlPersist 10m
```

The socket directory `/tmp/sshcm/` is created mode 700 on every
container start (per-container, tmpfs-friendly). Multiple ssh calls
to the same host within 10 minutes reuse the master TCP flow —
important on residential ISPs with CGNAT per-destination flow caps
(~4 flows on most European broadband; symptoms are
`kex_exchange_identification: Connection closed by remote host` on
the 5th+ concurrent ssh).

User-level overrides in `~/.ssh/config` win because Debian's
`/etc/ssh/ssh_config` includes `/etc/ssh/ssh_config.d/*.conf` before
the `Host *` block.

## tmux and 0-indexed sessions

The image installs `/etc/tmux.conf` with:

```
set -g base-index 0
set -g pane-base-index 0
```

This is the default tmux indexing. It's baked here because `pi-studio`
(shipped in the `:latest-studio` variant) hard-codes its tmux send
target to `<session>:0.0`. If you override `base-index` to 1 in a
personal `~/.tmux.conf`, pi-studio will fail with "can't find window: 0".

## AWS Bedrock auth

If you use Bedrock as pi's LLM provider:

1. Configure SSO on the host: `aws configure sso`
2. Bind-mount `~/.aws:/home/developer/.aws:ro`
3. Set `AWS_PROFILE` and `AWS_REGION` in `.env`
4. Inside the container: `aws sso login` if needed; pi picks up the
   profile via the env vars.

The pi-toolkit AWS env loader (in `~/.pi/agent/`) prepares Bedrock
inference-profile model IDs (with `eu.` / `us.` prefixes) automatically.

## Build pipeline

pi-devbox is built from this repo's CI in two phases:

1. **Base** (`Dockerfile.base`) — produces `joakimp/pi-devbox:base-<hash>`
   where `<hash>` is content-addressed over `Dockerfile.base`,
   `rootfs/`, and `entrypoint*.sh`. Rebuilt only when these change.
2. **Variant** (`Dockerfile.variant`) — `FROM ${BASE_IMAGE}` and adds
   the pi install (+ pi-studio when `INSTALL_STUDIO=true`). The `:latest`
   / `vX.Y.Z` and `:latest-studio` / `vX.Y.Z-studio` tags are produced
   from this layer. The studio variant builds via independent
   `smoke-studio` + `build-variant-studio` CI jobs that gate only the
   `-studio` tags.

Tag naming:

| Tag | Stage |
|---|---|
| `base-<hash>` | base image — internal building block |
| `base-latest` | promoted alias of the most recent base |
| `latest`, `vX.Y.Z` | variant: base + pi |
| `latest-studio`, `vX.Y.Z-studio` | variant: base + pi + pi-studio |

CI resolves `PI_VERSION` to a concrete version string before building
to defeat a registry-buildcache hit on `npm install -g
pi-coding-agent@latest` (the build-arg string would otherwise be
byte-identical across releases and the layer would silently reuse the
previous version's bytes).

## Troubleshooting

### Image grew unexpectedly

`docker history joakimp/pi-devbox:latest` shows per-layer sizes. The
biggest layers are typically the apt block (~600 MB), pi npm install
(~330 MB), MemPalace + ChromaDB (~315 MB), AWS CLI (~270 MB), Node.js
(~200 MB).

### pi can't reach LAN peers on macOS

The LAN-access helper (`/usr/local/lib/pi-devbox/setup-lan-access.sh`)
auto-runs on container start and writes `~/.ssh-local/config` with a
ssh-jump-via-host configuration. Set `DEVBOX_LAN_ACCESS=jump` and
`HOST_SSH_USER=<your-mac-user>` in `.env` if auto-detection fails.

### Smoke-testing a local build

```bash
./scripts/smoke-test.sh joakimp/pi-devbox:latest
```

## Versioning and release

pi-devbox follows semver-ish:

- **Major** — architectural changes. `v1.0.0` is the first decoupled
  release (independent of opencode-devbox).
- **Minor** — new variants, significant base additions.
- **Patch** — pi version bumps, smaller fixes.

The `pi --version` inside the image is asserted by smoke tests to
match the release tag's pi component, so version drift between the
image and the tag is caught at CI time.

## Acknowledgements

pi-devbox was originally a thin re-brand of the `pi-only` variant of
[opencode-devbox](https://gitea.jordbo.se/joakimp/opencode-devbox).
It was decoupled at `v1.0.0` so it could evolve at its own pace, with
self-contained docs and a focused, pi-centric image. Significant base
infrastructure (the SSH ControlMaster setup, MemPalace integration,
the entrypoint UID/GID dance) was adopted from there.

The pi coding-agent itself is [@earendil-works/pi-coding-agent](https://www.npmjs.com/package/@earendil-works/pi-coding-agent).

## License

MIT