Files
pi-devbox/scripts/smoke-test.sh
T
pi c1154f1fa6
Publish Docker Image / resolve-versions (push) Successful in 5s
Publish Docker Image / base-decide (push) Successful in 12s
Publish Docker Image / build-base (push) Successful in 45m47s
Publish Docker Image / smoke (push) Successful in 8m18s
Publish Docker Image / build-variant (push) Successful in 22m41s
Publish Docker Image / update-description (push) Failing after 9s
Publish Docker Image / promote-base-latest (push) Successful in 14s
v1.0.0: decouple from opencode-devbox
Self-contained build chain — own Dockerfile.base + Dockerfile.variant
+ entrypoint scripts + rootfs + CI pipeline. Previously v0.79.0 and
earlier were thin re-brands of opencode-devbox's pi-only variant
(joakimp/pi-devbox:base-pi-only built by opencode-devbox CI).

Architectural changes:
- Replace 5-line Dockerfile shim with full base+variant pair.
- Adapt CI workflow from opencode-devbox/docker-publish-split.yml,
  simplified to a single variant. Includes content-addressed base hash,
  PI_VERSION concrete-resolution to defeat registry-buildcache footgun,
  crane-based base-latest promotion, and the c6f9d11 smoke-test gate.
- pi-devbox releases no longer require rebuilding opencode-devbox first.

Base image additions:
- pandoc, graphviz, imagemagick, yq — broadly useful, ~260 MB total.
- tldr (tealdeer) — Rust port replaces Node tldr global, saves 135 MB.
- /etc/tmux.conf with base-index 0 + pane-base-index 0 — required for
  the planned :latest-studio variant; pi-studio hard-codes :0.0 target.

Smoke test:
- New checks for pandoc, graphviz, imagemagick, yq, tldr, tmux config,
  /tmp/sshcm directory.
- Image-size measurement now sums docker history layers (the prior
  inspect --format='{{.Size}}' returned only the variant-unique layer
  with the new base/variant split, understating by 2+ GB).
- Threshold 2850 → 3500 MB to absorb base additions + arch margin.

Image size:
- Local arm64 build: 3.20 GB. ~390 MB up from prior pi-only equivalent.
- Will tighten threshold once amd64 actuals settle in CI.

Pre-1.0 history preserved at tag pre-v1.0.0-decouple-backup.

Future work:
- v1.1.0: :latest-studio variant (adds pi-studio).
- v1.2.0: :latest-studio-tex variant (adds texlive-xetex for PDF).
- opencode-devbox v2.0.0 will retire INSTALL_PI / pi-only paths.
2026-06-10 01:14:07 +02:00

190 lines
8.7 KiB
Bash
Executable File

#!/usr/bin/env bash
# smoke-test.sh — sanity checks for the pi-devbox image
#
# Usage: ./scripts/smoke-test.sh <image>
#
# Verifies:
# - pi binary present and (if EXPECTED_PI_VERSION set) matches CI's resolved version
# - new v1.0.0 base additions (pandoc, graphviz, imagemagick, yq, tealdeer)
# - tmux 0-indexing baked in /etc/tmux.conf (required for pi-studio variants)
# - pi-toolkit cloned at /opt/pi-toolkit
# - pi-extensions cloned at /opt/pi-extensions
# - pi-fork + pi-observational-memory cloned with node_modules baked
# - entrypoint deploys pi-toolkit keybindings symlink
# - entrypoint deploys ≥4 extensions
# - mempalace bridge symlink present
# - settings.json bootstrapped
# - pi-fork + pi-observational-memory registered via `pi install`
# - image size within threshold
set -euo pipefail
IMAGE="${1:?usage: $0 <image>}"
PASS=0; FAIL=0
# pi-devbox v1.0.0 (decoupled from opencode-devbox) added pandoc, graphviz,
# imagemagick, yq, tealdeer, and a baked /etc/tmux.conf. Local arm64 build
# observed 3.20 GB. CI amd64 builds may differ slightly; threshold below
# carries +300 MB margin to absorb arch differences without false reds.
# Tighten in a follow-up release once amd64 actuals are observed in CI logs.
SIZE_THRESHOLD_MB=3500
run() {
local label="$1"; local cmd="$2"
if docker run --rm --entrypoint="" "$IMAGE" sh -c "$cmd" >/dev/null 2>&1; then
printf " ✅ %s\n" "$label"; PASS=$((PASS+1))
else
printf " ❌ %s\n" "$label"; FAIL=$((FAIL+1))
fi
}
# Stricter version of `run` that asserts an expected substring in stdout.
# Catches the "image bytes silently identical to previous release" class of
# regression — Docker layer cache hit on `npm install -g <pkg>` because the
# bare command string is identical across builds, even when `latest` would
# resolve differently. Discovered 2026-05-23 — every pi-devbox release
# v0.74.0..v0.75.5 had been shipping the same image bytes.
run_expect() {
local label="$1"; local cmd="$2"; local expect="$3"
local out
out=$(docker run --rm --entrypoint="" "$IMAGE" sh -c "$cmd" 2>&1) || true
if echo "$out" | grep -Fq "$expect"; then
printf " ✅ %s (got %s)\n" "$label" "$expect"; PASS=$((PASS+1))
else
printf " ❌ %s — expected substring %q, got: %s\n" "$label" "$expect" "$out"; FAIL=$((FAIL+1))
fi
}
echo "=== pi-devbox smoke test: $IMAGE ==="
echo ""
# ── Binaries ─────────────────────────────────────────────────────────
echo "── Binaries ──"
if [ -n "${EXPECTED_PI_VERSION:-}" ]; then
run_expect "pi version matches build arg" "pi --version" "$EXPECTED_PI_VERSION"
else
run "pi" "pi --version"
fi
run "node" "node --version"
run "git" "git --version"
run "aws" "aws --version"
run "uv" "uv --version"
run "nvim" "nvim --version"
run "mempalace-mcp" "mempalace-mcp --help"
# v1.0.0 base additions — verify presence and basic functionality.
run "pandoc" "pandoc --version"
run "graphviz (dot)" "dot -V"
run "imagemagick" "magick --version"
run "yq" "yq --version"
run "tldr (tealdeer)" "tldr --version"
# ── tmux 0-indexing (required for pi-studio variants) ─────────────────
echo ""
echo "── tmux config ──"
run_expect "/etc/tmux.conf has base-index 0" \
"cat /etc/tmux.conf" "set -g base-index 0"
run_expect "/etc/tmux.conf has pane-base-index 0" \
"cat /etc/tmux.conf" "set -g pane-base-index 0"
# ── Repo clones ───────────────────────────────────────────────────────
echo ""
echo "── Repo clones ──"
run "pi-toolkit clone" "test -d /opt/pi-toolkit && git -C /opt/pi-toolkit rev-parse --short HEAD"
run "pi-extensions clone" "test -d /opt/pi-extensions && git -C /opt/pi-extensions rev-parse --short HEAD"
run "pi-fork clone + node_modules" \
"test -f /opt/pi-fork/package.json && test -d /opt/pi-fork/node_modules"
run "pi-observational-memory clone + node_modules" \
"test -f /opt/pi-observational-memory/package.json && test -d /opt/pi-observational-memory/node_modules"
# ── Runtime deployment (needs entrypoint to run) ──────────────────────
echo ""
echo "── Runtime deployment ──"
# Spin up a long-running container WITHOUT overriding the entrypoint, so
# the baked entrypoint chain (entrypoint.sh → entrypoint-user.sh) runs and
# deploys pi-toolkit + pi-extensions to ~/.pi/agent/. Override CMD to
# tail -f /dev/null so the container stays alive while we docker-exec.
CID=$(docker run -d --rm "$IMAGE" tail -f /dev/null)
cleanup() { docker rm -f "$CID" >/dev/null 2>&1 || true; }
trap cleanup EXIT
# Wait for entrypoint-user.sh to finish deploying pi-toolkit + extensions.
# Gate on BOTH the keybindings symlink (deployed by pi-toolkit) AND the
# mempalace.ts bridge (deployed last by entrypoint-user.sh) AND ≥4 *.ts
# extensions present. Parallel build load can otherwise sample the *.ts
# count mid-deploy and produce a flake. See opencode-devbox c6f9d11
# (2026-06-08) — same fix transplanted.
for i in $(seq 1 45); do
if docker exec "$CID" sh -c '
test -L /home/developer/.pi/agent/keybindings.json && \
test -L /home/developer/.pi/agent/extensions/mempalace.ts && \
count=$(ls -1 /home/developer/.pi/agent/extensions/*.ts 2>/dev/null | wc -l) && \
[ "$count" -ge 4 ]
' >/dev/null 2>&1; then
break
fi
sleep 1
done
exec_test() {
local label="$1"; local cmd="$2"
if docker exec -u developer "$CID" sh -c "$cmd" >/dev/null 2>&1; then
printf " ✅ %s\n" "$label"; PASS=$((PASS+1))
else
printf " ❌ %s\n" "$label"; FAIL=$((FAIL+1))
fi
}
exec_test "keybindings.json (pi-toolkit)" 'test -L $HOME/.pi/agent/keybindings.json && echo ok'
exec_test "extensions ≥ 4 (pi-extensions)" 'count=$(ls -1 $HOME/.pi/agent/extensions/*.ts 2>/dev/null | wc -l); [ $count -ge 4 ] && echo "$count extensions"'
exec_test "mempalace.ts bridge" 'test -L $HOME/.pi/agent/extensions/mempalace.ts && echo ok'
exec_test "settings.json bootstrapped" 'test -f $HOME/.pi/agent/settings.json && echo ok'
# pi-fork + pi-observational-memory are registered by entrypoint-user.sh via
# `pi install /opt/<pkg>`, which runs slightly after the keybindings marker.
for i in $(seq 1 15); do
if docker exec "$CID" grep -q pi-observational-memory \
/home/developer/.pi/agent/settings.json 2>/dev/null; then
break
fi
sleep 1
done
exec_test "pi-fork registered (fork tool)" 'grep -q pi-fork $HOME/.pi/agent/settings.json && echo ok'
exec_test "pi-observational-memory registered (recall tool)" 'grep -q pi-observational-memory $HOME/.pi/agent/settings.json && echo ok'
# ── /tmp/sshcm directory created by entrypoint ────────────────────────
exec_test "/tmp/sshcm dir mode 700 (ssh ControlMaster)" \
'test -d /tmp/sshcm && [ "$(stat -c %a /tmp/sshcm)" = "700" ] && echo ok'
# ── Image size ────────────────────────────────────────────────────────
echo ""
echo "── Image size ──"
# Sum all layers via `docker history`. Docker's `image inspect --format='{{.Size}}'`
# returns ONLY the variant-unique layer when the base is content-addressed and
# shared (the case in this repo's two-phase build), which understates the
# user-facing image size by 2+ GB. Summing layer sizes from history is the
# metric Hub displays to users and the one we actually want to gate on.
SIZE_MB=$(docker history --format '{{.Size}}' "$IMAGE" | python3 -c '
import sys, re
total=0.0
for line in sys.stdin:
s=line.strip()
if s in ("0B", ""): continue
m=re.match(r"^([0-9.]+)(B|kB|MB|GB)$", s)
if not m: continue
v=float(m.group(1)); u=m.group(2)
mult={"B":1/1048576,"kB":1/1024,"MB":1,"GB":1024}[u]
total+=v*mult
print(int(total))
')
if [ -z "$SIZE_MB" ] || [ "$SIZE_MB" = "0" ]; then
printf " ⚠️ image size: could not parse — skipping check\n"
elif [ "$SIZE_MB" -le "$SIZE_THRESHOLD_MB" ]; then
printf " ✅ size: %d MB (threshold %d MB)\n" "$SIZE_MB" "$SIZE_THRESHOLD_MB"; PASS=$((PASS+1))
else
printf " ❌ size: %d MB exceeds threshold %d MB\n" "$SIZE_MB" "$SIZE_THRESHOLD_MB"; FAIL=$((FAIL+1))
fi
# ── Summary ───────────────────────────────────────────────────────────
echo ""
echo "=== Results: ${PASS} passed, ${FAIL} failed ==="
[ "$FAIL" -eq 0 ]