Files
pi-devbox/scripts/smoke-test.sh
T
joakimp 34cae2a1d2
Publish Docker Image / smoke (push) Successful in 2m18s
Publish Docker Image / publish (push) Successful in 12m59s
Publish Docker Image / update-description (push) Successful in 11s
Cut v0.75.5b — fix cache-hit silent same-bytes regression
ALL FOUR releases v0.74.0 -> v0.75.5 had been shipping the same image
bytes due to a Docker layer-cache hit on the bare 'npm install -g
@earendil-works/pi-coding-agent' command (when PI_VERSION=latest).
The command string is identical across builds, so the layer-hash is
identical, so registry buildcache (cache-from/cache-to) silently
reuses the layer from whatever pi version was current when the cache
was first populated.

Verification: docker manifest inspect joakimp/pi-devbox:vX.Y.Z showed
identical SHA256 digests on both linux/amd64 and linux/arm64 for
v0.74.0, v0.75.3, v0.75.4, v0.75.5. Users on :latest were getting
whatever pi version was baked into the v0.74.0 build.

DISCOVERED 2026-05-23 by user trying to update pi-devbox on MBP-M1
and seeing pi 0.74.0 reported despite pulling v0.75.5.

CHANGES

.gitea/workflows/docker-publish.yml — both smoke and publish jobs
get a new 'Resolve PI_VERSION from tag' step that strips the leading
'v' and any trailing letter suffix from github.ref_name. Result is
passed as a build-arg to docker/build-push-action so the npm install
layer's hash includes the concrete version, forcing cache miss when
pi bumps.

scripts/smoke-test.sh — new run_expect helper that asserts pi
--version contains the EXPECTED_PI_VERSION env var. Smoke job sets
this from the resolve step output. Would have caught this regression
on v0.75.3.

Dockerfile — comment block above ARG PI_VERSION=latest documenting
the cache-hit footgun. The 'if latest' branch in the install RUN is
preserved for local dev convenience but never fires in CI now.

AGENTS.md — new convention bullet explaining the cache-hit class of
bug and noting the latent same-bug in opencode-devbox's with-pi
variants (currently masked by OPENCODE_VERSION bumps; will manifest
when cutting a vN.N.Nb-style opencode-version-unchanged release that
only bumps pi).

CHANGELOG.md — full entry under v0.75.5b describing the recovery,
the silent-failure mechanism, and the verification steps.

NO IMAGE-CONTENT CHANGES vs v0.75.5 INTENT. This build produces the
actual pi 0.75.5 image content that v0.75.5 was supposed to ship.

NEXT FOLLOWUP (parked, not in this commit)

opencode-devbox should get the same workflow change for its
build-variant-with-pi and build-variant-omos-with-pi jobs. Currently
masked because every release also bumps OPENCODE_VERSION which
invalidates the cache, but that masking would fail on a pi-only bump
release.
2026-05-23 22:10:08 +02:00

118 lines
5.0 KiB
Bash
Executable File

#!/usr/bin/env bash
# smoke-test.sh — basic sanity checks for the pi-devbox image
#
# Usage: ./scripts/smoke-test.sh <image>
#
# Verifies:
# - pi binary present and returns a version
# - pi-toolkit cloned at /opt/pi-toolkit
# - pi-extensions cloned at /opt/pi-extensions
# - entrypoint deploys pi-toolkit keybindings symlink
# - entrypoint deploys ≥4 extensions
# - mempalace bridge symlink present
# - settings.json bootstrapped
# - image size within threshold
set -euo pipefail
IMAGE="${1:?usage: $0 <image>}"
PASS=0; FAIL=0
SIZE_THRESHOLD_MB=2200
run() {
local label="$1"; local cmd="$2"
if docker run --rm --entrypoint="" "$IMAGE" sh -c "$cmd" >/dev/null 2>&1; then
printf " ✅ %s\n" "$label"; PASS=$((PASS+1))
else
printf " ❌ %s\n" "$label"; FAIL=$((FAIL+1))
fi
}
# Stricter version of `run` that also asserts an expected substring in stdout.
# Used for catching the "image bytes silently identical to previous release"
# class of regression (Docker layer cache hit on `npm install -g <pkg>` because
# the bare command string is identical across builds, even when `latest` would
# resolve differently). Discovered 2026-05-23 — every pi-devbox release v0.74.0
# through v0.75.5 had been shipping the same image bytes.
run_expect() {
local label="$1"; local cmd="$2"; local expect="$3"
local out
out=$(docker run --rm --entrypoint="" "$IMAGE" sh -c "$cmd" 2>&1) || true
if echo "$out" | grep -Fq "$expect"; then
printf " ✅ %s (got %s)\n" "$label" "$expect"; PASS=$((PASS+1))
else
printf " ❌ %s — expected substring %q, got: %s\n" "$label" "$expect" "$out"; FAIL=$((FAIL+1))
fi
}
echo "=== pi-devbox smoke test: $IMAGE ==="
echo ""
# ── Basic binary checks ───────────────────────────────────────────────
echo "── Binaries ──"
if [ -n "${EXPECTED_PI_VERSION:-}" ]; then
run_expect "pi version matches build arg" "pi --version" "$EXPECTED_PI_VERSION"
else
run "pi" "pi --version"
fi
run "node" "node --version"
run "git" "git --version"
run "aws" "aws --version"
run "uv" "uv --version"
run "nvim" "nvim --version"
run "mempalace-mcp" "mempalace-mcp --help"
# ── Repo clones ───────────────────────────────────────────────────────
echo ""
echo "── Repo clones ──"
run "pi-toolkit clone" "test -d /opt/pi-toolkit && git -C /opt/pi-toolkit rev-parse --short HEAD"
run "pi-extensions clone" "test -d /opt/pi-extensions && git -C /opt/pi-extensions rev-parse --short HEAD"
# ── Runtime deployment (needs entrypoint to run) ──────────────────────
echo ""
echo "── Runtime deployment ──"
# Spin up a long-running container WITHOUT overriding the entrypoint, so
# the baked entrypoint chain (entrypoint.sh → entrypoint-user.sh) runs and
# deploys pi-toolkit + pi-extensions to ~/.pi/agent/. Override CMD to
# tail -f /dev/null so the container stays alive while we docker-exec.
CID=$(docker run -d --rm "$IMAGE" tail -f /dev/null)
cleanup() { docker rm -f "$CID" >/dev/null 2>&1 || true; }
trap cleanup EXIT
# Wait for entrypoint-user.sh to finish deploying pi-toolkit + extensions
for i in $(seq 1 30); do
if docker exec "$CID" test -L /home/developer/.pi/agent/keybindings.json 2>/dev/null; then
break
fi
sleep 1
done
exec_test() {
local label="$1"; local cmd="$2"
if docker exec -u developer "$CID" sh -c "$cmd" >/dev/null 2>&1; then
printf " ✅ %s\n" "$label"; PASS=$((PASS+1))
else
printf " ❌ %s\n" "$label"; FAIL=$((FAIL+1))
fi
}
exec_test "keybindings.json (pi-toolkit)" 'test -L $HOME/.pi/agent/keybindings.json && echo ok'
exec_test "extensions ≥ 4 (pi-extensions)" 'count=$(ls -1 $HOME/.pi/agent/extensions/*.ts 2>/dev/null | wc -l); [ $count -ge 4 ] && echo "$count extensions"'
exec_test "mempalace.ts bridge" 'test -L $HOME/.pi/agent/extensions/mempalace.ts && echo ok'
exec_test "settings.json bootstrapped" 'test -f $HOME/.pi/agent/settings.json && echo ok'
# ── Image size ────────────────────────────────────────────────────────
echo ""
echo "── Image size ──"
SIZE_MB=$(docker image inspect "$IMAGE" --format='{{.Size}}' | awk '{printf "%d", $1/1048576}')
if [ "$SIZE_MB" -le "$SIZE_THRESHOLD_MB" ]; then
printf " ✅ size: %d MB (threshold %d MB)\n" "$SIZE_MB" "$SIZE_THRESHOLD_MB"; PASS=$((PASS+1))
else
printf " ❌ size: %d MB exceeds threshold %d MB\n" "$SIZE_MB" "$SIZE_THRESHOLD_MB"; FAIL=$((FAIL+1))
fi
# ── Summary ───────────────────────────────────────────────────────────
echo ""
echo "=== Results: ${PASS} passed, ${FAIL} failed ==="
[ "$FAIL" -eq 0 ]