CI: preventative fix for PI_VERSION/OMOS_VERSION cache-hit silent regression

Mirrors the pi-devbox v0.75.5b fix (2026-05-23) onto the four-variant
pipeline here. The with-pi, omos, and omos-with-pi variants install
upstream npm packages whose *_VERSION build-args defaulted to 'latest'.
When the build-arg string is byte-identical across builds, the layer
hash is identical and the registry buildcache silently reuses the layer
from whatever upstream version was current when the cache was first
populated — same mechanism that shipped pi-devbox v0.74.0..v0.75.5 with
identical image bytes.

Currently masked here because OPENCODE_VERSION is a hard-coded ARG that
bumps every release; parent-chain cache invalidation flushes the
downstream pi/omos layers. Masking would fail on any vN.N.Nb opencode-
version-unchanged release that only bumps pi or omos. Filed last night
as parked followup; fixing preventatively now that #5 (AWS SSO inside
tor-ms22 container) cleared.

CHANGES

.gitea/workflows/docker-publish-split.yml — new resolve-versions job
running 'npm view @earendil-works/pi-coding-agent version' and
'npm view oh-my-opencode-slim version', exposing concrete strings as
job outputs. All six affected jobs (smoke-omos, smoke-with-pi,
smoke-omos-with-pi, build-variant-omos, build-variant-with-pi,
build-variant-omos-with-pi) now consume them as PI_VERSION /
OMOS_VERSION build-args. smoke-base / build-variant-base unaffected.

scripts/smoke-test.sh — new run_expect helper asserting an expected
substring in command output. The pi check uses EXPECTED_PI_VERSION;
the omos check uses EXPECTED_OMOS_VERSION against npm ls -g. Both env
vars are wired from resolve-versions outputs in the smoke jobs. Catches
this regression class on the next release, not four releases later.

Dockerfile.variant — comment blocks above OPENCODE_VERSION (source-
pinned, not subject to the bug), PI_VERSION (CI-resolved), and
OMOS_VERSION (CI-resolved) explaining the cache-hit footgun.

AGENTS.md — new convention bullet under 'Critical conventions' naming
the resolve-versions job + EXPECTED_*_VERSION wiring as the contract
to keep in lockstep when modifying variant build-args.

.gitea/README.md — Step 1 expanded to cover the parallel resolve-
versions job alongside base-decide; pipeline diagram updated.

CHANGELOG.md — Unreleased entry describing the fix, masking mechanism,
and audit footprint.

No image-content change expected on the next release vs what 'latest'
would have resolved to anyway. Purely makes the cache invalidate
correctly going forward.
This commit is contained in:
2026-05-24 15:38:36 +00:00
parent 4cce39d167
commit f7c34091b1
6 changed files with 151 additions and 17 deletions
+30 -7
View File
@@ -30,10 +30,10 @@ The split-base architecture is what the `docker-publish-split.yml` workflow exer
┌──────────────────┐
│ base-decide │ compute base-<hash>;
│ │ probe Docker Hub.
│ hash inputs: │
│ Dockerfile.base│
│ rootfs/ │
│ entrypoint*.sh │
│ hash inputs: │ (resolve-versions
│ Dockerfile.base│ runs in parallel:
│ rootfs/ │ npm view pi/omos
│ entrypoint*.sh │ → concrete versions)
└────────┬─────────┘
┌─────────────┴─────────────┐
@@ -73,10 +73,10 @@ The split-base architecture is what the `docker-publish-split.yml` workflow exer
└──────────────────────────┘
```
### Step 1: `base-decide`
### Step 1: `base-decide` (and `resolve-versions` in parallel)
Compute a SHA-256 hash over the inputs that determine the base image's
content:
**`base-decide`** computes a SHA-256 hash over the inputs that determine
the base image's content:
```sh
{
@@ -106,6 +106,29 @@ This is the core cache-reuse mechanism. Version-bump-only releases
that change anything in the base — apt packages, AWS CLI, Node version,
locale list, entrypoint scripts — pay the full base-build cost once.
**`resolve-versions`** runs alongside `base-decide` (no `needs:`
dependency between them) and resolves the floating npm packages whose
`*_VERSION` build-args default to `latest`:
```sh
PI_VERSION=$(npm view @earendil-works/pi-coding-agent version)
OMOS_VERSION=$(npm view oh-my-opencode-slim version)
```
The outputs (`pi_version`, `omos_version`) are consumed by every variant
smoke and build job that installs pi or omos. **Why this exists:** without
it, the `npm install -g` RUN layer in `Dockerfile.variant` hashes
identically across builds (same ARG default, same command string), so
the registry buildcache silently reuses the layer from whatever upstream
version was current when the cache was first populated. This is the
cache-hit silent-regression class of bug that shipped pi-devbox v0.74.0
through v0.75.5 with identical image bytes (fixed in pi-devbox v0.75.5b
2026-05-23). Currently masked here by `OPENCODE_VERSION` bumping every
release (parent-chain cache-key invalidation), but masking would fail on
a `vN.N.Nb` opencode-version-unchanged release that only bumps pi or
omos. Smoke jobs additionally assert `EXPECTED_PI_VERSION` /
`EXPECTED_OMOS_VERSION` against the resolved values.
### Step 2: `build-base` (conditional)
Only runs when `need_build=true`. Multi-arch (amd64 + arm64) build of