4 Commits

Author SHA1 Message Date
joakimp f7c34091b1 CI: preventative fix for PI_VERSION/OMOS_VERSION cache-hit silent regression
Mirrors the pi-devbox v0.75.5b fix (2026-05-23) onto the four-variant
pipeline here. The with-pi, omos, and omos-with-pi variants install
upstream npm packages whose *_VERSION build-args defaulted to 'latest'.
When the build-arg string is byte-identical across builds, the layer
hash is identical and the registry buildcache silently reuses the layer
from whatever upstream version was current when the cache was first
populated — same mechanism that shipped pi-devbox v0.74.0..v0.75.5 with
identical image bytes.

Currently masked here because OPENCODE_VERSION is a hard-coded ARG that
bumps every release; parent-chain cache invalidation flushes the
downstream pi/omos layers. Masking would fail on any vN.N.Nb opencode-
version-unchanged release that only bumps pi or omos. Filed last night
as parked followup; fixing preventatively now that #5 (AWS SSO inside
tor-ms22 container) cleared.

CHANGES

.gitea/workflows/docker-publish-split.yml — new resolve-versions job
running 'npm view @earendil-works/pi-coding-agent version' and
'npm view oh-my-opencode-slim version', exposing concrete strings as
job outputs. All six affected jobs (smoke-omos, smoke-with-pi,
smoke-omos-with-pi, build-variant-omos, build-variant-with-pi,
build-variant-omos-with-pi) now consume them as PI_VERSION /
OMOS_VERSION build-args. smoke-base / build-variant-base unaffected.

scripts/smoke-test.sh — new run_expect helper asserting an expected
substring in command output. The pi check uses EXPECTED_PI_VERSION;
the omos check uses EXPECTED_OMOS_VERSION against npm ls -g. Both env
vars are wired from resolve-versions outputs in the smoke jobs. Catches
this regression class on the next release, not four releases later.

Dockerfile.variant — comment blocks above OPENCODE_VERSION (source-
pinned, not subject to the bug), PI_VERSION (CI-resolved), and
OMOS_VERSION (CI-resolved) explaining the cache-hit footgun.

AGENTS.md — new convention bullet under 'Critical conventions' naming
the resolve-versions job + EXPECTED_*_VERSION wiring as the contract
to keep in lockstep when modifying variant build-args.

.gitea/README.md — Step 1 expanded to cover the parallel resolve-
versions job alongside base-decide; pipeline diagram updated.

CHANGELOG.md — Unreleased entry describing the fix, masking mechanism,
and audit footprint.

No image-content change expected on the next release vs what 'latest'
would have resolved to anyway. Purely makes the cache invalidate
correctly going forward.
2026-05-24 15:38:36 +00:00
joakimp b6e4d89a2c ci: filter __pycache__ and macOS metadata from base hash compute
Validate / docs-check (push) Successful in 14s
Validate / base-change-warning (push) Successful in 18s
Validate / validate-omos (push) Successful in 4m34s
Validate / validate-omos-with-pi (push) Successful in 4m57s
Validate / validate-with-pi (push) Successful in 6m9s
Validate / validate-base (push) Successful in 14m48s
Defensive against local-vs-CI hash divergence. `find rootfs -type f`
includes gitignored junk like rootfs/__pycache__/*.pyc and macOS
.DS_Store/._AppleDouble files, which CI's clean checkout never sees.

This bit us during v1.15.4 debugging when a stale generate-config.cpython-314.pyc
on the local rootfs/ produced base-3605aa6b6ab1 while CI computed
base-35ee5fe7861a. Took meaningful time to track down because git status
doesn't surface gitignored files.

Verified: same filter applied to current clean tree still produces
35ee5fe7861a (the published v1.15.4b base digest).
2026-05-20 22:45:27 +02:00
joakimp 07e07ec611 Bump opencode 1.14.44 -> 1.14.50; cut over to split-base pipeline
Validate / validate-omos-with-pi (push) Waiting to run
Validate / docs-check (push) Successful in 1m7s
Validate / validate-with-pi (push) Failing after 3m16s
Validate / validate-omos (push) Failing after 3m15s
Validate / validate-base (push) Failing after 6m31s
Publish Docker Image / base-decide (push) Failing after 11m59s
Publish Docker Image / build-base (push) Has been cancelled
Publish Docker Image / smoke-base (push) Has been cancelled
Publish Docker Image / smoke-omos (push) Has been cancelled
Publish Docker Image / smoke-with-pi (push) Has been cancelled
Publish Docker Image / smoke-omos-with-pi (push) Has been cancelled
Publish Docker Image / build-variant-base (push) Has been cancelled
Publish Docker Image / build-variant-omos (push) Has been cancelled
Publish Docker Image / build-variant-with-pi (push) Has been cancelled
Publish Docker Image / build-variant-omos-with-pi (push) Has been cancelled
Publish Docker Image / promote-base-latest (push) Has been cancelled
Publish Docker Image / update-description (push) Has been cancelled
- Bump OPENCODE_VERSION 1.14.44 -> 1.14.50 in Dockerfile.variant
- Cut over: docker-publish-split.yml now triggers on push: tags: v*
  (was workflow_dispatch only). RELEASE_TAG and PROMOTE_LATEST derived
  from github.ref_type/ref_name for tag-push; inputs still available
  for manual workflow_dispatch runs.
- Delete docker-publish.yml (retired, replaced by split-base pipeline)
- Delete Dockerfile (retired, replaced by Dockerfile.base + Dockerfile.variant)
- Update CHANGELOG: promote Unreleased -> v1.14.50
- Update AGENTS.md, .gitea/README.md, validate.yml: remove all references
  to the old single-Dockerfile pipeline and WIP migration plan
2026-05-14 19:39:45 +02:00
joakimp 6fde27c212 Document the build pipeline architecture in .gitea/README.md
Validate / docs-check (push) Successful in 16s
Validate / validate-base (push) Successful in 12m9s
Validate / validate-omos (push) Successful in 16m45s
Validate / validate-with-pi (push) Successful in 13m30s
Validate / validate-omos-with-pi (push) Successful in 15m15s
The split-base build architecture, the NPM_CONFIG_PREFIX gotcha, the
hash-driven base cache reuse mechanism, and the cutover plan from
docker-publish.yml to docker-publish-split.yml were previously
scattered across:
  - inline Dockerfile.base / Dockerfile.variant comments
  - CHANGELOG Unreleased entries
  - AGENTS.md mentions
  - docker-publish-split.yml header comment
  - my own session notes

Consolidate into .gitea/README.md as the canonical architectural doc.
Gitea (like GitHub) auto-renders this when navigating to .gitea/ in
the web UI, so anyone investigating 'why is CI shaped this way?'
finds it on the first click. Cross-referenced from AGENTS.md as the
first thing to read when touching CI.

Covers:
  - The two release pipelines and why both exist
  - Why split-base: cross-variant cache misses on layer-hash-divergence
  - The 6 phases of the split-base pipeline with an ASCII diagram
  - base-decide hash inputs and Docker Hub probe logic
  - NPM_CONFIG_PREFIX variant-override pattern (the volume-shadow trap)
  - Registry cache strategy (mode=max for cross-arch reuse)
  - Wall-clock estimates: version-bump vs base-touching releases
  - Validate workflow role
  - Runner expectations: catthehacker image, disk reclaim, concurrency,
    Gitea Actions @v4 artifact incompatibility
  - 4-step migration plan from docker-publish.yml to .split.yml
  - Cross-refs to related docs

Does not duplicate AGENTS.md content; links to it for domain facts and
release-day checklist.
2026-05-09 19:28:03 +02:00