be2a16834cac39b49b2ee4b3284a4c0e0748b7f2
11 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
be2a16834c |
Cut v1.15.12 — revert v4.0.0 pin (busted), bump pi to 0.76.0
Validate / docs-check (push) Successful in 8s
Validate / base-change-warning (push) Successful in 52s
Validate / validate-base (push) Failing after 3m34s
Publish Docker Image / base-decide (push) Successful in 10s
Publish Docker Image / resolve-versions (push) Successful in 4s
Validate / validate-with-pi (push) Failing after 4m0s
Validate / validate-omos (push) Failing after 6m50s
Validate / validate-omos-with-pi (push) Failing after 12m15s
Publish Docker Image / build-base (push) Failing after 30m40s
Publish Docker Image / smoke-base (push) Has been skipped
Publish Docker Image / smoke-with-pi (push) Has been skipped
Publish Docker Image / build-variant-base (push) Has been skipped
Publish Docker Image / build-variant-with-pi (push) Has been skipped
Publish Docker Image / smoke-omos (push) Has been skipped
Publish Docker Image / build-variant-omos-with-pi (push) Has been skipped
Publish Docker Image / build-variant-omos (push) Has been skipped
Publish Docker Image / smoke-omos-with-pi (push) Has been skipped
Publish Docker Image / promote-base-latest (push) Has been skipped
Publish Docker Image / update-description (push) Has been skipped
The v1.15.11b experiment confirmed setup-buildx-action@v4.1.0 is NOT the regressor: pinning all 9 references to @v4.0.0 reproduced the exact same '400 Bad request' from registry-1.docker.io on the first layer-blob PUT. CI run #336 failed twice (original + Gitea auto-rerun), both with HTML 400 bodies (CDN-tier rejection) at Offset:0. UUIDs and _state signatures differ across attempts; only the failure pattern is stable. Reverting all 9 pins back to @v4 — keeping a wrong pin holds us off action improvements with no benefit. Real suspects now narrow to: runner-image (catthehacker:act-latest, floating), runner-2 host network egress, buildx 0.34.x signed _state token format, or per-repo Hub-side state. Investigation deferred; this release ships via manual docker buildx build --push from a developer Orbstack to bypass the broken runner-network → Hub-CDN combo (we know that path works in ~25s for the same multi-arch build to the same Hub account). PI_VERSION=latest resolves to pi-coding-agent 0.76.0 (published 2026-05-27 20:03 UTC). OPENCODE_VERSION stays at 1.15.11 (no upstream bump since 1.15.11 was published 2026-05-27 03:59 UTC). Files: - .gitea/workflows/docker-publish-split.yml: 9 setup-buildx-action references reverted from @v4.0.0 to @v4 - CHANGELOG.md: v1.15.12 entry with regression triage status (ruled-out vs still-suspect) |
||
|
|
a16da2f041 |
Cut v1.15.11b — pin setup-buildx-action@v4.0.0
Validate / docs-check (push) Successful in 6s
Validate / base-change-warning (push) Successful in 6s
Validate / validate-with-pi (push) Failing after 4m1s
Publish Docker Image / base-decide (push) Successful in 8s
Publish Docker Image / resolve-versions (push) Successful in 5s
Validate / validate-omos-with-pi (push) Failing after 4m52s
Validate / validate-omos (push) Failing after 6m41s
Validate / validate-base (push) Failing after 8m55s
Publish Docker Image / promote-base-latest (push) Has been skipped
Publish Docker Image / update-description (push) Has been skipped
Publish Docker Image / build-base (push) Failing after 37m43s
Publish Docker Image / smoke-base (push) Has been skipped
Publish Docker Image / smoke-omos (push) Has been skipped
Publish Docker Image / smoke-with-pi (push) Has been skipped
Publish Docker Image / build-variant-omos (push) Has been skipped
Publish Docker Image / build-variant-with-pi (push) Has been skipped
Publish Docker Image / smoke-omos-with-pi (push) Has been skipped
Publish Docker Image / build-variant-base (push) Has been skipped
Publish Docker Image / build-variant-omos-with-pi (push) Has been skipped
The v1.15.11 publish failed three times in a row (runs #332/333/334)
with identical '400 Bad request' from registry-1.docker.io on the
multi-arch buildx layer-blob PUT. Triage on 2026-05-27 confirmed:
- Multi-arch buildx push from a developer host: succeeds in 25s
(same Hub account, same multi-arch path)
- Account / repo / Hub-CDN: all healthy
- Last known-good Gitea-runner Hub push: 2026-05-23 ~20:26 UTC
(pi-devbox v0.75.5b) — predates docker/setup-buildx-action v4.1.0
by <24h
- docker/setup-buildx-action@v4 floats to v4.1.0 (published
2026-05-22 16:00 UTC), bundling a newer buildx/buildkit whose
push protocol may trip Hub's CDN URI-length cap on the ~1.4 KB
_state query string in resumable-upload PUT URLs.
Pinning all nine setup-buildx-action references to @v4.0.0 to
test the hypothesis. setup-qemu-action@v3 left floating since
QEMU wasn't in the suspected blast radius. If v4.0.0 publishes
cleanly we keep the pin and file an upstream buildkit/buildx
issue.
No source changes — same OPENCODE_VERSION=1.15.11, same Dockerfile.base
and Dockerfile.variant. v1.15.11 (original tag) is preserved as a
historical marker of the first publish attempt; v1.15.11b becomes
the canonical release.
|
||
|
|
3cbcb44cf5 |
CI: fix resolve-versions to use curl+jq instead of npm view
catthehacker/ubuntu:act-latest ships Node/npm under /opt/acttoolcache/ with PATH updated only in /etc/environment. act_runner (nektos/act) does not source /etc/environment — it reads the Docker image's ENV instructions (inspectResult.Config.Env) which only contain DEBIAN_FRONTEND=noninteractive. So npm is NOT on PATH and 'npm view ...' would have CI-failed on first run. Fix: query the npm registry HTTP API directly with curl+jq, both of which are already used extensively by this workflow (curl for Hub auth/manifest inspect, jq for token parsing). The endpoint https://registry.npmjs.org/<pkg>/latest returns JSON with a 'version' field — equivalent to 'npm view <pkg> version' but with no toolchain dependency. Verified locally: both URLs resolve correctly to 0.75.5 (pi) and 1.1.1 (omos). Evidence: nektos/act pkg/container/docker_run.go reads imageEnv from inspectResult.Config.Env, not /etc/environment. DefaultPathVariable() in linux_container_environment_extensions.go returns a hardcoded path with no /opt/acttoolcache in it. |
||
|
|
f7c34091b1 |
CI: preventative fix for PI_VERSION/OMOS_VERSION cache-hit silent regression
Mirrors the pi-devbox v0.75.5b fix (2026-05-23) onto the four-variant pipeline here. The with-pi, omos, and omos-with-pi variants install upstream npm packages whose *_VERSION build-args defaulted to 'latest'. When the build-arg string is byte-identical across builds, the layer hash is identical and the registry buildcache silently reuses the layer from whatever upstream version was current when the cache was first populated — same mechanism that shipped pi-devbox v0.74.0..v0.75.5 with identical image bytes. Currently masked here because OPENCODE_VERSION is a hard-coded ARG that bumps every release; parent-chain cache invalidation flushes the downstream pi/omos layers. Masking would fail on any vN.N.Nb opencode- version-unchanged release that only bumps pi or omos. Filed last night as parked followup; fixing preventatively now that #5 (AWS SSO inside tor-ms22 container) cleared. CHANGES .gitea/workflows/docker-publish-split.yml — new resolve-versions job running 'npm view @earendil-works/pi-coding-agent version' and 'npm view oh-my-opencode-slim version', exposing concrete strings as job outputs. All six affected jobs (smoke-omos, smoke-with-pi, smoke-omos-with-pi, build-variant-omos, build-variant-with-pi, build-variant-omos-with-pi) now consume them as PI_VERSION / OMOS_VERSION build-args. smoke-base / build-variant-base unaffected. scripts/smoke-test.sh — new run_expect helper asserting an expected substring in command output. The pi check uses EXPECTED_PI_VERSION; the omos check uses EXPECTED_OMOS_VERSION against npm ls -g. Both env vars are wired from resolve-versions outputs in the smoke jobs. Catches this regression class on the next release, not four releases later. Dockerfile.variant — comment blocks above OPENCODE_VERSION (source- pinned, not subject to the bug), PI_VERSION (CI-resolved), and OMOS_VERSION (CI-resolved) explaining the cache-hit footgun. AGENTS.md — new convention bullet under 'Critical conventions' naming the resolve-versions job + EXPECTED_*_VERSION wiring as the contract to keep in lockstep when modifying variant build-args. .gitea/README.md — Step 1 expanded to cover the parallel resolve- versions job alongside base-decide; pipeline diagram updated. CHANGELOG.md — Unreleased entry describing the fix, masking mechanism, and audit footprint. No image-content change expected on the next release vs what 'latest' would have resolved to anyway. Purely makes the cache invalidate correctly going forward. |
||
|
|
b6e4d89a2c |
ci: filter __pycache__ and macOS metadata from base hash compute
Validate / docs-check (push) Successful in 14s
Validate / base-change-warning (push) Successful in 18s
Validate / validate-omos (push) Successful in 4m34s
Validate / validate-omos-with-pi (push) Successful in 4m57s
Validate / validate-with-pi (push) Successful in 6m9s
Validate / validate-base (push) Successful in 14m48s
Defensive against local-vs-CI hash divergence. `find rootfs -type f` includes gitignored junk like rootfs/__pycache__/*.pyc and macOS .DS_Store/._AppleDouble files, which CI's clean checkout never sees. This bit us during v1.15.4 debugging when a stale generate-config.cpython-314.pyc on the local rootfs/ produced base-3605aa6b6ab1 while CI computed base-35ee5fe7861a. Took meaningful time to track down because git status doesn't surface gitignored files. Verified: same filter applied to current clean tree still produces 35ee5fe7861a (the published v1.15.4b base digest). |
||
|
|
8f2c9f5112 |
v1.15.4b: omos-with-pi threshold bump + update-description partial-publish fix
Validate / docs-check (push) Successful in 7s
Validate / base-change-warning (push) Successful in 20s
Validate / validate-base (push) Successful in 3m36s
Publish Docker Image / base-decide (push) Successful in 13s
Publish Docker Image / build-base (push) Has been skipped
Validate / validate-with-pi (push) Successful in 4m14s
Validate / validate-omos (push) Successful in 7m1s
Publish Docker Image / smoke-base (push) Successful in 3m37s
Publish Docker Image / smoke-omos (push) Successful in 4m39s
Publish Docker Image / smoke-omos-with-pi (push) Successful in 5m7s
Publish Docker Image / smoke-with-pi (push) Successful in 6m24s
Validate / validate-omos-with-pi (push) Successful in 15m59s
Publish Docker Image / build-variant-base (push) Successful in 14m12s
Publish Docker Image / build-variant-omos (push) Successful in 19m29s
Publish Docker Image / build-variant-with-pi (push) Successful in 23m7s
Publish Docker Image / build-variant-omos-with-pi (push) Successful in 26m16s
Publish Docker Image / promote-base-latest (push) Has been skipped
Publish Docker Image / update-description (push) Successful in 8s
Recovery for v1.15.4's partial publish (omos-with-pi exceeded 3500 MB smoke threshold; other 3 variants published cleanly). Two changes: 1. omos-with-pi threshold 3500 -> 3700 MB. Compounded growth from opencode 1.15.0 -> 1.15.4 (4 patch versions) plus pi 0.74.0 -> 0.75.3 (minor + 3 patches) summed in the omos-with-pi variant, just over the existing limit. Same pattern as prior threshold bumps (v1.14.31c, v1.15.0b). Restores ~150 MB headroom for routine apt-upgrade drift. 2. update-description workflow bug fix. Pre-existing latent bug exposed by v1.15.4's partial publish: update-description.needs includes all 4 build-variant-* jobs, and gitea Actions' default behavior is 'skipped need => skip dependent' \u2014 even when the job's own if: condition is satisfied. So when build-variant-omos-with-pi was skipped (because its smoke failed), update-description cascaded into a skip too, and Hub description didn't refresh on v1.15.4 despite 3 variants publishing. Fix: wrap if: in always() + explicit success check on the base variant. Same fix applied to promote-base-latest preemptively (it has the same latent bug, currently masked by the cache-hit gate). No image-side changes \u2014 cache hit on base-35ee5fe7861a. |
||
|
|
18b9c9c549 |
CI: harden promote-base-latest (pinned crane + skip on cache-hit)
Validate / docs-check (push) Successful in 10s
Validate / base-change-warning (push) Successful in 16s
Validate / validate-with-pi (push) Successful in 4m10s
Validate / validate-omos (push) Successful in 4m34s
Validate / validate-base (push) Has been cancelled
Validate / validate-omos-with-pi (push) Has been cancelled
Two workflow-only changes for promote-base-latest, no image-side impact: T14 \u2014 replace imjasonh/setup-crane@v0.4 with direct pinned crane install. The action's bootstrap script calls api.github.com/.../releases/latest at every run to discover the crane version. That call periodically rate-limits and returns JSON without .tag_name, jq emits 'null', the action then downloads .../releases/download/null/... \u2192 404 \u2192 'gzip: unexpected end of file' \u2192 exit 2. We hit this on the v1.15.3 release (2026-05-16) where it was cosmetic only \u2014 base-latest was already correct from cache hit \u2014 but the red-X is annoying. Replaced with curl + tar pinned to crane v0.21.6 (latest at time of change). Same pattern as other GitHub-sourced binaries in the Dockerfile layer (gosu, fzf, eza etc.); operator bumps CRANE_VERSION deliberately when wanting updates. T15 \u2014 gate promote-base-latest on need_build == 'true'. When the base layer's content hash hasn't changed (cache hit on existing base-<hash> from a prior run), base-latest already points at the correct digest. The retag is a tautology, and any transient failure of it produces a red-X for an operation that didn't need to happen. Skipping the job entirely on cache-hit is correct and removes a whole class of cosmetic failure. Manual workflow_dispatch with promote_latest=true still bypasses the gate as an escape hatch (e.g., if base-latest got hand-deleted and needs regeneration without rebuilding the base). This will not trigger a CI publish run (main-branch commit, no tag). |
||
|
|
034830710c |
workflow: use github.ref_type directly in promote/update-description if-conditions
Validate / docs-check (push) Successful in 8s
Validate / base-change-warning (push) Successful in 10s
Validate / validate-with-pi (push) Successful in 4m23s
Validate / validate-omos-with-pi (push) Successful in 5m10s
Validate / validate-omos (push) Successful in 7m5s
Validate / validate-base (push) Successful in 10m5s
Gitea Actions evaluates 'env.PROMOTE_LATEST' as empty in YAML 'if:' contexts even though the same env var substitutes correctly in shell run: blocks. Result: on v1.15.0/v1.15.0b tag pushes, the build-variant-* jobs correctly pushed latest-* aliases (shell context), but promote-base-latest and update-description got skipped (YAML context), so the Hub README description wasn't refreshed. Switch to evaluating github.ref_type directly in the if-conditions — matches the production-trigger semantics and avoids the env-var indirection that gitea evaluates inconsistently. |
||
|
|
07e07ec611 |
Bump opencode 1.14.44 -> 1.14.50; cut over to split-base pipeline
Validate / validate-omos-with-pi (push) Waiting to run
Validate / docs-check (push) Successful in 1m7s
Validate / validate-with-pi (push) Failing after 3m16s
Validate / validate-omos (push) Failing after 3m15s
Validate / validate-base (push) Failing after 6m31s
Publish Docker Image / base-decide (push) Failing after 11m59s
Publish Docker Image / build-base (push) Has been cancelled
Publish Docker Image / smoke-base (push) Has been cancelled
Publish Docker Image / smoke-omos (push) Has been cancelled
Publish Docker Image / smoke-with-pi (push) Has been cancelled
Publish Docker Image / smoke-omos-with-pi (push) Has been cancelled
Publish Docker Image / build-variant-base (push) Has been cancelled
Publish Docker Image / build-variant-omos (push) Has been cancelled
Publish Docker Image / build-variant-with-pi (push) Has been cancelled
Publish Docker Image / build-variant-omos-with-pi (push) Has been cancelled
Publish Docker Image / promote-base-latest (push) Has been cancelled
Publish Docker Image / update-description (push) Has been cancelled
- Bump OPENCODE_VERSION 1.14.44 -> 1.14.50 in Dockerfile.variant - Cut over: docker-publish-split.yml now triggers on push: tags: v* (was workflow_dispatch only). RELEASE_TAG and PROMOTE_LATEST derived from github.ref_type/ref_name for tag-push; inputs still available for manual workflow_dispatch runs. - Delete docker-publish.yml (retired, replaced by split-base pipeline) - Delete Dockerfile (retired, replaced by Dockerfile.base + Dockerfile.variant) - Update CHANGELOG: promote Unreleased -> v1.14.50 - Update AGENTS.md, .gitea/README.md, validate.yml: remove all references to the old single-Dockerfile pipeline and WIP migration plan |
||
|
|
7dc836ab66 |
fix: replace echo -e heredoc with brace-block in build-variant tags steps
Validate / docs-check (push) Successful in 13s
Validate / validate-base (push) Successful in 12m21s
Validate / validate-omos (push) Successful in 18m38s
Validate / validate-with-pi (push) Successful in 13m23s
Validate / validate-omos-with-pi (push) Successful in 16m34s
echo -e doesn't interpret \n in /bin/sh (dash), which is the default
shell in catthehacker/ubuntu:act-latest. This caused steps.tags.outputs.tags
to be empty, resulting in 'tag is needed when pushing to registry' from buildx.
Also fixes a secondary bug: TAGS='${TAGS}\n...' stored a literal backslash-n
rather than a real newline, which would have broken multi-tag output when
promote_latest=true.
Fix: replace with a brace block using plain echo, which produces actual newlines
and works in both sh and bash.
|
||
|
|
4c27e6fd8a |
feat: split-base build pipeline (parallel, manual-trigger only)
Validate / docs-check (push) Successful in 15s
Validate / validate-base (push) Successful in 12m13s
Validate / validate-omos (push) Failing after 15m48s
Validate / validate-with-pi (push) Successful in 13m43s
Validate / validate-omos-with-pi (push) Has been cancelled
Two-Dockerfile split-base build alongside the existing single-Dockerfile
pipeline. Goal: cut CI wall clock from ~165-180min to ~30-40min on
typical version-bump-only releases by reusing a base image across the
four variants.
Files added:
- Dockerfile.base variant-independent layers (apt, locales, AWS
CLI, Node.js, mempalace, gitea-mcp, user setup,
chromadb prewarm, ENVs, entrypoints).
- Dockerfile.variant FROMs ${BASE_IMAGE} and adds opencode / pi /
omos / Go installs gated by INSTALL_* args.
Each npm install -g uses NPM_CONFIG_PREFIX=/usr
per-RUN to keep baked binaries off the volume-
shadowed ~/.pi/npm-global path inherited from
base.
- .gitea/workflows/docker-publish-split.yml
workflow_dispatch-only pipeline:
base-decide -> build-base (conditional) ->
smoke-* (4 parallel) -> build-variant-*
(4 parallel) -> promote-base-latest ->
update-description. Hash-driven base reuse:
if base-<sha> already exists on Docker Hub,
the build is skipped entirely. Inputs:
release_tag (test tag suffix, default
v0.0.0-split-test) and promote_latest
(default false; gates latest-* aliases and
Hub description update).
Files unchanged:
- Dockerfile, docker-publish.yml, validate.yml all left in place so
the production tag-push pipeline keeps working untouched.
Migration plan (in CHANGELOG Unreleased):
1. workflow_dispatch test run with promote_latest=false; verify the
four variant images smoke-pass and have plausible sizes.
2. Compare manifest digests against the same-version output from the
production pipeline (independent test run on the same commit).
3. Once verified across 1-2 release cycles, swap docker-publish-split.yml
to on: push: tags: v* and retire docker-publish.yml.
AGENTS.md and CHANGELOG.md updated with file roles and the migration
plan. Production pipeline behavior is bit-for-bit unchanged on this
branch.
|