440218fc4cfe07279ff8b8d03301875b0813f67d
37 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
1e98b53113 |
feat: publish pi-only build into the pi-devbox repo, not opencode-devbox (Option B)
The pi-only variant was published as opencode-devbox:latest-pi-only —
an 'opencode-devbox' tag containing no opencode, which confused users.
- build-variant-pi-only now pushes joakimp/pi-devbox:base-pi-only[-vX.Y.Z]
instead of opencode-devbox:*-pi-only. New PI_IMAGE workflow env.
- Still built from the same Dockerfile.variant (single source of truth),
still smoke-tested by smoke-pi-only / validate-pi-only before publish.
- De-advertised pi-only from README, DOCKER_HUB (HUB_TEMPLATE), AGENTS,
.gitea/README. opencode-devbox now publishes 8 tags + base-latest.
- Documented in CHANGELOG (Unreleased) and the plan doc.
Note: old opencode-devbox:{latest,vX.Y.Z}-pi-only tags from v1.15.13b are
superseded and should be deleted from Docker Hub.
|
||
|
|
237588253f |
docs: fix stale variant/job counts missed in pi-only sweep
Validate / base-change-warning (push) Successful in 6s
Validate / docs-check (push) Successful in 17s
Validate / validate-base (push) Successful in 3m40s
Validate / validate-with-pi (push) Failing after 4m43s
Validate / validate-omos (push) Successful in 7m7s
Validate / validate-pi-only (push) Failing after 3m44s
Validate / validate-omos-with-pi (push) Failing after 18m16s
- AGENTS.md: 'eight load:true jobs' -> ten (add validate-pi-only, smoke-pi-only) - .gitea/README.md: 'four variants / eight tags' -> five / ten - docs/manual-host-publish.md: 'Variants x4 / 10 tags' -> x5 / 12 tags These are living operational facts; the remaining 'four/eight' hits are illustrative meta-instructions or dated historical CHANGELOG entries (correct as-is). |
||
|
|
fc034ceade |
feat: add pi-only variant (pi without opencode) as basis for pi-devbox
Validate / docs-check (push) Successful in 10s
Validate / base-change-warning (push) Successful in 23s
Validate / validate-omos (push) Successful in 4m36s
Validate / validate-omos-with-pi (push) Failing after 5m40s
Validate / validate-with-pi (push) Failing after 7m35s
Validate / validate-pi-only (push) Failing after 3m45s
Validate / validate-base (push) Failing after 16m12s
All opencode-devbox variants set INSTALL_OPENCODE=true, so pointing pi-devbox at with-pi dragged opencode along and made it ~a re-tag of latest-with-pi. Add a 5th variant pi-only (INSTALL_OPENCODE=false, INSTALL_PI=true): pi + companions (toolkit, extensions, fork, recall) + base tooling, no opencode (~145 MB lighter than with-pi). - Dockerfile.variant: document pi-only in the variant table. - CI docker-publish-split.yml: new smoke-pi-only + build-variant-pi-only jobs (tags :VERSION-pi-only / :latest-pi-only, multi-arch); wired into promote-base-latest and update-description needs. - validate.yml: new validate-pi-only main-branch gate job. - smoke-test.sh: accept --variant pi-only; threshold 2750 MB; opencode-absent path already handled. - Docs: HUB_TEMPLATE (regenerated DOCKER_HUB.md), README, AGENTS (variant/tag counts 4->5, 8->10 tags), .gitea/README, manual-host-publish.sh (5 variants), plan doc implementation note. This is the single source of truth for joakimp/pi-devbox, which now FROMs latest-pi-only. Versions unchanged (opencode 1.15.13, pi 0.78.0). |
||
|
|
f09a4f382a |
feat: host-agnostic LAN access (base) + fork/recall in pi variants
Validate / base-change-warning (push) Successful in 22s
Validate / docs-check (push) Successful in 44s
Validate / validate-base (push) Successful in 3m27s
Validate / validate-omos (push) Successful in 7m3s
Validate / validate-with-pi (push) Failing after 4m33s
Validate / validate-omos-with-pi (push) Failing after 8m29s
Item A — LAN access (base image): - New rootfs/usr/local/lib/opencode-devbox/setup-lan-access.sh, invoked non-fatally from entrypoint-user.sh. On VM-backed hosts (macOS OrbStack / Docker Desktop, detected via host.docker.internal) it generates a writable ~/.ssh-local/config that uses the host as an SSH jump to reach LAN peers; no-op on native Linux. Ships the mechanism (generic 'host' jump alias), not policy (targets stay in the user's bind-mounted ~/.ssh/config). - New env knobs: DEVBOX_LAN_ACCESS (auto|jump|off), HOST_SSH_USER, DEVBOX_HOST_ALIAS. dssh/dscp aliases in .bash_aliases (guarded). Item B — pi-fork (fork) + pi-observational-memory (recall) in pi variants: - Dockerfile.variant clones both elpapi42 repos to /opt and runs npm install there at build time (local-path 'pi install' does not npm-install, so deps must be present to load). New args PI_FORK_REPO/REF, PI_OBSMEM_REPO/REF. - entrypoint-user.sh registers them at runtime via 'pi install /opt/<pkg>' (instant, in-place, idempotent; tools bind on next pi start). - CI resolve-versions resolves each repo's master HEAD to a commit SHA and passes PI_FORK_REF/PI_OBSMEM_REF — same cache-hit guard as PI_VERSION. - smoke-test asserts /opt clones + node_modules + settings.json registration; size thresholds bumped (with-pi 2700->2900, omos-with-pi 3700->3900). Versions unchanged (opencode 1.15.13, pi 0.78.0 — both still latest). Docs: README LAN section + env table, .env.example, AGENTS.md, CHANGELOG. Plan recorded in docs/plan-lan-access-and-pi-extensions.md. |
||
|
|
1fe5b5df91 |
ci: workflow-level 3-attempt retry around buildx build --push
Validate / docs-check (push) Successful in 7s
Validate / base-change-warning (push) Successful in 6s
Validate / validate-with-pi (push) Successful in 4m11s
Validate / validate-omos (push) Successful in 4m31s
Validate / validate-base (push) Successful in 5m19s
Validate / validate-omos-with-pi (push) Successful in 11m38s
Belt-and-braces against transient registry-1.docker.io blips (rate limits, brief 5xx, CDN flap). Replaces all five push docker/build-push- action@v7 invocations (1 base + 4 variants) with shell: bash steps that run docker buildx build --push in a for-loop with backoff (15s, 30s). Smoke build steps (load: true, no push) are untouched. Does NOT mask deterministic failures: a true regression (e.g. the cache-export 400 we hit 2026-05-23..28) fails all 3 attempts identically and the job still fails by design. Orthogonal layer to both cache-export disablement and the ci-release-watcher skill's transient-rerun heuristic. - AGENTS.md: new Critical conventions bullet documenting the retry pattern, the consistency rule across push steps, and why it's duplicated rather than factored (Gitea Actions doesn't support reusable composite shell steps cleanly). - CHANGELOG.md: Unreleased section addendum, no image-side change. No image-side change. |
||
|
|
51ec4a88cf |
CI: drop registry cache-export from build-base (Hub 400 root cause)
Validate / base-change-warning (push) Successful in 6s
Validate / docs-check (push) Successful in 13s
Validate / validate-with-pi (push) Successful in 4m9s
Validate / validate-omos (push) Successful in 4m31s
Validate / validate-base (push) Successful in 5m40s
Validate / validate-omos-with-pi (push) Successful in 12m49s
Diagnosed during manual v1.15.12 publish: buildkit's mode=max cache export to registry-1.docker.io reproducibly returns HTTP 400 with HTML body on the resumable-upload PUT. Image push (layers + manifest) works fine in parallel; only --cache-to fails. Removing cache-from/cache-to lets the publish complete. This explains all four prior CI failures (runs 332/333/334/336) which shared the exact same failure shape. Action-pin hypothesis (setup-buildx-action v4.1.0) was correctly disproven by run 336 with v4.0.0 pinned. Trade-off: every Dockerfile.base change now pays the full ~3 min multi-arch build. Unchanged bases short-circuit at the content-addressed probe step in base-decide and never re-build, so day-to-day cost is zero. Re-enable when moby/buildkit upstream resolves the cache-export protocol mismatch with Hub CDN, or when we can switch to a non-registry cache backend. CHANGELOG.md: full root-cause writeup in Unreleased section, including status update on every prior suspect (all ruled out). |
||
|
|
be2a16834c |
Cut v1.15.12 — revert v4.0.0 pin (busted), bump pi to 0.76.0
Validate / docs-check (push) Successful in 8s
Validate / base-change-warning (push) Successful in 52s
Validate / validate-base (push) Failing after 3m34s
Publish Docker Image / base-decide (push) Successful in 10s
Publish Docker Image / resolve-versions (push) Successful in 4s
Validate / validate-with-pi (push) Failing after 4m0s
Validate / validate-omos (push) Failing after 6m50s
Validate / validate-omos-with-pi (push) Failing after 12m15s
Publish Docker Image / build-base (push) Failing after 30m40s
Publish Docker Image / smoke-base (push) Has been skipped
Publish Docker Image / smoke-with-pi (push) Has been skipped
Publish Docker Image / build-variant-base (push) Has been skipped
Publish Docker Image / build-variant-with-pi (push) Has been skipped
Publish Docker Image / smoke-omos (push) Has been skipped
Publish Docker Image / build-variant-omos-with-pi (push) Has been skipped
Publish Docker Image / build-variant-omos (push) Has been skipped
Publish Docker Image / smoke-omos-with-pi (push) Has been skipped
Publish Docker Image / promote-base-latest (push) Has been skipped
Publish Docker Image / update-description (push) Has been skipped
The v1.15.11b experiment confirmed setup-buildx-action@v4.1.0 is NOT the regressor: pinning all 9 references to @v4.0.0 reproduced the exact same '400 Bad request' from registry-1.docker.io on the first layer-blob PUT. CI run #336 failed twice (original + Gitea auto-rerun), both with HTML 400 bodies (CDN-tier rejection) at Offset:0. UUIDs and _state signatures differ across attempts; only the failure pattern is stable. Reverting all 9 pins back to @v4 — keeping a wrong pin holds us off action improvements with no benefit. Real suspects now narrow to: runner-image (catthehacker:act-latest, floating), runner-2 host network egress, buildx 0.34.x signed _state token format, or per-repo Hub-side state. Investigation deferred; this release ships via manual docker buildx build --push from a developer Orbstack to bypass the broken runner-network → Hub-CDN combo (we know that path works in ~25s for the same multi-arch build to the same Hub account). PI_VERSION=latest resolves to pi-coding-agent 0.76.0 (published 2026-05-27 20:03 UTC). OPENCODE_VERSION stays at 1.15.11 (no upstream bump since 1.15.11 was published 2026-05-27 03:59 UTC). Files: - .gitea/workflows/docker-publish-split.yml: 9 setup-buildx-action references reverted from @v4.0.0 to @v4 - CHANGELOG.md: v1.15.12 entry with regression triage status (ruled-out vs still-suspect) |
||
|
|
a16da2f041 |
Cut v1.15.11b — pin setup-buildx-action@v4.0.0
Validate / docs-check (push) Successful in 6s
Validate / base-change-warning (push) Successful in 6s
Validate / validate-with-pi (push) Failing after 4m1s
Publish Docker Image / base-decide (push) Successful in 8s
Publish Docker Image / resolve-versions (push) Successful in 5s
Validate / validate-omos-with-pi (push) Failing after 4m52s
Validate / validate-omos (push) Failing after 6m41s
Validate / validate-base (push) Failing after 8m55s
Publish Docker Image / promote-base-latest (push) Has been skipped
Publish Docker Image / update-description (push) Has been skipped
Publish Docker Image / build-base (push) Failing after 37m43s
Publish Docker Image / smoke-base (push) Has been skipped
Publish Docker Image / smoke-omos (push) Has been skipped
Publish Docker Image / smoke-with-pi (push) Has been skipped
Publish Docker Image / build-variant-omos (push) Has been skipped
Publish Docker Image / build-variant-with-pi (push) Has been skipped
Publish Docker Image / smoke-omos-with-pi (push) Has been skipped
Publish Docker Image / build-variant-base (push) Has been skipped
Publish Docker Image / build-variant-omos-with-pi (push) Has been skipped
The v1.15.11 publish failed three times in a row (runs #332/333/334)
with identical '400 Bad request' from registry-1.docker.io on the
multi-arch buildx layer-blob PUT. Triage on 2026-05-27 confirmed:
- Multi-arch buildx push from a developer host: succeeds in 25s
(same Hub account, same multi-arch path)
- Account / repo / Hub-CDN: all healthy
- Last known-good Gitea-runner Hub push: 2026-05-23 ~20:26 UTC
(pi-devbox v0.75.5b) — predates docker/setup-buildx-action v4.1.0
by <24h
- docker/setup-buildx-action@v4 floats to v4.1.0 (published
2026-05-22 16:00 UTC), bundling a newer buildx/buildkit whose
push protocol may trip Hub's CDN URI-length cap on the ~1.4 KB
_state query string in resumable-upload PUT URLs.
Pinning all nine setup-buildx-action references to @v4.0.0 to
test the hypothesis. setup-qemu-action@v3 left floating since
QEMU wasn't in the suspected blast radius. If v4.0.0 publishes
cleanly we keep the pin and file an upstream buildkit/buildx
issue.
No source changes — same OPENCODE_VERSION=1.15.11, same Dockerfile.base
and Dockerfile.variant. v1.15.11 (original tag) is preserved as a
historical marker of the first publish attempt; v1.15.11b becomes
the canonical release.
|
||
|
|
3cbcb44cf5 |
CI: fix resolve-versions to use curl+jq instead of npm view
catthehacker/ubuntu:act-latest ships Node/npm under /opt/acttoolcache/ with PATH updated only in /etc/environment. act_runner (nektos/act) does not source /etc/environment — it reads the Docker image's ENV instructions (inspectResult.Config.Env) which only contain DEBIAN_FRONTEND=noninteractive. So npm is NOT on PATH and 'npm view ...' would have CI-failed on first run. Fix: query the npm registry HTTP API directly with curl+jq, both of which are already used extensively by this workflow (curl for Hub auth/manifest inspect, jq for token parsing). The endpoint https://registry.npmjs.org/<pkg>/latest returns JSON with a 'version' field — equivalent to 'npm view <pkg> version' but with no toolchain dependency. Verified locally: both URLs resolve correctly to 0.75.5 (pi) and 1.1.1 (omos). Evidence: nektos/act pkg/container/docker_run.go reads imageEnv from inspectResult.Config.Env, not /etc/environment. DefaultPathVariable() in linux_container_environment_extensions.go returns a hardcoded path with no /opt/acttoolcache in it. |
||
|
|
f7c34091b1 |
CI: preventative fix for PI_VERSION/OMOS_VERSION cache-hit silent regression
Mirrors the pi-devbox v0.75.5b fix (2026-05-23) onto the four-variant pipeline here. The with-pi, omos, and omos-with-pi variants install upstream npm packages whose *_VERSION build-args defaulted to 'latest'. When the build-arg string is byte-identical across builds, the layer hash is identical and the registry buildcache silently reuses the layer from whatever upstream version was current when the cache was first populated — same mechanism that shipped pi-devbox v0.74.0..v0.75.5 with identical image bytes. Currently masked here because OPENCODE_VERSION is a hard-coded ARG that bumps every release; parent-chain cache invalidation flushes the downstream pi/omos layers. Masking would fail on any vN.N.Nb opencode- version-unchanged release that only bumps pi or omos. Filed last night as parked followup; fixing preventatively now that #5 (AWS SSO inside tor-ms22 container) cleared. CHANGES .gitea/workflows/docker-publish-split.yml — new resolve-versions job running 'npm view @earendil-works/pi-coding-agent version' and 'npm view oh-my-opencode-slim version', exposing concrete strings as job outputs. All six affected jobs (smoke-omos, smoke-with-pi, smoke-omos-with-pi, build-variant-omos, build-variant-with-pi, build-variant-omos-with-pi) now consume them as PI_VERSION / OMOS_VERSION build-args. smoke-base / build-variant-base unaffected. scripts/smoke-test.sh — new run_expect helper asserting an expected substring in command output. The pi check uses EXPECTED_PI_VERSION; the omos check uses EXPECTED_OMOS_VERSION against npm ls -g. Both env vars are wired from resolve-versions outputs in the smoke jobs. Catches this regression class on the next release, not four releases later. Dockerfile.variant — comment blocks above OPENCODE_VERSION (source- pinned, not subject to the bug), PI_VERSION (CI-resolved), and OMOS_VERSION (CI-resolved) explaining the cache-hit footgun. AGENTS.md — new convention bullet under 'Critical conventions' naming the resolve-versions job + EXPECTED_*_VERSION wiring as the contract to keep in lockstep when modifying variant build-args. .gitea/README.md — Step 1 expanded to cover the parallel resolve- versions job alongside base-decide; pipeline diagram updated. CHANGELOG.md — Unreleased entry describing the fix, masking mechanism, and audit footprint. No image-content change expected on the next release vs what 'latest' would have resolved to anyway. Purely makes the cache invalidate correctly going forward. |
||
|
|
b6e4d89a2c |
ci: filter __pycache__ and macOS metadata from base hash compute
Validate / docs-check (push) Successful in 14s
Validate / base-change-warning (push) Successful in 18s
Validate / validate-omos (push) Successful in 4m34s
Validate / validate-omos-with-pi (push) Successful in 4m57s
Validate / validate-with-pi (push) Successful in 6m9s
Validate / validate-base (push) Successful in 14m48s
Defensive against local-vs-CI hash divergence. `find rootfs -type f` includes gitignored junk like rootfs/__pycache__/*.pyc and macOS .DS_Store/._AppleDouble files, which CI's clean checkout never sees. This bit us during v1.15.4 debugging when a stale generate-config.cpython-314.pyc on the local rootfs/ produced base-3605aa6b6ab1 while CI computed base-35ee5fe7861a. Took meaningful time to track down because git status doesn't surface gitignored files. Verified: same filter applied to current clean tree still produces 35ee5fe7861a (the published v1.15.4b base digest). |
||
|
|
8f2c9f5112 |
v1.15.4b: omos-with-pi threshold bump + update-description partial-publish fix
Validate / docs-check (push) Successful in 7s
Validate / base-change-warning (push) Successful in 20s
Validate / validate-base (push) Successful in 3m36s
Publish Docker Image / base-decide (push) Successful in 13s
Publish Docker Image / build-base (push) Has been skipped
Validate / validate-with-pi (push) Successful in 4m14s
Validate / validate-omos (push) Successful in 7m1s
Publish Docker Image / smoke-base (push) Successful in 3m37s
Publish Docker Image / smoke-omos (push) Successful in 4m39s
Publish Docker Image / smoke-omos-with-pi (push) Successful in 5m7s
Publish Docker Image / smoke-with-pi (push) Successful in 6m24s
Validate / validate-omos-with-pi (push) Successful in 15m59s
Publish Docker Image / build-variant-base (push) Successful in 14m12s
Publish Docker Image / build-variant-omos (push) Successful in 19m29s
Publish Docker Image / build-variant-with-pi (push) Successful in 23m7s
Publish Docker Image / build-variant-omos-with-pi (push) Successful in 26m16s
Publish Docker Image / promote-base-latest (push) Has been skipped
Publish Docker Image / update-description (push) Successful in 8s
Recovery for v1.15.4's partial publish (omos-with-pi exceeded 3500 MB smoke threshold; other 3 variants published cleanly). Two changes: 1. omos-with-pi threshold 3500 -> 3700 MB. Compounded growth from opencode 1.15.0 -> 1.15.4 (4 patch versions) plus pi 0.74.0 -> 0.75.3 (minor + 3 patches) summed in the omos-with-pi variant, just over the existing limit. Same pattern as prior threshold bumps (v1.14.31c, v1.15.0b). Restores ~150 MB headroom for routine apt-upgrade drift. 2. update-description workflow bug fix. Pre-existing latent bug exposed by v1.15.4's partial publish: update-description.needs includes all 4 build-variant-* jobs, and gitea Actions' default behavior is 'skipped need => skip dependent' \u2014 even when the job's own if: condition is satisfied. So when build-variant-omos-with-pi was skipped (because its smoke failed), update-description cascaded into a skip too, and Hub description didn't refresh on v1.15.4 despite 3 variants publishing. Fix: wrap if: in always() + explicit success check on the base variant. Same fix applied to promote-base-latest preemptively (it has the same latent bug, currently masked by the cache-hit gate). No image-side changes \u2014 cache hit on base-35ee5fe7861a. |
||
|
|
18b9c9c549 |
CI: harden promote-base-latest (pinned crane + skip on cache-hit)
Validate / docs-check (push) Successful in 10s
Validate / base-change-warning (push) Successful in 16s
Validate / validate-with-pi (push) Successful in 4m10s
Validate / validate-omos (push) Successful in 4m34s
Validate / validate-base (push) Has been cancelled
Validate / validate-omos-with-pi (push) Has been cancelled
Two workflow-only changes for promote-base-latest, no image-side impact: T14 \u2014 replace imjasonh/setup-crane@v0.4 with direct pinned crane install. The action's bootstrap script calls api.github.com/.../releases/latest at every run to discover the crane version. That call periodically rate-limits and returns JSON without .tag_name, jq emits 'null', the action then downloads .../releases/download/null/... \u2192 404 \u2192 'gzip: unexpected end of file' \u2192 exit 2. We hit this on the v1.15.3 release (2026-05-16) where it was cosmetic only \u2014 base-latest was already correct from cache hit \u2014 but the red-X is annoying. Replaced with curl + tar pinned to crane v0.21.6 (latest at time of change). Same pattern as other GitHub-sourced binaries in the Dockerfile layer (gosu, fzf, eza etc.); operator bumps CRANE_VERSION deliberately when wanting updates. T15 \u2014 gate promote-base-latest on need_build == 'true'. When the base layer's content hash hasn't changed (cache hit on existing base-<hash> from a prior run), base-latest already points at the correct digest. The retag is a tautology, and any transient failure of it produces a red-X for an operation that didn't need to happen. Skipping the job entirely on cache-hit is correct and removes a whole class of cosmetic failure. Manual workflow_dispatch with promote_latest=true still bypasses the gate as an escape hatch (e.g., if base-latest got hand-deleted and needs regeneration without rebuilding the base). This will not trigger a CI publish run (main-branch commit, no tag). |
||
|
|
034830710c |
workflow: use github.ref_type directly in promote/update-description if-conditions
Validate / docs-check (push) Successful in 8s
Validate / base-change-warning (push) Successful in 10s
Validate / validate-with-pi (push) Successful in 4m23s
Validate / validate-omos-with-pi (push) Successful in 5m10s
Validate / validate-omos (push) Successful in 7m5s
Validate / validate-base (push) Successful in 10m5s
Gitea Actions evaluates 'env.PROMOTE_LATEST' as empty in YAML 'if:' contexts even though the same env var substitutes correctly in shell run: blocks. Result: on v1.15.0/v1.15.0b tag pushes, the build-variant-* jobs correctly pushed latest-* aliases (shell context), but promote-base-latest and update-description got skipped (YAML context), so the Hub README description wasn't refreshed. Switch to evaluating github.ref_type directly in the if-conditions — matches the production-trigger semantics and avoids the env-var indirection that gitea evaluates inconsistently. |
||
|
|
dba05da7d1 |
validate.yml: use Hub base-latest as variant parent + warn on base-input changes
Validate / docs-check (push) Successful in 9s
Validate / base-change-warning (push) Successful in 11s
Validate / validate-base (push) Failing after 21s
Validate / validate-omos (push) Failing after 1m49s
Validate / validate-with-pi (push) Failing after 1m46s
Validate / validate-omos-with-pi (push) Failing after 13m9s
The previous two-step approach (build Dockerfile.base \ then Dockerfile.variant FROM the local image) doesn't work: each docker/build-push-action@v7 invocation runs in its own buildx container context, and an image loaded into the host docker daemon by step N is not visible to step N+1's buildx invocation. Variant builds in validate.yml now FROM joakimp/opencode-devbox:base-latest on Docker Hub, matching the production smokes' parent. Trade-off: PRs/pushes that change Dockerfile.base, rootfs/, or entrypoint*.sh are not exercised here \u2014 only release tags rebuild the base via docker-publish-split.yml. The new base-change-warning job surfaces a runtime warning when a commit modifies any base-image input, telling the author to run a workflow_dispatch test if they want full validation before merging. |
||
|
|
a438c67f06 |
fix: update validate.yml for split-base Dockerfiles
Validate / validate-omos (push) Failing after 20s
Validate / docs-check (push) Successful in 22s
Validate / validate-with-pi (push) Failing after 3m8s
Validate / validate-omos-with-pi (push) Failing after 3m6s
Validate / validate-base (push) Failing after 14m57s
Replace single-Dockerfile build with two-step: build Dockerfile.base first (loads as opencode-devbox:validate-base), then build Dockerfile.variant with BASE_IMAGE pointing at the local base image. All four validate jobs updated. |
||
|
|
07e07ec611 |
Bump opencode 1.14.44 -> 1.14.50; cut over to split-base pipeline
Validate / validate-omos-with-pi (push) Waiting to run
Validate / docs-check (push) Successful in 1m7s
Validate / validate-with-pi (push) Failing after 3m16s
Validate / validate-omos (push) Failing after 3m15s
Validate / validate-base (push) Failing after 6m31s
Publish Docker Image / base-decide (push) Failing after 11m59s
Publish Docker Image / build-base (push) Has been cancelled
Publish Docker Image / smoke-base (push) Has been cancelled
Publish Docker Image / smoke-omos (push) Has been cancelled
Publish Docker Image / smoke-with-pi (push) Has been cancelled
Publish Docker Image / smoke-omos-with-pi (push) Has been cancelled
Publish Docker Image / build-variant-base (push) Has been cancelled
Publish Docker Image / build-variant-omos (push) Has been cancelled
Publish Docker Image / build-variant-with-pi (push) Has been cancelled
Publish Docker Image / build-variant-omos-with-pi (push) Has been cancelled
Publish Docker Image / promote-base-latest (push) Has been cancelled
Publish Docker Image / update-description (push) Has been cancelled
- Bump OPENCODE_VERSION 1.14.44 -> 1.14.50 in Dockerfile.variant - Cut over: docker-publish-split.yml now triggers on push: tags: v* (was workflow_dispatch only). RELEASE_TAG and PROMOTE_LATEST derived from github.ref_type/ref_name for tag-push; inputs still available for manual workflow_dispatch runs. - Delete docker-publish.yml (retired, replaced by split-base pipeline) - Delete Dockerfile (retired, replaced by Dockerfile.base + Dockerfile.variant) - Update CHANGELOG: promote Unreleased -> v1.14.50 - Update AGENTS.md, .gitea/README.md, validate.yml: remove all references to the old single-Dockerfile pipeline and WIP migration plan |
||
|
|
7dc836ab66 |
fix: replace echo -e heredoc with brace-block in build-variant tags steps
Validate / docs-check (push) Successful in 13s
Validate / validate-base (push) Successful in 12m21s
Validate / validate-omos (push) Successful in 18m38s
Validate / validate-with-pi (push) Successful in 13m23s
Validate / validate-omos-with-pi (push) Successful in 16m34s
echo -e doesn't interpret \n in /bin/sh (dash), which is the default
shell in catthehacker/ubuntu:act-latest. This caused steps.tags.outputs.tags
to be empty, resulting in 'tag is needed when pushing to registry' from buildx.
Also fixes a secondary bug: TAGS='${TAGS}\n...' stored a literal backslash-n
rather than a real newline, which would have broken multi-tag output when
promote_latest=true.
Fix: replace with a brace block using plain echo, which produces actual newlines
and works in both sh and bash.
|
||
|
|
6fde27c212 |
Document the build pipeline architecture in .gitea/README.md
Validate / docs-check (push) Successful in 16s
Validate / validate-base (push) Successful in 12m9s
Validate / validate-omos (push) Successful in 16m45s
Validate / validate-with-pi (push) Successful in 13m30s
Validate / validate-omos-with-pi (push) Successful in 15m15s
The split-base build architecture, the NPM_CONFIG_PREFIX gotcha, the
hash-driven base cache reuse mechanism, and the cutover plan from
docker-publish.yml to docker-publish-split.yml were previously
scattered across:
- inline Dockerfile.base / Dockerfile.variant comments
- CHANGELOG Unreleased entries
- AGENTS.md mentions
- docker-publish-split.yml header comment
- my own session notes
Consolidate into .gitea/README.md as the canonical architectural doc.
Gitea (like GitHub) auto-renders this when navigating to .gitea/ in
the web UI, so anyone investigating 'why is CI shaped this way?'
finds it on the first click. Cross-referenced from AGENTS.md as the
first thing to read when touching CI.
Covers:
- The two release pipelines and why both exist
- Why split-base: cross-variant cache misses on layer-hash-divergence
- The 6 phases of the split-base pipeline with an ASCII diagram
- base-decide hash inputs and Docker Hub probe logic
- NPM_CONFIG_PREFIX variant-override pattern (the volume-shadow trap)
- Registry cache strategy (mode=max for cross-arch reuse)
- Wall-clock estimates: version-bump vs base-touching releases
- Validate workflow role
- Runner expectations: catthehacker image, disk reclaim, concurrency,
Gitea Actions @v4 artifact incompatibility
- 4-step migration plan from docker-publish.yml to .split.yml
- Cross-refs to related docs
Does not duplicate AGENTS.md content; links to it for domain facts and
release-day checklist.
|
||
|
|
4c27e6fd8a |
feat: split-base build pipeline (parallel, manual-trigger only)
Validate / docs-check (push) Successful in 15s
Validate / validate-base (push) Successful in 12m13s
Validate / validate-omos (push) Failing after 15m48s
Validate / validate-with-pi (push) Successful in 13m43s
Validate / validate-omos-with-pi (push) Has been cancelled
Two-Dockerfile split-base build alongside the existing single-Dockerfile
pipeline. Goal: cut CI wall clock from ~165-180min to ~30-40min on
typical version-bump-only releases by reusing a base image across the
four variants.
Files added:
- Dockerfile.base variant-independent layers (apt, locales, AWS
CLI, Node.js, mempalace, gitea-mcp, user setup,
chromadb prewarm, ENVs, entrypoints).
- Dockerfile.variant FROMs ${BASE_IMAGE} and adds opencode / pi /
omos / Go installs gated by INSTALL_* args.
Each npm install -g uses NPM_CONFIG_PREFIX=/usr
per-RUN to keep baked binaries off the volume-
shadowed ~/.pi/npm-global path inherited from
base.
- .gitea/workflows/docker-publish-split.yml
workflow_dispatch-only pipeline:
base-decide -> build-base (conditional) ->
smoke-* (4 parallel) -> build-variant-*
(4 parallel) -> promote-base-latest ->
update-description. Hash-driven base reuse:
if base-<sha> already exists on Docker Hub,
the build is skipped entirely. Inputs:
release_tag (test tag suffix, default
v0.0.0-split-test) and promote_latest
(default false; gates latest-* aliases and
Hub description update).
Files unchanged:
- Dockerfile, docker-publish.yml, validate.yml all left in place so
the production tag-push pipeline keeps working untouched.
Migration plan (in CHANGELOG Unreleased):
1. workflow_dispatch test run with promote_latest=false; verify the
four variant images smoke-pass and have plausible sizes.
2. Compare manifest digests against the same-version output from the
production pipeline (independent test run on the same commit).
3. Once verified across 1-2 release cycles, swap docker-publish-split.yml
to on: push: tags: v* and retire docker-publish.yml.
AGENTS.md and CHANGELOG.md updated with file roles and the migration
plan. Production pipeline behavior is bit-for-bit unchanged on this
branch.
|
||
|
|
f46c4ed017 |
CI matrix: add with-pi and omos-with-pi build variants
Validate / docs-check (push) Successful in 39s
Validate / validate-base (push) Successful in 13m40s
Validate / validate-omos (push) Successful in 19m15s
Validate / validate-with-pi (push) Successful in 13m53s
Validate / validate-omos-with-pi (push) Successful in 18m26s
Publish Docker Image / smoke-base (push) Successful in 12m21s
Publish Docker Image / smoke-with-pi (push) Successful in 14m17s
Publish Docker Image / smoke-omos (push) Successful in 16m55s
Publish Docker Image / smoke-omos-with-pi (push) Successful in 16m22s
Publish Docker Image / build-base (push) Successful in 40m52s
Publish Docker Image / build-with-pi (push) Successful in 47m32s
Publish Docker Image / build-omos (push) Successful in 51m41s
Publish Docker Image / build-omos-with-pi (push) Successful in 56m44s
Publish Docker Image / update-description (push) Successful in 15s
.gitea/workflows/validate.yml:
Adds validate-with-pi (INSTALL_PI=true) and validate-omos-with-pi
(INSTALL_OMOS=true + INSTALL_PI=true). amd64 single-arch with smoke
test, no push.
.gitea/workflows/docker-publish.yml:
Adds smoke-with-pi → build-with-pi and smoke-omos-with-pi →
build-omos-with-pi job pairs. Each push-by-digest multi-arch
(amd64+arm64) to Docker Hub with two tags:
${VERSION}-with-pi + latest-with-pi
${VERSION}-omos-with-pi + latest-omos-with-pi
update-description.needs[] extended to wait on both new build jobs.
scripts/smoke-test.sh:
bun-presence check now treats omos and omos-with-pi as the bun
variants. Pi state assertions wait up to 30s for entrypoint-user.sh
to finish deploying pi-toolkit + extensions (omos-with-pi has more
setup work than the base+pi path; the previous sleep-1 was too short
and caused empty-error assertion failures on cold starts).
Local verification (arm64 via OrbStack):
base → 1871 MB, all checks PASS
omos → 2813 MB, all checks PASS
with-pi → 2277 MB, all checks PASS
omos-with-pi → 3030 MB, all checks PASS
CI now produces 8 Docker Hub tags per release:
vX.Y.Z[n], latest
vX.Y.Z[n]-omos, latest-omos
vX.Y.Z[n]-with-pi, latest-with-pi
vX.Y.Z[n]-omos-with-pi, latest-omos-with-pi
|
||
|
|
fc74a8f906 |
Collapse per-arch matrix back into single multi-arch push jobs
Validate / docs-check (push) Successful in 17s
Validate / validate-omos (push) Successful in 14m21s
Validate / validate-base (push) Successful in 14m50s
Publish Docker Image / smoke-base (push) Successful in 11m12s
Publish Docker Image / smoke-omos (push) Successful in 22m0s
Publish Docker Image / build-base (push) Successful in 42m25s
Publish Docker Image / build-omos (push) Failing after 1h16m24s
Publish Docker Image / update-description (push) Has been cancelled
v1.14.31c's matrix jobs failed on Upload digest with GHESNotSupportedError — Gitea Actions doesn't support actions/upload-artifact@v4+. Separately, build-omos arm64 hung silently for 12 min in Set-up job, likely catthehacker pull contention between concurrent matrix children. Rather than downgrade artifacts to @v3, collapse the matrix entirely. docker/build-push-action@v7 with platforms: linux/amd64,linux/arm64 publishes a proper multi-arch manifest in one job, so the artifact-passing and imagetools create merge dance only existed to support a matrix split we no longer need. The matrix was designed around load: true disk exhaustion (v1.14.30b), but push-by-digest streams straight to the registry with fundamentally different disk profile. Reclaim step gives enough headroom for the combined amd64+arm64 push case. Workflow: 7 jobs → 5. docker-publish.yml: 263 → ~110 lines of YAML. Also: - timeout-minutes: 90 on build jobs so hung builds fail explicitly - BUILDKIT_PROGRESS=plain at workflow level for line-by-line arm64 logs - AGENTS.md §CI quirks documents the Gitea-specific traps (upload-artifact@v3-only, dash-not-bash, build-push-action@v7 multi-arch convention, reclaim requirement) |
||
|
|
5a2d06340e |
Fix dash-incompatible slash substitution and bump omos size threshold
Validate / docs-check (push) Successful in 18s
Validate / validate-base (push) Successful in 15m44s
Validate / validate-omos (push) Successful in 15m21s
Publish Docker Image / smoke-base (push) Successful in 14m30s
Publish Docker Image / smoke-omos (push) Successful in 15m51s
Publish Docker Image / build-base (linux/amd64) (push) Failing after 10m58s
Publish Docker Image / build-omos (linux/amd64) (push) Failing after 15m9s
Publish Docker Image / build-omos (linux/arm64) (push) Failing after 11m57s
Publish Docker Image / build-base (linux/arm64) (push) Failing after 39m30s
Publish Docker Image / merge-base (push) Has been skipped
Publish Docker Image / merge-omos (push) Has been skipped
Publish Docker Image / update-description (push) Has been skipped
v1.14.31b made it through smoke-base and validate-base (reclaim worked),
but two narrow bugs blocked the rest:
1. 'Derive platform slug' in the per-arch matrix jobs used bash
${PLATFORM_PAIR//\//-} which dash (/bin/sh in the runner) can't
parse — 'Bad substitution'. Rewrote with 'tr / -'.
2. smoke-omos image size 3107 MB tripped the 3000 MB guardrail. All
functional checks pass; the mempalace-toolkit bake-in from v1.14.30b
added ~100 MB and the threshold was stale. Bumped to 3200 MB.
No image-level changes.
|
||
|
|
23894bc19f |
Reclaim runner disk before load: true smoke builds
Validate / docs-check (push) Successful in 22s
Validate / validate-base (push) Successful in 18m10s
Validate / validate-omos (push) Failing after 25m54s
Publish Docker Image / smoke-base (push) Successful in 11m50s
Publish Docker Image / build-base (linux/amd64) (push) Failing after 38s
Publish Docker Image / build-base (linux/arm64) (push) Failing after 21s
Publish Docker Image / merge-base (push) Has been skipped
Publish Docker Image / smoke-omos (push) Failing after 19m18s
Publish Docker Image / build-omos (linux/amd64) (push) Has been skipped
Publish Docker Image / build-omos (linux/arm64) (push) Has been skipped
Publish Docker Image / merge-omos (push) Has been skipped
Publish Docker Image / update-description (push) Has been skipped
v1.14.31 publish and validate both hit 'No space left on device' on single-arch amd64 smoke/validate builds. The image has crossed ~3 GB and the runner's ~40 GB overlay starts ~70% full, so 'load: true' peak disk (tarball + unpacked image + buildx cache) no longer fits. Add a 'Reclaim runner disk' step to validate-base, validate-omos, smoke-base, smoke-omos. Strips catthehacker-resident toolchains we never use (hosted-tool-cache, dotnet, android, powershell, swift, ghc, jvm, microsoft, chromium, boost), then runs 'docker system prune -af --volumes' + 'docker builder prune -af' against the runner's dockerd before setup-buildx-action. Expected reclaim is 6-12 GB depending on what's resident. Deliberately NOT in the per-arch matrix build jobs — push-by-digest doesn't need it and pruning in parallel jobs risks one job nuking another's in-flight buildx cache. Also add workflow-level concurrency on docker-publish.yml so concurrent tag pushes serialize cleanly. |
||
|
|
f0918ba915 |
Bump opencode to 1.14.31 and split multi-arch publish across runners
Validate / docs-check (push) Successful in 26s
Publish Docker Image / smoke-base (push) Failing after 11m1s
Publish Docker Image / build-base (linux/amd64) (push) Has been skipped
Publish Docker Image / build-base (linux/arm64) (push) Has been skipped
Publish Docker Image / merge-base (push) Has been skipped
Validate / validate-base (push) Failing after 13m48s
Validate / validate-omos (push) Failing after 15m23s
Publish Docker Image / smoke-omos (push) Failing after 16m20s
Publish Docker Image / build-omos (linux/amd64) (push) Has been skipped
Publish Docker Image / build-omos (linux/arm64) (push) Has been skipped
Publish Docker Image / merge-omos (push) Has been skipped
Publish Docker Image / update-description (push) Has been skipped
The v1.14.30b publish failed on both variants with 'No space left on device' — arm64 QEMU-emulated layers were stored alongside amd64 on the same ~40 GB runner, and the mempalace-toolkit bake-in from v1.14.30b tipped peak disk over the edge during the nodejs dpkg unpack and the git-lfs layer export. Refactor docker-publish.yml to the canonical push-by-digest + manifest-merge pattern: smoke test (amd64) runs on its own runner, each (variant x arch) push target runs on its own fresh runner with outputs=type=image,push-by-digest=true,push=true (no local image store), then a tiny merge job assembles the multi-arch manifest with docker buildx imagetools create from digest artifacts. Per-runner disk peak is roughly one-quarter of the old single-job peak. The four Docker Hub tags per release are unchanged. As a bonus, amd64 and arm64 now build in parallel. No image-level changes beyond the opencode bump. |
||
|
|
113c9f0bb0 |
Infrastructure pass: CI smoke tests, floating versions, chown sentinel, generate-config script
Main changes: - Extract opencode.json generation from entrypoint-user.sh into a standalone Python script (rootfs/usr/local/lib/opencode-devbox/ generate-config.py). Preserves the never-overwrite-existing-config guarantee. Cuts entrypoint-user.sh from 176 to 97 lines. - Install MemPalace via 'uv tool install' into an isolated venv at /opt/uv-tools/mempalace/ with a /usr/local/bin/mempalace-mcp-server wrapper, replacing the 'pip install --break-system-packages' escape hatch. The wrapper is what generate-config.py references in the auto-generated opencode.json. Also fix 'mempalace init' in entrypoint-user.sh to use --yes so first-start initialization isn't interactive (this used to hang or print prompts into the user's terminal). Gated by INSTALL_MEMPALACE build arg (default true) so users who don't need AI memory can shave ~300 MB. - Sentinel-file pattern in entrypoint.sh volume-ownership loop: write .devbox-owner after a successful chown -R, skip the recursive walk on subsequent starts when the sentinel matches FINAL_UID:FINAL_GID. Cuts multi-second startup costs to milliseconds on large volumes (nvim plugins, palace data). UID changes still trigger a full chown. - Float all GitHub/Gitea-hosted binary versions: gosu, fzf, git-lfs, neovim, bat, eza, zoxide, uv, gitea-mcp now default to 'latest' and resolve the newest upstream release at build time via the /releases/ latest redirect. Go (go.dev JSON feed) and oh-my-opencode-slim (npm @latest) likewise. Intentional pins still in place: OPENCODE_VERSION, NODE_VERSION=22, DEBIAN_VERSION=trixie-slim. Each *_VERSION ARG accepts an explicit value to lock a specific version when needed. - New scripts/smoke-test.sh verifies binary presence, opencode startup, entrypoint user drop, generate-config idempotency, bun's presence- per-variant, and image size against thresholds (2500 MB base, 3000 MB OMOS). Prints resolved component versions as its first step so CI logs always record what got baked into a given image. - New .gitea/workflows/validate.yml runs on push to main and PRs: single-arch amd64 build, smoke test, DOCKER_HUB.md sync check. Tag- triggered docker-publish.yml now smoke-tests each variant on amd64 before the full multi-arch push. - scripts/generate-dockerhub-md.py auto-generates DOCKER_HUB.md from README.md using explicit SECTION_RULES. --check mode fails CI when the committed file is out of sync. Enforces the 25 kB Docker Hub limit. Adding a new README section forces an explicit keep/drop/ replace decision. - Remove dead INSTALL_PYTHON build arg (was a no-op since mempalace added python3 unconditionally). |
||
|
|
de659fbc54 |
Switch to new Docker Hub /v2/auth/token API for description updates
The old /v2/users/login endpoint is deprecated and returns tokens with insufficient permissions. Use /v2/auth/token with Bearer auth instead. |
||
|
|
d651a084de | Fix Docker Hub short description: trim to 100-byte limit | ||
|
|
18b4df23e5 | Fix IPv6 connectivity failures: force IPv4 preference in CI builds | ||
|
|
017f7f1343 | Fix Docker Hub description update: use --rawfile and capture error response | ||
|
|
56f98da914 | Add error handling to Docker Hub description update step | ||
|
|
078c095116 | Parallelize base and omos image builds into separate CI jobs | ||
|
|
c0b887791f | Add Docker Hub description update step to CI workflow | ||
|
|
4314a3fb88 |
Fix CI: use vars for username, secrets for token
Publish Docker Image / build-and-push (push) Successful in 28m54s
|
||
|
|
23a2c3da8c |
Fix CI: remove duplicate Docker socket mount
Publish Docker Image / build-and-push (push) Failing after 11m17s
|
||
|
|
489fa5c60f |
Fix CI: use container image with Docker CLI and mount Docker socket
Publish Docker Image / build-and-push (push) Failing after 1m7s
|
||
|
|
713a1a6f97 |
Add Gitea Actions workflow for Docker Hub publishing and Docker Hub usage docs
Publish Docker Image / build-and-push (push) Failing after 27s
|