Collapse per-arch matrix back into single multi-arch push jobs
Validate / docs-check (push) Successful in 17s
Validate / validate-omos (push) Successful in 14m21s
Validate / validate-base (push) Successful in 14m50s
Publish Docker Image / smoke-base (push) Successful in 11m12s
Publish Docker Image / smoke-omos (push) Successful in 22m0s
Publish Docker Image / build-base (push) Successful in 42m25s
Publish Docker Image / build-omos (push) Failing after 1h16m24s
Publish Docker Image / update-description (push) Has been cancelled

v1.14.31c's matrix jobs failed on Upload digest with GHESNotSupportedError
— Gitea Actions doesn't support actions/upload-artifact@v4+.
Separately, build-omos arm64 hung silently for 12 min in Set-up job,
likely catthehacker pull contention between concurrent matrix children.

Rather than downgrade artifacts to @v3, collapse the matrix entirely.
docker/build-push-action@v7 with platforms: linux/amd64,linux/arm64
publishes a proper multi-arch manifest in one job, so the
artifact-passing and imagetools create merge dance only existed to
support a matrix split we no longer need.

The matrix was designed around load: true disk exhaustion (v1.14.30b),
but push-by-digest streams straight to the registry with fundamentally
different disk profile. Reclaim step gives enough headroom for the
combined amd64+arm64 push case.

Workflow: 7 jobs → 5. docker-publish.yml: 263 → ~110 lines of YAML.

Also:
- timeout-minutes: 90 on build jobs so hung builds fail explicitly
- BUILDKIT_PROGRESS=plain at workflow level for line-by-line arm64 logs
- AGENTS.md §CI quirks documents the Gitea-specific traps
  (upload-artifact@v3-only, dash-not-bash, build-push-action@v7
  multi-arch convention, reclaim requirement)
This commit is contained in:
Joakim Persson
2026-05-01 12:28:34 +00:00
parent 5a2d06340e
commit fc74a8f906
3 changed files with 124 additions and 159 deletions
+106 -159
View File
@@ -6,7 +6,7 @@ on:
- 'v*'
# Serialize concurrent runs of the same workflow on the same ref so the
# matrix build jobs can't race `docker system prune` in the smoke gates
# build jobs can't race `docker system prune` in the smoke gates
# (pruning from one job can nuke another job's in-flight buildx cache).
# cancel-in-progress: false — tag pushes are release events, we never
# want to silently drop one.
@@ -14,16 +14,43 @@ concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: false
# Plain progress output from BuildKit — critical for diagnosing stalls
# inside arm64-under-QEMU builds where the default collapsed progress UI
# hides which step is stuck.
env:
BUILDKIT_PROGRESS: plain
# Runner disk pressure notes:
# Gitea Actions runners use `catthehacker/ubuntu:act-latest` on a shared host
# with limited overlay space (~40 GB, often 70%+ used at start). Building both
# architectures of both variants on a single runner exhausted disk around the
# nodejs dpkg unpack / git-lfs layer export. To fix this:
# * smoke test (amd64 only, load into daemon) runs on its own runner
# * each push target (variant × arch) runs on its own runner, pushes by
# digest (no local image store), uploads digest as an artifact
# * a merge job composes the multi-arch manifest with `imagetools create`
# Per-runner disk pressure is now one-quarter of the old single-job peak.
# with limited overlay space (~40 GB, often 70%+ used at start). Two jobs
# per variant:
# * smoke gate (amd64 only, `load: true` into local dockerd for smoke
# testing) — peak disk = tarball + unpacked image + buildx cache. The
# `Reclaim runner disk` step below strips catthehacker-resident
# toolchains and prunes stale docker state before buildx starts.
# * build job (amd64 + arm64, `push-by-digest` streaming directly to
# Docker Hub, no local unpack). Peak disk on push-by-digest is
# BuildKit's content store only — much smaller than `load: true`.
# `docker/build-push-action@v7` with comma-separated platforms
# publishes a proper multi-arch manifest in one step.
#
# Why not matrix + digest artifacts?
# An earlier revision split each arch into its own matrix job and used
# `actions/upload-artifact` to pass digests to a merge job. On Gitea
# Actions, `actions/{upload,download}-artifact@v4+` fails with
# `GHESNotSupportedError` — v4 relies on a GitHub-specific Artifact
# API that Gitea doesn't implement. Rather than downgrade to @v3 (the
# last Gitea-compatible release) we collapsed back to single-job
# multi-arch push. The matrix only helps when the build literally
# cannot fit on one runner, which push-by-digest + reclaim no longer
# hits for this image.
#
# Gitea Actions gotchas baked into this file:
# * `actions/{upload,download}-artifact` must stay at @v3 on Gitea.
# * Step scripts run under /bin/sh (dash) — no bash-isms like
# ${VAR//a/b}. Use `tr` or explicit `shell: bash`.
# * `docker/build-push-action@v7` with `platforms: a,b` works for
# multi-arch push natively; no matrix/merge dance needed.
jobs:
# ── Smoke test (amd64 only, gates the push jobs) ────────────────────
@@ -137,18 +164,13 @@ jobs:
- name: Smoke test (amd64)
run: bash scripts/smoke-test.sh opencode-devbox:smoke-omos --variant omos
# ── Per-arch push (by digest, no local image) ──────────────────────
# ── Multi-arch push (single job per variant, comma-separated platforms)
build-base:
runs-on: ubuntu-latest
needs: smoke-base
timeout-minutes: 90
container:
image: catthehacker/ubuntu:act-latest
strategy:
fail-fast: false
matrix:
platform:
- linux/amd64
- linux/arm64
steps:
- name: Checkout
uses: actions/checkout@v4
@@ -156,16 +178,35 @@ jobs:
- name: Force IPv4 for Docker Hub
run: echo 'precedence ::ffff:0:0/96 100' >> /etc/gai.conf
- name: Derive platform slug
id: platform
# Lighter reclaim than the smoke-gate version: push-by-digest
# doesn't write to host dockerd, so `docker system prune` adds
# little. BuildKit cache from prior runs is the thing to clear.
- name: Reclaim runner disk
run: |
# POSIX-safe slash substitution — act's runner container ships
# /bin/sh as dash, which doesn't support bash's ${VAR//a/b}.
echo "pair=$(echo '${{ matrix.platform }}' | tr / -)" >> $GITHUB_OUTPUT
set -x
df -h / || true
rm -rf \
/opt/hostedtoolcache \
/opt/microsoft \
/opt/az \
/opt/ghc \
/usr/local/.ghcup \
/usr/share/dotnet \
/usr/share/swift \
/usr/local/lib/android \
/usr/local/share/powershell \
/usr/local/share/chromium \
/usr/local/share/boost \
/usr/lib/jvm 2>/dev/null || true
apt-get clean || true
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* || true
docker builder prune -af || true
df -h / || true
- name: Set up QEMU
if: matrix.platform != 'linux/amd64'
uses: docker/setup-qemu-action@v4
with:
platforms: arm64
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v4
@@ -178,39 +219,26 @@ jobs:
username: ${{ vars.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Build and push by digest
id: build
- name: Extract version from tag
id: version
run: echo "version=${GITHUB_REF#refs/tags/}" >> $GITHUB_OUTPUT
- name: Build and push (multi-arch)
uses: docker/build-push-action@v7
with:
context: .
platforms: ${{ matrix.platform }}
outputs: type=image,name=${{ vars.DOCKERHUB_USERNAME }}/opencode-devbox,push-by-digest=true,name-canonical=true,push=true
- name: Export digest
run: |
mkdir -p /tmp/digests
digest="${{ steps.build.outputs.digest }}"
touch "/tmp/digests/${digest#sha256:}"
- name: Upload digest
uses: actions/upload-artifact@v4
with:
name: digests-base-${{ steps.platform.outputs.pair }}
path: /tmp/digests/*
if-no-files-found: error
retention-days: 1
platforms: linux/amd64,linux/arm64
push: true
tags: |
${{ vars.DOCKERHUB_USERNAME }}/opencode-devbox:${{ steps.version.outputs.version }}
${{ vars.DOCKERHUB_USERNAME }}/opencode-devbox:latest
build-omos:
runs-on: ubuntu-latest
needs: smoke-omos
timeout-minutes: 90
container:
image: catthehacker/ubuntu:act-latest
strategy:
fail-fast: false
matrix:
platform:
- linux/amd64
- linux/arm64
steps:
- name: Checkout
uses: actions/checkout@v4
@@ -218,16 +246,32 @@ jobs:
- name: Force IPv4 for Docker Hub
run: echo 'precedence ::ffff:0:0/96 100' >> /etc/gai.conf
- name: Derive platform slug
id: platform
- name: Reclaim runner disk
run: |
# POSIX-safe slash substitution — act's runner container ships
# /bin/sh as dash, which doesn't support bash's ${VAR//a/b}.
echo "pair=$(echo '${{ matrix.platform }}' | tr / -)" >> $GITHUB_OUTPUT
set -x
df -h / || true
rm -rf \
/opt/hostedtoolcache \
/opt/microsoft \
/opt/az \
/opt/ghc \
/usr/local/.ghcup \
/usr/share/dotnet \
/usr/share/swift \
/usr/local/lib/android \
/usr/local/share/powershell \
/usr/local/share/chromium \
/usr/local/share/boost \
/usr/lib/jvm 2>/dev/null || true
apt-get clean || true
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* || true
docker builder prune -af || true
df -h / || true
- name: Set up QEMU
if: matrix.platform != 'linux/amd64'
uses: docker/setup-qemu-action@v4
with:
platforms: arm64
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v4
@@ -240,122 +284,25 @@ jobs:
username: ${{ vars.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Build and push by digest
id: build
- name: Extract version from tag
id: version
run: echo "version=${GITHUB_REF#refs/tags/}" >> $GITHUB_OUTPUT
- name: Build and push (multi-arch)
uses: docker/build-push-action@v7
with:
context: .
platforms: ${{ matrix.platform }}
platforms: linux/amd64,linux/arm64
push: true
build-args: |
INSTALL_OMOS=true
outputs: type=image,name=${{ vars.DOCKERHUB_USERNAME }}/opencode-devbox,push-by-digest=true,name-canonical=true,push=true
- name: Export digest
run: |
mkdir -p /tmp/digests
digest="${{ steps.build.outputs.digest }}"
touch "/tmp/digests/${digest#sha256:}"
- name: Upload digest
uses: actions/upload-artifact@v4
with:
name: digests-omos-${{ steps.platform.outputs.pair }}
path: /tmp/digests/*
if-no-files-found: error
retention-days: 1
# ── Merge per-arch digests into multi-arch tags ─────────────────────
merge-base:
runs-on: ubuntu-latest
needs: build-base
container:
image: catthehacker/ubuntu:act-latest
steps:
- name: Force IPv4 for Docker Hub
run: echo 'precedence ::ffff:0:0/96 100' >> /etc/gai.conf
- name: Download digests
uses: actions/download-artifact@v4
with:
path: /tmp/digests
pattern: digests-base-*
merge-multiple: true
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v4
with:
driver-opts: network=host
- name: Login to Docker Hub
uses: docker/login-action@v4
with:
username: ${{ vars.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Extract version from tag
id: version
run: echo "version=${GITHUB_REF#refs/tags/}" >> $GITHUB_OUTPUT
- name: Create manifest list and push
working-directory: /tmp/digests
run: |
docker buildx imagetools create \
-t ${{ vars.DOCKERHUB_USERNAME }}/opencode-devbox:${{ steps.version.outputs.version }} \
-t ${{ vars.DOCKERHUB_USERNAME }}/opencode-devbox:latest \
$(printf '${{ vars.DOCKERHUB_USERNAME }}/opencode-devbox@sha256:%s ' *)
- name: Inspect image
run: |
docker buildx imagetools inspect \
${{ vars.DOCKERHUB_USERNAME }}/opencode-devbox:${{ steps.version.outputs.version }}
merge-omos:
runs-on: ubuntu-latest
needs: build-omos
container:
image: catthehacker/ubuntu:act-latest
steps:
- name: Force IPv4 for Docker Hub
run: echo 'precedence ::ffff:0:0/96 100' >> /etc/gai.conf
- name: Download digests
uses: actions/download-artifact@v4
with:
path: /tmp/digests
pattern: digests-omos-*
merge-multiple: true
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v4
with:
driver-opts: network=host
- name: Login to Docker Hub
uses: docker/login-action@v4
with:
username: ${{ vars.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Extract version from tag
id: version
run: echo "version=${GITHUB_REF#refs/tags/}" >> $GITHUB_OUTPUT
- name: Create manifest list and push
working-directory: /tmp/digests
run: |
docker buildx imagetools create \
-t ${{ vars.DOCKERHUB_USERNAME }}/opencode-devbox:${{ steps.version.outputs.version }}-omos \
-t ${{ vars.DOCKERHUB_USERNAME }}/opencode-devbox:latest-omos \
$(printf '${{ vars.DOCKERHUB_USERNAME }}/opencode-devbox@sha256:%s ' *)
- name: Inspect image
run: |
docker buildx imagetools inspect \
tags: |
${{ vars.DOCKERHUB_USERNAME }}/opencode-devbox:${{ steps.version.outputs.version }}-omos
${{ vars.DOCKERHUB_USERNAME }}/opencode-devbox:latest-omos
update-description:
runs-on: ubuntu-latest
needs: [merge-base, merge-omos]
needs: [build-base, build-omos]
container:
image: catthehacker/ubuntu:act-latest
steps:
+5
View File
@@ -47,6 +47,11 @@ When bumping the opencode version, also bump `OPENCODE_VERSION` in `Dockerfile`
- `update-description` job runs only when both builds succeed (`needs: [build-base, build-omos]`).
- Tags must be pushed to trigger the publish workflow. The validate workflow runs on push to main and PRs.
- Smoke tests run on amd64 only (single-arch load into the local daemon). The multi-arch push happens after smoke passes.
- **Gitea Actions runner has ~40 GB disk, often 70%+ used at job start.** All four `load: true` jobs (`validate-base`, `validate-omos`, `smoke-base`, `smoke-omos`) include a `Reclaim runner disk` step that strips catthehacker-resident toolchains and prunes stale docker state before `setup-buildx-action`. Build jobs use a lighter version (push-by-digest doesn't need `docker system prune`). Don't remove these steps without testing on a fresh runner.
- **`docker/build-push-action@v7` with `platforms: linux/amd64,linux/arm64` handles multi-arch push natively in a single job** — produces a proper manifest list, no matrix or merge step needed. An earlier revision split into per-arch matrix jobs with digest artifacts, but that pattern requires `actions/{upload,download}-artifact@v4+` which Gitea Actions doesn't support (see below).
- **`actions/upload-artifact` and `actions/download-artifact` must stay at @v3 on Gitea.** v4+ uses a GitHub-Enterprise-specific Artifact API; runs fail with `GHESNotSupportedError`. If you need artifacts for a new reason (build logs, SBOMs, etc.), pin @v3 explicitly.
- **Step scripts run under `/bin/sh` (dash), not bash.** Avoid bash-isms like `${VAR//a/b}` parameter-pattern substitution; use POSIX alternatives (`tr`, `sed`) or declare `shell: bash` on the step.
- **`BUILDKIT_PROGRESS=plain`** is set at workflow level on `docker-publish.yml` so arm64-under-QEMU builds log each layer line-by-line. The default collapsed progress UI hides which step is stalled, which made diagnosing earlier hangs expensive.
## Testing changes
+13
View File
@@ -6,6 +6,19 @@ Tags follow `v{opencode_version}[letter]` — bare tag for the first build on a
---
## v1.14.31d — 2026-05-01
**CI: collapse per-arch matrix back into single multi-arch push jobs.**
- **Fix:** `v1.14.31c`'s per-arch matrix build jobs failed on `Upload digest` with `GHESNotSupportedError: @actions/artifact v2.0.0+, upload-artifact@v4+ and download-artifact@v4+ are not currently supported on GHES`. Gitea Actions only implements the v3-compatible artifact API; `@v4` uses a GitHub-Enterprise-specific backend. Separately, `build-omos linux/arm64` hung silently for 12 minutes in "Set-up job" and then failed with no log output — likely catthehacker image-pull contention between concurrent matrix children on the same runner host.
- Rather than downgrade to `actions/{upload,download}-artifact@v3`, collapsed the per-arch matrix entirely. `docker/build-push-action@v7` with `platforms: linux/amd64,linux/arm64` publishes a proper multi-arch manifest in a single job, so the whole artifact-passing and `imagetools create` merge dance existed only to support a matrix split we no longer need.
- The original matrix split was designed around `load: true` disk exhaustion (v1.14.30b). With `push-by-digest`/`push: true` streaming straight to the registry — no local unpack — the peak disk story is fundamentally different. Validated in v1.14.31b that the reclaim step gives sufficient headroom for a single-job amd64 build; oracle-reviewed call that this should extend to the combined amd64+arm64 push case.
- Workflow goes from 7 jobs to 5 (smoke-base, smoke-omos, build-base, build-omos, update-description). 263 → ~110 lines of YAML in `docker-publish.yml`.
- **Add:** `timeout-minutes: 90` on both build jobs so a hung arm64 build produces an explicit failure with logs rather than runner-default silent truncation.
- **Add:** `BUILDKIT_PROGRESS=plain` at workflow level so arm64-under-QEMU build output is line-by-line (the default collapsed progress UI was obscuring earlier stalls).
- **Add:** `AGENTS.md §CI quirks` documents the Gitea-specific traps encountered this week: `upload-artifact@v3`-only on Gitea, `/bin/sh` is dash, `build-push-action@v7` does multi-arch natively with comma-separated platforms, reclaim step is mandatory on `load: true` jobs.
- No image changes. Rebuild of v1.14.31 content only.
## v1.14.31c — 2026-05-01
**CI: fix bash-specific parameter expansion and bump omos size threshold.**