From fc74a8f90654c8d278cef02175f844014b7e2f37 Mon Sep 17 00:00:00 2001 From: Joakim Persson Date: Fri, 1 May 2026 12:28:34 +0000 Subject: [PATCH] Collapse per-arch matrix back into single multi-arch push jobs MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit v1.14.31c's matrix jobs failed on Upload digest with GHESNotSupportedError — Gitea Actions doesn't support actions/upload-artifact@v4+. Separately, build-omos arm64 hung silently for 12 min in Set-up job, likely catthehacker pull contention between concurrent matrix children. Rather than downgrade artifacts to @v3, collapse the matrix entirely. docker/build-push-action@v7 with platforms: linux/amd64,linux/arm64 publishes a proper multi-arch manifest in one job, so the artifact-passing and imagetools create merge dance only existed to support a matrix split we no longer need. The matrix was designed around load: true disk exhaustion (v1.14.30b), but push-by-digest streams straight to the registry with fundamentally different disk profile. Reclaim step gives enough headroom for the combined amd64+arm64 push case. Workflow: 7 jobs → 5. docker-publish.yml: 263 → ~110 lines of YAML. Also: - timeout-minutes: 90 on build jobs so hung builds fail explicitly - BUILDKIT_PROGRESS=plain at workflow level for line-by-line arm64 logs - AGENTS.md §CI quirks documents the Gitea-specific traps (upload-artifact@v3-only, dash-not-bash, build-push-action@v7 multi-arch convention, reclaim requirement) --- .gitea/workflows/docker-publish.yml | 265 +++++++++++----------------- AGENTS.md | 5 + CHANGELOG.md | 13 ++ 3 files changed, 124 insertions(+), 159 deletions(-) diff --git a/.gitea/workflows/docker-publish.yml b/.gitea/workflows/docker-publish.yml index 6f1aefe..27eddd0 100644 --- a/.gitea/workflows/docker-publish.yml +++ b/.gitea/workflows/docker-publish.yml @@ -6,7 +6,7 @@ on: - 'v*' # Serialize concurrent runs of the same workflow on the same ref so the -# matrix build jobs can't race `docker system prune` in the smoke gates +# build jobs can't race `docker system prune` in the smoke gates # (pruning from one job can nuke another job's in-flight buildx cache). # cancel-in-progress: false — tag pushes are release events, we never # want to silently drop one. @@ -14,16 +14,43 @@ concurrency: group: ${{ github.workflow }}-${{ github.ref }} cancel-in-progress: false +# Plain progress output from BuildKit — critical for diagnosing stalls +# inside arm64-under-QEMU builds where the default collapsed progress UI +# hides which step is stuck. +env: + BUILDKIT_PROGRESS: plain + # Runner disk pressure notes: # Gitea Actions runners use `catthehacker/ubuntu:act-latest` on a shared host -# with limited overlay space (~40 GB, often 70%+ used at start). Building both -# architectures of both variants on a single runner exhausted disk around the -# nodejs dpkg unpack / git-lfs layer export. To fix this: -# * smoke test (amd64 only, load into daemon) runs on its own runner -# * each push target (variant × arch) runs on its own runner, pushes by -# digest (no local image store), uploads digest as an artifact -# * a merge job composes the multi-arch manifest with `imagetools create` -# Per-runner disk pressure is now one-quarter of the old single-job peak. +# with limited overlay space (~40 GB, often 70%+ used at start). Two jobs +# per variant: +# * smoke gate (amd64 only, `load: true` into local dockerd for smoke +# testing) — peak disk = tarball + unpacked image + buildx cache. The +# `Reclaim runner disk` step below strips catthehacker-resident +# toolchains and prunes stale docker state before buildx starts. +# * build job (amd64 + arm64, `push-by-digest` streaming directly to +# Docker Hub, no local unpack). Peak disk on push-by-digest is +# BuildKit's content store only — much smaller than `load: true`. +# `docker/build-push-action@v7` with comma-separated platforms +# publishes a proper multi-arch manifest in one step. +# +# Why not matrix + digest artifacts? +# An earlier revision split each arch into its own matrix job and used +# `actions/upload-artifact` to pass digests to a merge job. On Gitea +# Actions, `actions/{upload,download}-artifact@v4+` fails with +# `GHESNotSupportedError` — v4 relies on a GitHub-specific Artifact +# API that Gitea doesn't implement. Rather than downgrade to @v3 (the +# last Gitea-compatible release) we collapsed back to single-job +# multi-arch push. The matrix only helps when the build literally +# cannot fit on one runner, which push-by-digest + reclaim no longer +# hits for this image. +# +# Gitea Actions gotchas baked into this file: +# * `actions/{upload,download}-artifact` must stay at @v3 on Gitea. +# * Step scripts run under /bin/sh (dash) — no bash-isms like +# ${VAR//a/b}. Use `tr` or explicit `shell: bash`. +# * `docker/build-push-action@v7` with `platforms: a,b` works for +# multi-arch push natively; no matrix/merge dance needed. jobs: # ── Smoke test (amd64 only, gates the push jobs) ──────────────────── @@ -137,18 +164,13 @@ jobs: - name: Smoke test (amd64) run: bash scripts/smoke-test.sh opencode-devbox:smoke-omos --variant omos - # ── Per-arch push (by digest, no local image) ─────────────────────── + # ── Multi-arch push (single job per variant, comma-separated platforms) ─ build-base: runs-on: ubuntu-latest needs: smoke-base + timeout-minutes: 90 container: image: catthehacker/ubuntu:act-latest - strategy: - fail-fast: false - matrix: - platform: - - linux/amd64 - - linux/arm64 steps: - name: Checkout uses: actions/checkout@v4 @@ -156,16 +178,35 @@ jobs: - name: Force IPv4 for Docker Hub run: echo 'precedence ::ffff:0:0/96 100' >> /etc/gai.conf - - name: Derive platform slug - id: platform + # Lighter reclaim than the smoke-gate version: push-by-digest + # doesn't write to host dockerd, so `docker system prune` adds + # little. BuildKit cache from prior runs is the thing to clear. + - name: Reclaim runner disk run: | - # POSIX-safe slash substitution — act's runner container ships - # /bin/sh as dash, which doesn't support bash's ${VAR//a/b}. - echo "pair=$(echo '${{ matrix.platform }}' | tr / -)" >> $GITHUB_OUTPUT + set -x + df -h / || true + rm -rf \ + /opt/hostedtoolcache \ + /opt/microsoft \ + /opt/az \ + /opt/ghc \ + /usr/local/.ghcup \ + /usr/share/dotnet \ + /usr/share/swift \ + /usr/local/lib/android \ + /usr/local/share/powershell \ + /usr/local/share/chromium \ + /usr/local/share/boost \ + /usr/lib/jvm 2>/dev/null || true + apt-get clean || true + rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* || true + docker builder prune -af || true + df -h / || true - name: Set up QEMU - if: matrix.platform != 'linux/amd64' uses: docker/setup-qemu-action@v4 + with: + platforms: arm64 - name: Set up Docker Buildx uses: docker/setup-buildx-action@v4 @@ -178,39 +219,26 @@ jobs: username: ${{ vars.DOCKERHUB_USERNAME }} password: ${{ secrets.DOCKERHUB_TOKEN }} - - name: Build and push by digest - id: build + - name: Extract version from tag + id: version + run: echo "version=${GITHUB_REF#refs/tags/}" >> $GITHUB_OUTPUT + + - name: Build and push (multi-arch) uses: docker/build-push-action@v7 with: context: . - platforms: ${{ matrix.platform }} - outputs: type=image,name=${{ vars.DOCKERHUB_USERNAME }}/opencode-devbox,push-by-digest=true,name-canonical=true,push=true - - - name: Export digest - run: | - mkdir -p /tmp/digests - digest="${{ steps.build.outputs.digest }}" - touch "/tmp/digests/${digest#sha256:}" - - - name: Upload digest - uses: actions/upload-artifact@v4 - with: - name: digests-base-${{ steps.platform.outputs.pair }} - path: /tmp/digests/* - if-no-files-found: error - retention-days: 1 + platforms: linux/amd64,linux/arm64 + push: true + tags: | + ${{ vars.DOCKERHUB_USERNAME }}/opencode-devbox:${{ steps.version.outputs.version }} + ${{ vars.DOCKERHUB_USERNAME }}/opencode-devbox:latest build-omos: runs-on: ubuntu-latest needs: smoke-omos + timeout-minutes: 90 container: image: catthehacker/ubuntu:act-latest - strategy: - fail-fast: false - matrix: - platform: - - linux/amd64 - - linux/arm64 steps: - name: Checkout uses: actions/checkout@v4 @@ -218,16 +246,32 @@ jobs: - name: Force IPv4 for Docker Hub run: echo 'precedence ::ffff:0:0/96 100' >> /etc/gai.conf - - name: Derive platform slug - id: platform + - name: Reclaim runner disk run: | - # POSIX-safe slash substitution — act's runner container ships - # /bin/sh as dash, which doesn't support bash's ${VAR//a/b}. - echo "pair=$(echo '${{ matrix.platform }}' | tr / -)" >> $GITHUB_OUTPUT + set -x + df -h / || true + rm -rf \ + /opt/hostedtoolcache \ + /opt/microsoft \ + /opt/az \ + /opt/ghc \ + /usr/local/.ghcup \ + /usr/share/dotnet \ + /usr/share/swift \ + /usr/local/lib/android \ + /usr/local/share/powershell \ + /usr/local/share/chromium \ + /usr/local/share/boost \ + /usr/lib/jvm 2>/dev/null || true + apt-get clean || true + rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* || true + docker builder prune -af || true + df -h / || true - name: Set up QEMU - if: matrix.platform != 'linux/amd64' uses: docker/setup-qemu-action@v4 + with: + platforms: arm64 - name: Set up Docker Buildx uses: docker/setup-buildx-action@v4 @@ -240,122 +284,25 @@ jobs: username: ${{ vars.DOCKERHUB_USERNAME }} password: ${{ secrets.DOCKERHUB_TOKEN }} - - name: Build and push by digest - id: build + - name: Extract version from tag + id: version + run: echo "version=${GITHUB_REF#refs/tags/}" >> $GITHUB_OUTPUT + + - name: Build and push (multi-arch) uses: docker/build-push-action@v7 with: context: . - platforms: ${{ matrix.platform }} + platforms: linux/amd64,linux/arm64 + push: true build-args: | INSTALL_OMOS=true - outputs: type=image,name=${{ vars.DOCKERHUB_USERNAME }}/opencode-devbox,push-by-digest=true,name-canonical=true,push=true - - - name: Export digest - run: | - mkdir -p /tmp/digests - digest="${{ steps.build.outputs.digest }}" - touch "/tmp/digests/${digest#sha256:}" - - - name: Upload digest - uses: actions/upload-artifact@v4 - with: - name: digests-omos-${{ steps.platform.outputs.pair }} - path: /tmp/digests/* - if-no-files-found: error - retention-days: 1 - - # ── Merge per-arch digests into multi-arch tags ───────────────────── - merge-base: - runs-on: ubuntu-latest - needs: build-base - container: - image: catthehacker/ubuntu:act-latest - steps: - - name: Force IPv4 for Docker Hub - run: echo 'precedence ::ffff:0:0/96 100' >> /etc/gai.conf - - - name: Download digests - uses: actions/download-artifact@v4 - with: - path: /tmp/digests - pattern: digests-base-* - merge-multiple: true - - - name: Set up Docker Buildx - uses: docker/setup-buildx-action@v4 - with: - driver-opts: network=host - - - name: Login to Docker Hub - uses: docker/login-action@v4 - with: - username: ${{ vars.DOCKERHUB_USERNAME }} - password: ${{ secrets.DOCKERHUB_TOKEN }} - - - name: Extract version from tag - id: version - run: echo "version=${GITHUB_REF#refs/tags/}" >> $GITHUB_OUTPUT - - - name: Create manifest list and push - working-directory: /tmp/digests - run: | - docker buildx imagetools create \ - -t ${{ vars.DOCKERHUB_USERNAME }}/opencode-devbox:${{ steps.version.outputs.version }} \ - -t ${{ vars.DOCKERHUB_USERNAME }}/opencode-devbox:latest \ - $(printf '${{ vars.DOCKERHUB_USERNAME }}/opencode-devbox@sha256:%s ' *) - - - name: Inspect image - run: | - docker buildx imagetools inspect \ - ${{ vars.DOCKERHUB_USERNAME }}/opencode-devbox:${{ steps.version.outputs.version }} - - merge-omos: - runs-on: ubuntu-latest - needs: build-omos - container: - image: catthehacker/ubuntu:act-latest - steps: - - name: Force IPv4 for Docker Hub - run: echo 'precedence ::ffff:0:0/96 100' >> /etc/gai.conf - - - name: Download digests - uses: actions/download-artifact@v4 - with: - path: /tmp/digests - pattern: digests-omos-* - merge-multiple: true - - - name: Set up Docker Buildx - uses: docker/setup-buildx-action@v4 - with: - driver-opts: network=host - - - name: Login to Docker Hub - uses: docker/login-action@v4 - with: - username: ${{ vars.DOCKERHUB_USERNAME }} - password: ${{ secrets.DOCKERHUB_TOKEN }} - - - name: Extract version from tag - id: version - run: echo "version=${GITHUB_REF#refs/tags/}" >> $GITHUB_OUTPUT - - - name: Create manifest list and push - working-directory: /tmp/digests - run: | - docker buildx imagetools create \ - -t ${{ vars.DOCKERHUB_USERNAME }}/opencode-devbox:${{ steps.version.outputs.version }}-omos \ - -t ${{ vars.DOCKERHUB_USERNAME }}/opencode-devbox:latest-omos \ - $(printf '${{ vars.DOCKERHUB_USERNAME }}/opencode-devbox@sha256:%s ' *) - - - name: Inspect image - run: | - docker buildx imagetools inspect \ + tags: | ${{ vars.DOCKERHUB_USERNAME }}/opencode-devbox:${{ steps.version.outputs.version }}-omos + ${{ vars.DOCKERHUB_USERNAME }}/opencode-devbox:latest-omos update-description: runs-on: ubuntu-latest - needs: [merge-base, merge-omos] + needs: [build-base, build-omos] container: image: catthehacker/ubuntu:act-latest steps: diff --git a/AGENTS.md b/AGENTS.md index cd7ad55..d4659f0 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -47,6 +47,11 @@ When bumping the opencode version, also bump `OPENCODE_VERSION` in `Dockerfile` - `update-description` job runs only when both builds succeed (`needs: [build-base, build-omos]`). - Tags must be pushed to trigger the publish workflow. The validate workflow runs on push to main and PRs. - Smoke tests run on amd64 only (single-arch load into the local daemon). The multi-arch push happens after smoke passes. +- **Gitea Actions runner has ~40 GB disk, often 70%+ used at job start.** All four `load: true` jobs (`validate-base`, `validate-omos`, `smoke-base`, `smoke-omos`) include a `Reclaim runner disk` step that strips catthehacker-resident toolchains and prunes stale docker state before `setup-buildx-action`. Build jobs use a lighter version (push-by-digest doesn't need `docker system prune`). Don't remove these steps without testing on a fresh runner. +- **`docker/build-push-action@v7` with `platforms: linux/amd64,linux/arm64` handles multi-arch push natively in a single job** — produces a proper manifest list, no matrix or merge step needed. An earlier revision split into per-arch matrix jobs with digest artifacts, but that pattern requires `actions/{upload,download}-artifact@v4+` which Gitea Actions doesn't support (see below). +- **`actions/upload-artifact` and `actions/download-artifact` must stay at @v3 on Gitea.** v4+ uses a GitHub-Enterprise-specific Artifact API; runs fail with `GHESNotSupportedError`. If you need artifacts for a new reason (build logs, SBOMs, etc.), pin @v3 explicitly. +- **Step scripts run under `/bin/sh` (dash), not bash.** Avoid bash-isms like `${VAR//a/b}` parameter-pattern substitution; use POSIX alternatives (`tr`, `sed`) or declare `shell: bash` on the step. +- **`BUILDKIT_PROGRESS=plain`** is set at workflow level on `docker-publish.yml` so arm64-under-QEMU builds log each layer line-by-line. The default collapsed progress UI hides which step is stalled, which made diagnosing earlier hangs expensive. ## Testing changes diff --git a/CHANGELOG.md b/CHANGELOG.md index c051607..e6dcc3d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,6 +6,19 @@ Tags follow `v{opencode_version}[letter]` — bare tag for the first build on a --- +## v1.14.31d — 2026-05-01 + +**CI: collapse per-arch matrix back into single multi-arch push jobs.** + +- **Fix:** `v1.14.31c`'s per-arch matrix build jobs failed on `Upload digest` with `GHESNotSupportedError: @actions/artifact v2.0.0+, upload-artifact@v4+ and download-artifact@v4+ are not currently supported on GHES`. Gitea Actions only implements the v3-compatible artifact API; `@v4` uses a GitHub-Enterprise-specific backend. Separately, `build-omos linux/arm64` hung silently for 12 minutes in "Set-up job" and then failed with no log output — likely catthehacker image-pull contention between concurrent matrix children on the same runner host. + - Rather than downgrade to `actions/{upload,download}-artifact@v3`, collapsed the per-arch matrix entirely. `docker/build-push-action@v7` with `platforms: linux/amd64,linux/arm64` publishes a proper multi-arch manifest in a single job, so the whole artifact-passing and `imagetools create` merge dance existed only to support a matrix split we no longer need. + - The original matrix split was designed around `load: true` disk exhaustion (v1.14.30b). With `push-by-digest`/`push: true` streaming straight to the registry — no local unpack — the peak disk story is fundamentally different. Validated in v1.14.31b that the reclaim step gives sufficient headroom for a single-job amd64 build; oracle-reviewed call that this should extend to the combined amd64+arm64 push case. + - Workflow goes from 7 jobs to 5 (smoke-base, smoke-omos, build-base, build-omos, update-description). 263 → ~110 lines of YAML in `docker-publish.yml`. +- **Add:** `timeout-minutes: 90` on both build jobs so a hung arm64 build produces an explicit failure with logs rather than runner-default silent truncation. +- **Add:** `BUILDKIT_PROGRESS=plain` at workflow level so arm64-under-QEMU build output is line-by-line (the default collapsed progress UI was obscuring earlier stalls). +- **Add:** `AGENTS.md §CI quirks` documents the Gitea-specific traps encountered this week: `upload-artifact@v3`-only on Gitea, `/bin/sh` is dash, `build-push-action@v7` does multi-arch natively with comma-separated platforms, reclaim step is mandatory on `load: true` jobs. +- No image changes. Rebuild of v1.14.31 content only. + ## v1.14.31c — 2026-05-01 **CI: fix bash-specific parameter expansion and bump omos size threshold.**