Collapse per-arch matrix back into single multi-arch push jobs

v1.14.31c's matrix jobs failed on Upload digest with GHESNotSupportedError — Gitea Actions doesn't support actions/upload-artifact@v4+. Separately, build-omos arm64 hung silently for 12 min in Set-up job, likely catthehacker pull contention between concurrent matrix children. Rather than downgrade artifacts to @v3, collapse the matrix entirely. docker/build-push-action@v7 with platforms: linux/amd64,linux/arm64 publishes a proper multi-arch manifest in one job, so the artifact-passing and imagetools create merge dance only existed to support a matrix split we no longer need. The matrix was designed around load: true disk exhaustion (v1.14.30b), but push-by-digest streams straight to the registry with fundamentally different disk profile. Reclaim step gives enough headroom for the combined amd64+arm64 push case. Workflow: 7 jobs → 5. docker-publish.yml: 263 → ~110 lines of YAML. Also: - timeout-minutes: 90 on build jobs so hung builds fail explicitly - BUILDKIT_PROGRESS=plain at workflow level for line-by-line arm64 logs - AGENTS.md §CI quirks documents the Gitea-specific traps (upload-artifact@v3-only, dash-not-bash, build-push-action@v7 multi-arch convention, reclaim requirement)
Fix dash-incompatible slash substitution and bump omos size threshold
2026-05-01 12:28:34 +00:00 · 2026-05-01 10:43:04 +00:00 · 2026-05-01 09:34:52 +00:00 · 2026-05-01 08:43:08 +00:00 · 2026-04-30 20:56:58 +00:00
8 changed files with 422 additions and 54 deletions
@@ -5,8 +5,56 @@ on:
    tags:
      - 'v*'
 # Serialize concurrent runs of the same workflow on the same ref so the
 # build jobs can't race `docker system prune` in the smoke gates
 # (pruning from one job can nuke another job's in-flight buildx cache).
 # cancel-in-progress: false — tag pushes are release events, we never
 # want to silently drop one.
 concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: false
 # Plain progress output from BuildKit — critical for diagnosing stalls
 # inside arm64-under-QEMU builds where the default collapsed progress UI
 # hides which step is stuck.
 env:
  BUILDKIT_PROGRESS: plain
 # Runner disk pressure notes:
 # Gitea Actions runners use `catthehacker/ubuntu:act-latest` on a shared host
 # with limited overlay space (~40 GB, often 70%+ used at start). Two jobs
 # per variant:
 #   * smoke gate (amd64 only, `load: true` into local dockerd for smoke
 #     testing) — peak disk = tarball + unpacked image + buildx cache. The
 #     `Reclaim runner disk` step below strips catthehacker-resident
 #     toolchains and prunes stale docker state before buildx starts.
 #   * build job (amd64 + arm64, `push-by-digest` streaming directly to
 #     Docker Hub, no local unpack). Peak disk on push-by-digest is
 #     BuildKit's content store only — much smaller than `load: true`.
 #     `docker/build-push-action@v7` with comma-separated platforms
 #     publishes a proper multi-arch manifest in one step.
 #
 # Why not matrix + digest artifacts?
 #   An earlier revision split each arch into its own matrix job and used
 #   `actions/upload-artifact` to pass digests to a merge job. On Gitea
 #   Actions, `actions/{upload,download}-artifact@v4+` fails with
 #   `GHESNotSupportedError` — v4 relies on a GitHub-specific Artifact
 #   API that Gitea doesn't implement. Rather than downgrade to @v3 (the
 #   last Gitea-compatible release) we collapsed back to single-job
 #   multi-arch push. The matrix only helps when the build literally
 #   cannot fit on one runner, which push-by-digest + reclaim no longer
 #   hits for this image.
 #
 # Gitea Actions gotchas baked into this file:
 #   * `actions/{upload,download}-artifact` must stay at @v3 on Gitea.
 #   * Step scripts run under /bin/sh (dash) — no bash-isms like
 #     ${VAR//a/b}. Use `tr` or explicit `shell: bash`.
 #   * `docker/build-push-action@v7` with `platforms: a,b` works for
 #     multi-arch push natively; no matrix/merge dance needed.
 jobs:
-  build-base:
+  # ── Smoke test (amd64 only, gates the push jobs) ────────────────────
  smoke-base:
    runs-on: ubuntu-latest
    container:
      image: catthehacker/ubuntu:act-latest
@@ -15,30 +63,41 @@ jobs:
        uses: actions/checkout@v4
      - name: Force IPv4 for Docker Hub
-        run: |
+        run: echo 'precedence ::ffff:0:0/96  100' >> /etc/gai.conf
          # Prefer IPv4 to avoid intermittent IPv6 connectivity failures
          echo 'precedence ::ffff:0:0/96  100' >> /etc/gai.conf
-      - name: Set up QEMU
+      # See docker-publish.yml preamble. `load: true` peak disk = tarball
-        uses: docker/setup-qemu-action@v4
+      # + unpacked image + buildx cache; the image now crosses the 40 GB
      # runner overlay's starting headroom. Strip catthehacker-resident
      # toolchains and any stale docker state up front.
      - name: Reclaim runner disk
        run: |
          set -x
          df -h / || true
          rm -rf \
            /opt/hostedtoolcache \
            /opt/microsoft \
            /opt/az \
            /opt/ghc \
            /usr/local/.ghcup \
            /usr/share/dotnet \
            /usr/share/swift \
            /usr/local/lib/android \
            /usr/local/share/powershell \
            /usr/local/share/chromium \
            /usr/local/share/boost \
            /usr/lib/jvm 2>/dev/null || true
          apt-get clean || true
          rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* || true
          docker system df || true
          docker system prune -af --volumes || true
          docker builder prune -af || true
          df -h / || true
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v4
        with:
          driver-opts: network=host
      - name: Login to Docker Hub
        uses: docker/login-action@v4
        with:
          username: ${{ vars.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}
      - name: Extract version from tag
        id: version
        run: |
          VERSION=${GITHUB_REF#refs/tags/}
          echo "version=${VERSION}" >> $GITHUB_OUTPUT
      - name: Build and load amd64 image for smoke test
        uses: docker/build-push-action@v7
        with:
@@ -49,20 +108,9 @@ jobs:
          tags: opencode-devbox:smoke-base
      - name: Smoke test (amd64)
-        run: |
+        run: bash scripts/smoke-test.sh opencode-devbox:smoke-base --variant base
          bash scripts/smoke-test.sh opencode-devbox:smoke-base --variant base
-      - name: Build and push (base)
+  smoke-omos:
        uses: docker/build-push-action@v7
        with:
          context: .
          platforms: linux/amd64,linux/arm64
          push: true
          tags: |
            ${{ vars.DOCKERHUB_USERNAME }}/opencode-devbox:${{ steps.version.outputs.version }}
            ${{ vars.DOCKERHUB_USERNAME }}/opencode-devbox:latest
  build-omos:
    runs-on: ubuntu-latest
    container:
      image: catthehacker/ubuntu:act-latest
@@ -71,30 +119,37 @@ jobs:
        uses: actions/checkout@v4
      - name: Force IPv4 for Docker Hub
-        run: |
+        run: echo 'precedence ::ffff:0:0/96  100' >> /etc/gai.conf
          # Prefer IPv4 to avoid intermittent IPv6 connectivity failures
          echo 'precedence ::ffff:0:0/96  100' >> /etc/gai.conf
-      - name: Set up QEMU
+      - name: Reclaim runner disk
-        uses: docker/setup-qemu-action@v4
+        run: |
          set -x
          df -h / || true
          rm -rf \
            /opt/hostedtoolcache \
            /opt/microsoft \
            /opt/az \
            /opt/ghc \
            /usr/local/.ghcup \
            /usr/share/dotnet \
            /usr/share/swift \
            /usr/local/lib/android \
            /usr/local/share/powershell \
            /usr/local/share/chromium \
            /usr/local/share/boost \
            /usr/lib/jvm 2>/dev/null || true
          apt-get clean || true
          rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* || true
          docker system df || true
          docker system prune -af --volumes || true
          docker builder prune -af || true
          df -h / || true
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v4
        with:
          driver-opts: network=host
      - name: Login to Docker Hub
        uses: docker/login-action@v4
        with:
          username: ${{ vars.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}
      - name: Extract version from tag
        id: version
        run: |
          VERSION=${GITHUB_REF#refs/tags/}
          echo "version=${VERSION}" >> $GITHUB_OUTPUT
      - name: Build and load amd64 image for smoke test
        uses: docker/build-push-action@v7
        with:
@@ -107,10 +162,133 @@ jobs:
          tags: opencode-devbox:smoke-omos
      - name: Smoke test (amd64)
-        run: |
+        run: bash scripts/smoke-test.sh opencode-devbox:smoke-omos --variant omos
          bash scripts/smoke-test.sh opencode-devbox:smoke-omos --variant omos
-      - name: Build and push (omos)
+  # ── Multi-arch push (single job per variant, comma-separated platforms) ─
  build-base:
    runs-on: ubuntu-latest
    needs: smoke-base
    timeout-minutes: 90
    container:
      image: catthehacker/ubuntu:act-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Force IPv4 for Docker Hub
        run: echo 'precedence ::ffff:0:0/96  100' >> /etc/gai.conf
      # Lighter reclaim than the smoke-gate version: push-by-digest
      # doesn't write to host dockerd, so `docker system prune` adds
      # little. BuildKit cache from prior runs is the thing to clear.
      - name: Reclaim runner disk
        run: |
          set -x
          df -h / || true
          rm -rf \
            /opt/hostedtoolcache \
            /opt/microsoft \
            /opt/az \
            /opt/ghc \
            /usr/local/.ghcup \
            /usr/share/dotnet \
            /usr/share/swift \
            /usr/local/lib/android \
            /usr/local/share/powershell \
            /usr/local/share/chromium \
            /usr/local/share/boost \
            /usr/lib/jvm 2>/dev/null || true
          apt-get clean || true
          rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* || true
          docker builder prune -af || true
          df -h / || true
      - name: Set up QEMU
        uses: docker/setup-qemu-action@v4
        with:
          platforms: arm64
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v4
        with:
          driver-opts: network=host
      - name: Login to Docker Hub
        uses: docker/login-action@v4
        with:
          username: ${{ vars.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}
      - name: Extract version from tag
        id: version
        run: echo "version=${GITHUB_REF#refs/tags/}" >> $GITHUB_OUTPUT
      - name: Build and push (multi-arch)
        uses: docker/build-push-action@v7
        with:
          context: .
          platforms: linux/amd64,linux/arm64
          push: true
          tags: |
            ${{ vars.DOCKERHUB_USERNAME }}/opencode-devbox:${{ steps.version.outputs.version }}
            ${{ vars.DOCKERHUB_USERNAME }}/opencode-devbox:latest
  build-omos:
    runs-on: ubuntu-latest
    needs: smoke-omos
    timeout-minutes: 90
    container:
      image: catthehacker/ubuntu:act-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Force IPv4 for Docker Hub
        run: echo 'precedence ::ffff:0:0/96  100' >> /etc/gai.conf
      - name: Reclaim runner disk
        run: |
          set -x
          df -h / || true
          rm -rf \
            /opt/hostedtoolcache \
            /opt/microsoft \
            /opt/az \
            /opt/ghc \
            /usr/local/.ghcup \
            /usr/share/dotnet \
            /usr/share/swift \
            /usr/local/lib/android \
            /usr/local/share/powershell \
            /usr/local/share/chromium \
            /usr/local/share/boost \
            /usr/lib/jvm 2>/dev/null || true
          apt-get clean || true
          rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* || true
          docker builder prune -af || true
          df -h / || true
      - name: Set up QEMU
        uses: docker/setup-qemu-action@v4
        with:
          platforms: arm64
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v4
        with:
          driver-opts: network=host
      - name: Login to Docker Hub
        uses: docker/login-action@v4
        with:
          username: ${{ vars.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}
      - name: Extract version from tag
        id: version
        run: echo "version=${GITHUB_REF#refs/tags/}" >> $GITHUB_OUTPUT
      - name: Build and push (multi-arch)
        uses: docker/build-push-action@v7
        with:
          context: .
@@ -46,6 +46,34 @@ jobs:
        run: |
          echo 'precedence ::ffff:0:0/96  100' >> /etc/gai.conf
      # The runner's overlay disk starts ~70% full. `load: true` peak disk
      # is tarball + unpacked image + buildx cache, which tips it over
      # once the image crosses ~3 GB. Strip catthehacker-resident
      # toolchains we never use and any stale docker state up front.
      - name: Reclaim runner disk
        run: |
          set -x
          df -h / || true
          rm -rf \
            /opt/hostedtoolcache \
            /opt/microsoft \
            /opt/az \
            /opt/ghc \
            /usr/local/.ghcup \
            /usr/share/dotnet \
            /usr/share/swift \
            /usr/local/lib/android \
            /usr/local/share/powershell \
            /usr/local/share/chromium \
            /usr/local/share/boost \
            /usr/lib/jvm 2>/dev/null || true
          apt-get clean || true
          rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* || true
          docker system df || true
          docker system prune -af --volumes || true
          docker builder prune -af || true
          df -h / || true
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v4
        with:
@@ -76,6 +104,30 @@ jobs:
        run: |
          echo 'precedence ::ffff:0:0/96  100' >> /etc/gai.conf
      - name: Reclaim runner disk
        run: |
          set -x
          df -h / || true
          rm -rf \
            /opt/hostedtoolcache \
            /opt/microsoft \
            /opt/az \
            /opt/ghc \
            /usr/local/.ghcup \
            /usr/share/dotnet \
            /usr/share/swift \
            /usr/local/lib/android \
            /usr/local/share/powershell \
            /usr/local/share/chromium \
            /usr/local/share/boost \
            /usr/lib/jvm 2>/dev/null || true
          apt-get clean || true
          rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* || true
          docker system df || true
          docker system prune -af --volumes || true
          docker builder prune -af || true
          df -h / || true
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v4
        with:
@@ -47,6 +47,11 @@ When bumping the opencode version, also bump `OPENCODE_VERSION` in `Dockerfile`
 - `update-description` job runs only when both builds succeed (`needs: [build-base, build-omos]`).
 - Tags must be pushed to trigger the publish workflow. The validate workflow runs on push to main and PRs.
 - Smoke tests run on amd64 only (single-arch load into the local daemon). The multi-arch push happens after smoke passes.
 - **Gitea Actions runner has ~40 GB disk, often 70%+ used at job start.** All four `load: true` jobs (`validate-base`, `validate-omos`, `smoke-base`, `smoke-omos`) include a `Reclaim runner disk` step that strips catthehacker-resident toolchains and prunes stale docker state before `setup-buildx-action`. Build jobs use a lighter version (push-by-digest doesn't need `docker system prune`). Don't remove these steps without testing on a fresh runner.
 - **`docker/build-push-action@v7` with `platforms: linux/amd64,linux/arm64` handles multi-arch push natively in a single job** — produces a proper manifest list, no matrix or merge step needed. An earlier revision split into per-arch matrix jobs with digest artifacts, but that pattern requires `actions/{upload,download}-artifact@v4+` which Gitea Actions doesn't support (see below).
 - **`actions/upload-artifact` and `actions/download-artifact` must stay at @v3 on Gitea.** v4+ uses a GitHub-Enterprise-specific Artifact API; runs fail with `GHESNotSupportedError`. If you need artifacts for a new reason (build logs, SBOMs, etc.), pin @v3 explicitly.
 - **Step scripts run under `/bin/sh` (dash), not bash.** Avoid bash-isms like `${VAR//a/b}` parameter-pattern substitution; use POSIX alternatives (`tr`, `sed`) or declare `shell: bash` on the step.
 - **`BUILDKIT_PROGRESS=plain`** is set at workflow level on `docker-publish.yml` so arm64-under-QEMU builds log each layer line-by-line. The default collapsed progress UI hides which step is stalled, which made diagnosing earlier hangs expensive.
 ## Testing changes
@@ -6,6 +6,61 @@ Tags follow `v{opencode_version}[letter]` — bare tag for the first build on a
 ---
 ## v1.14.31d — 2026-05-01
 **CI: collapse per-arch matrix back into single multi-arch push jobs.**
 - **Fix:** `v1.14.31c`'s per-arch matrix build jobs failed on `Upload digest` with `GHESNotSupportedError: @actions/artifact v2.0.0+, upload-artifact@v4+ and download-artifact@v4+ are not currently supported on GHES`. Gitea Actions only implements the v3-compatible artifact API; `@v4` uses a GitHub-Enterprise-specific backend. Separately, `build-omos linux/arm64` hung silently for 12 minutes in "Set-up job" and then failed with no log output — likely catthehacker image-pull contention between concurrent matrix children on the same runner host.
  - Rather than downgrade to `actions/{upload,download}-artifact@v3`, collapsed the per-arch matrix entirely. `docker/build-push-action@v7` with `platforms: linux/amd64,linux/arm64` publishes a proper multi-arch manifest in a single job, so the whole artifact-passing and `imagetools create` merge dance existed only to support a matrix split we no longer need.
  - The original matrix split was designed around `load: true` disk exhaustion (v1.14.30b). With `push-by-digest`/`push: true` streaming straight to the registry — no local unpack — the peak disk story is fundamentally different. Validated in v1.14.31b that the reclaim step gives sufficient headroom for a single-job amd64 build; oracle-reviewed call that this should extend to the combined amd64+arm64 push case.
  - Workflow goes from 7 jobs to 5 (smoke-base, smoke-omos, build-base, build-omos, update-description). 263 → ~110 lines of YAML in `docker-publish.yml`.
 - **Add:** `timeout-minutes: 90` on both build jobs so a hung arm64 build produces an explicit failure with logs rather than runner-default silent truncation.
 - **Add:** `BUILDKIT_PROGRESS=plain` at workflow level so arm64-under-QEMU build output is line-by-line (the default collapsed progress UI was obscuring earlier stalls).
 - **Add:** `AGENTS.md §CI quirks` documents the Gitea-specific traps encountered this week: `upload-artifact@v3`-only on Gitea, `/bin/sh` is dash, `build-push-action@v7` does multi-arch natively with comma-separated platforms, reclaim step is mandatory on `load: true` jobs.
 - No image changes. Rebuild of v1.14.31 content only.
 ## v1.14.31c — 2026-05-01
 **CI: fix bash-specific parameter expansion and bump omos size threshold.**
 - **Fix:** `Derive platform slug` step in the per-arch matrix build jobs (`build-base`, `build-omos`) used `${PLATFORM_PAIR//\//-}` which is a bash parameter-expansion. The runner container executes step scripts via `/bin/sh` (dash), which errored with `Bad substitution`. Rewrote using `tr / -` which is POSIX and behaves identically. Both `build-base` and `build-omos` matrix jobs were blocked on this on `v1.14.31b`.
 - **Fix:** smoke-test image-size threshold for the `omos` variant bumped from 3000 MB to 3200 MB. The mempalace-toolkit bake-in added ~100 MB to omos; measured 3107 MB on `v1.14.31b`. All functional smoke checks (opencode, node, mempalace CLIs, toolkit wrappers, oh-my-opencode-slim) pass — this is a guardrail recalibration, not a performance concession. The underlying image genuinely grew.
 - The runner-disk reclaim step from v1.14.31b did its job: `smoke-base` and `validate-base` now pass cleanly. Only `smoke-omos` was blocked this iteration, and only on the threshold.
 - No image changes beyond what shipped in v1.14.31. Rebuild of v1.14.31 content only.
 ## v1.14.31b — 2026-05-01
 **CI: reclaim runner disk before `load: true` smoke builds.**
 - **Fix:** v1.14.31's publish workflow and the `validate` workflow both hit `No space left on device` on the single-arch amd64 smoke/validate builds (`/opt/uv-tools/mempalace/lib/python3.13/site-packages/hf_xet/hf_xet.abi3.so`, `/usr/local/bin/git-lfs`). Root cause is not the build itself but the `load: true` step: peak disk during export equals tarball + unpacked image + buildx cache, and the image has crossed the ~3 GB threshold where this no longer fits in the ~12 GB of free space the runner container starts with. The v1.14.30c refactor split multi-arch into per-arch push-by-digest jobs (which don't `load`), but the smoke gates still do and still hit the wall.
  - Added a `Reclaim runner disk` step to all four `load: true` jobs (`validate-base`, `validate-omos`, `smoke-base`, `smoke-omos`). The step strips `catthehacker/ubuntu:act-latest`-resident toolchains we never use (hosted-tool-cache, dotnet, android, powershell, swift, ghc, jvm, microsoft, chromium, boost) and runs `docker system prune -af --volumes` + `docker builder prune -af` against the runner's dockerd before `setup-buildx-action`. Expected reclaim is 6–12 GB depending on what's resident.
  - Added workflow-level `concurrency: { group: ..., cancel-in-progress: false }` on `docker-publish.yml` so concurrent tag pushes can't race `docker system prune` in one job against an in-flight buildx cache in another.
  - Pruning is deliberately kept out of the per-arch matrix push-by-digest jobs (`build-base`/`build-omos`) — those don't need it (no `load: true`), and pruning in parallel jobs risks one job nuking another's cache.
 - **Follow-up** (not in this release): image-size reduction via a dedicated `uv tool install mempalace` build stage (strips uv's cache from the final image), pinning `mempalace-toolkit` to a commit SHA with `--depth=1 --filter=blob:none`, and auditing whether `hf_xet` is actually required by mempalace at runtime. These will ship in the next release that rebases on a new opencode version.
 - No image changes. Rebuild of v1.14.31 content only.
 ## v1.14.31 — 2026-05-01
 Bump opencode to 1.14.31.
 **CI infrastructure: split multi-arch publish across separate runners.**
 - **Fix:** The `publish` workflow exhausted runner disk space on `v1.14.30b` and would have hit the same wall on any subsequent release. Both variants built both architectures on a single `catthehacker/ubuntu:act-latest` container with ~40 GB of shared overlay space, and the peak disk footprint during the nodejs dpkg unpack / git-lfs layer export pushed it over the edge (`No space left on device`). The mempalace-toolkit bake-in from v1.14.30b added the final straw; the underlying issue is that QEMU-emulated arm64 layers were stored alongside the amd64 build on the same runner.
  - `docker-publish.yml` refactored to the canonical `push-by-digest` + manifest-merge pattern: smoke test (amd64) runs on its own runner, each `(variant × arch)` push target runs on its own fresh runner with `outputs: type=image,...,push-by-digest=true,push=true` (no local image store), then a tiny merge job assembles the multi-arch manifest with `docker buildx imagetools create` from digest artifacts.
  - Per-runner disk peak is now roughly one-quarter of the old single-job peak. The four Docker Hub tags produced per release (`vX.Y.Z[n]`, `latest`, `vX.Y.Z[n]-omos`, `latest-omos`) are unchanged.
  - Also parallelizes the amd64 and arm64 builds, so wall-clock time for a release should drop noticeably despite the added merge hop.
 ## v1.14.30b — 2026-04-30
 **Bake mempalace-toolkit wrappers into the image.**
 - **Fix:** The scheduler templates in [mempalace-toolkit's `contrib/`](https://gitea.jordbo.se/joakimp/mempalace-toolkit/src/branch/main/contrib) assume `mempalace-session` is available inside the container, but the image never actually installed it. Users following the `*-devbox` scheduler docs would silently lose the wrappers on every `docker compose up --force-recreate`, because the only way to get them was a post-hoc `./install.sh --yes` inside the container — which lives in the ephemeral layer. The host-side systemd timer would then fire, `docker exec` in, and hit `mempalace-session: command not found`. Caught during runtime validation on 2026-04-30.
  - New Dockerfile block clones `mempalace-toolkit` at build time (depth-1) to `/opt/mempalace-toolkit/`, symlinks `bin/mempalace-session` and `bin/mempalace-docs` into `/usr/local/bin/`, and asserts both respond to `--help` before the layer succeeds.
  - Gated by `ARG INSTALL_MEMPALACE_TOOLKIT=true` (defaults on, depends on `INSTALL_MEMPALACE=true`).
  - Floated ref via `ARG MEMPALACE_TOOLKIT_REF=main` — override for reproducible builds once the toolkit starts tagging releases.
 - **Tests:** Smoke test gains three toolkit assertions (`mempalace-session --help`, `mempalace-docs --help`, symlink target check). The resolved-versions preamble now logs the toolkit git short-SHA alongside the other floated components.
 - **Docs:** README's MemPalace section gains a `Scheduled mining (mempalace-toolkit)` subsection covering the new wrappers and pointing at `contrib/` for scheduling. New build-args table entry for `INSTALL_MEMPALACE_TOOLKIT`.
 ## v1.14.30 — 2026-04-30
 Bump opencode to 1.14.30.
@@ -449,6 +449,24 @@ mempalace wake-up
 Each workspace gets its own isolated "wing" — memories never leak between projects.
 ### Scheduled mining (mempalace-toolkit)
 The image bakes in [mempalace-toolkit](https://gitea.jordbo.se/joakimp/mempalace-toolkit), a small set of bash wrappers that pair with mempalace for two common routines:
 ```bash
 # Mine opencode session history (reads ~/.local/share/opencode/opencode.db, stages JSONL, mines into wing_conversations)
 mempalace-session
 # Mine a project's docs into a dedicated wing
 mempalace-docs /workspace/my-project
 ```
 Both wrappers are idempotent and dedup-aware — re-running them on unchanged input is a cheap no-op.
 For weekly automated runs, the toolkit ships ready-to-use scheduler templates (systemd user timer, launchd user agent, cron) in its [`contrib/`](https://gitea.jordbo.se/joakimp/mempalace-toolkit/src/branch/main/contrib) directory. The `*-devbox` variants are designed for this container: host-side schedulers that `docker exec` into the running opencode-devbox.
 Disable the toolkit (keeps mempalace itself) with `--build-arg INSTALL_MEMPALACE_TOOLKIT=false`. Pin to a specific ref with `--build-arg MEMPALACE_TOOLKIT_REF=v0.3.0` once tagged releases exist.
 ### Storage
 Two separate named volumes keep different data classes apart:
@@ -5,7 +5,7 @@ ARG DEBIAN_VERSION=trixie-slim
 FROM debian:${DEBIAN_VERSION} AS base
 ARG TARGETARCH
-ARG OPENCODE_VERSION=1.14.30
+ARG OPENCODE_VERSION=1.14.31
 LABEL maintainer="joakimp"
 LABEL description="Portable opencode developer container"
@@ -207,6 +207,31 @@ RUN if [ "${INSTALL_MEMPALACE}" = "true" ]; then \
      /opt/uv-tools/mempalace/bin/python -c "import mempalace; print('mempalace', mempalace.__version__ if hasattr(mempalace, '__version__') else 'installed')" ; \
    fi
 # ── mempalace-toolkit — bash wrappers for session/docs mining ────────
 # Thin wrappers (`mempalace-session`, `mempalace-docs`) that delegate to
 # the mempalace Python CLI for two common scheduled tasks:
 #   - mempalace-session: mines opencode's SQLite session history into
 #     the palace (wing_conversations). Referenced by contrib/ scheduler
 #     templates (systemd user timer, cron) in the toolkit repo.
 #   - mempalace-docs: mines project docs into a per-project wing.
 # Repo source of truth: https://gitea.jordbo.se/joakimp/mempalace-toolkit
 #
 # Requires INSTALL_MEMPALACE=true (wrappers shell out to `mempalace`).
 # Disable with --build-arg INSTALL_MEMPALACE_TOOLKIT=false if you don't
 # use the scheduled-mining workflow.
 ARG INSTALL_MEMPALACE_TOOLKIT=true
 ARG MEMPALACE_TOOLKIT_REF=main
 RUN if [ "${INSTALL_MEMPALACE}" = "true" ] && [ "${INSTALL_MEMPALACE_TOOLKIT}" = "true" ]; then \
      git clone --depth 1 --branch "${MEMPALACE_TOOLKIT_REF}" \
        https://gitea.jordbo.se/joakimp/mempalace-toolkit.git /opt/mempalace-toolkit && \
      ln -sf /opt/mempalace-toolkit/bin/mempalace-session /usr/local/bin/mempalace-session && \
      ln -sf /opt/mempalace-toolkit/bin/mempalace-docs    /usr/local/bin/mempalace-docs && \
      chmod +x /opt/mempalace-toolkit/bin/mempalace-session /opt/mempalace-toolkit/bin/mempalace-docs && \
      mempalace-session --help >/dev/null && \
      mempalace-docs    --help >/dev/null && \
      echo "mempalace-toolkit installed at $(cd /opt/mempalace-toolkit && git rev-parse --short HEAD)" ; \
    fi
 # rustup — Rust toolchain manager
 # Installs the rustup-init binary only. Users bootstrap Rust with:
 #   rustup-init -y && source ~/.cargo/env
@@ -339,6 +339,7 @@ docker compose build --build-arg NVIM_VERSION=0.12.1   # pin to a specific versi
 |---|---|---|
 | `INSTALL_GO` | `false` | Go toolchain (resolves latest stable from go.dev when `GO_VERSION=latest`) |
 | `INSTALL_MEMPALACE` | `true` | [MemPalace](https://github.com/MemPalace/mempalace) local AI memory system (~300 MB — disable to shrink image if you don't need MCP memory) |
 | `INSTALL_MEMPALACE_TOOLKIT` | `true` | [mempalace-toolkit](https://gitea.jordbo.se/joakimp/mempalace-toolkit) bash wrappers (`mempalace-session`, `mempalace-docs`). Cloned at build time from `MEMPALACE_TOOLKIT_REF` (default `main`). Requires `INSTALL_MEMPALACE=true`. |
 | `INSTALL_OMOS` | `false` | [oh-my-opencode-slim](https://github.com/alvinunreal/oh-my-opencode-slim) multi-agent orchestration (installs Bun and plugin) |
 | `OPENCODE_VERSION` | *(pinned per release)* | opencode npm version. Drives the image tag and is intentionally not floated. |
 | `NODE_VERSION` | `22` | Node.js major version. Pinned to protect against upstream breaking changes across majors. |
@@ -502,6 +503,24 @@ mempalace wake-up
 Each workspace gets its own isolated "wing" — memories never leak between projects.
 ### Scheduled mining (mempalace-toolkit)
 The image bakes in [mempalace-toolkit](https://gitea.jordbo.se/joakimp/mempalace-toolkit), a small set of bash wrappers that pair with mempalace for two common routines:
 ```bash
 # Mine opencode session history (reads ~/.local/share/opencode/opencode.db, stages JSONL, mines into wing_conversations)
 mempalace-session
 # Mine a project's docs into a dedicated wing
 mempalace-docs /workspace/my-project
 ```
 Both wrappers are idempotent and dedup-aware — re-running them on unchanged input is a cheap no-op.
 For weekly automated runs, the toolkit ships ready-to-use scheduler templates (systemd user timer, launchd user agent, cron) in its [`contrib/`](https://gitea.jordbo.se/joakimp/mempalace-toolkit/src/branch/main/contrib) directory. The `*-devbox` variants are designed for this container: host-side schedulers that `docker exec` into the running opencode-devbox.
 Disable the toolkit (keeps mempalace itself) with `--build-arg INSTALL_MEMPALACE_TOOLKIT=false`. Pin to a specific ref with `--build-arg MEMPALACE_TOOLKIT_REF=v0.3.0` once tagged releases exist.
 ### Storage
 Two separate named volumes keep different data classes apart:
@@ -71,6 +71,9 @@ docker run --rm --entrypoint="" "$IMAGE" sh -c '
  if command -v mempalace >/dev/null 2>&1; then
    printf "  %-15s %s\n" "mempalace"   "$(mempalace --version 2>&1 | head -1 || echo installed)"
  fi
  if command -v mempalace-session >/dev/null 2>&1 && [ -d /opt/mempalace-toolkit ]; then
    printf "  %-15s %s\n" "toolkit"     "$(git -C /opt/mempalace-toolkit rev-parse --short HEAD 2>/dev/null || echo installed)"
  fi
 '
 echo
 echo "-- Core binaries --"
@@ -104,6 +107,16 @@ else
  echo "  - mempalace not installed (INSTALL_MEMPALACE=false)"
 fi
 # mempalace-toolkit wrappers: present unless built with INSTALL_MEMPALACE_TOOLKIT=false
 # Gated on mempalace presence — wrappers are useless without the CLI.
 if docker run --rm --entrypoint="" "$IMAGE" sh -c "command -v mempalace && command -v mempalace-session" >/dev/null 2>&1; then
  run "mempalace-session (toolkit)" "mempalace-session --help | head -1"
  run "mempalace-docs (toolkit)"    "mempalace-docs --help | head -1"
  run "toolkit symlink target"      "test -L /usr/local/bin/mempalace-session && readlink /usr/local/bin/mempalace-session"
 elif docker run --rm --entrypoint="" "$IMAGE" sh -c "command -v mempalace" >/dev/null 2>&1; then
  echo "  - mempalace-toolkit not installed (INSTALL_MEMPALACE_TOOLKIT=false)"
 fi
 # bun: only in the omos variant
 if [ "$VARIANT" = "omos" ]; then
  run "bun (omos)"            "bun --version"
@@ -201,9 +214,12 @@ SIZE_BYTES=$(docker image inspect --format='{{.Size}}' "$IMAGE")
 SIZE_MB=$((SIZE_BYTES / 1024 / 1024))
 echo "  Uncompressed size: ${SIZE_MB} MB"
-# Thresholds (uncompressed): base 2500 MB, omos 3000 MB. Adjust as image content evolves.
+# Thresholds (uncompressed): base 2500 MB, omos 3200 MB. Adjust as image content evolves.
 # omos bumped 3000→3200 on v1.14.31c — mempalace-toolkit bake-in pushed the
 # omos variant to ~3.1 GB. Functional smoke checks all pass; this is a
 # guardrail, not a performance limit.
 THRESHOLD=2500
-[ "$VARIANT" = "omos" ] && THRESHOLD=3000
+[ "$VARIANT" = "omos" ] && THRESHOLD=3200
 if [ "$SIZE_MB" -gt "$THRESHOLD" ]; then
  fail "image size ${SIZE_MB} MB exceeds threshold ${THRESHOLD} MB for variant=$VARIANT"
 else
Author	SHA1	Message	Date
Joakim Persson	fc74a8f906	Collapse per-arch matrix back into single multi-arch push jobs Validate / docs-check (push) Successful in 17s Details Validate / validate-omos (push) Successful in 14m21s Details Validate / validate-base (push) Successful in 14m50s Details Publish Docker Image / smoke-base (push) Successful in 11m12s Details Publish Docker Image / smoke-omos (push) Successful in 22m0s Details Publish Docker Image / build-base (push) Successful in 42m25s Details Publish Docker Image / build-omos (push) Failing after 1h16m24s Details Publish Docker Image / update-description (push) Has been cancelled Details v1.14.31c's matrix jobs failed on Upload digest with GHESNotSupportedError — Gitea Actions doesn't support actions/upload-artifact@v4+. Separately, build-omos arm64 hung silently for 12 min in Set-up job, likely catthehacker pull contention between concurrent matrix children. Rather than downgrade artifacts to @v3, collapse the matrix entirely. docker/build-push-action@v7 with platforms: linux/amd64,linux/arm64 publishes a proper multi-arch manifest in one job, so the artifact-passing and imagetools create merge dance only existed to support a matrix split we no longer need. The matrix was designed around load: true disk exhaustion (v1.14.30b), but push-by-digest streams straight to the registry with fundamentally different disk profile. Reclaim step gives enough headroom for the combined amd64+arm64 push case. Workflow: 7 jobs → 5. docker-publish.yml: 263 → ~110 lines of YAML. Also: - timeout-minutes: 90 on build jobs so hung builds fail explicitly - BUILDKIT_PROGRESS=plain at workflow level for line-by-line arm64 logs - AGENTS.md §CI quirks documents the Gitea-specific traps (upload-artifact@v3-only, dash-not-bash, build-push-action@v7 multi-arch convention, reclaim requirement)	2026-05-01 12:28:34 +00:00
Joakim Persson	5a2d06340e	Fix dash-incompatible slash substitution and bump omos size threshold Validate / docs-check (push) Successful in 18s Details Validate / validate-base (push) Successful in 15m44s Details Validate / validate-omos (push) Successful in 15m21s Details Publish Docker Image / smoke-base (push) Successful in 14m30s Details Publish Docker Image / smoke-omos (push) Successful in 15m51s Details Publish Docker Image / build-base (linux/amd64) (push) Failing after 10m58s Details Publish Docker Image / build-omos (linux/amd64) (push) Failing after 15m9s Details Publish Docker Image / build-omos (linux/arm64) (push) Failing after 11m57s Details Publish Docker Image / build-base (linux/arm64) (push) Failing after 39m30s Details Publish Docker Image / merge-base (push) Has been skipped Details Publish Docker Image / merge-omos (push) Has been skipped Details Publish Docker Image / update-description (push) Has been skipped Details v1.14.31b made it through smoke-base and validate-base (reclaim worked), but two narrow bugs blocked the rest: 1. 'Derive platform slug' in the per-arch matrix jobs used bash ${PLATFORM_PAIR//\//-} which dash (/bin/sh in the runner) can't parse — 'Bad substitution'. Rewrote with 'tr / -'. 2. smoke-omos image size 3107 MB tripped the 3000 MB guardrail. All functional checks pass; the mempalace-toolkit bake-in from v1.14.30b added ~100 MB and the threshold was stale. Bumped to 3200 MB. No image-level changes.	2026-05-01 10:43:04 +00:00
Joakim Persson	23894bc19f	Reclaim runner disk before load: true smoke builds Validate / docs-check (push) Successful in 22s Details Validate / validate-base (push) Successful in 18m10s Details Validate / validate-omos (push) Failing after 25m54s Details Publish Docker Image / smoke-base (push) Successful in 11m50s Details Publish Docker Image / build-base (linux/amd64) (push) Failing after 38s Details Publish Docker Image / build-base (linux/arm64) (push) Failing after 21s Details Publish Docker Image / merge-base (push) Has been skipped Details Publish Docker Image / smoke-omos (push) Failing after 19m18s Details Publish Docker Image / build-omos (linux/amd64) (push) Has been skipped Details Publish Docker Image / build-omos (linux/arm64) (push) Has been skipped Details Publish Docker Image / merge-omos (push) Has been skipped Details Publish Docker Image / update-description (push) Has been skipped Details v1.14.31 publish and validate both hit 'No space left on device' on single-arch amd64 smoke/validate builds. The image has crossed ~3 GB and the runner's ~40 GB overlay starts ~70% full, so 'load: true' peak disk (tarball + unpacked image + buildx cache) no longer fits. Add a 'Reclaim runner disk' step to validate-base, validate-omos, smoke-base, smoke-omos. Strips catthehacker-resident toolchains we never use (hosted-tool-cache, dotnet, android, powershell, swift, ghc, jvm, microsoft, chromium, boost), then runs 'docker system prune -af --volumes' + 'docker builder prune -af' against the runner's dockerd before setup-buildx-action. Expected reclaim is 6-12 GB depending on what's resident. Deliberately NOT in the per-arch matrix build jobs — push-by-digest doesn't need it and pruning in parallel jobs risks one job nuking another's in-flight buildx cache. Also add workflow-level concurrency on docker-publish.yml so concurrent tag pushes serialize cleanly.	2026-05-01 09:34:52 +00:00
Joakim Persson	f0918ba915	Bump opencode to 1.14.31 and split multi-arch publish across runners Validate / docs-check (push) Successful in 26s Details Publish Docker Image / smoke-base (push) Failing after 11m1s Details Publish Docker Image / build-base (linux/amd64) (push) Has been skipped Details Publish Docker Image / build-base (linux/arm64) (push) Has been skipped Details Publish Docker Image / merge-base (push) Has been skipped Details Validate / validate-base (push) Failing after 13m48s Details Validate / validate-omos (push) Failing after 15m23s Details Publish Docker Image / smoke-omos (push) Failing after 16m20s Details Publish Docker Image / build-omos (linux/amd64) (push) Has been skipped Details Publish Docker Image / build-omos (linux/arm64) (push) Has been skipped Details Publish Docker Image / merge-omos (push) Has been skipped Details Publish Docker Image / update-description (push) Has been skipped Details The v1.14.30b publish failed on both variants with 'No space left on device' — arm64 QEMU-emulated layers were stored alongside amd64 on the same ~40 GB runner, and the mempalace-toolkit bake-in from v1.14.30b tipped peak disk over the edge during the nodejs dpkg unpack and the git-lfs layer export. Refactor docker-publish.yml to the canonical push-by-digest + manifest-merge pattern: smoke test (amd64) runs on its own runner, each (variant x arch) push target runs on its own fresh runner with outputs=type=image,push-by-digest=true,push=true (no local image store), then a tiny merge job assembles the multi-arch manifest with docker buildx imagetools create from digest artifacts. Per-runner disk peak is roughly one-quarter of the old single-job peak. The four Docker Hub tags per release are unchanged. As a bonus, amd64 and arm64 now build in parallel. No image-level changes beyond the opencode bump.	2026-05-01 08:43:08 +00:00
Joakim Persson	1683650240	Bake mempalace-toolkit wrappers into the image Validate / docs-check (push) Successful in 14s Details Validate / validate-base (push) Successful in 14m44s Details Validate / validate-omos (push) Successful in 17m51s Details Publish Docker Image / build-omos (push) Failing after 25m7s Details Publish Docker Image / build-base (push) Failing after 55m51s Details Publish Docker Image / update-description (push) Has been skipped Details The scheduler templates in mempalace-toolkit's contrib/ assume mempalace-session is available inside the container, but the image never actually installed it. Users following the *-devbox scheduler docs would silently lose the wrappers on every container recreate, because the only way to get them was a post-hoc install.sh inside the container — which lives in the ephemeral layer. The host-side systemd timer would then fire, docker exec in, and hit "mempalace-session: command not found". Caught during runtime validation on 2026-04-30: host-side systemd unit ran cleanly at 16:15 today, then the container was rebuilt and recreated, and the wrappers were gone. The rebuild produced an image that the scheduler template's own documented precondition did not hold for. Fix: new Dockerfile block clones mempalace-toolkit at build time (depth-1) to /opt/mempalace-toolkit/, symlinks bin/mempalace-session and bin/mempalace-docs into /usr/local/bin/, asserts both respond to --help before the layer succeeds. Gated by INSTALL_MEMPALACE_TOOLKIT=true (defaults on, depends on INSTALL_MEMPALACE=true). Floated ref via MEMPALACE_TOOLKIT_REF=main for auto-picking-up toolkit updates; override for reproducible builds once the toolkit starts tagging releases. Smoke test gains three assertions (mempalace-session --help, mempalace-docs --help, symlink target check). Resolved-versions preamble logs the toolkit git short-SHA alongside the other floated components, so CI logs always record what got baked in. README gains a Scheduled mining (mempalace-toolkit) subsection and a build-args row. DOCKER_HUB.md regenerated; sync-check passes.	2026-04-30 20:56:58 +00:00