Compare commits

...

2 Commits

Author SHA1 Message Date
Joakim Persson 5a2d06340e Fix dash-incompatible slash substitution and bump omos size threshold
Validate / docs-check (push) Successful in 18s
Validate / validate-base (push) Successful in 15m44s
Validate / validate-omos (push) Successful in 15m21s
Publish Docker Image / smoke-base (push) Successful in 14m30s
Publish Docker Image / smoke-omos (push) Successful in 15m51s
Publish Docker Image / build-base (linux/amd64) (push) Failing after 10m58s
Publish Docker Image / build-omos (linux/amd64) (push) Failing after 15m9s
Publish Docker Image / build-omos (linux/arm64) (push) Failing after 11m57s
Publish Docker Image / build-base (linux/arm64) (push) Failing after 39m30s
Publish Docker Image / merge-base (push) Has been skipped
Publish Docker Image / merge-omos (push) Has been skipped
Publish Docker Image / update-description (push) Has been skipped
v1.14.31b made it through smoke-base and validate-base (reclaim worked),
but two narrow bugs blocked the rest:

1. 'Derive platform slug' in the per-arch matrix jobs used bash
   ${PLATFORM_PAIR//\//-} which dash (/bin/sh in the runner) can't
   parse — 'Bad substitution'. Rewrote with 'tr / -'.

2. smoke-omos image size 3107 MB tripped the 3000 MB guardrail. All
   functional checks pass; the mempalace-toolkit bake-in from v1.14.30b
   added ~100 MB and the threshold was stale. Bumped to 3200 MB.

No image-level changes.
2026-05-01 10:43:04 +00:00
Joakim Persson 23894bc19f Reclaim runner disk before load: true smoke builds
Validate / docs-check (push) Successful in 22s
Validate / validate-base (push) Successful in 18m10s
Validate / validate-omos (push) Failing after 25m54s
Publish Docker Image / smoke-base (push) Successful in 11m50s
Publish Docker Image / build-base (linux/amd64) (push) Failing after 38s
Publish Docker Image / build-base (linux/arm64) (push) Failing after 21s
Publish Docker Image / merge-base (push) Has been skipped
Publish Docker Image / smoke-omos (push) Failing after 19m18s
Publish Docker Image / build-omos (linux/amd64) (push) Has been skipped
Publish Docker Image / build-omos (linux/arm64) (push) Has been skipped
Publish Docker Image / merge-omos (push) Has been skipped
Publish Docker Image / update-description (push) Has been skipped
v1.14.31 publish and validate both hit 'No space left on device' on
single-arch amd64 smoke/validate builds. The image has crossed ~3 GB
and the runner's ~40 GB overlay starts ~70% full, so 'load: true'
peak disk (tarball + unpacked image + buildx cache) no longer fits.

Add a 'Reclaim runner disk' step to validate-base, validate-omos,
smoke-base, smoke-omos. Strips catthehacker-resident toolchains we
never use (hosted-tool-cache, dotnet, android, powershell, swift,
ghc, jvm, microsoft, chromium, boost), then runs 'docker system
prune -af --volumes' + 'docker builder prune -af' against the
runner's dockerd before setup-buildx-action. Expected reclaim is
6-12 GB depending on what's resident.

Deliberately NOT in the per-arch matrix build jobs — push-by-digest
doesn't need it and pruning in parallel jobs risks one job nuking
another's in-flight buildx cache. Also add workflow-level
concurrency on docker-publish.yml so concurrent tag pushes serialize
cleanly.
2026-05-01 09:34:52 +00:00
4 changed files with 144 additions and 6 deletions
+67 -4
View File
@@ -5,6 +5,15 @@ on:
tags:
- 'v*'
# Serialize concurrent runs of the same workflow on the same ref so the
# matrix build jobs can't race `docker system prune` in the smoke gates
# (pruning from one job can nuke another job's in-flight buildx cache).
# cancel-in-progress: false — tag pushes are release events, we never
# want to silently drop one.
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: false
# Runner disk pressure notes:
# Gitea Actions runners use `catthehacker/ubuntu:act-latest` on a shared host
# with limited overlay space (~40 GB, often 70%+ used at start). Building both
@@ -29,6 +38,34 @@ jobs:
- name: Force IPv4 for Docker Hub
run: echo 'precedence ::ffff:0:0/96 100' >> /etc/gai.conf
# See docker-publish.yml preamble. `load: true` peak disk = tarball
# + unpacked image + buildx cache; the image now crosses the 40 GB
# runner overlay's starting headroom. Strip catthehacker-resident
# toolchains and any stale docker state up front.
- name: Reclaim runner disk
run: |
set -x
df -h / || true
rm -rf \
/opt/hostedtoolcache \
/opt/microsoft \
/opt/az \
/opt/ghc \
/usr/local/.ghcup \
/usr/share/dotnet \
/usr/share/swift \
/usr/local/lib/android \
/usr/local/share/powershell \
/usr/local/share/chromium \
/usr/local/share/boost \
/usr/lib/jvm 2>/dev/null || true
apt-get clean || true
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* || true
docker system df || true
docker system prune -af --volumes || true
docker builder prune -af || true
df -h / || true
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v4
with:
@@ -57,6 +94,30 @@ jobs:
- name: Force IPv4 for Docker Hub
run: echo 'precedence ::ffff:0:0/96 100' >> /etc/gai.conf
- name: Reclaim runner disk
run: |
set -x
df -h / || true
rm -rf \
/opt/hostedtoolcache \
/opt/microsoft \
/opt/az \
/opt/ghc \
/usr/local/.ghcup \
/usr/share/dotnet \
/usr/share/swift \
/usr/local/lib/android \
/usr/local/share/powershell \
/usr/local/share/chromium \
/usr/local/share/boost \
/usr/lib/jvm 2>/dev/null || true
apt-get clean || true
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* || true
docker system df || true
docker system prune -af --volumes || true
docker builder prune -af || true
df -h / || true
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v4
with:
@@ -98,8 +159,9 @@ jobs:
- name: Derive platform slug
id: platform
run: |
PLATFORM_PAIR="${{ matrix.platform }}"
echo "pair=${PLATFORM_PAIR//\//-}" >> $GITHUB_OUTPUT
# POSIX-safe slash substitution — act's runner container ships
# /bin/sh as dash, which doesn't support bash's ${VAR//a/b}.
echo "pair=$(echo '${{ matrix.platform }}' | tr / -)" >> $GITHUB_OUTPUT
- name: Set up QEMU
if: matrix.platform != 'linux/amd64'
@@ -159,8 +221,9 @@ jobs:
- name: Derive platform slug
id: platform
run: |
PLATFORM_PAIR="${{ matrix.platform }}"
echo "pair=${PLATFORM_PAIR//\//-}" >> $GITHUB_OUTPUT
# POSIX-safe slash substitution — act's runner container ships
# /bin/sh as dash, which doesn't support bash's ${VAR//a/b}.
echo "pair=$(echo '${{ matrix.platform }}' | tr / -)" >> $GITHUB_OUTPUT
- name: Set up QEMU
if: matrix.platform != 'linux/amd64'
+52
View File
@@ -46,6 +46,34 @@ jobs:
run: |
echo 'precedence ::ffff:0:0/96 100' >> /etc/gai.conf
# The runner's overlay disk starts ~70% full. `load: true` peak disk
# is tarball + unpacked image + buildx cache, which tips it over
# once the image crosses ~3 GB. Strip catthehacker-resident
# toolchains we never use and any stale docker state up front.
- name: Reclaim runner disk
run: |
set -x
df -h / || true
rm -rf \
/opt/hostedtoolcache \
/opt/microsoft \
/opt/az \
/opt/ghc \
/usr/local/.ghcup \
/usr/share/dotnet \
/usr/share/swift \
/usr/local/lib/android \
/usr/local/share/powershell \
/usr/local/share/chromium \
/usr/local/share/boost \
/usr/lib/jvm 2>/dev/null || true
apt-get clean || true
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* || true
docker system df || true
docker system prune -af --volumes || true
docker builder prune -af || true
df -h / || true
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v4
with:
@@ -76,6 +104,30 @@ jobs:
run: |
echo 'precedence ::ffff:0:0/96 100' >> /etc/gai.conf
- name: Reclaim runner disk
run: |
set -x
df -h / || true
rm -rf \
/opt/hostedtoolcache \
/opt/microsoft \
/opt/az \
/opt/ghc \
/usr/local/.ghcup \
/usr/share/dotnet \
/usr/share/swift \
/usr/local/lib/android \
/usr/local/share/powershell \
/usr/local/share/chromium \
/usr/local/share/boost \
/usr/lib/jvm 2>/dev/null || true
apt-get clean || true
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* || true
docker system df || true
docker system prune -af --volumes || true
docker builder prune -af || true
df -h / || true
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v4
with:
+20
View File
@@ -6,6 +6,26 @@ Tags follow `v{opencode_version}[letter]` — bare tag for the first build on a
---
## v1.14.31c — 2026-05-01
**CI: fix bash-specific parameter expansion and bump omos size threshold.**
- **Fix:** `Derive platform slug` step in the per-arch matrix build jobs (`build-base`, `build-omos`) used `${PLATFORM_PAIR//\//-}` which is a bash parameter-expansion. The runner container executes step scripts via `/bin/sh` (dash), which errored with `Bad substitution`. Rewrote using `tr / -` which is POSIX and behaves identically. Both `build-base` and `build-omos` matrix jobs were blocked on this on `v1.14.31b`.
- **Fix:** smoke-test image-size threshold for the `omos` variant bumped from 3000 MB to 3200 MB. The mempalace-toolkit bake-in added ~100 MB to omos; measured 3107 MB on `v1.14.31b`. All functional smoke checks (opencode, node, mempalace CLIs, toolkit wrappers, oh-my-opencode-slim) pass — this is a guardrail recalibration, not a performance concession. The underlying image genuinely grew.
- The runner-disk reclaim step from v1.14.31b did its job: `smoke-base` and `validate-base` now pass cleanly. Only `smoke-omos` was blocked this iteration, and only on the threshold.
- No image changes beyond what shipped in v1.14.31. Rebuild of v1.14.31 content only.
## v1.14.31b — 2026-05-01
**CI: reclaim runner disk before `load: true` smoke builds.**
- **Fix:** v1.14.31's publish workflow and the `validate` workflow both hit `No space left on device` on the single-arch amd64 smoke/validate builds (`/opt/uv-tools/mempalace/lib/python3.13/site-packages/hf_xet/hf_xet.abi3.so`, `/usr/local/bin/git-lfs`). Root cause is not the build itself but the `load: true` step: peak disk during export equals tarball + unpacked image + buildx cache, and the image has crossed the ~3 GB threshold where this no longer fits in the ~12 GB of free space the runner container starts with. The v1.14.30c refactor split multi-arch into per-arch push-by-digest jobs (which don't `load`), but the smoke gates still do and still hit the wall.
- Added a `Reclaim runner disk` step to all four `load: true` jobs (`validate-base`, `validate-omos`, `smoke-base`, `smoke-omos`). The step strips `catthehacker/ubuntu:act-latest`-resident toolchains we never use (hosted-tool-cache, dotnet, android, powershell, swift, ghc, jvm, microsoft, chromium, boost) and runs `docker system prune -af --volumes` + `docker builder prune -af` against the runner's dockerd before `setup-buildx-action`. Expected reclaim is 612 GB depending on what's resident.
- Added workflow-level `concurrency: { group: ..., cancel-in-progress: false }` on `docker-publish.yml` so concurrent tag pushes can't race `docker system prune` in one job against an in-flight buildx cache in another.
- Pruning is deliberately kept out of the per-arch matrix push-by-digest jobs (`build-base`/`build-omos`) — those don't need it (no `load: true`), and pruning in parallel jobs risks one job nuking another's cache.
- **Follow-up** (not in this release): image-size reduction via a dedicated `uv tool install mempalace` build stage (strips uv's cache from the final image), pinning `mempalace-toolkit` to a commit SHA with `--depth=1 --filter=blob:none`, and auditing whether `hf_xet` is actually required by mempalace at runtime. These will ship in the next release that rebases on a new opencode version.
- No image changes. Rebuild of v1.14.31 content only.
## v1.14.31 — 2026-05-01
Bump opencode to 1.14.31.
+5 -2
View File
@@ -214,9 +214,12 @@ SIZE_BYTES=$(docker image inspect --format='{{.Size}}' "$IMAGE")
SIZE_MB=$((SIZE_BYTES / 1024 / 1024))
echo " Uncompressed size: ${SIZE_MB} MB"
# Thresholds (uncompressed): base 2500 MB, omos 3000 MB. Adjust as image content evolves.
# Thresholds (uncompressed): base 2500 MB, omos 3200 MB. Adjust as image content evolves.
# omos bumped 3000→3200 on v1.14.31c — mempalace-toolkit bake-in pushed the
# omos variant to ~3.1 GB. Functional smoke checks all pass; this is a
# guardrail, not a performance limit.
THRESHOLD=2500
[ "$VARIANT" = "omos" ] && THRESHOLD=3000
[ "$VARIANT" = "omos" ] && THRESHOLD=3200
if [ "$SIZE_MB" -gt "$THRESHOLD" ]; then
fail "image size ${SIZE_MB} MB exceeds threshold ${THRESHOLD} MB for variant=$VARIANT"
else