Gitea Actions evaluates 'env.PROMOTE_LATEST' as empty in YAML 'if:' contexts even though the same env var substitutes correctly in shell run: blocks. Result: on v1.15.0/v1.15.0b tag pushes, the build-variant-* jobs correctly pushed latest-* aliases (shell context), but promote-base-latest and update-description got skipped (YAML context), so the Hub README description wasn't refreshed. Switch to evaluating github.ref_type directly in the if-conditions — matches the production-trigger semantics and avoids the env-var indirection that gitea evaluates inconsistently.
CI / Build Pipeline
This directory contains the gitea Actions workflows and the supporting documentation for opencode-devbox's CI. If you're investigating why the build pipeline is shaped the way it is, you're in the right place.
Workflows in this directory
| File | Trigger | Role |
|---|---|---|
workflows/docker-publish-split.yml |
push: tags: v* |
Production release pipeline. Two-phase split-base build: shared base-<hash> published once (skipped on cache hit), then four parallel variant deltas. ~40–80 min wall clock depending on runner count and whether base needs rebuilding. |
workflows/validate.yml |
push: branches: main + PR |
Lightweight gate. amd64-only smoke test of all four variants + DOCKER_HUB.md sync check. ~30 min. Fires on every push to main. |
Why the split-base pipeline exists
opencode-devbox publishes four image variants (base, omos, with-pi, omos-with-pi) × two architectures (amd64, arm64) = eight image tags per release. Today's runners are 2 self-hosted gitea Actions runners. arm64 builds are emulated under QEMU, which is the dominant cost (~3–5x slower than native).
The four variants share ~95% of their layers (Debian + apt + Node + AWS CLI + mempalace + dev tools + entrypoints). The original Dockerfile was a single multi-stage build with INSTALL_* build-args gating variant-specific RUNs. BuildKit's per-layer cache key is content-addressed, but as soon as a build-arg-gated RUN produces a different layer hash for variant A vs variant B, every subsequent layer also has a different parent → identical commands re-execute per variant. Result: minimal cross-variant cache reuse on a fresh build.
Two improvements were considered:
- Reorder the original Dockerfile so all variant-gated RUNs land at the bottom — modest gain, ~10–20% wall-clock reduction. Not pursued.
- Split into
Dockerfile.base+Dockerfile.variantwith the base published as a long-lived shared image — significant gain, ~50–70% wall-clock reduction with hash-driven cache reuse. Pursued.
The split-base architecture is what the docker-publish-split.yml workflow exercises.
How the split-base pipeline works
┌──────────────────┐
│ base-decide │ compute base-<hash>;
│ │ probe Docker Hub.
│ hash inputs: │
│ Dockerfile.base│
│ rootfs/ │
│ entrypoint*.sh │
└────────┬─────────┘
│
┌─────────────┴─────────────┐
│ need_build = true? │
└─────────────┬─────────────┘
yes │ no
▼
┌──────────────────┐
│ build-base │ multi-arch build,
│ │ push base-<hash>
└────────┬─────────┘ to Docker Hub.
│
┌───────────────────────┼───────────────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────────┐
│smoke-base│ │smoke-omos│ ... │smoke-omos-pi │ amd64 only,
└────┬─────┘ └────┬─────┘ └──────┬───────┘ parallel.
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────────┐
│build- │ │build- │ │build- │ multi-arch,
│variant- │ │variant- │ ... │variant- │ parallel,
│base │ │omos │ │omos-with-pi │ tag push.
└────┬─────┘ └────┬─────┘ └──────┬───────┘
└───────────────────────┴──────────────────────┘
│
▼
┌──────────────────────────┐
│ promote-base-latest │ crane copy
│ │ base-<hash>
│ │ → base-latest
└────────┬─────────────────┘
│
▼
┌──────────────────────────┐
│ update-description │
└──────────────────────────┘
Step 1: base-decide
Compute a SHA-256 hash over the inputs that determine the base image's content:
{
cat Dockerfile.base
find rootfs -type f -print0 | sort -z | xargs -0 cat
cat entrypoint.sh entrypoint-user.sh
} | sha256sum | cut -c1-12
The 12-character truncated hash becomes base-<hash>. Probe Docker Hub
for this tag via docker manifest inspect:
- If it exists → set
need_build=false.build-baseis skipped entirely. - If it doesn't → set
need_build=true.build-baseruns.
This is the core cache-reuse mechanism. Version-bump-only releases
(only Dockerfile.variant or build-args changed) hit the cache. Releases
that change anything in the base — apt packages, AWS CLI, Node version,
locale list, entrypoint scripts — pay the full base-build cost once.
Step 2: build-base (conditional)
Only runs when need_build=true. Multi-arch (amd64 + arm64) build of
Dockerfile.base, pushed to joakimp/opencode-devbox:base-<hash>.
Registry cache via --cache-from/--cache-to reduces incremental rebuilds
when only one or two layers changed.
The base image is not tagged base-latest here — that promotion
happens at the very end after all variants succeed (see step 5).
Step 3: smoke-* (×4, parallel)
For each variant: build amd64-only against the base tag, load into
local docker, run scripts/smoke-test.sh.
Variant build-args:
| variant | INSTALL_OPENCODE | INSTALL_OMOS | INSTALL_PI |
|---|---|---|---|
base |
true | false | false |
omos |
true | true | false |
with-pi |
true | false | true |
omos-with-pi |
true | true | true |
Smoke runs --variant <name> to enable variant-specific assertions.
Gate the publish: a smoke failure for variant X blocks build-variant-X.
Step 4: build-variant-* (×4, parallel)
For each variant that passed smoke: multi-arch (amd64 + arm64) build of
Dockerfile.variant, pushed to Docker Hub with the user-facing release
tags:
| Build job | Tags pushed |
|---|---|
build-variant-base |
vX.Y.Z, latest |
build-variant-omos |
vX.Y.Z-omos, latest-omos |
build-variant-with-pi |
vX.Y.Z-with-pi, latest-with-pi |
build-variant-omos-with-pi |
vX.Y.Z-omos-with-pi, latest-omos-with-pi |
The latest* aliases are only updated when promote_latest=true (the
manual dispatch input) — for test runs, promote_latest=false keeps the
production aliases pointing at the previous good release.
Step 5: promote-base-latest
Once all four variants successfully publish, re-tag base-<hash> as
base-latest using crane copy. This is a manifest-level re-tag, not
a rebuild — it touches only Docker Hub's image index, takes seconds,
and is atomic.
The reason this happens after variants succeed (rather than alongside
build-base) is so a partial failure leaves base-latest pointing at
the previous known-good base. External consumers who pin to
base-latest (e.g. the planned pi-devbox repo) never see a broken base.
Step 6: update-description
Push the generated DOCKER_HUB.md to the Hub repo's full_description
field via the Hub REST API. Same step as the production pipeline.
NPM_CONFIG_PREFIX gotcha (variant override pattern)
The base sets
ENV NPM_CONFIG_PREFIX=/home/developer/.pi/npm-global
This is intentional — it makes pi install npm:<pkg> and npm install -g
land on the devbox-pi-config named volume at runtime, so user-installed
packages survive container recreate AND image rebuild.
But the variant build inherits this prefix at build time. If left as-is,
npm install -g opencode-ai@$VERSION in Dockerfile.variant would
install opencode into /home/developer/.pi/npm-global/..., which is then
shadowed by the volume mount at runtime → opencode disappears from
PATH on first start.
Fix: each npm install -g in Dockerfile.variant overrides the prefix
per-RUN:
RUN NPM_CONFIG_PREFIX=/usr npm install -g opencode-ai@${OPENCODE_VERSION}
Baked binaries land on /usr/bin/... (system prefix), survive the volume
mount. Runtime-installed user packages still land on
~/.pi/npm-global/.... Both visible on PATH.
Cache strategy
Two registry caches are configured:
cache-from: type=registry,ref=joakimp/opencode-devbox:base-buildcache
cache-to: type=registry,ref=joakimp/opencode-devbox:base-buildcache,mode=max
cache-from: type=registry,ref=joakimp/opencode-devbox:base-variant-buildcache
cache-to: type=registry,ref=joakimp/opencode-devbox:base-variant-buildcache,mode=max
mode=max exports cache for all layers, not just the final image's
layers. Important for multi-arch builds where the cross-arch layer reuse
matters more.
Wall-clock estimates
| Scenario | Production pipeline | Split-base pipeline |
|---|---|---|
| Version-bump-only release (only opencode/pi/omos version changed) | ~165–180 min | ~30–40 min (base cache hit) |
| Base-touching release (apt/Node/Debian/entrypoint change) | ~165–180 min | ~70–90 min (base rebuilds) |
The split-base pipeline pays its dues on base-touching releases (which are infrequent — a few times a year for Debian / Node major version bumps). Most releases are version-bumps and ride the cache.
Validate workflow
validate.yml is the lightweight gate that runs
on every push to main and on PRs. It:
- Runs
scripts/generate-dockerhub-md.py --checkto enforceDOCKER_HUB.mdis in sync withHUB_TEMPLATE. - Builds each of the four variants amd64-only (no multi-arch, no push)
and runs
scripts/smoke-test.sh.
This catches regressions before they reach a tag push. Wall clock ~30 min.
Runner expectations
- Image:
catthehacker/ubuntu:act-latest. Each job runs inside a fresh container of this image. Don't assume any pre-installed toolchains beyond what catthehacker ships. - Disk pressure: the runner host has ~40 GB of usable overlay space,
often 70%+ used at job start. Every job that does
load: true(smoke) starts with aReclaim runner diskstep that strips catthehacker-resident toolchains (Android SDK, .NET, Swift, GHC, JVM, Boost, Chromium, PowerShell) and prunes stale docker state. Don't remove these steps without testing on a fresh runner. - Concurrency: 2 runners. Jobs in the same workflow run can fan out to
both; jobs in different workflow runs are serialized by gitea's queue.
The
concurrency: { group: ${{ workflow }}-${{ ref }}, cancel-in-progress: false }setting keeps tag pushes from racing each other but allows per-PR/per-branch parallelism. - Workflow visibility in UI: gitea Actions only surfaces workflows
from the default branch in the web UI's workflow list, even for
workflow_dispatchtriggers. Workflows on feature branches are invisible until merged tomain. - Disk reclaim quirk:
actions/{upload,download}-artifact@v4+does not work on Gitea (depends on a GitHub-only Artifact API). Stick to@v3if matrix-fanout-with-artifacts is ever needed. We avoided this by usingdocker/build-push-action@v7with comma-separatedplatforms: linux/amd64,linux/arm64— natively does multi-arch push in a single job, no artifact dance.
Migration plan: split-base → production
- Validate the split-base dispatch. Trigger
docker-publish-split.ymlmanually withrelease_tag=v0.0.0-split-testandpromote_latest=false. Confirm all jobs go green, image sizes match the production baseline within ~10%, and no unexpected layer rebuilds appear inbuild-variant-*logs after the FROM line. - Run a second dispatch to confirm cache-hit behavior:
base-decideshould setneed_build=false,build-baseshould be skipped entirely, total wall clock should drop to ~25–40 min. - Cut over — done as of v1.14.50.
docker-publish-split.ymlnow triggers onpush: tags: v*.docker-publish.ymland originalDockerfiledeleted. - Tag a release. First production release on the new pipeline.
Related docs
AGENTS.md— domain facts, release-day checklist, documentation coupling rules. Read first when modifying CI behavior.CHANGELOG.md— build pipeline rewrite landed in v1.14.50.Dockerfile.base,Dockerfile.variant— the split-base Dockerfiles. Comments at the top of each explain their role.scripts/smoke-test.sh— invoked by all three workflows; this is the single source of truth for "what does a built image have to satisfy".scripts/generate-dockerhub-md.py— generatesDOCKER_HUB.mdfromHUB_TEMPLATE.--checkenforces sync invalidate.yml.