Files
opencode-devbox/.gitea
joakimp 07e07ec611
Validate / validate-omos-with-pi (push) Waiting to run
Validate / docs-check (push) Successful in 1m7s
Validate / validate-with-pi (push) Failing after 3m16s
Validate / validate-omos (push) Failing after 3m15s
Validate / validate-base (push) Failing after 6m31s
Publish Docker Image / base-decide (push) Failing after 11m59s
Publish Docker Image / build-base (push) Has been cancelled
Publish Docker Image / smoke-base (push) Has been cancelled
Publish Docker Image / smoke-omos (push) Has been cancelled
Publish Docker Image / smoke-with-pi (push) Has been cancelled
Publish Docker Image / smoke-omos-with-pi (push) Has been cancelled
Publish Docker Image / build-variant-base (push) Has been cancelled
Publish Docker Image / build-variant-omos (push) Has been cancelled
Publish Docker Image / build-variant-with-pi (push) Has been cancelled
Publish Docker Image / build-variant-omos-with-pi (push) Has been cancelled
Publish Docker Image / promote-base-latest (push) Has been cancelled
Publish Docker Image / update-description (push) Has been cancelled
Bump opencode 1.14.44 -> 1.14.50; cut over to split-base pipeline
- Bump OPENCODE_VERSION 1.14.44 -> 1.14.50 in Dockerfile.variant
- Cut over: docker-publish-split.yml now triggers on push: tags: v*
  (was workflow_dispatch only). RELEASE_TAG and PROMOTE_LATEST derived
  from github.ref_type/ref_name for tag-push; inputs still available
  for manual workflow_dispatch runs.
- Delete docker-publish.yml (retired, replaced by split-base pipeline)
- Delete Dockerfile (retired, replaced by Dockerfile.base + Dockerfile.variant)
- Update CHANGELOG: promote Unreleased -> v1.14.50
- Update AGENTS.md, .gitea/README.md, validate.yml: remove all references
  to the old single-Dockerfile pipeline and WIP migration plan
2026-05-14 19:39:45 +02:00
..

CI / Build Pipeline

This directory contains the gitea Actions workflows and the supporting documentation for opencode-devbox's CI. If you're investigating why the build pipeline is shaped the way it is, you're in the right place.

Workflows in this directory

File Trigger Role
workflows/docker-publish-split.yml push: tags: v* Production release pipeline. Two-phase split-base build: shared base-<hash> published once (skipped on cache hit), then four parallel variant deltas. ~4080 min wall clock depending on runner count and whether base needs rebuilding.
workflows/validate.yml push: branches: main + PR Lightweight gate. amd64-only smoke test of all four variants + DOCKER_HUB.md sync check. ~30 min. Fires on every push to main.

Why the split-base pipeline exists

opencode-devbox publishes four image variants (base, omos, with-pi, omos-with-pi) × two architectures (amd64, arm64) = eight image tags per release. Today's runners are 2 self-hosted gitea Actions runners. arm64 builds are emulated under QEMU, which is the dominant cost (~35x slower than native).

The four variants share ~95% of their layers (Debian + apt + Node + AWS CLI + mempalace + dev tools + entrypoints). The original Dockerfile was a single multi-stage build with INSTALL_* build-args gating variant-specific RUNs. BuildKit's per-layer cache key is content-addressed, but as soon as a build-arg-gated RUN produces a different layer hash for variant A vs variant B, every subsequent layer also has a different parent → identical commands re-execute per variant. Result: minimal cross-variant cache reuse on a fresh build.

Two improvements were considered:

  1. Reorder the original Dockerfile so all variant-gated RUNs land at the bottom — modest gain, ~1020% wall-clock reduction. Not pursued.
  2. Split into Dockerfile.base + Dockerfile.variant with the base published as a long-lived shared image — significant gain, ~5070% wall-clock reduction with hash-driven cache reuse. Pursued.

The split-base architecture is what the docker-publish-split.yml workflow exercises.

How the split-base pipeline works

                       ┌──────────────────┐
                       │  base-decide     │   compute base-<hash>;
                       │                  │   probe Docker Hub.
                       │  hash inputs:    │
                       │   Dockerfile.base│
                       │   rootfs/        │
                       │   entrypoint*.sh │
                       └────────┬─────────┘
                                │
                  ┌─────────────┴─────────────┐
                  │ need_build = true?        │
                  └─────────────┬─────────────┘
                       yes      │       no
                                ▼
                       ┌──────────────────┐
                       │  build-base      │   multi-arch build,
                       │                  │   push base-<hash>
                       └────────┬─────────┘   to Docker Hub.
                                │
        ┌───────────────────────┼───────────────────────┐
        ▼                       ▼                       ▼
   ┌──────────┐            ┌──────────┐         ┌──────────────┐
   │smoke-base│            │smoke-omos│   ...   │smoke-omos-pi │   amd64 only,
   └────┬─────┘            └────┬─────┘         └──────┬───────┘   parallel.
        │                       │                      │
        ▼                       ▼                      ▼
   ┌──────────┐            ┌──────────┐         ┌──────────────┐
   │build-    │            │build-    │         │build-        │   multi-arch,
   │variant-  │            │variant-  │   ...   │variant-      │   parallel,
   │base      │            │omos      │         │omos-with-pi  │   tag push.
   └────┬─────┘            └────┬─────┘         └──────┬───────┘
        └───────────────────────┴──────────────────────┘
                                │
                                ▼
                  ┌──────────────────────────┐
                  │  promote-base-latest     │   crane copy
                  │                          │   base-<hash>
                  │                          │   → base-latest
                  └────────┬─────────────────┘
                           │
                           ▼
                  ┌──────────────────────────┐
                  │  update-description      │
                  └──────────────────────────┘

Step 1: base-decide

Compute a SHA-256 hash over the inputs that determine the base image's content:

{
  cat Dockerfile.base
  find rootfs -type f -print0 | sort -z | xargs -0 cat
  cat entrypoint.sh entrypoint-user.sh
} | sha256sum | cut -c1-12

The 12-character truncated hash becomes base-<hash>. Probe Docker Hub for this tag via docker manifest inspect:

  • If it exists → set need_build=false. build-base is skipped entirely.
  • If it doesn't → set need_build=true. build-base runs.

This is the core cache-reuse mechanism. Version-bump-only releases (only Dockerfile.variant or build-args changed) hit the cache. Releases that change anything in the base — apt packages, AWS CLI, Node version, locale list, entrypoint scripts — pay the full base-build cost once.

Step 2: build-base (conditional)

Only runs when need_build=true. Multi-arch (amd64 + arm64) build of Dockerfile.base, pushed to joakimp/opencode-devbox:base-<hash>. Registry cache via --cache-from/--cache-to reduces incremental rebuilds when only one or two layers changed.

The base image is not tagged base-latest here — that promotion happens at the very end after all variants succeed (see step 5).

Step 3: smoke-* (×4, parallel)

For each variant: build amd64-only against the base tag, load into local docker, run scripts/smoke-test.sh. Variant build-args:

variant INSTALL_OPENCODE INSTALL_OMOS INSTALL_PI
base true false false
omos true true false
with-pi true false true
omos-with-pi true true true

Smoke runs --variant <name> to enable variant-specific assertions. Gate the publish: a smoke failure for variant X blocks build-variant-X.

Step 4: build-variant-* (×4, parallel)

For each variant that passed smoke: multi-arch (amd64 + arm64) build of Dockerfile.variant, pushed to Docker Hub with the user-facing release tags:

Build job Tags pushed
build-variant-base vX.Y.Z, latest
build-variant-omos vX.Y.Z-omos, latest-omos
build-variant-with-pi vX.Y.Z-with-pi, latest-with-pi
build-variant-omos-with-pi vX.Y.Z-omos-with-pi, latest-omos-with-pi

The latest* aliases are only updated when promote_latest=true (the manual dispatch input) — for test runs, promote_latest=false keeps the production aliases pointing at the previous good release.

Step 5: promote-base-latest

Once all four variants successfully publish, re-tag base-<hash> as base-latest using crane copy. This is a manifest-level re-tag, not a rebuild — it touches only Docker Hub's image index, takes seconds, and is atomic.

The reason this happens after variants succeed (rather than alongside build-base) is so a partial failure leaves base-latest pointing at the previous known-good base. External consumers who pin to base-latest (e.g. the planned pi-devbox repo) never see a broken base.

Step 6: update-description

Push the generated DOCKER_HUB.md to the Hub repo's full_description field via the Hub REST API. Same step as the production pipeline.

NPM_CONFIG_PREFIX gotcha (variant override pattern)

The base sets

ENV NPM_CONFIG_PREFIX=/home/developer/.pi/npm-global

This is intentional — it makes pi install npm:<pkg> and npm install -g land on the devbox-pi-config named volume at runtime, so user-installed packages survive container recreate AND image rebuild.

But the variant build inherits this prefix at build time. If left as-is, npm install -g opencode-ai@$VERSION in Dockerfile.variant would install opencode into /home/developer/.pi/npm-global/..., which is then shadowed by the volume mount at runtime → opencode disappears from PATH on first start.

Fix: each npm install -g in Dockerfile.variant overrides the prefix per-RUN:

RUN NPM_CONFIG_PREFIX=/usr npm install -g opencode-ai@${OPENCODE_VERSION}

Baked binaries land on /usr/bin/... (system prefix), survive the volume mount. Runtime-installed user packages still land on ~/.pi/npm-global/.... Both visible on PATH.

Cache strategy

Two registry caches are configured:

cache-from: type=registry,ref=joakimp/opencode-devbox:base-buildcache
cache-to:   type=registry,ref=joakimp/opencode-devbox:base-buildcache,mode=max

cache-from: type=registry,ref=joakimp/opencode-devbox:base-variant-buildcache
cache-to:   type=registry,ref=joakimp/opencode-devbox:base-variant-buildcache,mode=max

mode=max exports cache for all layers, not just the final image's layers. Important for multi-arch builds where the cross-arch layer reuse matters more.

Wall-clock estimates

Scenario Production pipeline Split-base pipeline
Version-bump-only release (only opencode/pi/omos version changed) ~165180 min ~3040 min (base cache hit)
Base-touching release (apt/Node/Debian/entrypoint change) ~165180 min ~7090 min (base rebuilds)

The split-base pipeline pays its dues on base-touching releases (which are infrequent — a few times a year for Debian / Node major version bumps). Most releases are version-bumps and ride the cache.

Validate workflow

validate.yml is the lightweight gate that runs on every push to main and on PRs. It:

  1. Runs scripts/generate-dockerhub-md.py --check to enforce DOCKER_HUB.md is in sync with HUB_TEMPLATE.
  2. Builds each of the four variants amd64-only (no multi-arch, no push) and runs scripts/smoke-test.sh.

This catches regressions before they reach a tag push. Wall clock ~30 min.

Runner expectations

  • Image: catthehacker/ubuntu:act-latest. Each job runs inside a fresh container of this image. Don't assume any pre-installed toolchains beyond what catthehacker ships.
  • Disk pressure: the runner host has ~40 GB of usable overlay space, often 70%+ used at job start. Every job that does load: true (smoke) starts with a Reclaim runner disk step that strips catthehacker-resident toolchains (Android SDK, .NET, Swift, GHC, JVM, Boost, Chromium, PowerShell) and prunes stale docker state. Don't remove these steps without testing on a fresh runner.
  • Concurrency: 2 runners. Jobs in the same workflow run can fan out to both; jobs in different workflow runs are serialized by gitea's queue. The concurrency: { group: ${{ workflow }}-${{ ref }}, cancel-in-progress: false } setting keeps tag pushes from racing each other but allows per-PR/per-branch parallelism.
  • Workflow visibility in UI: gitea Actions only surfaces workflows from the default branch in the web UI's workflow list, even for workflow_dispatch triggers. Workflows on feature branches are invisible until merged to main.
  • Disk reclaim quirk: actions/{upload,download}-artifact@v4+ does not work on Gitea (depends on a GitHub-only Artifact API). Stick to @v3 if matrix-fanout-with-artifacts is ever needed. We avoided this by using docker/build-push-action@v7 with comma-separated platforms: linux/amd64,linux/arm64 — natively does multi-arch push in a single job, no artifact dance.

Migration plan: split-base → production

  1. Validate the split-base dispatch. Trigger docker-publish-split.yml manually with release_tag=v0.0.0-split-test and promote_latest=false. Confirm all jobs go green, image sizes match the production baseline within ~10%, and no unexpected layer rebuilds appear in build-variant-* logs after the FROM line.
  2. Run a second dispatch to confirm cache-hit behavior: base-decide should set need_build=false, build-base should be skipped entirely, total wall clock should drop to ~2540 min.
  3. Cut overdone as of v1.14.50. docker-publish-split.yml now triggers on push: tags: v*. docker-publish.yml and original Dockerfile deleted.
  4. Tag a release. First production release on the new pipeline.
  • AGENTS.md — domain facts, release-day checklist, documentation coupling rules. Read first when modifying CI behavior.
  • CHANGELOG.md — build pipeline rewrite landed in v1.14.50.
  • Dockerfile.base, Dockerfile.variant — the split-base Dockerfiles. Comments at the top of each explain their role.
  • scripts/smoke-test.sh — invoked by all three workflows; this is the single source of truth for "what does a built image have to satisfy".
  • scripts/generate-dockerhub-md.py — generates DOCKER_HUB.md from HUB_TEMPLATE. --check enforces sync in validate.yml.