6cc2670a93
Validate / docs-check (push) Successful in 6s
Validate / base-change-warning (push) Successful in 12s
Validate / validate-with-pi (push) Successful in 4m5s
Validate / validate-omos (push) Successful in 4m27s
Validate / validate-base (push) Successful in 5m33s
Validate / validate-omos-with-pi (push) Successful in 12m18s
Captures the escape-hatch procedure used to ship v1.15.12 on 2026-05-28 when buildkit cache-export mode=max started returning HTTP 400 from the Hub CDN, breaking five consecutive CI publishes (runs #332/333/334/336 + a rerun). - docs/manual-host-publish.sh: the literal script that shipped v1.15.12 from a developer Mac via Orbstack, preserved as-is for future reference. - docs/manual-host-publish.md: runbook explaining when to reach for it, the four constants to edit, three ways to source BASE_HASH (CI log / Hub probe / local recompute matching base-decide's exact recipe including __pycache__/.DS_Store junk filters), and adaptations for pi-devbox / letter-suffix rebuilds / partial-failure recovery. - AGENTS.md: new Critical conventions bullet documenting the cache-from /cache-to disablement, failure shape, repo-specificity, why action pinning didn't help, the trade-off, and the re-enable condition. Cross-references CHANGELOG v1.15.12 Unreleased + the new runbook.
128 lines
10 KiB
Markdown
128 lines
10 KiB
Markdown
# Manual host-side publish — escape hatch when CI is broken
|
||
|
||
This runbook is the procedure for publishing an opencode-devbox release **directly from a developer host** when the Gitea Actions → Docker Hub path is broken. Used in anger on 2026-05-28 to ship `v1.15.12` after five consecutive CI publish failures (runs #332/333/334/336 + a rerun) and as a parallel diagnostic that pinpointed the root cause (buildkit `cache-export mode=max` returning HTTP 400 from the Hub CDN).
|
||
|
||
The procedure is also a **diagnostic probe**. If the host-side publish succeeds where CI fails, the failure is somewhere in the runner → Hub path (cache-export, runner egress, runner-image, action versions). If host-side fails the same way, the failure is in your local buildx + Hub combination and you need a different escape (different network, different account, file an upstream).
|
||
|
||
## When to reach for this
|
||
|
||
- Tag pushed, CI keeps failing on `docker buildx build --push`, the failure shape is stable across reruns.
|
||
- Failure body looks like a registry-tier rejection (HTTP 4xx, HTML response body, repeats on every retry) — i.e. not a transient.
|
||
- You've already disproved the obvious suspects (action pin, runner image, network) per the [`ci-release-watcher` skill](../../../.agents/skills/ci-release-watcher/SKILL.md) playbook.
|
||
- You need the release **shipped today** and don't want to wait for a CI fix to land + re-trigger.
|
||
|
||
If CI is broken because **a workflow change you just made is bad**, fix the workflow and re-tag with a letter suffix. This runbook is for when the workflow looks correct but the publish path itself is broken.
|
||
|
||
## Prerequisites on the host
|
||
|
||
- Docker (or Orbstack on macOS) with `docker buildx` available — multi-arch publish needs `setup-qemu` equivalent. Orbstack ships QEMU emulators for both archs by default; on Linux install `qemu-user-static` and run `docker run --privileged --rm tonistiigi/binfmt --install all` once per host.
|
||
- `docker login` credentials for `joakimp` on Docker Hub (PAT or password). Confirm with `docker info | grep Username`.
|
||
- A clone of `opencode-devbox` checked out at the **exact tag** you want to publish. `git status` clean. `git describe --tags --exact-match HEAD` should print the tag.
|
||
- Network connectivity to `registry-1.docker.io` from the host. Verify with `curl -sI https://registry-1.docker.io/v2/ | head -1` (expects `401 Unauthorized` — that's the v2 API saying "auth required", which means you can reach it).
|
||
|
||
## How to use this runbook
|
||
|
||
A working reference script lives next to this doc: **[`docs/manual-host-publish.sh`](manual-host-publish.sh)**. It is the literal script that shipped opencode-devbox v1.15.12 on 2026-05-28 from a developer Mac via Orbstack, with the BASE_HASH and version pins of that release. To publish a different release, **copy it to a new file, edit four constants at the top, and run it**:
|
||
|
||
```bash
|
||
cp docs/manual-host-publish.sh /tmp/manual-publish-vX.Y.Z.sh
|
||
# Edit at top of file:
|
||
# RELEASE_TAG="vX.Y.Z"
|
||
# BASE_HASH="<12-char hash from CI's base-decide step>"
|
||
# PI_VERSION="<from npm registry, see step 2 below>"
|
||
# OMOS_VERSION="<from npm registry, see step 2 below>"
|
||
bash /tmp/manual-publish-vX.Y.Z.sh
|
||
```
|
||
|
||
Keep the historical script in `docs/` as-is — it's an archive of the v1.15.12 publish, useful as a reference if a future debug needs to compare exact arg sets across releases. Don't edit it in place.
|
||
|
||
The sections below explain what the script does and what you need to know to edit those four constants safely.
|
||
|
||
## 1. Pin RELEASE_TAG
|
||
|
||
The git tag you're publishing. Must match a tag in the local clone:
|
||
|
||
```bash
|
||
git fetch && git checkout v1.15.13 # whatever you're publishing
|
||
git describe --tags --exact-match HEAD
|
||
```
|
||
|
||
The script asserts `HEAD == ${RELEASE_TAG}^{commit}` before doing anything destructive. If you've drifted, fix it with `git checkout` before running.
|
||
|
||
## 2. Pin PI_VERSION and OMOS_VERSION
|
||
|
||
Gitea CI's `resolve-versions` job queries the npm registry at workflow time and threads concrete versions through every variant build, mitigating the silent same-bytes-across-releases regression class documented in `AGENTS.md`. Do the same by hand:
|
||
|
||
```bash
|
||
curl -sf https://registry.npmjs.org/@earendil-works%2Fpi-coding-agent/latest | jq -r .version
|
||
curl -sf https://registry.npmjs.org/oh-my-opencode-slim/latest | jq -r .version
|
||
```
|
||
|
||
Paste the two version strings into the script's `PI_VERSION` / `OMOS_VERSION` constants. Don't leave the script defaulting to `latest` — the registry buildcache will silently reuse a stale layer if the build-arg byte-equals a previous build.
|
||
|
||
## 3. Pin BASE_HASH
|
||
|
||
This is the 12-char hash that CI's `base-decide` job computes from `Dockerfile.base` + `rootfs/**` + `entrypoint*.sh`. Three ways to get it, in order of preference:
|
||
|
||
**A. From a prior CI run on the same commit** (cheapest — if the Gitea Actions run that triggered on this tag got far enough to log `base-decide`'s output, just read it):
|
||
|
||
```
|
||
Gitea Actions → the run for vX.Y.Z → base-decide job → "Compute base tag" step → last line:
|
||
Computed base tag: base-XXXXXXXXXXXX
|
||
```
|
||
|
||
This is the canonical source. The whole reason for the manual escape is that *something later in CI broke* — `base-decide` itself is fast, deterministic, and almost always succeeds.
|
||
|
||
**B. From an existing image on the Hub** if a recent release already published a `base-<hash>` tag and the inputs haven't changed, you can copy that hash. Confirm with `docker manifest inspect joakimp/opencode-devbox:base-latest` and read the digest — if it matches a `base-<hash>` you already see on the Hub, that hash is yours.
|
||
|
||
**C. Compute it locally**, replicating CI's exact recipe (the script in `.gitea/workflows/docker-publish-split.yml` `base-decide.compute`):
|
||
|
||
```bash
|
||
{
|
||
cat Dockerfile.base
|
||
find rootfs -type f \
|
||
! -path '*/__pycache__/*' \
|
||
! -name '*.pyc' \
|
||
! -name '.DS_Store' \
|
||
! -name '._*' \
|
||
-print0 2>/dev/null | sort -z | xargs -0 cat 2>/dev/null
|
||
cat entrypoint.sh entrypoint-user.sh
|
||
} | sha256sum | cut -c1-12
|
||
```
|
||
|
||
The junk-file filters (`__pycache__`, `.DS_Store`, `._*` AppleDouble) matter — they are gitignored but `find -type f` picks them up locally and would diverge your hash from CI's clean checkout. Don't skip them.
|
||
|
||
If method C disagrees with method A, **trust A** and find out why your local tree differs. The hash in CI is what's on the Hub; that's what variants must FROM.
|
||
|
||
## What the script does (high level)
|
||
|
||
After the constants are set, the script runs a 5-step procedure. No editing needed inside the body; the whole flow is parameterised by the four constants above plus `IMAGE` (which is fixed to `joakimp/opencode-devbox`).
|
||
|
||
1. **Preflight** — buildx present, tag exists, `HEAD == tag`, multi-arch builder created if missing.
|
||
2. **Base build (conditional)** — probe `${IMAGE}:base-${BASE_HASH}` on the Hub; if missing, build it multi-arch and push. **No `--cache-from` / `--cache-to`.** That's the whole point of this escape. If the base push itself fails the same way CI did, stop — the regression has spread to image push and you need a different host or account, not this runbook.
|
||
3. **Promote `base-latest`** — `docker buildx imagetools create` re-tags by manifest reference. No rebuild.
|
||
4. **Variants × 4** — sequential (not parallel; one host's egress can't saturate four multi-arch pushes safely). Each variant is `Dockerfile.variant` `FROM ${IMAGE}:base-${BASE_HASH}` plus the appropriate `INSTALL_OMOS` / `INSTALL_PI` build-args, tagged `${RELEASE_TAG}${suffix}` and `latest${suffix}`.
|
||
5. **Verify** — prints the digest of all 10 expected tags (8 variant + base-hash + base-latest). Spot-check that each `vX.Y.Z*` and its `latest*` alias share a digest.
|
||
|
||
Expected wall time on a recent Mac: ~25-40 min (base ~3 min if rebuilt, each variant ~3-7 min mostly QEMU arm64 emulation).
|
||
|
||
## Optional: update DOCKER_HUB.md description
|
||
|
||
CI's `update-description` job posts the rendered Hub description via the Hub API. The manual script does **not** do this — the release works fine without it. If you want parity, copy the curl invocation from the `update-description` job in `.gitea/workflows/docker-publish-split.yml` and run it from the host with a Hub PAT loaded into `HUB_PAT`. Cosmetic; can wait until CI is healthy and the next release pushes a fresh description automatically.
|
||
|
||
## After: capture diagnostic value
|
||
|
||
The whole point of running this manually is the diagnostic. Three things to record before moving on:
|
||
|
||
1. **Did the host publish succeed?** If yes and CI was failing on the same exact code, you've localised the failure to the runner side (cache-export, network, runner image). If no, the failure is in your local buildx + Hub combination and CI is a victim, not a cause.
|
||
2. **What was different from CI?** Document at minimum: `docker buildx version`, the host's `buildx ls` output (driver name + version), whether you used `--cache-to` or not, and which network you were on.
|
||
3. **File the upstream.** If the diagnostic narrowed the failure to a specific buildkit/buildx behaviour, file at `moby/buildkit` or `docker/buildx` with: stable failure shape, the exact request URL fragment (`Offset:0` / `_state=...` / digest if visible), the timeline boundary when failures started, and what worked vs what failed in your repro. The 2026-05-28 cache-export-mode=max regression is a worked example.
|
||
|
||
Restore CI as the primary publish path as soon as the underlying regression is fixed or worked around at workflow level. This runbook should be exercised rarely.
|
||
|
||
## Variants of this runbook
|
||
|
||
- **pi-devbox** — same idea, simpler: only one image (`joakimp/pi-devbox`), one tag pair (`vX.Y.Z` + `latest`), no split base. Adapt the script: drop the `BASE_HASH` constant + steps 2-3 + the variant function; replace with a single `docker buildx build --file Dockerfile --build-arg PI_VERSION=... --tag joakimp/pi-devbox:${RELEASE_TAG} --tag joakimp/pi-devbox:latest --push .`.
|
||
- **opencode-devbox letter-suffix rebuild** (e.g. `v1.15.12b`) — same procedure end-to-end. The `BASE_HASH` will probably be unchanged from the prior release if no rootfs/entrypoint/Dockerfile.base changes shipped, so the base-build step skips itself automatically via the Hub probe.
|
||
- **Single-variant publish** for partial-failure recovery (e.g. CI succeeded for base + 3 variants but the 4th failed) — comment out the three completed `build_variant` calls in your copy of the script. Keep `imagetools create` for `base-latest` only if it didn't already promote. Then re-run.
|