4de0bc9993
Gitea Actions runners accumulate buildkit cache, stale containers, and unused images. Without periodic cleanup the disk fills and builds stall during image push (observed: build-omos hung at 'pushing layers' for 1.5h on a 77%-full disk). Add a 'CI runner maintenance' section to deploy/README.md with two cleanup layers: a daily cron job (prunes anything >72h old) and Docker daemon builder GC (caps buildkit cache at 10 GB).
319 lines
14 KiB
Markdown
319 lines
14 KiB
Markdown
# Deploy — Host VM setup
|
||
|
||
Scripts for setting up a fresh Linux VM to host opencode-devbox.
|
||
|
||
## Files
|
||
|
||
- **`cloud-init.yml`** — cloud-init user-data template for automated VM provisioning on OpenStack, Proxmox, or any cloud with cloud-init support
|
||
- **`setup-host.sh`** — interactive post-install script for VMs that weren't provisioned with cloud-init
|
||
- **`setup-openstack-secgroup.sh`** — creates an OpenStack security group with the right rules (SSH, mosh, ICMP)
|
||
- **`sync-to-vm.sh`** — syncs local config directories (`~/.aws`, `~/.config/opencode`, etc.) to a remote VM based on which bind mounts are active in its `docker-compose.yml`
|
||
|
||
## Supported distributions
|
||
|
||
- **Debian 13 (Trixie)** — recommended (matches opencode-devbox base image)
|
||
- **Ubuntu 24.04 LTS** — also works
|
||
|
||
Other distributions will need manual adaptation.
|
||
|
||
## Quick start
|
||
|
||
### Option 1: Cloud-init (automated)
|
||
|
||
Customize `cloud-init.yml` — replace the SSH public key and optionally the hostname/timezone. Then use it during VM creation:
|
||
|
||
- **Proxmox**: attach as cloud-init user-data
|
||
- **OpenStack**: pass via `--user-data` flag (see full example below)
|
||
- **AWS/DigitalOcean/etc**: paste into the "user data" field
|
||
|
||
#### Full OpenStack example
|
||
|
||
Cloud-init only handles guest configuration — flavor, image, network, and security group must be specified explicitly at creation time.
|
||
|
||
> **Note:** Do not use `--key-name` — the SSH key is configured in `cloud-init.yml` under `ssh_authorized_keys` for the `devbox` user. The `--key-name` flag injects into the image's default user (e.g. `debian`), not the `devbox` user created by cloud-init.
|
||
|
||
```bash
|
||
# List available flavors to choose appropriate sizing
|
||
openstack flavor list
|
||
|
||
# Create the security group first (one-time, see below)
|
||
./setup-openstack-secgroup.sh
|
||
|
||
# Basic — boot from default storage
|
||
openstack server create \
|
||
--flavor c4m8 \
|
||
--image Debian-13-Trixie \
|
||
--network my-network \
|
||
--security-group opencode-devbox \
|
||
--user-data cloud-init.yml \
|
||
devbox-vm
|
||
```
|
||
|
||
If your cloud offers NVMe-backed (performance) volumes, boot from one for faster Docker and build I/O:
|
||
|
||
```bash
|
||
# Performance — boot from NVMe volume (40GB, preserved on instance deletion)
|
||
openstack server create \
|
||
--flavor c4m8 \
|
||
--network my-network \
|
||
--security-group opencode-devbox \
|
||
--user-data cloud-init.yml \
|
||
--block-device source_type=image,uuid=$(openstack image show Debian-13-Trixie -f value -c id),destination_type=volume,volume_size=40,delete_on_termination=false,boot_index=0,volume_type=performance \
|
||
devbox-vm
|
||
```
|
||
|
||
> **Note:** The inline `volume_type` parameter requires API microversion 2.67+. If the server goes to ERROR state, check your volume quota (`openstack quota show`) and try creating the volume separately:
|
||
> ```bash
|
||
> openstack volume create --image Debian-13-Trixie --size 40 --type performance --bootable devbox-boot-volume
|
||
> openstack server create --flavor c4m8 --volume devbox-boot-volume --network my-network --security-group opencode-devbox --user-data cloud-init.yml devbox-vm
|
||
> ```
|
||
|
||
#### Floating IP
|
||
|
||
OpenStack doesn't support assigning a floating IP at instance creation time — it's a separate step after the VM is active:
|
||
|
||
```bash
|
||
# Allocate a new floating IP from the external network
|
||
openstack floating ip create <external-network>
|
||
|
||
# Assign it to the VM
|
||
openstack server add floating ip devbox-vm <floating-ip>
|
||
```
|
||
|
||
To find your external network name: `openstack network list --external`. If you already have an unassigned floating IP, skip the create step.
|
||
|
||
The VM boots with Docker installed, firewall configured (or skipped on OpenStack), and your SSH key authorized. Log in as the `devbox` user.
|
||
|
||
### Console password (optional)
|
||
|
||
The cloud-init template uses SSH key authentication only — no password is set by default. This is sufficient for normal use since the `devbox` user has passwordless `sudo`.
|
||
|
||
A password is only needed for:
|
||
|
||
- **Emergency console access** — logging in via OpenStack Horizon console (noVNC) or Proxmox VNC when SSH is unreachable
|
||
- **`su - devbox`** — switching to the devbox user from another account
|
||
|
||
To enable console access, uncomment the `chpasswd` block in `cloud-init.yml` before deploying:
|
||
|
||
```yaml
|
||
chpasswd:
|
||
expire: false
|
||
users:
|
||
- name: devbox
|
||
password: your-password-here
|
||
type: text
|
||
```
|
||
|
||
For an already-running VM, set a password via SSH:
|
||
|
||
```bash
|
||
sudo passwd devbox
|
||
```
|
||
|
||
### Option 2: Post-install script (manual)
|
||
|
||
On a fresh Debian/Ubuntu VM:
|
||
|
||
```bash
|
||
curl -fsSL https://gitea.jordbo.se/joakimp/opencode-devbox/raw/branch/main/deploy/setup-host.sh | bash
|
||
```
|
||
|
||
Or clone and run:
|
||
|
||
```bash
|
||
git clone https://gitea.jordbo.se/joakimp/opencode-devbox
|
||
cd opencode-devbox/deploy
|
||
./setup-host.sh
|
||
```
|
||
|
||
## What gets installed
|
||
|
||
- Docker Engine (from Docker's official apt repo, not distro's `docker.io`)
|
||
- Docker Compose plugin (v2)
|
||
- `tmux`, `mosh`, `git`
|
||
- `ufw` firewall with SSH (22) and mosh (UDP 60000-61000) allowed — **skipped on OpenStack** (detected automatically; use security groups instead)
|
||
- IPv4 DNS preference (works around Docker Hub IPv6 connectivity issues)
|
||
|
||
## OpenStack security groups
|
||
|
||
On OpenStack, firewalling is handled by security groups rather than ufw. The `setup-host.sh` script detects OpenStack automatically and skips ufw configuration.
|
||
|
||
To create the required security group:
|
||
|
||
```bash
|
||
./setup-openstack-secgroup.sh
|
||
```
|
||
|
||
This creates a security group named `opencode-devbox` with rules for SSH (TCP 22), mosh (UDP 60000-61000), and ICMP. Apply it to your instance:
|
||
|
||
```bash
|
||
# New instance
|
||
openstack server create --security-group opencode-devbox ...
|
||
|
||
# Existing instance
|
||
openstack server add security group <instance-name> opencode-devbox
|
||
```
|
||
|
||
## VM sizing recommendations
|
||
|
||
| Use case | vCPU | RAM | Disk |
|
||
|---|---|---|---|
|
||
| Minimum | 2 | 4 GB | 20 GB |
|
||
| Recommended | 4 | 8 GB | 40 GB |
|
||
| Heavy use (Rust/Python builds, multi-project) | 8 | 16 GB | 80 GB |
|
||
|
||
## After VM setup
|
||
|
||
If you uncomment any bind mounts in `docker-compose.yml` (e.g. `~/.aws`, `~/.config/opencode`), create the directories first — Docker creates missing bind mount paths as root-owned, which causes permission issues:
|
||
|
||
```bash
|
||
# Only create directories for mounts you uncomment
|
||
mkdir -p ~/.aws # AWS Bedrock SSO
|
||
mkdir -p ~/.config/opencode # persistent opencode config
|
||
mkdir -p ~/.config/nvim # custom neovim config
|
||
mkdir -p ~/.agents/skills # opencode agent skills
|
||
```
|
||
|
||
Named volumes (`devbox-data`, `devbox-uv`, etc.) are managed by Docker and need no pre-creation.
|
||
|
||
```bash
|
||
mkdir -p ~/opencode-devbox && cd ~/opencode-devbox
|
||
curl -sL https://gitea.jordbo.se/joakimp/opencode-devbox/raw/branch/main/docker-compose.yml -o docker-compose.yml
|
||
curl -sL https://gitea.jordbo.se/joakimp/opencode-devbox/raw/branch/main/.env.example -o .env
|
||
vim .env # configure provider and keys
|
||
vim docker-compose.yml # uncomment optional volume mounts
|
||
docker compose up -d
|
||
docker compose exec -u developer devbox opencode
|
||
```
|
||
|
||
> **AWS Bedrock users:** Uncomment the `~/.aws` volume mount in `docker-compose.yml` before starting. You'll also need to copy your `~/.aws/config` from a machine where SSO is already configured, then authenticate inside the container with `aws sso login`.
|
||
|
||
### Syncing local config to the VM
|
||
|
||
After editing `docker-compose.yml` on the VM to uncomment the bind mounts you need, run `sync-to-vm.sh` from your local machine to copy the corresponding directories:
|
||
|
||
```bash
|
||
./deploy/sync-to-vm.sh devbox-affection
|
||
```
|
||
|
||
The script reads `docker-compose.yml` on the remote VM, detects which bind mounts are active, and syncs only those directories from your local machine. It also creates the remote directories if they don't exist.
|
||
|
||
### Upgrading an existing VM to a new release
|
||
|
||
Each tagged release may add new named volumes or bind-mount lines to `docker-compose.yml`. Pulling a new image via `docker compose pull` grabs the new container behaviour, but compose files on the VM are user-owned and never touched by the image — you have to reconcile them yourself when upgrading across versions.
|
||
|
||
**Symptom of a missed reconcile:** a new feature quietly doesn't work even though the image is correct. Example from v1.14.19c → v1.14.20: bash history persistence requires the `devbox-shell-history` named volume mounted at `/home/developer/.cache/bash`. The v1.14.20 image writes history to that path either way, but without the volume mount on the VM, writes land in the container's writable layer and vanish on every `--force-recreate`.
|
||
|
||
**Upgrade ritual:**
|
||
|
||
```bash
|
||
# On the VM, before recreating the container:
|
||
cd ~/opencode-devbox
|
||
cp docker-compose.yml docker-compose.yml.bak-$(date +%Y%m%d-%H%M%S)
|
||
|
||
# Compare against the repo version to see what's new:
|
||
# (from your local checkout)
|
||
scp devbox-affection:~/opencode-devbox/docker-compose.yml /tmp/vm-compose.yml
|
||
diff -u /tmp/vm-compose.yml ~/src/src_local/opencode-devbox/docker-compose.yml
|
||
```
|
||
|
||
For each new `volumes:` entry or mount line in the repo version that isn't in your VM's file, add it manually — preserving any local customizations you've made (image variant, read/write flags on bind mounts, etc.). Then:
|
||
|
||
```bash
|
||
docker compose config >/dev/null # verify YAML still parses
|
||
docker compose up -d --force-recreate
|
||
```
|
||
|
||
If you maintain the VM's compose file with no local changes, `scp` the repo version over wholesale. If you have customizations (the common case), do the diff-and-merge by hand.
|
||
|
||
### Shell defaults inside the container
|
||
|
||
The image ships baked `.bash_aliases` and `.inputrc` in `/etc/skel-devbox/` — quality-of-life defaults (prefix history search on Up/Down arrows, persistent history across container recreates via the `devbox-shell-history` named volume, `[devbox]` prompt marker, sensible aliases). On first container start the entrypoint copies them to `/home/developer/` **only if the target file does not already exist**.
|
||
|
||
This means:
|
||
|
||
- Fresh containers get the defaults automatically.
|
||
- If you bind-mount your host's `~/.bash_aliases` / `~/.inputrc` (see the commented lines in `docker-compose.yml`), your host versions win.
|
||
- If you edit the files inside a running container and store them via a home-dir bind-mount or equivalent, subsequent upgrades never overwrite them.
|
||
- To restore the baked defaults any time: `cp /etc/skel-devbox/.bash_aliases ~/` (or delete the file and recreate the container).
|
||
- To diff your current config against what the image ships: `diff ~/.bash_aliases /etc/skel-devbox/.bash_aliases`.
|
||
|
||
### CI runner maintenance: automatic Docker pruning
|
||
|
||
Gitea Actions runners accumulate Docker build cache, stale buildkit containers, and unused images over time. Without periodic cleanup, the runner's disk fills up and builds stall during the image-push phase (symptom: `#61 exporting to image` / `pushing layers` hangs indefinitely while buildkit repeatedly re-authenticates with Docker Hub).
|
||
|
||
Set up two layers of automatic cleanup on the runner host:
|
||
|
||
**1. Daily cron job** — prunes images, containers, and build cache older than 72 hours:
|
||
|
||
```bash
|
||
sudo tee /etc/cron.daily/docker-prune <<'EOF'
|
||
#!/bin/sh
|
||
docker system prune -af --filter "until=72h" > /var/log/docker-prune.log 2>&1
|
||
docker builder prune -af --filter "until=72h" >> /var/log/docker-prune.log 2>&1
|
||
EOF
|
||
sudo chmod +x /etc/cron.daily/docker-prune
|
||
```
|
||
|
||
**2. Docker daemon builder GC** — caps buildkit cache at 10 GB (Docker 23.0+):
|
||
|
||
Add to `/etc/docker/daemon.json` (create if absent):
|
||
|
||
```json
|
||
{
|
||
"builder": {
|
||
"gc": {
|
||
"enabled": true,
|
||
"defaultKeepStorage": "10GB"
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
Then `sudo systemctl restart docker`.
|
||
|
||
Both are safe to run on a machine that also hosts long-running containers (like opencode-devbox) — `docker system prune` only removes *unused* images and *stopped* containers, never running ones.
|
||
|
||
### Troubleshooting: SSH hangs or "banner exchange" timeouts
|
||
|
||
If SSH to the VM intermittently fails with `Connection timed out during banner exchange` or pure TCP connect timeouts — especially after the first few successful connects in a short window — the cause is almost certainly your ISP's CGNAT (Carrier-Grade NAT), not the VM.
|
||
|
||
**Symptoms**
|
||
|
||
- First 3–4 SSH connects succeed, then subsequent ones fail hard for 20–30 minutes
|
||
- `ping` to the VM works perfectly throughout (ICMP isn't tracked the same way)
|
||
- `mosh` sessions stay stable once established (UDP, different flow table)
|
||
- Happens on residential ISPs (Tele2, Comhem, Telia, most European consumer broadband)
|
||
- VM-side logs show SSH is idle — the SYNs never reach it
|
||
|
||
**Cause**
|
||
|
||
Residential CGNAT boxes keep a per-subscriber TCP flow table with a small concurrent-flow cap (~4) per destination IP. Once exhausted, new SYNs to that destination are silently dropped until old flows age out (typically 20–30 min after TCP close).
|
||
|
||
**Fix**
|
||
|
||
Add SSH connection multiplexing on your client so all SSH sessions (interactive, `scp`, `rsync`, scripts) share a single TCP connection to the VM:
|
||
|
||
```ssh-config
|
||
# ~/.ssh/config
|
||
Host <vm-alias>
|
||
HostName <vm-ip>
|
||
User devbox
|
||
IdentityFile ~/.ssh/id_ed25519
|
||
ControlMaster auto
|
||
ControlPath ~/.ssh/cm/%r@%h:%p
|
||
ControlPersist 4h
|
||
ServerAliveInterval 30
|
||
ServerAliveCountMax 6
|
||
```
|
||
|
||
Then create the socket directory:
|
||
|
||
```bash
|
||
mkdir -p ~/.ssh/cm && chmod 700 ~/.ssh/cm
|
||
```
|
||
|
||
All SSH to the VM now multiplexes over a single flow slot, regardless of how many parallel sessions you open. `sync-to-vm.sh` already does this internally for its own rsync/scp calls.
|
||
|
||
For a more robust long-term fix (especially if you access the VM from multiple hosts), run a WireGuard tunnel on the VM and route SSH through that — UDP bypasses the TCP flow table entirely.
|