opencode-devbox/deploy/README.md

# Deploy — Host VM setup

Scripts for setting up a fresh Linux VM to host opencode-devbox.

## Files

- **`cloud-init.yml`** — cloud-init user-data template for automated VM provisioning on OpenStack, Proxmox, or any cloud with cloud-init support
- **`setup-host.sh`** — interactive post-install script for VMs that weren't provisioned with cloud-init
- **`setup-openstack-secgroup.sh`** — creates an OpenStack security group with the right rules (SSH, mosh, ICMP)
- **`sync-to-vm.sh`** — syncs local config directories (`~/.aws`, `~/.config/opencode`, etc.) to a remote VM based on which bind mounts are active in its `docker-compose.yml`

## Supported distributions

- **Debian 13 (Trixie)** — recommended (matches opencode-devbox base image)
- **Ubuntu 24.04 LTS** — also works

Other distributions will need manual adaptation.

## Quick start

### Option 1: Cloud-init (automated)

Customize `cloud-init.yml` — replace the SSH public key and optionally the hostname/timezone. Then use it during VM creation:

- **Proxmox**: attach as cloud-init user-data
- **OpenStack**: pass via `--user-data` flag (see full example below)
- **AWS/DigitalOcean/etc**: paste into the "user data" field

#### Full OpenStack example

Cloud-init only handles guest configuration — flavor, image, network, and security group must be specified explicitly at creation time.

> **Note:** Do not use `--key-name` — the SSH key is configured in `cloud-init.yml` under `ssh_authorized_keys` for the `devbox` user. The `--key-name` flag injects into the image's default user (e.g. `debian`), not the `devbox` user created by cloud-init.

```bash
# List available flavors to choose appropriate sizing
openstack flavor list

# Create the security group first (one-time, see below)
./setup-openstack-secgroup.sh

# Basic — boot from default storage
openstack server create \
  --flavor c4m8 \
  --image Debian-13-Trixie \
  --network my-network \
  --security-group opencode-devbox \
  --user-data cloud-init.yml \
  devbox-vm
```

If your cloud offers NVMe-backed (performance) volumes, boot from one for faster Docker and build I/O:

```bash
# Performance — boot from NVMe volume (40GB, preserved on instance deletion)
openstack server create \
  --flavor c4m8 \
  --network my-network \
  --security-group opencode-devbox \
  --user-data cloud-init.yml \
  --block-device source_type=image,uuid=$(openstack image show Debian-13-Trixie -f value -c id),destination_type=volume,volume_size=40,delete_on_termination=false,boot_index=0,volume_type=performance \
  devbox-vm
```

> **Note:** The inline `volume_type` parameter requires API microversion 2.67+. If the server goes to ERROR state, check your volume quota (`openstack quota show`) and try creating the volume separately:
> ```bash
> openstack volume create --image Debian-13-Trixie --size 40 --type performance --bootable devbox-boot-volume
> openstack server create --flavor c4m8 --volume devbox-boot-volume --network my-network --security-group opencode-devbox --user-data cloud-init.yml devbox-vm
> ```

#### Floating IP

OpenStack doesn't support assigning a floating IP at instance creation time — it's a separate step after the VM is active:

```bash
# Allocate a new floating IP from the external network
openstack floating ip create <external-network>

# Assign it to the VM
openstack server add floating ip devbox-vm <floating-ip>
```

To find your external network name: `openstack network list --external`. If you already have an unassigned floating IP, skip the create step.

The VM boots with Docker installed, firewall configured (or skipped on OpenStack), and your SSH key authorized. Log in as the `devbox` user.

### Console password (optional)

The cloud-init template uses SSH key authentication only — no password is set by default. This is sufficient for normal use since the `devbox` user has passwordless `sudo`.

A password is only needed for:

- **Emergency console access** — logging in via OpenStack Horizon console (noVNC) or Proxmox VNC when SSH is unreachable
- **`su - devbox`** — switching to the devbox user from another account

To enable console access, uncomment the `chpasswd` block in `cloud-init.yml` before deploying:

```yaml
chpasswd:
  expire: false
  users:
    - name: devbox
      password: your-password-here
      type: text
```

For an already-running VM, set a password via SSH:

```bash
sudo passwd devbox
```

### Option 2: Post-install script (manual)

On a fresh Debian/Ubuntu VM:

```bash
curl -fsSL https://gitea.jordbo.se/joakimp/opencode-devbox/raw/branch/main/deploy/setup-host.sh | bash
```

Or clone and run:

```bash
git clone https://gitea.jordbo.se/joakimp/opencode-devbox
cd opencode-devbox/deploy
./setup-host.sh
```

## What gets installed

- Docker Engine (from Docker's official apt repo, not distro's `docker.io`)
- Docker Compose plugin (v2)
- `tmux`, `mosh`, `git`
- `ufw` firewall with SSH (22) and mosh (UDP 60000-61000) allowed — **skipped on OpenStack** (detected automatically; use security groups instead)
- IPv4 DNS preference (works around Docker Hub IPv6 connectivity issues)

## OpenStack security groups

On OpenStack, firewalling is handled by security groups rather than ufw. The `setup-host.sh` script detects OpenStack automatically and skips ufw configuration.

To create the required security group:

```bash
./setup-openstack-secgroup.sh
```

This creates a security group named `opencode-devbox` with rules for SSH (TCP 22), mosh (UDP 60000-61000), and ICMP. Apply it to your instance:

```bash
# New instance
openstack server create --security-group opencode-devbox ...

# Existing instance
openstack server add security group <instance-name> opencode-devbox
```

## VM sizing recommendations

| Use case | vCPU | RAM | Disk |
|---|---|---|---|
| Minimum | 2 | 4 GB | 20 GB |
| Recommended | 4 | 8 GB | 40 GB |
| Heavy use (Rust/Python builds, multi-project) | 8 | 16 GB | 80 GB |

## After VM setup

If you uncomment any bind mounts in `docker-compose.yml` (e.g. `~/.aws`, `~/.config/opencode`), create the directories first — Docker creates missing bind mount paths as root-owned, which causes permission issues:

```bash
# Only create directories for mounts you uncomment
mkdir -p ~/.aws                  # AWS Bedrock SSO
mkdir -p ~/.config/opencode      # persistent opencode config
mkdir -p ~/.config/nvim          # custom neovim config
mkdir -p ~/.agents/skills        # opencode agent skills
```

Named volumes (`devbox-data`, `devbox-uv`, etc.) are managed by Docker and need no pre-creation.

```bash
mkdir -p ~/opencode-devbox && cd ~/opencode-devbox
curl -sL https://gitea.jordbo.se/joakimp/opencode-devbox/raw/branch/main/docker-compose.yml -o docker-compose.yml
curl -sL https://gitea.jordbo.se/joakimp/opencode-devbox/raw/branch/main/.env.example -o .env
vim .env                                           # configure provider and keys
vim docker-compose.yml                             # uncomment optional volume mounts
docker compose up -d
docker compose exec -u developer devbox opencode
```

> **AWS Bedrock users:** Uncomment the `~/.aws` volume mount in `docker-compose.yml` before starting. You'll also need to copy your `~/.aws/config` from a machine where SSO is already configured, then authenticate inside the container with `aws sso login`.

### Syncing local config to the VM

After editing `docker-compose.yml` on the VM to uncomment the bind mounts you need, run `sync-to-vm.sh` from your local machine to copy the corresponding directories:

```bash
./deploy/sync-to-vm.sh devbox-affection
```

The script reads `docker-compose.yml` on the remote VM, detects which bind mounts are active, and syncs only those directories from your local machine. It also creates the remote directories if they don't exist.

### Upgrading an existing VM to a new release

Each tagged release may add new named volumes or bind-mount lines to `docker-compose.yml`. Pulling a new image via `docker compose pull` grabs the new container behaviour, but compose files on the VM are user-owned and never touched by the image — you have to reconcile them yourself when upgrading across versions.

**Symptom of a missed reconcile:** a new feature quietly doesn't work even though the image is correct. Example from v1.14.19c → v1.14.20: bash history persistence requires the `devbox-shell-history` named volume mounted at `/home/developer/.cache/bash`. The v1.14.20 image writes history to that path either way, but without the volume mount on the VM, writes land in the container's writable layer and vanish on every `--force-recreate`.

**Upgrade ritual:**

```bash
# On the VM, before recreating the container:
cd ~/opencode-devbox
cp docker-compose.yml docker-compose.yml.bak-$(date +%Y%m%d-%H%M%S)

# Compare against the repo version to see what's new:
#   (from your local checkout)
scp devbox-affection:~/opencode-devbox/docker-compose.yml /tmp/vm-compose.yml
diff -u /tmp/vm-compose.yml ~/src/src_local/opencode-devbox/docker-compose.yml
```

For each new `volumes:` entry or mount line in the repo version that isn't in your VM's file, add it manually — preserving any local customizations you've made (image variant, read/write flags on bind mounts, etc.). Then:

```bash
docker compose config >/dev/null   # verify YAML still parses
docker compose up -d --force-recreate
```

If you maintain the VM's compose file with no local changes, `scp` the repo version over wholesale. If you have customizations (the common case), do the diff-and-merge by hand.

### Shell defaults inside the container

The image ships baked `.bash_aliases` and `.inputrc` in `/etc/skel-devbox/` — quality-of-life defaults (prefix history search on Up/Down arrows, persistent history across container recreates via the `devbox-shell-history` named volume, `[devbox]` prompt marker, sensible aliases). On first container start the entrypoint copies them to `/home/developer/` **only if the target file does not already exist**.

This means:

- Fresh containers get the defaults automatically.
- If you bind-mount your host's `~/.bash_aliases` / `~/.inputrc` (see the commented lines in `docker-compose.yml`), your host versions win.
- If you edit the files inside a running container and store them via a home-dir bind-mount or equivalent, subsequent upgrades never overwrite them.
- To restore the baked defaults any time: `cp /etc/skel-devbox/.bash_aliases ~/` (or delete the file and recreate the container).
- To diff your current config against what the image ships: `diff ~/.bash_aliases /etc/skel-devbox/.bash_aliases`.

### CI runner maintenance: automatic Docker pruning

Gitea Actions runners accumulate Docker build cache, stale buildkit containers, and unused images over time. Without periodic cleanup, the runner's disk fills up and builds stall during the image-push phase (symptom: `#61 exporting to image` / `pushing layers` hangs indefinitely while buildkit repeatedly re-authenticates with Docker Hub).

Set up two layers of automatic cleanup on the runner host:

**1. Daily cron job** — prunes images, containers, and build cache older than 72 hours:

```bash
sudo tee /etc/cron.daily/docker-prune <<'EOF'
#!/bin/sh
docker system prune -af --filter "until=72h" > /var/log/docker-prune.log 2>&1
docker builder prune -af --filter "until=72h" >> /var/log/docker-prune.log 2>&1
EOF
sudo chmod +x /etc/cron.daily/docker-prune
```

**2. Docker daemon builder GC** — caps buildkit cache at 10 GB (Docker 23.0+):

Add to `/etc/docker/daemon.json` (create if absent):

```json
{
  "builder": {
    "gc": {
      "enabled": true,
      "defaultKeepStorage": "10GB"
    }
  }
}
```

Then `sudo systemctl restart docker`.

Both are safe to run on a machine that also hosts long-running containers (like opencode-devbox) — `docker system prune` only removes *unused* images and *stopped* containers, never running ones.

### Troubleshooting: SSH hangs or "banner exchange" timeouts

If SSH to the VM intermittently fails with `Connection timed out during banner exchange` or pure TCP connect timeouts — especially after the first few successful connects in a short window — the cause is almost certainly your ISP's CGNAT (Carrier-Grade NAT), not the VM.

**Symptoms**

- First 3–4 SSH connects succeed, then subsequent ones fail hard for 20–30 minutes
- `ping` to the VM works perfectly throughout (ICMP isn't tracked the same way)
- `mosh` sessions stay stable once established (UDP, different flow table)
- Happens on residential ISPs (Tele2, Comhem, Telia, most European consumer broadband)
- VM-side logs show SSH is idle — the SYNs never reach it

**Cause**

Residential CGNAT boxes keep a per-subscriber TCP flow table with a small concurrent-flow cap (~4) per destination IP. Once exhausted, new SYNs to that destination are silently dropped until old flows age out (typically 20–30 min after TCP close).

**Fix**

Add SSH connection multiplexing on your client so all SSH sessions (interactive, `scp`, `rsync`, scripts) share a single TCP connection to the VM:

```ssh-config
# ~/.ssh/config
Host <vm-alias>
    HostName <vm-ip>
    User devbox
    IdentityFile ~/.ssh/id_ed25519
    ControlMaster auto
    ControlPath ~/.ssh/cm/%r@%h:%p
    ControlPersist 4h
    ServerAliveInterval 30
    ServerAliveCountMax 6
```

Then create the socket directory:

```bash
mkdir -p ~/.ssh/cm && chmod 700 ~/.ssh/cm
```

All SSH to the VM now multiplexes over a single flow slot, regardless of how many parallel sessions you open. `sync-to-vm.sh` already does this internally for its own rsync/scp calls.

For a more robust long-term fix (especially if you access the VM from multiple hosts), run a WireGuard tunnel on the VM and route SSH through that — UDP bypasses the TCP flow table entirely.