Document CI runner Docker pruning setup in deploy/README.md
Gitea Actions runners accumulate buildkit cache, stale containers, and unused images. Without periodic cleanup the disk fills and builds stall during image push (observed: build-omos hung at 'pushing layers' for 1.5h on a 77%-full disk). Add a 'CI runner maintenance' section to deploy/README.md with two cleanup layers: a daily cron job (prunes anything >72h old) and Docker daemon builder GC (caps buildkit cache at 10 GB).
This commit is contained in:
@@ -238,6 +238,42 @@ This means:
|
||||
- To restore the baked defaults any time: `cp /etc/skel-devbox/.bash_aliases ~/` (or delete the file and recreate the container).
|
||||
- To diff your current config against what the image ships: `diff ~/.bash_aliases /etc/skel-devbox/.bash_aliases`.
|
||||
|
||||
### CI runner maintenance: automatic Docker pruning
|
||||
|
||||
Gitea Actions runners accumulate Docker build cache, stale buildkit containers, and unused images over time. Without periodic cleanup, the runner's disk fills up and builds stall during the image-push phase (symptom: `#61 exporting to image` / `pushing layers` hangs indefinitely while buildkit repeatedly re-authenticates with Docker Hub).
|
||||
|
||||
Set up two layers of automatic cleanup on the runner host:
|
||||
|
||||
**1. Daily cron job** — prunes images, containers, and build cache older than 72 hours:
|
||||
|
||||
```bash
|
||||
sudo tee /etc/cron.daily/docker-prune <<'EOF'
|
||||
#!/bin/sh
|
||||
docker system prune -af --filter "until=72h" > /var/log/docker-prune.log 2>&1
|
||||
docker builder prune -af --filter "until=72h" >> /var/log/docker-prune.log 2>&1
|
||||
EOF
|
||||
sudo chmod +x /etc/cron.daily/docker-prune
|
||||
```
|
||||
|
||||
**2. Docker daemon builder GC** — caps buildkit cache at 10 GB (Docker 23.0+):
|
||||
|
||||
Add to `/etc/docker/daemon.json` (create if absent):
|
||||
|
||||
```json
|
||||
{
|
||||
"builder": {
|
||||
"gc": {
|
||||
"enabled": true,
|
||||
"defaultKeepStorage": "10GB"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Then `sudo systemctl restart docker`.
|
||||
|
||||
Both are safe to run on a machine that also hosts long-running containers (like opencode-devbox) — `docker system prune` only removes *unused* images and *stopped* containers, never running ones.
|
||||
|
||||
### Troubleshooting: SSH hangs or "banner exchange" timeouts
|
||||
|
||||
If SSH to the VM intermittently fails with `Connection timed out during banner exchange` or pure TCP connect timeouts — especially after the first few successful connects in a short window — the cause is almost certainly your ISP's CGNAT (Carrier-Grade NAT), not the VM.
|
||||
|
||||
Reference in New Issue
Block a user