Document SSH banner-timeout workaround for residential CGNAT users

Add a Troubleshooting subsection to deploy/README.md describing the ISP-CGNAT per-destination flow-table exhaustion that manifests as 'Connection timed out during banner exchange' or pure TCP connect timeouts after the first 3-4 SSH connects. The fix is SSH ControlMaster/ControlPersist on the client side, which multiplexes all SSH sessions over one TCP flow and stays within the CGNAT cap. sync-to-vm.sh already uses this pattern internally; this note makes it discoverable for users hitting the issue in interactive or scripted SSH use outside the deploy/ scripts.
2026-04-21 09:04:59 +02:00
parent 3d632ef02f
commit cb4971b4a6
1 changed files with 43 additions and 0 deletions
@@ -197,3 +197,46 @@ After editing `docker-compose.yml` on the VM to uncomment the bind mounts you ne
 ```

 The script reads `docker-compose.yml` on the remote VM, detects which bind mounts are active, and syncs only those directories from your local machine. It also creates the remote directories if they don't exist.
+
+### Troubleshooting: SSH hangs or "banner exchange" timeouts
+
+If SSH to the VM intermittently fails with `Connection timed out during banner exchange` or pure TCP connect timeouts — especially after the first few successful connects in a short window — the cause is almost certainly your ISP's CGNAT (Carrier-Grade NAT), not the VM.
+
+**Symptoms**
+
+- First 3–4 SSH connects succeed, then subsequent ones fail hard for 20–30 minutes
+- `ping` to the VM works perfectly throughout (ICMP isn't tracked the same way)
+- `mosh` sessions stay stable once established (UDP, different flow table)
+- Happens on residential ISPs (Tele2, Comhem, Telia, most European consumer broadband)
+- VM-side logs show SSH is idle — the SYNs never reach it
+
+**Cause**
+
+Residential CGNAT boxes keep a per-subscriber TCP flow table with a small concurrent-flow cap (~4) per destination IP. Once exhausted, new SYNs to that destination are silently dropped until old flows age out (typically 20–30 min after TCP close).
+
+**Fix**
+
+Add SSH connection multiplexing on your client so all SSH sessions (interactive, `scp`, `rsync`, scripts) share a single TCP connection to the VM:
+
+```ssh-config
+# ~/.ssh/config
+Host <vm-alias>
+    HostName <vm-ip>
+    User devbox
+    IdentityFile ~/.ssh/id_ed25519
+    ControlMaster auto
+    ControlPath ~/.ssh/cm/%r@%h:%p
+    ControlPersist 4h
+    ServerAliveInterval 30
+    ServerAliveCountMax 6
+```
+
+Then create the socket directory:
+
+```bash
+mkdir -p ~/.ssh/cm && chmod 700 ~/.ssh/cm
+```
+
+All SSH to the VM now multiplexes over a single flow slot, regardless of how many parallel sessions you open. `sync-to-vm.sh` already does this internally for its own rsync/scp calls.
+
+For a more robust long-term fix (especially if you access the VM from multiple hosts), run a WireGuard tunnel on the VM and route SSH through that — UDP bypasses the TCP flow table entirely.