Document SSH banner-timeout workaround for residential CGNAT users

Add a Troubleshooting subsection to deploy/README.md describing the ISP-CGNAT per-destination flow-table exhaustion that manifests as 'Connection timed out during banner exchange' or pure TCP connect timeouts after the first 3-4 SSH connects. The fix is SSH ControlMaster/ControlPersist on the client side, which multiplexes all SSH sessions over one TCP flow and stays within the CGNAT cap. sync-to-vm.sh already uses this pattern internally; this note makes it discoverable for users hitting the issue in interactive or scripted SSH use outside the deploy/ scripts.
2026-04-21 09:04:59 +02:00
parent 3d632ef02f
commit cb4971b4a6
1 changed files with 43 additions and 0 deletions
@@ -197,3 +197,46 @@ After editing `docker-compose.yml` on the VM to uncomment the bind mounts you ne
 ```
 The script reads `docker-compose.yml` on the remote VM, detects which bind mounts are active, and syncs only those directories from your local machine. It also creates the remote directories if they don't exist.
 ### Troubleshooting: SSH hangs or "banner exchange" timeouts
 If SSH to the VM intermittently fails with `Connection timed out during banner exchange` or pure TCP connect timeouts — especially after the first few successful connects in a short window — the cause is almost certainly your ISP's CGNAT (Carrier-Grade NAT), not the VM.
 **Symptoms**
 - First 3–4 SSH connects succeed, then subsequent ones fail hard for 20–30 minutes
 - `ping` to the VM works perfectly throughout (ICMP isn't tracked the same way)
 - `mosh` sessions stay stable once established (UDP, different flow table)
 - Happens on residential ISPs (Tele2, Comhem, Telia, most European consumer broadband)
 - VM-side logs show SSH is idle — the SYNs never reach it
 **Cause**
 Residential CGNAT boxes keep a per-subscriber TCP flow table with a small concurrent-flow cap (~4) per destination IP. Once exhausted, new SYNs to that destination are silently dropped until old flows age out (typically 20–30 min after TCP close).
 **Fix**
 Add SSH connection multiplexing on your client so all SSH sessions (interactive, `scp`, `rsync`, scripts) share a single TCP connection to the VM:
 ```ssh-config
 # ~/.ssh/config
 Host <vm-alias>
    HostName <vm-ip>
    User devbox
    IdentityFile ~/.ssh/id_ed25519
    ControlMaster auto
    ControlPath ~/.ssh/cm/%r@%h:%p
    ControlPersist 4h
    ServerAliveInterval 30
    ServerAliveCountMax 6
 ```
 Then create the socket directory:
 ```bash
 mkdir -p ~/.ssh/cm && chmod 700 ~/.ssh/cm
 ```
 All SSH to the VM now multiplexes over a single flow slot, regardless of how many parallel sessions you open. `sync-to-vm.sh` already does this internally for its own rsync/scp calls.
 For a more robust long-term fix (especially if you access the VM from multiple hosts), run a WireGuard tunnel on the VM and route SSH through that — UDP bypasses the TCP flow table entirely.