Document SSH banner-timeout workaround for residential CGNAT users

Add a Troubleshooting subsection to deploy/README.md describing the
ISP-CGNAT per-destination flow-table exhaustion that manifests as
'Connection timed out during banner exchange' or pure TCP connect
timeouts after the first 3-4 SSH connects.

The fix is SSH ControlMaster/ControlPersist on the client side, which
multiplexes all SSH sessions over one TCP flow and stays within the
CGNAT cap. sync-to-vm.sh already uses this pattern internally; this
note makes it discoverable for users hitting the issue in interactive
or scripted SSH use outside the deploy/ scripts.
This commit is contained in:
2026-04-21 09:04:59 +02:00
parent 3d632ef02f
commit cb4971b4a6
+43
View File
@@ -197,3 +197,46 @@ After editing `docker-compose.yml` on the VM to uncomment the bind mounts you ne
``` ```
The script reads `docker-compose.yml` on the remote VM, detects which bind mounts are active, and syncs only those directories from your local machine. It also creates the remote directories if they don't exist. The script reads `docker-compose.yml` on the remote VM, detects which bind mounts are active, and syncs only those directories from your local machine. It also creates the remote directories if they don't exist.
### Troubleshooting: SSH hangs or "banner exchange" timeouts
If SSH to the VM intermittently fails with `Connection timed out during banner exchange` or pure TCP connect timeouts — especially after the first few successful connects in a short window — the cause is almost certainly your ISP's CGNAT (Carrier-Grade NAT), not the VM.
**Symptoms**
- First 34 SSH connects succeed, then subsequent ones fail hard for 2030 minutes
- `ping` to the VM works perfectly throughout (ICMP isn't tracked the same way)
- `mosh` sessions stay stable once established (UDP, different flow table)
- Happens on residential ISPs (Tele2, Comhem, Telia, most European consumer broadband)
- VM-side logs show SSH is idle — the SYNs never reach it
**Cause**
Residential CGNAT boxes keep a per-subscriber TCP flow table with a small concurrent-flow cap (~4) per destination IP. Once exhausted, new SYNs to that destination are silently dropped until old flows age out (typically 2030 min after TCP close).
**Fix**
Add SSH connection multiplexing on your client so all SSH sessions (interactive, `scp`, `rsync`, scripts) share a single TCP connection to the VM:
```ssh-config
# ~/.ssh/config
Host <vm-alias>
HostName <vm-ip>
User devbox
IdentityFile ~/.ssh/id_ed25519
ControlMaster auto
ControlPath ~/.ssh/cm/%r@%h:%p
ControlPersist 4h
ServerAliveInterval 30
ServerAliveCountMax 6
```
Then create the socket directory:
```bash
mkdir -p ~/.ssh/cm && chmod 700 ~/.ssh/cm
```
All SSH to the VM now multiplexes over a single flow slot, regardless of how many parallel sessions you open. `sync-to-vm.sh` already does this internally for its own rsync/scp calls.
For a more robust long-term fix (especially if you access the VM from multiple hosts), run a WireGuard tunnel on the VM and route SSH through that — UDP bypasses the TCP flow table entirely.