srv_prepare: bump conntrack + SYN backlog for VPN workload#41
Merged
Conversation
Default Ubuntu kernel ships nf_conntrack_max=8192 — adequate for a desktop, dangerously small for a VPN aggregation point handling hundreds of concurrent VLESS+Reality, XHTTP/H2, and probe flows. Symptom on EU production: dmesg fills with "nf_conntrack: table full, dropping packet"; new flows from any source IP not already warm in the table fail their TLS handshake. Compounds with the default tcp_timeout_established=432000 (5 days) which keeps slots occupied long after the client disconnected. Live cumulative counters on EU before the fix: insert_failed: 29371 drop: 55667 early_drop: 16 Settings added to srv_prepare_bbr_settings (sysctl applied via /etc/sysctl.conf): net.netfilter.nf_conntrack_max: 131072 net.netfilter.nf_conntrack_buckets: 131072 net.netfilter.nf_conntrack_tcp_timeout_established: 3600 net.ipv4.tcp_max_syn_backlog: 4096 Memory cost: ~50 MB on a 1vCPU/1GB box (131072 entries × ~376 B). Acceptable. The same defaults belong on every host in groups['cloud'] and groups['ru'] — srv_prepare runs on all of them. Hosts will pick up the new values on the next role apply; no service restart needed beyond the sysctl reload that role already triggers. Live patch already applied to vm_my_srv via direct `sysctl -w` plus /etc/sysctl.d/99-vpn-tuning.conf. The latter is now redundant once this role runs and writes /etc/sysctl.conf with the same values — keeping it temporarily until the next deploy normalises state. Note: this fix did NOT resolve the separate TLS-handshake-fails- from-vm_my_ru2 issue (3 of 4 v2 Reality SNIs). That bug has a different root cause and will be investigated separately (likely involving tcpdump on EU :443 during a fresh ru2-source handshake). Signed-off-by: findias <findias@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Default Ubuntu ships
nf_conntrack_max=8192— adequate for a desktop, dangerously small for a VPN aggregation point. EU production was silently dropping packets:Compounds with
tcp_timeout_established=432000(5 days), which holds VPN slots even after clients disconnect.Symptom: TLS handshakes from any source IP not already warm in the conntrack table fail intermittently. Discovered while bringing vm_my_ru2 online — new source IP, no warm entries, handshakes broke.
Fix
Adds four entries to
srv_prepare_bbr_settings(the dict the role already loops through):Memory cost: ~50 MB on a 1vCPU/1GB box. Acceptable.
Status
vm_my_srvviasysctl -wand/etc/sysctl.d/99-vpn-tuning.confalready (no waiting on this PR for relief)./etc/sysctl.conf, so a fresh box gets them too.groups['cloud']andgroups['ru']will receive the new values on next role apply.Test plan
python3 -c "yaml.safe_load(open(..))")dmesgno new conntrack-table-full entries since live patch (~30 min ago).Closes a latent capacity bug that would have hit any rapidly-growing user base with or without multi-RU.