Userspace TCP stack in Rust with DPDK kernel bypass, built for high-frequency trading.
This is the companion code for the article "Userspace TCP in Rust with DPDK for High-Frequency Trading". It implements a minimal, allocation-free TCP handler designed for long-lived FIX sessions to exchange gateways, achieving sub-2μs wire-to-wire latency.
This is a case study in extremes, not a general-purpose TCP library. See the article for the full decision framework on when (and when not) to build a custom network stack.
NIC (ConnectX-5)
│ DMA → hugepage mbufs
▼
┌─────────────────────────┐
│ DPDK Poll-Mode Driver │ Zero-interrupt packet I/O
├─────────────────────────┤
│ Zero-Copy Parsing │ ETH → IP → TCP headers parsed in-place (~15ns)
├─────────────────────────┤
│ TCP State Machine │ Minimal: SYN/ACK/FIN/RST + ECN rate control
├─────────────────────────┤
│ FIX Protocol Parser │ Zero-allocation tag-value parsing (~80ns)
├─────────────────────────┤
│ Strategy Engine │ Your trading logic here
└─────────────────────────┘
Key components:
- DPDK FFI bindings (
src/dpdk/) — Raw NIC access via poll-mode drivers - Zero-copy packet parsing (
src/net/) — Ethernet, IPv4, TCP, ARP headers parsed by casting byte slices - Minimal TCP state machine (
src/net/tcp.rs) — Handles only what FIX sessions need, with ECN-based rate control - FIX protocol parser (
src/protocol/fix.rs) — Zero-allocation field extraction with fixed-point price arithmetic - SPSC ring buffer (
src/mem/ring.rs) — Lock-free cross-core communication - Object pool (
src/mem/pool.rs) — Pre-allocated, allocation-free resource management - Event loop (
src/engine/event_loop.rs) — Single-threaded busy-polling reactor on isolated CPU core
cargo build # Compile check
cargo test # Run all 70 unit tests
cargo clippy # LintRequires DPDK 22.11+ installed and discoverable via pkg-config:
cargo build --release --features dpdk- BIOS: Disable HyperThreading, C-States, P-States. Enable VT-d/IOMMU.
- Kernel boot params:
isolcpus=2-5 nohz_full=2-5 rcu_nocbs=2-5 hugepagesz=1G hugepages=4 iommu=pt intel_iommu=on nosmt transparent_hugepage=never - Hugepages:
mount -t hugetlbfs nodev /dev/hugepages -o pagesize=1G - NIC binding:
dpdk-devbind.py --bind=vfio-pci <PCI_ADDR>
sudo ./target/release/hft-tcp -- -l 2-3 -n 4 --proc-type=primarysrc/
├── main.rs # Entry point, EAL init, core launch
├── dpdk/
│ ├── ffi.rs # Raw DPDK FFI bindings (stub when no DPDK)
│ ├── mbuf.rs # Zero-copy mbuf wrapper with Drop guarantee
│ └── port.rs # NIC port configuration
├── net/
│ ├── ethernet.rs # Ethernet frame parsing
│ ├── ip.rs # IPv4 header parsing
│ ├── tcp.rs # TCP state machine + TX engine
│ ├── arp.rs # ARP responder
│ └── checksum.rs # Software checksums (HW offload in production)
├── protocol/
│ └── fix.rs # FIX protocol parser + session state machine
├── engine/
│ └── event_loop.rs # Core busy-polling reactor
└── mem/
├── pool.rs # Fixed-size object pool allocator
└── ring.rs # Lock-free SPSC ring buffer
| Feature | Why | Risk |
|---|---|---|
| Nagle's algorithm | Every message is latency-critical | None |
| Delayed ACKs | We ACK immediately on PSH | Minor bandwidth |
| Slow start | Dedicated link | Microburst at switch |
| Full congestion control | Latency overhead | PFC storms (mitigated by ECN) |
| Out-of-order reassembly | Direct cable | Silent corruption (1 incident in 18 months) |
| TIME_WAIT recycling | Persistent connections | Reconnection delay |
| Stage | Median | p99 | p99.9 |
|---|---|---|---|
| NIC DMA → mbuf | 35ns | 50ns | 80ns |
| ETH+IP+TCP parse | 15ns | 20ns | 30ns |
| TCP state machine | 18ns | 25ns | 40ns |
| FIX message parse | 82ns | 110ns | 150ns |
| Total wire-to-wire | 687ns | 1128ns | 1875ns |
MIT