Skip to content

Userspace TCP stack in Rust with DPDK kernel bypass for high-frequency trading — minimal, allocation-free TCP handler for FIX sessions with sub-2μs wire-to-wire latency

License

Notifications You must be signed in to change notification settings

luishsr/hft-kernel-bypass

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

hft-tcp

Userspace TCP stack in Rust with DPDK kernel bypass, built for high-frequency trading.

This is the companion code for the article "Userspace TCP in Rust with DPDK for High-Frequency Trading". It implements a minimal, allocation-free TCP handler designed for long-lived FIX sessions to exchange gateways, achieving sub-2μs wire-to-wire latency.

This is a case study in extremes, not a general-purpose TCP library. See the article for the full decision framework on when (and when not) to build a custom network stack.

Architecture

NIC (ConnectX-5)
    │ DMA → hugepage mbufs
    ▼
┌─────────────────────────┐
│  DPDK Poll-Mode Driver  │  Zero-interrupt packet I/O
├─────────────────────────┤
│  Zero-Copy Parsing      │  ETH → IP → TCP headers parsed in-place (~15ns)
├─────────────────────────┤
│  TCP State Machine      │  Minimal: SYN/ACK/FIN/RST + ECN rate control
├─────────────────────────┤
│  FIX Protocol Parser    │  Zero-allocation tag-value parsing (~80ns)
├─────────────────────────┤
│  Strategy Engine        │  Your trading logic here
└─────────────────────────┘

Key components:

  • DPDK FFI bindings (src/dpdk/) — Raw NIC access via poll-mode drivers
  • Zero-copy packet parsing (src/net/) — Ethernet, IPv4, TCP, ARP headers parsed by casting byte slices
  • Minimal TCP state machine (src/net/tcp.rs) — Handles only what FIX sessions need, with ECN-based rate control
  • FIX protocol parser (src/protocol/fix.rs) — Zero-allocation field extraction with fixed-point price arithmetic
  • SPSC ring buffer (src/mem/ring.rs) — Lock-free cross-core communication
  • Object pool (src/mem/pool.rs) — Pre-allocated, allocation-free resource management
  • Event loop (src/engine/event_loop.rs) — Single-threaded busy-polling reactor on isolated CPU core

Building

Without DPDK (development and testing)

cargo build          # Compile check
cargo test           # Run all 70 unit tests
cargo clippy         # Lint

With DPDK (production)

Requires DPDK 22.11+ installed and discoverable via pkg-config:

cargo build --release --features dpdk

Prerequisites for production deployment

  1. BIOS: Disable HyperThreading, C-States, P-States. Enable VT-d/IOMMU.
  2. Kernel boot params:
    isolcpus=2-5 nohz_full=2-5 rcu_nocbs=2-5 hugepagesz=1G hugepages=4
    iommu=pt intel_iommu=on nosmt transparent_hugepage=never
    
  3. Hugepages: mount -t hugetlbfs nodev /dev/hugepages -o pagesize=1G
  4. NIC binding: dpdk-devbind.py --bind=vfio-pci <PCI_ADDR>

Running

sudo ./target/release/hft-tcp -- -l 2-3 -n 4 --proc-type=primary

Project Structure

src/
├── main.rs                    # Entry point, EAL init, core launch
├── dpdk/
│   ├── ffi.rs                 # Raw DPDK FFI bindings (stub when no DPDK)
│   ├── mbuf.rs                # Zero-copy mbuf wrapper with Drop guarantee
│   └── port.rs                # NIC port configuration
├── net/
│   ├── ethernet.rs            # Ethernet frame parsing
│   ├── ip.rs                  # IPv4 header parsing
│   ├── tcp.rs                 # TCP state machine + TX engine
│   ├── arp.rs                 # ARP responder
│   └── checksum.rs            # Software checksums (HW offload in production)
├── protocol/
│   └── fix.rs                 # FIX protocol parser + session state machine
├── engine/
│   └── event_loop.rs          # Core busy-polling reactor
└── mem/
    ├── pool.rs                # Fixed-size object pool allocator
    └── ring.rs                # Lock-free SPSC ring buffer

What's deliberately omitted

Feature Why Risk
Nagle's algorithm Every message is latency-critical None
Delayed ACKs We ACK immediately on PSH Minor bandwidth
Slow start Dedicated link Microburst at switch
Full congestion control Latency overhead PFC storms (mitigated by ECN)
Out-of-order reassembly Direct cable Silent corruption (1 incident in 18 months)
TIME_WAIT recycling Persistent connections Reconnection delay

Performance (measured)

Stage Median p99 p99.9
NIC DMA → mbuf 35ns 50ns 80ns
ETH+IP+TCP parse 15ns 20ns 30ns
TCP state machine 18ns 25ns 40ns
FIX message parse 82ns 110ns 150ns
Total wire-to-wire 687ns 1128ns 1875ns

License

MIT

About

Userspace TCP stack in Rust with DPDK kernel bypass for high-frequency trading — minimal, allocation-free TCP handler for FIX sessions with sub-2μs wire-to-wire latency

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages