HFT-Kernel-Sandbox: Low-Latency Zero-Copy Infrastructure (v3.1)

A high-performance Linux Kernel experiment focused on minimizing User-Kernel communication overhead. This project evolved from a basic shared-memory proof-of-concept into a Lock-Free SPSC Ring Buffer architecture, achieving deterministic, "Zero-Copy" data transfer—a critical requirement in High-Frequency Trading (HFT) systems where every nanosecond counts.

The Goal

In HFT, standard Linux I/O operations (like read, write, or recv) introduce significant latency due to context switching and data copying. The objective of this project is to:

Eliminate the Context Switch Tax by moving from blocking read() to asynchronous shared-memory polling.
Optimize for Determinism using a cache-aligned SPSC (Single Producer Single Consumer) Ring Buffer.
Bypass OS Jitter through strict CPU isolation and affinity.

Environment Settings

The project was developed and tested in a strictly controlled sandbox environment:

Host OS: Ubuntu 22.04 (Build Server)
Target Kernel: Linux 5.4.0 (Custom Build)
Architecture: x86_64
Emulation: QEMU (SMP 2, isolcpus=1 enabled)
Toolchain: GCC (Static Linking), BusyBox (Initramfs)
Performance Tracking: Hardware-level rdtsc (Time Stamp Counter) with mfence.

Implementation Details

1. Custom System Call (Entry 548)

Implemented a control plane in kernel/sys.c to manage the HFT infrastructure.

Mode 555: Allocates physically contiguous memory using get_order and registers the /dev/hft device.
Mode 123: Starts the Kernel Packet Simulator (1ms interval timer).
Mode 777: Direct syscall return to measure raw mode-transition overhead.

2. SPSC Ring Buffer (The "HFT Path")

Developed a state-of-the-art communication channel that lives in shared memory:

Lock-Free Design: Uses smp_wmb (Kernel) and mfence (User) memory barriers for synchronization without mutexes.
Cache-Line Alignment: Pointers (head, tail) are aligned to 64-byte boundaries to prevent False Sharing, ensuring the Producer (Core 0) and Consumer (Core 1) do not invalidate each other's L1 cache.
Busy Polling: The User-space engine spins on the head pointer for instantaneous reaction to kernel events.

3. Blocking I/O (The "Standard Path")

Implemented a wait_queue in the kernel driver to provide a fair comparison against traditional Linux I/O.

Mechanism: User-space calls read(), the kernel puts the process to sleep, and the Timer interrupt executes wake_up() to reschedule the process.

Benchmark Results

Actual measurements taken within the QEMU sandbox (Cycles represent CPU clock ticks). 1,000 samples per test.

Method	P50 (Median)	P99 (Tail)	P99.9 (Extreme)	Architecture
Standard (Blocking)	570,492	965,693	1,383,176	Syscall + Context Switch
v2.0 (Shared Mem)	16,882	183,610	366,469	Pinned + Basic mmap
v3.1 (SPSC Ring)	8,190	11,634	63,280	Pinned + Cache Aligned

Conclusion: Moving from Standard I/O to the v3.1 Ring Buffer resulted in a ~70x reduction in median latency and an ~83x reduction in tail latency (P99). This proves that eliminating context switches and OS scheduling is vital for HFT determinism.

Execution Guide

1. Compilation

Kernel:

cd ~/linux
make -j$(nproc) bzImage

User-Space:

# Standard Path
gcc -static lat_test_v0.c -o ./rootfs/bin/lat_v0
# HFT Path (v3.1)
gcc -static -march=x86-64 -I. lat_test_v3.c -o ./rootfs/bin/lat_v3

2. Packaging & Launch

Build Initramfs:

cd ~/hft-sandbox/rootfs
find . -print0 | cpio --null -ov --format=newc | gzip -9 > ../initramfs.cpio.gz

Run QEMU (Crucial for Isolation):

qemu-system-x86_64 \
    -m 1G -smp 2 \
    -kernel ~/linux/arch/x86/boot/bzImage \
    -initrd ~/hft-sandbox/initramfs.cpio.gz \
    -nographic \
    -append "console=ttyS0 nokaslr iomem=relaxed isolcpus=1 nohz_full=1"

3. Testing (Inside QEMU)

# Initialize and Create Node
mknod /dev/hft c 10 60

# Run Standard Baseline
/bin/lat_v0

# Run Optimized HFT Path (Pinned to Core 1)
taskset 0x2 /bin/lat_v3

Performance Results

The final v3.1 implementation achieves a "Vertical Cliff" CDF, representing near-perfect determinism despite running in a virtualized environment.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
kernel_mods		kernel_mods
lat_data		lat_data
.gitignore		.gitignore
README.md		README.md
build_rootfs.sh		build_rootfs.sh
comp_test.c		comp_test.c
hft_queue.h		hft_queue.h
hft_test.c		hft_test.c
lat_plot.py		lat_plot.py
lat_test_v0.c		lat_test_v0.c
lat_test_v2.c		lat_test_v2.c
lat_test_v3.c		lat_test_v3.c
run_hft.sh		run_hft.sh
sync_kernel.sh		sync_kernel.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HFT-Kernel-Sandbox: Low-Latency Zero-Copy Infrastructure (v3.1)

The Goal

Environment Settings

Implementation Details

1. Custom System Call (Entry 548)

2. SPSC Ring Buffer (The "HFT Path")

3. Blocking I/O (The "Standard Path")

Benchmark Results

Execution Guide

1. Compilation

2. Packaging & Launch

3. Testing (Inside QEMU)

Performance Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HFT-Kernel-Sandbox: Low-Latency Zero-Copy Infrastructure (v3.1)

The Goal

Environment Settings

Implementation Details

1. Custom System Call (Entry 548)

2. SPSC Ring Buffer (The "HFT Path")

3. Blocking I/O (The "Standard Path")

Benchmark Results

Execution Guide

1. Compilation

2. Packaging & Launch

3. Testing (Inside QEMU)

Performance Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages