A production-grade C++20 statistical arbitrage market-making system that combines Avellaneda-Stoikov optimal quoting, cointegration-based pair selection, and order flow toxicity detection into a coherent strategy with real-time risk management, realistic execution simulation, and walk-forward backtesting.
This engine unifies three quantitative pillars:
-
Avellaneda-Stoikov (2008) optimal market making provides the quoting framework -- the reservation price adjusts for inventory risk, and the optimal spread compensates for adverse selection via an order intensity model calibrated from fill data.
-
Cointegration-based signals (Engle-Granger + Johansen) identify mean-reverting pairs, with Kalman-filtered dynamic hedge ratios that track structural changes online. The z-score of the Kalman innovation drives entry/exit timing.
-
Microstructure toxicity signals (VPIN, Kyle's lambda, multi-level OFI with Lee-Ready classification) modulate the spread width in real time -- widening quotes when informed trading is detected, reducing adverse selection by an estimated 30%.
The risk management layer enforces position limits, drawdown circuit breakers, fat-finger protection, and rate limiting on every quote, with zero-allocation pre-trade checks completing in under 100ns.
| Operation | Throughput | Latency (p50) |
|---|---|---|
| Add Order | 16.41 M ops/sec | 61 ns |
| Cancel Order | 11.08 M ops/sec | 90 ns |
| Match Order | 4.38 M ops/sec | 228 ns |
| Mixed Workload | 9.35 M ops/sec | 107 ns |
Hardware: AMD Ryzen 7 2700x, single-threaded, no kernel bypass.
graph TB
MD[Market Data<br/>LOBSTER L3] --> RE[ReplayEngine<br/>Deterministic replay]
RE --> OB[OrderBook<br/>16M ops/sec, O(1)]
RE --> SIG[Signal Layer]
subgraph SIG[Signal Generation]
SM[SpreadModel<br/>Kalman-filtered beta]
OFI[Multi-Level OFI<br/>Lee-Ready classification]
VP[VPIN<br/>Volume-sync toxicity]
KL[Kyle Lambda<br/>Price impact]
end
SIG --> AS[StatArbMM<br/>A-S optimal quotes]
AS --> RM[RiskManager<br/>Zero-alloc pre-trade]
RM --> ES[ExecutionSimulator<br/>Latency + queue + AS]
ES --> PNL[PnL Analytics<br/>Sharpe, Sortino, Calmar]
PNL --> WF[Walk-Forward<br/>Overfit ratio]
- Full closed-form:
r = s - q*gamma*sigma^2*tau,delta = gamma*sigma^2*tau + (2/gamma)*ln(1+gamma/k) - Terminal time degradation (tau decays to 0 at session end)
- Order intensity calibration from fill rate data
- Signal modulation: spread widens with VPIN and Kyle's lambda
- 2x2 state-space model for dynamic hedge ratio tracking
- Hand-rolled matrix ops (no Eigen on hot path)
- Z-score uses Kalman innovation variance (principled uncertainty)
- Delta parameter controls adaptation speed (Chan 2013)
- Engle-Granger with ADF lag selection via BIC (Schwert 1989)
- Johansen trace and max-eigenvalue tests (bivariate)
- OU parameter estimation via concentrated MLE (Brent's method)
- MacKinnon (1996) response surface p-values
std::arrayindexed by symbolId (no hash maps on hot path)- Pre-trade checks: position limits, loss limits, drawdown, fat-finger, rate limiting
- Atomic kill switch with auto-activation on drawdown breach
- Typed
RiskCheckResultenum for audit logging
- VPIN (
src/signals/VPIN.hpp): Volume-synchronized toxicity detection - Kyle's Lambda (
src/signals/KyleLambda.hpp): Permanent price impact estimation - Multi-Level OFI (
src/signals/OFI.hpp): K-level order flow with Lee-Ready classification
- Event-driven (not vectorized) to prevent lookahead bias
- Realistic fill model via
ExecutionSimulator(latency, queue position, adverse selection) - Almgren-Chriss transaction cost model (
src/execution/TransactionCosts.hpp) - Walk-forward optimization with overfit ratio reporting
| Decision | Why | Tradeoff |
|---|---|---|
| PMR monotonic buffer | 5ns alloc vs 100-500ns malloc | Memory not freed until reset |
| CRTP matching dispatch | Eliminates vtable + enables inlining | Compile-time strategy binding |
| Hand-rolled 2x2 Kalman | No Eigen dependency on hot path | Manual matrix code |
| Fixed arrays in RiskManager | O(1) with no hash, no allocation | Max 64 symbols |
| OU MLE via Brent's method | Less biased than AR(1) OLS | More complex implementation |
| Event-driven backtesting | Prevents lookahead bias | Slower than vectorized |
See docs/design_decisions.md for detailed rationale.
# Configure and build
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build
# Run tests
./bin/orderbook_tests
./bin/strategy_tests
./bin/kalman_tests
./bin/cointegration_gtest
./bin/risk_tests
./bin/integration_tests
# Run benchmarks
./bin/orderbook_bench
./bin/strategy_benchmark
# Run demo
./bin/stat_arb_runner- GoogleTest v1.14.0
- Google Benchmark v1.8.3
- Eigen 3.4.0 (analytics layer only)
src/
├── core/ # Order book engine (16M ops/sec)
│ ├── Order.hpp # 32-byte POD order struct
│ ├── OrderBook.hpp/cpp # O(1) price-indexed book
│ ├── PriceLevel.hpp # FIFO order queue
│ ├── Bitset.hpp # SIMD bitmask for level discovery
│ ├── MatchingStrategy.hpp # CRTP + virtual matching
│ ├── RingBuffer.hpp # Lock-free SPSC queue
│ └── Exchange.hpp # Shard-per-core architecture
├── signals/ # Alpha signal generation
│ ├── SpreadModel.hpp # Rolling z-score + Kalman integration
│ ├── KalmanFilter.hpp # Dynamic hedge ratio (2x2 state-space)
│ ├── OFI.hpp # Multi-level OFI + Lee-Ready classifier
│ ├── VPIN.hpp # Volume-synchronized toxicity
│ └── KyleLambda.hpp # Permanent price impact
├── strategy/
│ └── StatArbMM.hpp # Full Avellaneda-Stoikov with signal modulation
├── risk/
│ └── RiskManager.hpp # Zero-alloc pre-trade risk checks
├── execution/
│ ├── ExecutionSimulator.hpp # Latency + queue + adverse selection
│ └── TransactionCosts.hpp # Almgren-Chriss impact model
├── analytics/
│ ├── PnLAnalytics.hpp # Sharpe, Sortino, Calmar, fill rate
│ ├── CointegrationTests.hpp # EG + Johansen + OU MLE
│ └── OFIValidation.hpp # A/B testing framework
├── backtest/
│ ├── Simulator.hpp # Event-driven backtest engine
│ └── WalkForward.hpp # Walk-forward optimization
└── replay/
├── LobsterParser.hpp # LOBSTER L3 data parser
└── ReplayEngine.hpp # Deterministic replay
tests/cpp/
├── strategy_tests.cpp # A-S formula, time decay, intensity
├── kalman_tests.cpp # Convergence, time-varying beta
├── cointegration_gtest.cpp # ADF size verification (Monte Carlo)
├── risk_tests.cpp # Position limits, kill switch, drawdown
├── integration_tests.cpp # Full pipeline + walk-forward
└── strategy_benchmark.cpp # Hot-path latency (Google Benchmark)
docs/
├── model_spec.md # Full mathematical specification
├── design_decisions.md # Architecture tradeoffs with rationale
├── sensitivity.md # Parameter sensitivity analysis
└── failure_modes.md # What didn't work and why
| Document | Contents |
|---|---|
| Model Specification | HJB derivation, Kalman state-space, cointegration framework, microstructure signals |
| Design Decisions | PMR vs tcmalloc, CRTP vs virtual, hand-rolled vs Eigen, event-driven vs vectorized |
| Sensitivity Analysis | Sharpe vs gamma/k/z-threshold, parameter robustness under perturbation |
| Failure Modes | Cointegration breakdown, flash crash, adverse selection spiral, overfitting |
The test suite includes both software engineering tests (unit, integration) and statistical tests that verify quantitative correctness:
- ADF test size verification: Generate 500 random walk pairs, run Engle-Granger at 5% level, assert rejection rate is in [1%, 15%]. This catches bugs in the ADF implementation that would produce systematically wrong p-values.
- OU MLE parameter recovery: Generate OU process with known theta, verify MLE recovers it within confidence intervals.
- Kalman convergence: Verify filter converges to true beta on synthetic cointegrated data.
- Walk-forward overfit ratio: Out-of-sample Sharpe / in-sample Sharpe should exceed 0.3 for non-trivial strategies.
MIT