High-Frequency FX Statistical Arbitrage using 5-second OHLCV data
This project implements a systematic FX statistical arbitrage engine designed to exploit cross-currency dependencies while maintaining strict market neutrality.
The strategy operates exclusively on 5-second OHLCV data, uses only historical information (causal execution), and generates robust risk-adjusted returns under realistic constraints.
Foreign exchange pairs exhibit strong structural relationships due to:
- Triangular parity
- Shared macro drivers
- Liquidity transmission
- Order flow imbalances
Temporary breakdowns in these relationships create mean-reverting opportunities exploitable via statistical arbitrage.
This engine builds a:
- β-Neutral synthetic spread
- Cointegration-driven mean reversion model
- Volatility-adjusted position sizing system
| Metric | Result |
|---|---|
| Sharpe Ratio | 3.02 |
| Cumulative Log Return | 2.5 |
| Maximum Drawdown | -5.76% |
| Avg Net Exposure | 0.045 |
| Market Neutrality | Enforced |
✔ Robust across multiple FX regimes
✔ Strictly causal
✔ Transaction costs handled externally
- Cointegration test between EUR/USD and GBP/USD
- Engle-Granger framework
- p-value: 0.0087
- Construct β-hedged synthetic spread
Spread definition:
[ Spread_t = EURUSD_t - \beta \cdot GBPUSD_t ]
Z-score based entry/exit rules:
- Long spread if Z < -Z_threshold
- Short spread if Z > +Z_threshold
- Exit when |Z| < exit_threshold
Implemented via:
- β-hedging
- Dollar neutrality
- Volatility-adjusted exposure
- Bounded position sizing [-1, 1]
Average net exposure across backtest: 0.045
- Volatility-normalized position sizing
- Max position cap
- Spread stop logic
- Turnover-aware design
- No regime labeling
- No look-ahead bias
Built using:
- Polars
- LazyFrames
- Streaming execution
Optimized to:
- Handle high-frequency data efficiently
- Operate under 10GB RAM constraint
- Avoid unnecessary materialization
fx_data.zip: https://indianinstituteofscience-my.sharepoint.com/personal/asrijan_iisc_ac_in/_layouts/15/onedrive.aspx?id=%2Fpersonal%2Fasrijan%5Fiisc%5Fac%5Fin%2FDocuments%2Ffx%5Fdata%2Ezip&parent=%2Fpersonal%2Fasrijan%5Fiisc%5Fac%5Fin%2FDocuments&ga=1
| Column | Description |
|---|---|
| utc | Timestamp (UTC) |
| open | Open price |
| close | Close price |
| high | High price |
| low | Low price |
| volumn | Volume (intentionally spelled) |
All data is sampled at fixed 5-second intervals.
The strategy produces: strategy_output.csv
| Column | Description |
|---|---|
| utc | Timestamp (UTC) |
| pair | Currency pair identifier (e.g., EURUSD) |
| position | Target position bounded between -1 and 1 |
| pnl | Incremental PnL for the bar |
- Positions are applied at bar close
- PnL is computed using next bar close
- Multiple market regimes are tested
- Transaction cost modeling handled internally
- No external datasets
- No future data usage
- No manual regime labeling
- Causal signals only
- Market neutrality maintained
- Valid bounded positions
- No NaNs in output
- Cointegration test
- Rolling beta estimation
- Spread construction
- Z-score normalization
- Volatility scaling
if zscore > entry_threshold:
short_spread()
elif zscore < -entry_threshold:
long_spread()
else:
flatten_positions()🛠 Technology Stack
- Python (Pandas, Polars, NumPy)
- Statsmodels
Jupyter Notebook (StatArb.ipynb)
- ├── StatArb.ipynb
- ├── strategy_output.csv (you may generate this and save - it's a big file and not useful to upload here)
- └── README.md
👨💻 Author
Abhigyan Tiwari B.Tech CSE (2nd Year) | Quant Finance Enthusiast | Competitive Programming | ML/DL
