Skip to content

driouchab/vol_forecasting

Repository files navigation

vol_forecasting

5-day ahead realized volatility forecasting for SPY (2010–2024), with a VRP trading strategy.


Overview

This project benchmarks a spectrum of volatility models — from classic HAR regressions to a Neural SDE with fractional Brownian motion noise — on the task of forecasting 5-day annualized realized volatility for the S&P 500 ETF (SPY).

All models are evaluated under strict no-look-ahead conditions: features are lagged by at least one day, walk-forward validation uses only past data for each training window, and the test set is always held out chronologically.


Models

Model Description
Linear (rolling vols) OLS on 5d and 20d lagged RV
Linear + VIX OLS adding VIX as an implied-vol signal
HAR Heterogeneous Autoregressive model (Corsi 2009): daily + weekly + monthly RV
HAR + VIX HAR augmented with VIX
Advanced linear OLS on VIX, 20d RV, VRP, VVIX
LogHAR HAR in log-vol space with Jensen-correction back to level
ElasticNet (LogHAR) ElasticNetCV on 8 log-space features + lagged residual AR term
GARCH(1,1) Classical conditional-variance model (via arch)
Gaussian Process Matérn(ν=1.5) kernel; date-based split puts COVID in test set for meaningful uncertainty estimates
Random Forest Walk-forward RF on regime-normalised target (shallow trees, high min_samples_leaf)
Neural SDE (fBm) Milstein-discretised SDE with fractional BM noise (H ≈ 0.134), conditioned on VIX/VRP features
VolNet Deep MLP (4 layers, SiLU, LayerNorm) in log-vol space
ShallowVolNet 2-layer MLP in full log-space — neural analogue of LogHAR

Key findings

  • VIX is the single strongest predictor across all horizons (IC peaks at the 5–20d horizon), consistent with the VRP literature.
  • Rough volatility: the Hurst exponent of log-RV estimated via R/S analysis is H ≈ 0.13, well below 0.5, confirming the anti-persistent, rough nature of the vol process (Gatheral et al. 2018).
  • HAR + VIX is a very strong linear baseline and hard to beat on pure R².
  • ElasticNet (LogHAR) improves on plain LogHAR by regularising the 8-feature log-space regression and correcting for residual autocorrelation; the AR term alone without regularisation does not reliably outperform.
  • GARCH(1,1) is beaten by the Advanced linear model on MSE; the Diebold-Mariano test confirms this at the 5% level.
  • Neural SDE: the fBm-driven SDE with H < 0.5 noise captures the rough clustering of vol better than standard BM; competitive with VolNet in R² with more interpretable dynamics.
  • VRP strategy: timing vol-selling by the VRP signal (enter when VRP > 75th expanding percentile) produces a meaningfully higher Sharpe than the always-short benchmark.

Project structure

vol_forecasting/
├── config.py                  # All hyperparameters in one place
├── main.py                    # End-to-end pipeline
├── requirements.txt
│
├── data/
│   └── loader.py              # yfinance download (SPY, VIX, VVIX)
│
├── features/
│   └── engineering.py         # Feature construction + feature-set registries
│
├── models/
│   ├── baseline.py            # Linear, HAR, LogHAR, ElasticNet
│   ├── garch_model.py         # GARCH(1,1) via arch
│   ├── probabilistic.py       # Gaussian Process + Random Forest
│   ├── neural_sde.py          # Neural SDE with fBm noise
│   └── vol_net.py             # VolNet + ShallowVolNet
│
└── evaluation/
    ├── metrics.py             # Walk-forward R², DM test, IC by horizon, Hurst
    └── strategy.py            # VRP volatility-selling strategy

Quickstart

pip install -r requirements.txt

# Full run including neural models (~10 min on CPU)
python main.py

# Fast run, skip neural models (~30 sec)
python main.py --no-neural

Features

All features are lagged by ≥1 day to prevent look-ahead bias.

Feature Description
rolling_vol_20_lag 20-day realized vol, lagged 1d
rolling_vol_5_lag 5-day realized vol, lagged 1d
abs_return_lag |r_{t-1}| × √252
vix_lag VIX / 100, lagged 1d
vvix_lag VVIX / 100, lagged 1d
vrp VIX − 20d RV (variance risk premium proxy)
RV_daily/weekly/monthly HAR components (1d, 5d, 22d lagged RV)
vol_5_ratio 5d RV / 20d RV (regime indicator)
log_RV_* Log-transformed HAR components
log_vix, log_vvix Log implied vol
momentum_1w 5-day cumulative log return, lagged
log_volume_diff First difference of log(volume), lagged

References

  • Corsi, F. (2009). A Simple Approximate Long-Memory Model of Realized Volatility. Journal of Financial Econometrics.
  • Gatheral, J., Jaisson, T., & Rosenbaum, M. (2018). Volatility is Rough. Quantitative Finance.
  • Bollerslev, T., Tauchen, G., & Zhou, H. (2009). Expected Stock Returns and Variance Risk Premia. Review of Financial Studies.
  • Diebold, F. X., & Mariano, R. S. (1995). Comparing Predictive Accuracy. Journal of Business & Economic Statistics.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages