eqr is a Python package for EVT-based tail regression. It estimates:
- Extreme conditional quantiles
$Q_{Y \mid X=x}(\tau)$ for$\tau \to 1$ - Conditional exceedance probabilities
$P(Y > y^* \mid X=x)$
The package implements a two-stage peaks-over-threshold workflow:
- Fit an intermediate conditional quantile model
$q_0(x) = Q_{Y \mid X=x}(\tau_0)$ - Fit a conditional generalized Pareto distribution (GPD) model on exceedances
$z = y - q_0(x)$ - Extrapolate tail quantities from the fitted GPD
- Pure-Python implementation with NumPy, pandas, SciPy, and scikit-learn
- Multiple tail-model backends behind a shared workflow
- Cross-validated likelihood and deviance tuning
- Python API and command-line interface
- YAML/JSON experiment runner for reproducible batch runs
ERF: extremal random forest with local weighted likelihoodEGAM: spline-additive EV-GAM-style tail modelGBEX: gradient boosting for GPD deviance minimizationEQRN: neural tail model implemented with PyTorch (optional dependency)
Install the base package from the repository root:
pip install -e .Install with the optional neural model:
pip install -e ".[torch]"Install development dependencies:
pip install -e ".[dev,torch]"import numpy as np
from eqr import (
GradientBoostedExceedances,
fit_two_stage_tail_model,
grid_search_tail_model_cv,
predict_exceedance_probabilities,
predict_extreme_quantiles,
)
rng = np.random.default_rng(0)
X = rng.standard_normal((3000, 10))
y = 2.0 * X[:, 0] + 0.5 * rng.standard_normal(3000)
tau0 = 0.95
param_grid = {
"n_estimators": [100, 200],
"learning_rate": [0.05, 0.1],
"max_depth": [2, 3],
"min_samples_leaf": [30, 50],
"constant_xi": [True, False],
"add_q0_feature": [True],
}
intermediate_params = {
"n_estimators": 400,
"min_samples_leaf": 25,
"max_features": "sqrt",
"random_state": 0,
"n_jobs": -1,
}
grid = grid_search_tail_model_cv(
model_class=GradientBoostedExceedances,
param_grid=param_grid,
X=X,
y=y,
tau0=tau0,
intermediate_params=intermediate_params,
n_splits=5,
score="nll_mean",
seed=0,
verbose=1,
)
two_stage = fit_two_stage_tail_model(
X=X,
y=y,
tau0=tau0,
intermediate_params=intermediate_params,
tail_model_class=GradientBoostedExceedances,
tail_params=grid.best_params,
)
X_test = rng.standard_normal((200, 10))
quantiles = predict_extreme_quantiles(two_stage, X_test, taus=[0.99, 0.999])
prob_gt_10 = predict_exceedance_probabilities(two_stage, X_test, y_star=10.0)Installing the package exposes the eqr CLI.
eqr grid-search \
--model gbex \
--data data/mydata.csv \
--target y \
--tau0 0.95 \
--n-splits 5 \
--outdir results/grid_searchArtifacts written to results/grid_search include:
gbex_cv_results.csvgbex_best_params.jsongbex_fold_ids.npygbex_meta.json
eqr predict \
--model gbex \
--train data/mydata.csv \
--tail-params results/grid_search/gbex_best_params.json \
--tau0 0.95 \
--taus 0.99,0.999 \
--y-star 10.0 \
--outdir results/predictionsThis command writes prediction and metadata files to results/predictions.
eqr run --config examples/experiment_config.yamlThe experiment runner materializes a timestamped output directory such as:
results/experiments/<dataset_name>/<timestamp>_<optional_name>/
config_original.yaml
config_resolved.json
env.json
grid_search/<model>/
summary/ranking.csv
models/<model>/
predictions/<model>/
The experiment runner accepts YAML or JSON configuration files. See
examples/experiment_config.yaml for a template.
- Likelihood-based scores are computed on exceedances only, where
y > q0 - During tuning, training-set
q0values are computed out-of-bag within each fold - The ERF tail model depends on the intermediate forest because it uses forest proximity weights
- Exceedance-probability queries require
y_star >= q0(x)on the prediction set - The neural model requires the optional
torchextra
Run the test suite:
python -m pytestRun lint checks:
ruff check eqr testsFormat the codebase:
black eqr testsMIT