Skip to content

dlon450/Rare-Event-Estimation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Extreme Quantile Regression (EQR)

eqr is a Python package for EVT-based tail regression. It estimates:

  • Extreme conditional quantiles $Q_{Y \mid X=x}(\tau)$ for $\tau \to 1$
  • Conditional exceedance probabilities $P(Y > y^* \mid X=x)$

The package implements a two-stage peaks-over-threshold workflow:

  1. Fit an intermediate conditional quantile model $q_0(x) = Q_{Y \mid X=x}(\tau_0)$
  2. Fit a conditional generalized Pareto distribution (GPD) model on exceedances $z = y - q_0(x)$
  3. Extrapolate tail quantities from the fitted GPD

Features

  • Pure-Python implementation with NumPy, pandas, SciPy, and scikit-learn
  • Multiple tail-model backends behind a shared workflow
  • Cross-validated likelihood and deviance tuning
  • Python API and command-line interface
  • YAML/JSON experiment runner for reproducible batch runs

Included Models

  • ERF: extremal random forest with local weighted likelihood
  • EGAM: spline-additive EV-GAM-style tail model
  • GBEX: gradient boosting for GPD deviance minimization
  • EQRN: neural tail model implemented with PyTorch (optional dependency)

Installation

Install the base package from the repository root:

pip install -e .

Install with the optional neural model:

pip install -e ".[torch]"

Install development dependencies:

pip install -e ".[dev,torch]"

Python API

import numpy as np
from eqr import (
    GradientBoostedExceedances,
    fit_two_stage_tail_model,
    grid_search_tail_model_cv,
    predict_exceedance_probabilities,
    predict_extreme_quantiles,
)

rng = np.random.default_rng(0)
X = rng.standard_normal((3000, 10))
y = 2.0 * X[:, 0] + 0.5 * rng.standard_normal(3000)

tau0 = 0.95

param_grid = {
    "n_estimators": [100, 200],
    "learning_rate": [0.05, 0.1],
    "max_depth": [2, 3],
    "min_samples_leaf": [30, 50],
    "constant_xi": [True, False],
    "add_q0_feature": [True],
}
intermediate_params = {
    "n_estimators": 400,
    "min_samples_leaf": 25,
    "max_features": "sqrt",
    "random_state": 0,
    "n_jobs": -1,
}

grid = grid_search_tail_model_cv(
    model_class=GradientBoostedExceedances,
    param_grid=param_grid,
    X=X,
    y=y,
    tau0=tau0,
    intermediate_params=intermediate_params,
    n_splits=5,
    score="nll_mean",
    seed=0,
    verbose=1,
)

two_stage = fit_two_stage_tail_model(
    X=X,
    y=y,
    tau0=tau0,
    intermediate_params=intermediate_params,
    tail_model_class=GradientBoostedExceedances,
    tail_params=grid.best_params,
)

X_test = rng.standard_normal((200, 10))
quantiles = predict_extreme_quantiles(two_stage, X_test, taus=[0.99, 0.999])
prob_gt_10 = predict_exceedance_probabilities(two_stage, X_test, y_star=10.0)

Command-Line Interface

Installing the package exposes the eqr CLI.

Grid search

eqr grid-search \
  --model gbex \
  --data data/mydata.csv \
  --target y \
  --tau0 0.95 \
  --n-splits 5 \
  --outdir results/grid_search

Artifacts written to results/grid_search include:

  • gbex_cv_results.csv
  • gbex_best_params.json
  • gbex_fold_ids.npy
  • gbex_meta.json

Fit and predict

eqr predict \
  --model gbex \
  --train data/mydata.csv \
  --tail-params results/grid_search/gbex_best_params.json \
  --tau0 0.95 \
  --taus 0.99,0.999 \
  --y-star 10.0 \
  --outdir results/predictions

This command writes prediction and metadata files to results/predictions.

Run from configuration

eqr run --config examples/experiment_config.yaml

The experiment runner materializes a timestamped output directory such as:

results/experiments/<dataset_name>/<timestamp>_<optional_name>/
  config_original.yaml
  config_resolved.json
  env.json
  grid_search/<model>/
  summary/ranking.csv
  models/<model>/
  predictions/<model>/

Configuration

The experiment runner accepts YAML or JSON configuration files. See examples/experiment_config.yaml for a template.

Notes

  • Likelihood-based scores are computed on exceedances only, where y > q0
  • During tuning, training-set q0 values are computed out-of-bag within each fold
  • The ERF tail model depends on the intermediate forest because it uses forest proximity weights
  • Exceedance-probability queries require y_star >= q0(x) on the prediction set
  • The neural model requires the optional torch extra

Development

Run the test suite:

python -m pytest

Run lint checks:

ruff check eqr tests

Format the codebase:

black eqr tests

License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages