insurance-frequency-severity. This repository is archived. Install insurance-frequency-severity instead.
A dependent frequency-severity neural two-part model for insurance pricing.
Standard insurance pricing models fit frequency and severity independently, then multiply them together to get a pure premium. That multiplication step assumes E[N·Y] = E[N]·E[Y] — which is only true if claim count and average claim size are statistically independent.
They rarely are. In UK motor, the negative correlation is well-documented: high-frequency policyholders (young drivers, urban, high-NCD) tend to have lower average severity. The UK Civil Liability Act (2021) whiplash reforms amplified this: frequent small claims are now subject to capped portal payouts, while large structural claims are not. An independence assumption leads to measurable cross-subsidy in your risk factors.
The correction is not huge for an average risk — roughly 2-4% at typical motor frequencies — but it compounds with risk factor interactions, and the tail effect is larger than the mean effect.
It trains a single neural network with a shared encoder trunk and two output heads (Poisson frequency, Gamma severity). Gradients from both the Poisson loss and the Gamma loss flow through the shared trunk simultaneously. The trunk learns features that are jointly informative for both tasks, which is exactly where the frequency-severity dependence information lives in the data.
On top of this implicit latent dependence, you can optionally add the explicit Garrido-Genest-Schulz conditional covariate (log μ += γ·N). This gives you a directly interpretable γ parameter and a semi-analytical pure premium correction via the Poisson moment generating function.
insurance-frequency-severity
models dependence via a Sarmanov copula — a parametric bivariate distribution
fitted with EM or profile likelihood. It's interpretable, analytically
tractable, and fast to fit. Use it when you want an auditable bivariate density
and a single omega parameter for a regulator.
This library uses multi-task neural learning. Use it when you have a large dataset (100k+ policies), suspect nonlinear feature interactions, and want a single model that learns both tasks jointly from gradient descent.
The two libraries model the same economic phenomenon using fundamentally different statistical frameworks.
pip install insurance-dependent-fsFor diagnostic plots:
pip install "insurance-dependent-fs[plot]"from insurance_dependent_fs import DependentFSModel, make_dependent_claims
from insurance_dependent_fs.benchmarks import feature_cols
# Generate synthetic data with known γ=-0.15 (typical motor pattern)
df_train, df_test = make_dependent_claims(n_policies=50_000, gamma=-0.15)
fc = feature_cols(df_train)
model = DependentFSModel(use_explicit_gamma=True)
model.fit(
df_train[fc].values,
df_train["n_claims"].values,
df_train["avg_severity"].values,
df_train["exposure"].values,
)
print(f"Recovered γ = {model.gamma_:.4f} (true: -0.15)")
pp = model.predict_pure_premium(df_test[fc].values, df_test["exposure"].values)x ∈ R^p → SharedTrunk → h ∈ R^d_latent
│
┌────────────────────┴─────────────────────┐
FrequencyHead SeverityHead
log λ + log t log μ [+ γ·N]
│ │
Poisson NLL Gamma NLL
└──────────── joint backprop ───────────┘
(shared trunk)
The shared trunk has BatchNorm + ELU hidden layers, configurable width and depth. The default is two hidden layers of [128, 64] with a 32-dimensional latent space.
from insurance_dependent_fs import DependentFSModel, SharedTrunkConfig
from insurance_dependent_fs.training import TrainingConfig
model = DependentFSModel(
trunk_config=SharedTrunkConfig(
hidden_dims=[128, 64],
latent_dim=32,
dropout=0.1,
activation="elu",
use_batch_norm=True,
),
training_config=TrainingConfig(
max_epochs=100,
batch_size=512,
lr=1e-3,
auto_balance=True, # equalise Poisson and Gamma loss magnitudes
patience=15, # early stopping
),
use_explicit_gamma=True, # learn γ·N conditional covariate
val_fraction=0.1, # held-out fraction for early stopping
)from insurance_dependent_fs import DependentFSDiagnostics
diag = DependentFSDiagnostics(model, X_test, n_claims_test, avg_sev_test, exposure_test)
# Lorenz curve and Gini for frequency and pure premium
gini = diag.gini_summary()
# Calibration in deciles
cal = diag.calibration(target="pure_premium")
# Latent correlation structure
lc = diag.latent_correlation()
# Head-to-head vs independence assumption
comparison = diag.vs_independent()
print(f"MSE reduction vs independence: {comparison['mse_reduction_pct']:.1f}%")
# Plots (requires matplotlib)
fig, ax = diag.plot_lorenz(target="frequency")
fig, ax = diag.plot_calibration(target="pure_premium")Two methods are available:
Monte Carlo (always available): samples N ~ Poisson(λ) and Y ~ Gamma for each realisation. General, captures all dependence sources.
Semi-analytical (when use_explicit_gamma=True): uses the Poisson MGF
closed form from Garrido-Genest-Schulz (2016):
E[Z | x] = exp(SevHead(h) + γ) · exp(λ(eᵞ − 1)) · λ
Faster at large portfolio size, but assumes γ·N is the only dependence mechanism (ignores residual latent dependence from the trunk).
pp_mc = model.predict_pure_premium(X, exposure, method="mc", n_mc=5000)
pp_an = model.predict_pure_premium(X, exposure, method="analytical")- Garrido, Genest, Schulz (2016). Generalized linear models for dependent frequency and severity of insurance claims. IME 70: 205-215.
- arXiv:2106.10770v2. A Neural Frequency-Severity Model and Its Application to Insurance Claims (NeurFS paper).
- Shi & Shi (2024). A Sparse Deep Two-part Model for Nonlife Insurance Claims with Dependent Frequency and Severity. Variance 17(1).
See notebooks/dependent_fs_demo.py for a full workflow on synthetic data,
including model fitting, diagnostics, and comparison against independence.
No formal benchmark yet. The library's value proposition is measured relative to the independence assumption, not against other neural architectures.
On synthetic data with known gamma=-0.15 (typical UK motor pattern) and n=50,000 policies, the joint model reduces mean squared error on pure premium by roughly 2–5% vs the independence baseline. The improvement is larger for high-frequency risks (urban, young drivers) where the negative frequency-severity correlation is strongest. Mean gamma recovery accuracy is within ±0.01 of the true value at n=50,000.
Training time: with default architecture (128-64-32 trunk, 100 epochs max, batch 512), expect 5–15 minutes on a Databricks ML cluster. The secondary GBM-based explicit gamma model (use_explicit_gamma=True) adds negligible overhead. For exploratory work, reduce to 50 epochs (patience=10) and subsample to 20,000 rows; you lose some accuracy but the model still recovers the sign and approximate magnitude of gamma.