Version 0.7.3 of Gpredomics introduced the concept of a confidence interval for the decision threshold. When this feature is enabled, models can reject a sample when the predicted score falls inside a confidence interval computed around their binary decision threshold:
Model #27 Ratio:Raw [k=51] [gen:1] [fit:0.799] AUC 0.923/0.902 | accuracy 0.911/0.864 | sensitivity 0.875/0.909 | specificity 0.957/0.818 | rejection rate 0.122/0.267 | G-mean 0.886/0.862
Class healthy: score < 0.120
Class cirrhosis: score > 0.270
Rejection zone (95% CI): score ∈ [0.120; 0.270]
score = (msp_0049 + msp_0313 + msp_0380 + msp_0385 + msp_0581 + msp_0602c + msp_0768 + msp_0881 + msp_0884 + msp_0977 + msp_1127 + msp_1453 + msp_1543 + msp_1789 + msp_1793 + msp_1799) / (msp_0017 + msp_0034 + msp_0047 + msp_0048 + msp_0062 + msp_0063 + msp_0088 + msp_0089 + msp_0144 + msp_0151 + msp_0194 + msp_0196 + msp_0198 + msp_0205 + msp_0236 + msp_0263 + msp_0306 + msp_0381 + msp_0450 + msp_0464 + msp_0468 + msp_0558 + msp_0572 + msp_0757 + msp_0763 + msp_0852 + msp_0898 + msp_0917 + msp_1021 + msp_1093 + msp_1139 + msp_1185 + msp_1275 + msp_1323 + msp_1881 + 1e-5)
There, the rejection zone is defined between 0.120 and 0.270.
Concretely, for each epoch, the threshold of each model is estimated on the full dataset and on a number of
resampled datasets (the number of bootstrap iterations given by threshold_ci_n_bootstrap). This yields an empirical
distribution of the threshold and allows estimating how variable the threshold is; from that we derive the model's
uncertainty on individual samples. The confidence zone is controlled by threshold_ci_alpha: lowering alpha increases
the confidence level (1 − α) and therefore tends to widen the confidence interval (so decreasing alpha makes the
rejection zone larger).
Due to the necessity of computing threshold_ci_n_bootstrap, threshold confidence interval has cost of
Once the confidence interval is estimated, its lower and upper bounds define a rejection zone. Any sample whose score
falls inside that zone is considered rejected and therefore excluded from performance metric computations. The
parameter threshold_ci_penalty is used directly as a selection pressure in the genetic algorithm: models with a low
rejection rate (more decisive models) are favored by evolution compared to models that reject many samples.
Recommended scenarios:
- High-stakes clinical decisions where false positives/negatives have severe consequences (e.g., cancer screening, antimicrobial resistance prediction)
- Regulatory compliance where model uncertainty must be quantified and reported
- Imbalanced datasets where threshold estimation is inherently uncertain
- Small sample sizes (N < 200) where threshold variability is higher
Not recommended when:
- Fast prototyping is prioritized over uncertainty quantification (use B=0 to disable)
- Very large datasets (N > 10,000) where threshold is stable and CI computation becomes expensive
- Streaming/real-time predictions where abstention is not operationally feasible
The implementation uses a stratified resampling strategy and a percentile-based confidence interval construction. Key points:
-
Stratified sampling: bootstrap / subsampling is stratified by class (positive / negative) so class proportions are preserved in each replicate.
-
Sampling modes:
- If
threshold_ci_frac_bootstrap == 1.0: classic bootstrap (with replacement). Each bootstrap draw has the same nominal size as the corresponding class in the original data. - If
0 < threshold_ci_frac_bootstrap < 1.0: subsampling without replacement; in each replicate a fractionthreshold_ci_frac_bootstrapof the samples from each class is drawn.
- If
-
Subsampling correction (Geyer rescaling): When threshold_ci_frac_bootstrap < 1.0, subsample-based variability underestimates full-sample variability. To correct this bias, the code applies Geyer rescaling:
- For each bootstrap replicate b, compute the deviation: δ_b = θ̂_b - θ̂ (where θ̂ is the threshold on full data) - Rescale by subsample size: δ*_b = √m · δ_b (where m = subsample size) - Compute quantiles on {δ*_b} to get q_lower and q_upper - De-pivot with full sample size: CI = [θ̂ - q_upper/√n, θ̂ - q_lower/√n] (where n = full sample size)This ensures the CI has the correct coverage even when using subsampling for computational efficiency.
-
Confidence interval building: percentile method is used. For B bootstrap replicates and significance α the code selects quantile indices
- lower_idx = ceil((α / 2) * (B − 1))
- upper_idx = floor((1 − α / 2) * (B − 1)) and uses the sorted bootstrap statistics at those indices (after Geyer rescaling if applicable) as CI bounds.
-
Rejection rate: defined as the fraction of samples assigned the "undecided" label (i.e., whose score lies between the lower and upper bounds). This rejection rate is stored with the model and used to compute the GA penalty as fit := fit − threshold_ci_penalty * rejection_rate.
-
Reproducibility: Bootstrap indices and labels are precomputed once per epoch using the RNG seed, then reused for all individuals in the population. This ensures deterministic results across runs with the same seed.
-
Performance: the code contains a faster path that precomputes bootstrap indices and labels once per dataset and reuses them for all individuals in a population for reproducibility and speed.
- Bootstrap methodology: Efron, B. & Tibshirani, R. J. (1993). An Introduction to the Bootstrap. Chapman & Hall/CRC.
- Subsampling correction: Politis, D. N., Romano, J. P., & Wolf, M. (1999). Subsampling. Springer.
Last updated: v0.7.6