Problem
More MCMC iterations lead to worse test performance — classic overfitting:
| n_iter |
lambda |
Train AUC |
Test AUC |
| 200 |
0.001 |
0.925 |
0.738 |
| 1000 |
0.001 |
0.960 |
0.662 |
| 1000 |
0.01 |
0.960 |
0.687 |
Root cause
SBS eliminates features based solely on training posterior probability — no holdout or CV during feature elimination. As n_iter increases, the posterior concentrates on features that fit the training data, not features that generalize.
Proposed fixes (by complexity)
1. Inner holdout during SBS (medium)
Split train data into train/valid before SBS. At each step, run MCMC on train, evaluate eliminated candidates on valid. Drop the feature whose removal least hurts validation performance.
2. Bayesian model averaging (medium)
Instead of committing to one feature subset via SBS, average predictions over multiple subsets weighted by their posterior model probability. This naturally regularizes by not over-committing.
3. Spike-and-slab prior with stronger shrinkage (low complexity)
The current L2 prior (lambda) weakly regularizes coefficients. A spike-and-slab prior explicitly models P(feature included) vs P(feature excluded) and can be tuned to be more aggressive.
4. Early stopping on SBS (simple)
Monitor a held-out validation metric during SBS. Stop eliminating features when validation performance starts decreasing, even if nmin hasn't been reached.
5. Cross-validated SBS (high, most principled)
Run K-fold CV at each SBS step — each fold runs MCMC independently, elimination decision based on average posterior across folds.
References
- O'Hara, R.B. & Sillanpää, M.J. (2009). A review of Bayesian variable selection methods. Bayesian Analysis.
- Mitchell, T.J. & Beauchamp, J.J. (1988). Bayesian variable selection in linear regression. JASA.
- Ishwaran, H. & Rao, J.S. (2005). Spike and slab variable selection. Annals of Statistics.
Problem
More MCMC iterations lead to worse test performance — classic overfitting:
Root cause
SBS eliminates features based solely on training posterior probability — no holdout or CV during feature elimination. As n_iter increases, the posterior concentrates on features that fit the training data, not features that generalize.
Proposed fixes (by complexity)
1. Inner holdout during SBS (medium)
Split train data into train/valid before SBS. At each step, run MCMC on train, evaluate eliminated candidates on valid. Drop the feature whose removal least hurts validation performance.
2. Bayesian model averaging (medium)
Instead of committing to one feature subset via SBS, average predictions over multiple subsets weighted by their posterior model probability. This naturally regularizes by not over-committing.
3. Spike-and-slab prior with stronger shrinkage (low complexity)
The current L2 prior (lambda) weakly regularizes coefficients. A spike-and-slab prior explicitly models P(feature included) vs P(feature excluded) and can be tuned to be more aggressive.
4. Early stopping on SBS (simple)
Monitor a held-out validation metric during SBS. Stop eliminating features when validation performance starts decreasing, even if nmin hasn't been reached.
5. Cross-validated SBS (high, most principled)
Run K-fold CV at each SBS step — each fold runs MCMC independently, elimination decision based on average posterior across folds.
References