Feature
In situations when running random forests (or other bagged models), OOB model information (predictions, error rates, etc.) should be available.
- First of all, I'm not convinced that OOB is a bad option. In this recent paper they say:
In line with results reported in the literature [5], the use of stratified subsampling with sampling fractions that are proportional to response class sizes of the training data yielded almost unbiased error rates in most settings with metric predictors. It therefore presents an easy way of reducing the bias in the OOB error. It does not increase the cost of constructing the RF, since unstratified sampling (bootstrap of subsampling) is simply replaced by stratified subsampling.
Indicating that OOB errors are doing a good job of estimating error rates (with the added benefit that they require no additional model fitting) as long as stratified sampling is done instead of subsampling.
- Even if nested resampling is superior (and I'll buy that there is an argument to be made), I find that cross validation and OOB are stepping stones to understanding nested resampling. Do you argue that nested resampling is better than CV? If so, why have CV in the package? Again, OOB happens for free, and sometimes nested resampling isn't even that much better. I think that more people will use nested resampling if they understand OOB, and the path to understanding OOB happens when it is included in the tidymodels package.
Thanks for all that you do!! The tidymodels package is amazing, and I really appreciate all the hard work that has gone into creating it.
Feature
In situations when running random forests (or other bagged models), OOB model information (predictions, error rates, etc.) should be available.
Indicating that OOB errors are doing a good job of estimating error rates (with the added benefit that they require no additional model fitting) as long as stratified sampling is done instead of subsampling.
Thanks for all that you do!! The tidymodels package is amazing, and I really appreciate all the hard work that has gone into creating it.