Grf qr engine#360
Conversation
There was a problem hiding this comment.
Comparing make_quantile_reg and make_grf_quantiles, it looks like they're both using the same model/engine/mode settings. I remember some discussion about maybe switching so that quantile_regression was a separate mode? I guess if that's happening it's happening some later version?
Also, I'm assuming all parameters not here are just set to their defaults, so sample.fraction=0.5; is there some reason for this particular trio of parameters, and not including say sample.fraction, honesty, honesty.fraction, honesty.prune.leaves, etc?
Overall, mostly just some clarification stuff and some bike-shedding w.r.t which parameters we include. I think it's good to go.
| set.seed(12345) | ||
| library(grf) | ||
| tib <- tibble( | ||
| y = rnorm(100), x = rnorm(100), z = rnorm(100), |
There was a problem hiding this comment.
| y = rnorm(100), x = rnorm(100), z = rnorm(100), | |
| + y = rnorm(100), x = rnorm(100), z = y-x+ .01 * rnorm(100)+1, |
if we make it so that there's something other than noise, the fit models are the same regardless of the state of the seed.
So I tried this, and got different results still, unfortunately.
| colnames(p_manual) <- c("0.1", "0.5", "0.9") | ||
| p_manual <- tibble::as_tibble(p_manual) | ||
| # not equal despite the seed, etc | ||
| # expect_equal(p_pars, p_manual) |
There was a problem hiding this comment.
My guess is that this is because fseed specifies the c++ seed, whereas set_engine(...seed=fseed) is specifying an R seed, which isn't getting passed down. We may just want to set the tolerances large on expect_equal or something along those lines
There was a problem hiding this comment.
I ran the manual twice and got something roughly at the same size. Given that this is just a wrapper I'm not sure we need to do too extensive of testing
There was a problem hiding this comment.
My understanding is that these two should result in the same thing. So, I'm a bit miffed it doesn't.
library(epipredict)
library(grf)
tib <- tibble::tibble(
y = rnorm(100), x = rnorm(100), z = rnorm(100),
f = factor(sample(letters[1:3], 100, replace = TRUE))
)
fseed <- 12345
spec_seed <- rand_forest(mode = "regression", mtry = 2L, min_n = 10) %>%
set_engine("grf_quantiles", seed = fseed)
manual <- quantile_forest(
as.matrix(tib[, 2:3]), tib$y,
quantiles = c(0.1, 0.5, 0.9), seed = fseed,
mtry = 2L, min.node.size = 10
)
translate(spec_seed)
#> Random Forest Model Specification (regression)
#>
#> Main Arguments:
#> mtry = 2
#> min_n = 10
#>
#> Engine-Specific Arguments:
#> seed = fseed
#>
#> Computational engine: grf_quantiles
#>
#> Model fit template:
#> grf::quantile_forest(X = missing_arg(), Y = missing_arg(), mtry = min_cols(~2L,
#> x), min.node.size = min_rows(~10, x), seed = fseed, quantiles = c(0.1,
#> 0.5, 0.9), num.threads = 1L)Created on 2024-08-13 with reprex v2.1.1
The function is being called with the same args, and passing the seed. I'm not concerned, but I wish the were the same.
|
As for the overall comment, yes, the plan is for |
Checklist
Please:
dajmcdon.
DESCRIPTIONandNEWS.md.Always increment the patch version number (the third number), unless you are
making a release PR from dev to main, in which case increment the minor
version number (the second number).
(backwards-incompatible changes to the documented interface) are noted.
Collect the changes under the next release number (e.g. if you are on
0.7.2, then write your changes under the 0.8 heading).