Skip to content

softbart_probit error message for constant outcome can be improved #10

@EoghanONeill

Description

@EoghanONeill

If the training outcome is a factor with two levels, and only takes one value, then an error message occurs. This makes sense, but the error message can be improved. Maybe a check can be added after line 95 of softbart_probit.R .

The full error message is

Error in if (nulldev == 0) stop("y is constant; gaussian glmnet fails at standardization step") : 
  missing value where TRUE/FALSE needed

However, when I ran the function within a parallelized loop the only message I observed was

missing value where TRUE/FALSE needed

Therefore a check with a more informative error message might be preferable.

Here is a reproducible example.

library(SoftBart)
num_burn <- 10 ## Should be ~ 5000
num_save <- 10 ## Should be ~ 5000
set.seed(1234)
f_fried <- function(x) 10 * sin(pi * x[,1] * x[,2]) + 20 * (x[,3] - 0.5)^2 +
  10 * x[,4] + 5 * x[,5]
gen_data <- function(n_train, n_test, P, sigma) {
  X <- matrix(runif(n_train * P), nrow = n_train)
  mu <- (f_fried(X) - 14) / 5
  X_test <- matrix(runif(n_test * P), nrow = n_test)
  mu_test <- (f_fried(X_test) - 14) / 5
  # Y <- factor(rbinom(n_train, 1, pnorm(mu)), levels = c(0,1))
  Y <- factor(rep(1,n_train), levels = c(0,1))
  Y_test <- factor(rbinom(n_test, 1, pnorm(mu_test)), levels = c(0,1))
  return(list(X = X, Y = Y, mu = mu, X_test = X_test, Y_test = Y_test,
              mu_test = mu_test))
}
## Simiulate dataset
sim_data <- gen_data(250, 250, 100, 1)
df <- data.frame(X = sim_data$X, Y = sim_data$Y)
df_test <- data.frame(X = sim_data$X_test, Y = sim_data$Y_test)
## Fit the model
opts <- Opts(num_burn = num_burn, num_save = num_save)
fitted_probit <- softbart_probit(Y ~ ., df, df_test, opts = opts)

Here is the example of the less informative message for parallel code. Perhaps this is more of an issue or the parallel package, or perhaps it can occur when SoftBart is run in other functions


library(parallel)

parfunction <- function(i){
  library(SoftBart)
  num_burn <- 10 ## Should be ~ 5000
  num_save <- 10 ## Should be ~ 5000
  set.seed(1234)
  f_fried <- function(x) 10 * sin(pi * x[,1] * x[,2]) + 20 * (x[,3] - 0.5)^2 +
    10 * x[,4] + 5 * x[,5]
  gen_data <- function(n_train, n_test, P, sigma) {
    X <- matrix(runif(n_train * P), nrow = n_train)
    mu <- (f_fried(X) - 14) / 5
    X_test <- matrix(runif(n_test * P), nrow = n_test)
    mu_test <- (f_fried(X_test) - 14) / 5
    # Y <- factor(rbinom(n_train, 1, pnorm(mu)), levels = c(0,1))
    Y <- factor(rep(1,n_train), levels = c(0,1))
    Y_test <- factor(rbinom(n_test, 1, pnorm(mu_test)), levels = c(0,1))
    return(list(X = X, Y = Y, mu = mu, X_test = X_test, Y_test = Y_test,
                mu_test = mu_test))
  }
  sim_data <- gen_data(250, 250, 100, 1)
  df <- data.frame(X = sim_data$X, Y = sim_data$Y)
  df_test <- data.frame(X = sim_data$X_test, Y = sim_data$Y_test)
  ## Fit the model
  opts <- Opts(num_burn = num_burn, num_save = num_save)
  fitted_probit <- softbart_probit(Y ~ ., df, df_test, opts = opts)
  fitted_probit
}




cl <- makeCluster(1)
clusterSetRNGStream(cl = cl, iseed = 123)

clusterExport(cl,c('parfunction'),
envir = environment()
)

res_list <- parallel::parLapply(cl = cl, 1:1, fun = parfunction)

stopCluster(cl)


'''

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions