Skip to content

Conversation

@ColeMiller1
Copy link
Contributor

@ColeMiller1 ColeMiller1 commented Jul 18, 2024

Hi @contefranz,

After installing data.table from github master,

data.table::update_dev_pkg()

And then running R CMD check on msmtools, I get the following new failure, (which is not present if you use data.table from CRAN)

--- re-building 'msmtools.Rmd' using rmarkdown_notangle
Quitting from lines 300-318 [multistate_model] (msmtools.Rmd)
Error: processing vignette 'msmtools.Rmd' failed with diagnostics:
object 'augmented_int' not found
--- failed re-building 'msmtools.Rmd'

Before uploading new versions to CRAN, data.table needs to ensure that updates do not break CRAN checks in dependent packages like msmtools (see also Rdatatable/data.table#6033). So can you please submit an updated version of msmtools to CRAN, that fixes this check issue? In particular, I would suggest to avoid using substitute() on the LHS of your := calls, and instead use the column variable directly. For example,

final[ status == state[[ 3 ]], substitute( t_augmented ) := get( t_death ) ]

would become

final[ status == state[[ 3 ]], ( t_augmented ) := get( t_death ) ]

In future versions of data.table, we will start to deprecate the use of additional checks for substitute(col) on the LHS of assignment calls so many of these calls would no longer work as previous versions allowed.

@MichaelChirico
Copy link

Hello, {data.table} plans to release to CRAN in August, so please be advised that {msmtools} will need a new version in the near future. Thanks in advance!

@contefranz contefranz merged commit 4bf4010 into contefranz:master Aug 29, 2024
contefranz pushed a commit that referenced this pull request Aug 29, 2024
@contefranz
Copy link
Owner

@ColeMiller1 I tried updating the code following your suggestions (thx!). I worked on augment() only so any other function hasn't been touched yet. You can find the updated code in the dedicated branch substitute_fix here.

Still, I am struggling finding what causes the following issue when building the vignette. The result below follows an R CMD check from RStudio with the flag --as-cran activated.

─  installing the package to build vignettes
E  creating vignettes (2.7s)
   --- re-building ‘msmtools.Rmd’ using rmarkdown_notangle
   
   Quitting from lines 81-88 [running_augment] (msmtools.Rmd)
   Error: processing vignette 'msmtools.Rmd' failed with diagnostics:
   Supplied 30 items to be assigned to 43 items of column 'status'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.
   --- failed re-building ‘msmtools.Rmd’
   
   SUMMARY: processing the following file failed:
     ‘msmtools.Rmd’
   
   Error: Vignette re-building failed.
   Execution halted
Error in `(function (command = NULL, args = character(), error_on_status = TRUE, …`:
! System command 'R' failed

Any idea on what is going on here?
Below is my R session.

R version 4.4.1 (2024-06-14)
Platform: aarch64-apple-darwin23.4.0
Running under: macOS Sonoma 14.6.1

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /opt/homebrew/Cellar/r/4.4.1/lib/R/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Rome
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.16.0 testthat_3.2.1.1  msmtools_2.0.1   

loaded via a namespace (and not attached):
 [1] Matrix_1.7-0      gtable_0.3.5      expm_1.0-0        dplyr_1.1.4       compiler_4.4.1    brio_1.1.5        tidyselect_1.2.1  msm_1.7.1         splines_4.4.1    
[10] scales_1.3.0      yaml_2.3.9        fastmap_1.2.0     lattice_0.22-6    ggplot2_3.5.1     R6_2.5.1          generics_0.1.3    patchwork_1.2.0   knitr_1.48       
[19] tibble_3.2.1      munsell_0.5.1     pillar_1.9.0      rlang_1.1.4       utf8_1.2.4        xfun_0.46         pkgload_1.4.0     cli_3.6.3         magrittr_2.0.3   
[28] digest_0.6.36     grid_4.4.1        rstudioapi_0.16.0 mvtnorm_1.2-5     lifecycle_1.0.4   vctrs_0.6.5       evaluate_0.24.0   glue_1.7.0        survival_3.6-4   
[37] fansi_1.0.6       colorspace_2.1-0  rmarkdown_2.27    tools_4.4.1       pkgconfig_2.0.3   htmltools_0.5.8.1

@ColeMiller1
Copy link
Contributor Author

Hi @contefranz -

It looks like you are in the right direction! That is a very large diff but based on the error, this seems most probable to be the source of the error.

msmtools/R/augment.R

Lines 646 to 647 in fbdfbcb

flag = unlist( flag_temp, recursive = FALSE )
final[ , status := flag ]

Lines 612-614 also have assignment to a status column. Were you able to test the build against --as-cran as you developed? With the large diff, it's hard to pinpoint without deeper review. Smaller commits would make it easier to review and track any issues that came with the refactor.

@ColeMiller1
Copy link
Contributor Author

I filed a follow-up PR (#9) to focus on the substitute calls. I am not sure I would have been faster at debugging it. Let me know if you have any other questions / follow-up in the other PR.

@contefranz
Copy link
Owner

Hi @contefranz -

It looks like you are in the right direction! That is a very large diff but based on the error, this seems most probable to be the source of the error.

msmtools/R/augment.R

Lines 646 to 647 in fbdfbcb

flag = unlist( flag_temp, recursive = FALSE )
final[ , status := flag ]

Lines 612-614 also have assignment to a status column. Were you able to test the build against --as-cran as you developed? With the large diff, it's hard to pinpoint without deeper review. Smaller commits would make it easier to review and track any issues that came with the refactor.

Thanks for looking into this. To be honest, I am not following what the problem is anymore. I fixed all of the NSE issues caused by either substitute(), get() and eval() (see the augment() function in the new branch).

Why is the creation of status a problem? Its creation is correctly declared at the beginning of the function with the line:

if ( getRversion() >= "2.15.1" ) {
  utils::globalVariables( c( "status", "status_num", "n_status",
                             "status_exp", "status_exp_num", "n_status_exp",
                             ".", "V2" ) )
}

So I now have two questions:

  1. After fixing everything as said above, I still get the very same error on the vignette building process. Btw, the error is about recycling so I really don't get the issue here as the internal data.table objects that are created have very well controlled dimensions. This makes me think that what I changed did not have anything to do with this problem. Thoughts on this?
  2. Why did you file a new PR on main and not on the updated branch?

Furthermore, is the direction I took correct? Meaning, are those fix what you wanted me to do in the first place?

I am sorry for all these questions, but I am really willing to fix any issue just not getting what the issue is.

Thanks!

@ColeMiller1
Copy link
Contributor Author

I am not following what the problem is anymore.

Right - I am not sure what the exact problem in your branch is, either. You did a lot of work, but my guess is that there is a bug which results in the number of rows being assigned to be incorrect. That's why you are seeing the error about Supplied 30 items to be assigned to 43 items of column 'status'.

Why is the creation of status a problem?

The error is not related to the object status or how it's defined. The error is related to the assignment to the column status in the data.table. We can reproduce a similar error by demonstrating on iris:

library(data.table)

dt = as.data.table(iris)
dt[, my_new_col := 1:30]

## Error in `[.data.table`(dt, , `:=`(my_new_col, 1:30)) :
##    Supplied 30 items to be assigned to 150 items of column 'my_new_col'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.

After fixing everything as said above, I still get the very same error on the vignette building process.

You did a lot of work, I agree. Unfortunately, you introduced a different problem that I am unsure what change caused the error. I did not get vignette errors on your main branch and I did not get vignette errors after I made changes mainly targeting eval(substitute()).

Why did you file a new PR on main and not on the updated branch?

Because I was not sure what was causing the new error with the vignette. It was easier to start from something that was working correctly. I know you spent time on your branch so it's OK if you're able to figure out the problem and ignore my other PR.

Furthermore, is the direction I took correct? Meaning, are those fix what you wanted me to do in the first place?

I like the direction you took. In addition to targeting substitute(), you also were cleaning up get() to use less NSE. As for what you needed to fix, the only item that would need to be fixed would be calls like dt[, substitute(your_col) := new_val] where you have substitute(your_col) on the LHS, as indicated in my original post. The get() on the RHS or within by = arguments will still be supported although I agree that you can probably refactor to remove the get() calls.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants