Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .github/workflows/R-CMD-check.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,10 @@ jobs:

env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
# Run skip_on_cran() tests in CI (incl. the varPro grow tests): CI is
# not a sanitizer build, so the upstream UBSAN path is harmless here;
# only CRAN's own check machines (NOT_CRAN unset) skip them.
NOT_CRAN: "true"
R_KEEP_PKG_SOURCE: yes
# Skip vdiffr visual-regression tests in CI until reference SVGs are
# committed. To regenerate: run testthat::snapshot_accept() locally,
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/check-release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,10 @@ jobs:
runs-on: ubuntu-latest
env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
# Run skip_on_cran() tests in CI (incl. the varPro grow tests): CI is
# not a sanitizer build, so the upstream UBSAN path is harmless here;
# only CRAN's own check machines (NOT_CRAN unset) skip them.
NOT_CRAN: "true"
R_KEEP_PKG_SOURCE: yes
VDIFFR_RUN_TESTS: "false"
steps:
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/check-standard.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,10 @@ jobs:

env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
# Run skip_on_cran() tests in CI (incl. the varPro grow tests): CI is
# not a sanitizer build, so the upstream UBSAN path is harmless here;
# only CRAN's own check machines (NOT_CRAN unset) skip them.
NOT_CRAN: "true"
R_KEEP_PKG_SOURCE: yes
VDIFFR_RUN_TESTS: "false"

Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/test-coverage.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@ jobs:
runs-on: ubuntu-latest
env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
# Run skip_on_cran() tests in CI (incl. the varPro grow tests): CI is
# not a sanitizer build, so the upstream UBSAN path is harmless here;
# only CRAN's own check machines (NOT_CRAN unset) skip them.
NOT_CRAN: "true"
VDIFFR_RUN_TESTS: "false"

steps:
Expand Down
6 changes: 3 additions & 3 deletions CRAN-SUBMISSION
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
Version: 3.1.0
Date: 2026-06-11 15:26:24 UTC
SHA: a7d805290e69ae517d04846bf13fae6a01062fce
Version: 3.1.2
Date: 2026-06-13 13:53:53 UTC
SHA: 83d8021b0014e72613d35d02caf19bf9e4bca6eb
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
Package: ggRandomForests
Type: Package
Title: Visually Exploring Random Forests
Version: 3.1.0.9000
Date: 2026-06-11
Version: 3.1.2.9000
Date: 2026-06-13
Authors@R: person("John", "Ehrlinger",
role = c("aut", "cre"),
email = "john.ehrlinger@gmail.com")
Expand Down
43 changes: 41 additions & 2 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
Package: ggRandomForests
Version: 3.1.0.9000
Version: 3.1.2.9000

ggRandomForests v4.0.0 (development)
====================================
* Development version 3.1.0.9000, opened after the v3.1.0 CRAN release.
* Development version 3.1.2.9000, opened after the v3.1.2 CRAN release
(forward-merged the v3.1.1 and v3.1.2 CRAN fixes onto the dev line).
* `gg_auct()` / `plot.gg_auct()`: tidy wrapper and plot for time-varying
AUC from `randomForestRHF::auct.rhf()` (RHF Phase 2). Returns a long
frame `time / auc / se / lower / upper / marker` with an `iauc`
Expand All @@ -17,6 +18,44 @@ ggRandomForests v4.0.0 (development)
`requireNamespace("randomForestRHF")`. No change for users who do not
install it.

ggRandomForests v3.1.2
======================
* CRAN fix: skip only the single test grow that trips the upstream
`randomForestSRC` gcc-UBSAN report at `entry.c:184` — the *unsupervised*
isolation forest in `gg_isopro` (`varPro::isopro(method = "unsupv")`). Only
an unsupervised grow has a 0-length `yvar.wt`, the vector `rfsrcGrow`
decrements to an out-of-bounds pointer; supervised grows are unaffected.
We verified this under `-fsanitize=undefined`: of every varPro/rfsrc grow
in the test suite, only `isopro(method = "unsupv")` fires `entry.c:184`.
`make_iso_fit()` therefore calls `skip_on_cran()` only for
`method = "unsupv"`. ggRandomForests is pure R and unchanged.
* The broader `skip_on_cran()` guards added in v3.1.1 (the `varpro`,
`uvarpro`, `ivarpro`, `beta.varpro`, and `isopro(method = "rnd")` test
fixtures) are removed: those grows are supervised (or synthetic-supervised)
and gcc-UBSAN-clean, so they run on CRAN again, restoring that test
coverage. The upstream issue is fixed in `randomForestSRC` and pending a
CRAN release.

ggRandomForests v3.1.1
======================
* CRAN fix: the varPro tests now call `skip_on_cran()` so they do not run
on CRAN's check machines, including the gcc-UBSAN additional check. They
were triggering an upstream `randomForestSRC` sanitizer issue (a 0-length
array access in `rfsrcGrow`, `entry.c:184`) that surfaces when any
`varPro` grow (`varpro()`, `beta.varpro()`, `uvarpro()`, `isopro()`,
`ivarpro()`) builds a forest. ggRandomForests is pure R and its code is
unchanged; the varPro tests still run in our CI (the workflows set
`NOT_CRAN=true`) and locally; they are skipped only on CRAN's check
machines, including the gcc-UBSAN check. The upstream issue has been
reported to the randomForestSRC maintainers.
* The `varpro` vignette now loads every varPro fit from a precomputed
file (`vignettes/varpro_precomputed.rds`, built by
`vignettes/precompute_varpro.R`), so the vignette performs no live
varPro grow during `R CMD check`. This removes the same upstream
sanitizer path from the vignette build and trims check time. Each chunk
falls back to a live fit if the precomputed object is absent, so the
vignette remains reproducible from source.

ggRandomForests v3.1.0
======================
* Fix: `gg_vimp()` for single-outcome rfsrc forests now correctly flags
Expand Down
142 changes: 59 additions & 83 deletions cran-comments.md
Original file line number Diff line number Diff line change
@@ -1,88 +1,64 @@
## v3.1.0 — varPro integration + bug fix (major release)

### Resubmission

This resubmission addresses the one issue raised on the previous 3.1.0
submission: the overall check time exceeded 10 minutes on
r-devel-windows-x86_64, driven by the vignette rebuild (~331 s) and the
tests (~209 s). We have reduced both, per the three levers suggested
(small toy data, fewer iterations, precomputed results for the lengthiest
parts):

- The regression and survival vignettes use lighter forests (`ntree`
200 / 150, imputation `ntree` 100) and coarser partial-dependence grids.
- The varpro vignette's three `gg_partial_varpro()` calls and the Boston
`beta.varpro()` fit — together ~34 s, the bulk of that vignette's
rebuild — are now precomputed offline (`vignettes/precompute_varpro.R`)
and loaded from a 167 KB `vignettes/varpro_precomputed.rds`, with an
automatic live-computation fallback if the file is absent.
- The `gg_udependent()` tests previously recomputed the same per-fit
entropy matrix (~1.5 s) once per test; they now memoise it.

Locally the full vignette rebuild drops from ~80 s to ~38 s and the test
suite from ~44 s to ~36 s (R 4.6.0, aarch64-apple-darwin23), with no
change in test coverage or vignette content.

### Release context

v2.7.3 is the version currently published on CRAN. A v3.0.0 submission
(2026-05-28) cleared the Windows and Debian pretests (0/0/0) but did not
complete the review cycle; this 3.1.0 release supersedes it, carrying the
full v3.0.0 feature layer plus the v3.1.0 bug fix and documentation work.
The version is in 3.x territory because it adds a substantial new feature
layer and soft-deprecates one user-facing function.

### Changes in v3.1.0

- **Bug fix.** `gg_vimp()` on single-outcome `rfsrc` forests now correctly
flags variables with non-positive VIMP in the `positive` column (used for
plot colouring). The single-outcome fit names the column `VIMP`
(uppercase) while the flag check read `$vimp` (lowercase), so `positive`
stayed `TRUE` for every variable. Added a regression test.
- **Documentation.** Deepened the varPro-family and rfsrc importance /
partial / survival help pages, made the `gg_vimp()` (permutation
importance) vs `gg_varpro()` (release-rule importance) distinction
explicit and cross-linked, and fixed a stale `@return` in `gg_roc()`. No
user-facing behaviour change beyond the bug fix above.

### Major changes carried from v3.0.0

- **New varPro wrapper family.** Tidy extractors and `plot()` methods for
the `varPro` package: `gg_partial_varpro()` (partial dependence),
`gg_varpro()` (variable importance), `gg_udependent()` (cross-variable
dependency graph), `gg_isopro()` (isolation-forest anomaly scores),
`gg_beta_varpro()` (per-rule beta importance), and `gg_ivarpro()`
(individual / local importance), each with `print` / `summary` /
`autoplot` companions and a dedicated "varpro" vignette.
- **Soft-deprecation.** `gg_partialpro()` now warns and forwards to
`gg_partial_varpro()`; it will be removed in a future release.
- **randomForest engine fixes.** `gg_variable.randomForest()`
classification, `gg_roc()` / `calc_roc()` for `randomForest` (true
probability-based, macro-averaged ROC), per-class one-vs-rest ROC
(`per_class = TRUE`), and `plot.gg_variable()` now returns a single
`ggplot` / `patchwork` object rather than a bare list.
- **Importance-plot ordering.** All importance plots now place the
most-important variable at the top, matching the `gg_vimp` convention.

### Dependency change (reverse-dependency safe)

`randomForestSRC` and `randomForest` move from `Depends:` to `Imports:`,
and `varPro` is added to `Imports:`. `library(ggRandomForests)` no longer
attaches the forest packages to the search path. There are **0 reverse
dependencies** on CRAN, so no downstream packages are affected.

## Test environments
## v3.1.2 — CRAN fix (gcc-UBSAN additional check, follow-up to 3.1.1)

This patch completes the gcc-UBSAN "additional issue" fix. v3.1.1 guarded
the varPro forest-growing tests with `skip_on_cran()`, but one fixture was
missed and the auto-check re-flagged the issue (win-builder incoming
pre-test, 2026-06-12; correct before 2026-06-29).

### The issue

The sanitizer reports, once, during the package tests:

```
entry.c:184:55: runtime error: pointer index expression with base
0x000000000001 overflowed to 0xfffffffffffffff9
#0 rfsrcGrow .../randomForestSRC/src/entry.c:184
```

This is a 0-length weight-vector access in the compiled code of the
`randomForestSRC` package (`rfsrcGrow`, `entry.c:184`): when `yvar.wt` has
length 0 — which happens for the **unsupervised** family — the unconditional
`RF_yWeight--` forms an out-of-bounds pointer (undefined behaviour, though
never dereferenced). ggRandomForests is a pure-R package
(`NeedsCompilation: no`); it contains no C/C++/Fortran of its own, so the
overflow is not in our code. It is surfaced because our test suite grows an
unsupervised forest via `varPro::isopro()`.

### The fix

No package R code changes (ggRandomForests remains `NeedsCompilation: no`).
We built CRAN's `randomForestSRC` 3.6.2 with `-fsanitize=undefined` and ran
every varPro/rfsrc grow in our test suite under it. Only one grow fires
`entry.c:184`: the *unsupervised* isolation forest
(`varPro::isopro(method = "unsupv")`). The bug is a 0-length `yvar.wt`
(present only for the unsupervised family) that `rfsrcGrow` decrements to an
out-of-bounds pointer; supervised grows have a non-empty `yvar.wt` and are
unaffected.

`make_iso_fit()` in the `gg_isopro` tests therefore calls
`testthat::skip_on_cran()` only when `method = "unsupv"`, so the one
offending grow never runs on CRAN's check machines (including the gcc-UBSAN
additional check). All other varPro tests run on CRAN. We confirmed the full
test suite produces zero gcc-UBSAN errors under the sanitizer configuration
CRAN uses (GCC `-fsanitize=undefined`, which — unlike clang — does not
include `float-cast-overflow`).

The upstream issue has been reported to the randomForestSRC maintainers.

### Test environments

* **Local:** R 4.6.0 on macOS (aarch64-apple-darwin23).
`R CMD check --as-cran` returns 0 errors, 0 warnings, 0 notes.
`R CMD check --as-cran` (with the manual) returns 0 errors, 0 warnings,
1 NOTE (days since last update, see below); the single unsupervised isopro
test (`method = "unsupv"`) skips under the CRAN check environment as
intended, all other varPro tests run.
* **GitHub Actions matrix:** ubuntu-latest (R-devel / R-release /
R-oldrel-1), windows-latest (R-release), macos-latest (R-release),
all green on the head commit.
* **Reverse-dependency check:** `tools::package_dependencies(reverse =
TRUE)` returns 0.
* **URLs:** `urlchecker::url_check()` clean.
R-oldrel-1), windows-latest (R-release), macos-latest (R-release).
* **Reverse-dependency check:** 0 reverse dependencies on CRAN.

## NOTE disposition
### NOTE disposition

No notes in local `R CMD check --as-cran`. Examples that fit survival
forests are wrapped in `\donttest` and run in a few seconds each.
One NOTE on the incoming feasibility check, "Days since last update: N".
This is expected: this patch is the follow-up to the gcc-UBSAN fix the CRAN
team requested (correct before 2026-06-29), so the short interval is
unavoidable. No other notes.
90 changes: 90 additions & 0 deletions dev/randomForestSRC-ubsan-report.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# UBSAN: 0-length-array pointer arithmetic in `rfsrcGrow` (entry.c:184)

**Package:** randomForestSRC (observed in 3.6.2, latent in earlier versions
incl. 3.3.5)
**Reporter:** John Ehrlinger (ggRandomForests maintainer)
**Severity:** Undefined behaviour flagged by gcc-UBSAN; numerically benign in
practice, but CRAN treats it as an "Additional issue."

## What CRAN reports

gcc-UBSAN, surfaced via the ggRandomForests test suite (which grows varPro
forests):

```
entry.c:184:55: runtime error: pointer index expression with base
0x000000000001 overflowed to 0xfffffffffffffff9
#0 rfsrcGrow .../randomForestSRC/src/entry.c:184
```

## Root cause

`rfsrcGrow()` builds 1-offset working pointers from incoming R weight vectors:

```c
// src/entry.c (3.6.2), lines 183-185
RF_xWeightStat = REAL(xWeightStat); RF_xWeightStat--; // 183
RF_yWeight = REAL(yWeight); RF_yWeight--; // 184 <-- UBSAN
RF_xWeight = REAL(xWeight); RF_xWeight--; // 185
```

When the corresponding R vector has **length 0**, `REAL()` returns R's
empty-vector sentinel address `0x1`. The unconditional `ptr--` then forms
`0x1 - sizeof(double) = 0x1 - 8 = 0xfffffffffffffff9`. *Forming* this
out-of-bounds pointer is undefined behaviour (UBSAN, column 55 = the `--`),
even though the package never dereferences `RF_yWeight[i]` when the weight is
empty — so the result is numerically correct and the bug is otherwise silent.

`yWeight` (R-side `yvar.wt`) is length 0 whenever the y-weight vector is
empty. The R glue produces exactly that for the **unsupervised** family:

```r
# R/rfsrc.R
if (family == "unspv") {
yvar.wt <- NULL # -> as.double(NULL) -> numeric(0) -> length-0 yWeight
} else {
yvar.wt <- get.weight(yvar.wt, length(yvar.types))
}
```

So every unsupervised forest already forms this pointer; varPro reaches it
because its rule-grow machinery fits forests through that path.

### Minimal reproduction (no varPro needed)

```r
library(randomForestSRC)
d <- data.frame(y = rnorm(60), x1 = rnorm(60), x2 = rnorm(60))
rfsrc(Unsupervised() ~ ., d, ntree = 10) # yvar.wt is numeric(0) -> entry.c:184
```

Confirmed by tracing `randomForestSRC:::get.weight`: the unsupervised path
returns a length-0 `yvar.wt` (`n=0, outlen=0`); regression returns length 1.

Note the sibling pointers `RF_subjWeight`, `RF_eventType`, etc. are guarded
with `if (VECTOR_ELT(...) != R_NilValue)`, but these three are not — and
`as.double(numeric(0))` is never `R_NilValue`, so nothing catches the empty
case.

## Suggested fix

Only decrement when the vector is non-empty (mirrors the existing
NULL-guarded idiom; the empty pointer is never indexed downstream):

```c
RF_xWeightStat = REAL(xWeightStat); if (LENGTH(xWeightStat) > 0) RF_xWeightStat--;
RF_yWeight = REAL(yWeight); if (LENGTH(yWeight) > 0) RF_yWeight--;
RF_xWeight = REAL(xWeight); if (LENGTH(xWeight) > 0) RF_xWeight--;
```

(In this repo's amalgamated `src/randomForestSRC.c` the same three lines are
at ~36965-36967.) The identical guard should be applied anywhere else
`rfsrcGrow`/`rfsrcPredict` does `REAL(w); w--` / `INTEGER(w); w--` on a vector
that can arrive length 0.

## Downstream impact

ggRandomForests 3.1.1 works around this by not exercising the path under
`R CMD check` (varPro tests `skip_on_cran()`; the varpro vignette loads
precomputed fits). A guard in randomForestSRC would remove the UB at the
source for all callers.
6 changes: 6 additions & 0 deletions tests/testthat/helper-varpro-fixtures.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
# Session-memoised varpro + beta.varpro fixtures for the gg_beta_varpro tests.
# beta.varpro() is the expensive call (per-rule glmnet); compute once per R
# session and reuse. In-memory only — no disk cache.
#
# These all grow *supervised* varpro/ivarpro/beta.varpro forests (a real Y), so
# yvar.wt is non-empty and they do NOT trip randomForestSRC's gcc-UBSAN report
# at entry.c:184 (that fires only for the unsupervised family — verified under
# -fsanitize=undefined). They intentionally run on CRAN; do not skip_on_cran().
# Only the unsupervised isopro grow is skipped (see test_gg_isopro.R).

.varpro_cache <- new.env(parent = emptyenv())

Expand Down
9 changes: 8 additions & 1 deletion tests/testthat/test_gg_isopro.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,14 @@
# Phase 4 of the v2.8.0 varPro integration.

make_iso_fit <- function(seed = 1L, method = "rnd", ntree = 25, sampsize = 16) {
skip_if_not_installed("varPro")
# Only the *unsupervised* isopro grow (method = "unsupv") trips
# randomForestSRC's gcc-UBSAN report at entry.c:184 (length-0 yvar.wt
# decremented to an out-of-bounds pointer; upstream bug, ggRandomForests is
# pure R). The default method = "rnd" builds a synthetic-supervised forest and
# is UBSAN-clean — verified under -fsanitize=undefined. So skip_on_cran() only
# the unsupervised path; the rnd tests run on CRAN. Both run in CI and locally.
if (identical(method, "unsupv")) testthat::skip_on_cran()
testthat::skip_if_not_installed("varPro")
set.seed(seed)
varPro::isopro(
data = iris[, 1:4],
Expand Down
3 changes: 3 additions & 0 deletions tests/testthat/test_gg_udependent.R
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@
## ── Helpers ──────────────────────────────────────────────────────────────────

make_uvp <- function(ntree = 25L) {
# uvarpro() grows a *synthetic-supervised* forest (yxyz123 ~ ., random Y), so
# yvar.wt is non-empty and it does NOT trip the entry.c:184 gcc-UBSAN report
# (only the truly-unsupervised isopro grow does). Runs on CRAN.
set.seed(42L)
varPro::uvarpro(iris[, -5L], ntree = ntree)
}
Expand Down
Loading
Loading