ercbk
diff --git a/‎_book/qmd/algorithms-ml.html‎
Lines changed: 14 additions & 7 deletions b/‎_book/qmd/algorithms-ml.html‎
Lines changed: 14 additions & 7 deletions
diff --git a/‎_book/qmd/big-data.html‎
Lines changed: 21 additions & 17 deletions b/‎_book/qmd/big-data.html‎
Lines changed: 21 additions & 17 deletions
@@ -2829,12 +2829,9 @@ <h3 data-number="1.1.2" class="anchored" data-anchor-id="sec-alg-ml-trees-rf"><s
 <ul>
 <li>Selects discriminative features via a multi-class class separability score (CSS), splits by nearest class centroid, and aggregates tree votes to produce predictions and class probabilities.</li>
 </ul></li>
-<li><span style="color: #990000">{</span><a href="https://cran.r-project.org/web/packages/sirus/index.html" style="color: #990000">sirus</a><span style="color: #990000">}</span>: <u>S</u>table and <u>I</u>nterpretable <u>Ru</u>le <u>S</u>et
+<li><span style="color: #990000">{</span><a href="https://cran.r-project.org/web/packages/corrRF/index.html" style="color: #990000">corrRF</a><span style="color: #990000">}</span> (<a href="https://arxiv.org/abs/2503.12634">Paper</a>) - A clustered random forest algorithm for fitting random forests for data of independent clusters, that exhibit within cluster dependence
 <ul>
-<li>Combines the simplicity of decision trees with a predictivity close to random forests</li>
-<li>Instead of aggregating predictions, SIRUS aggregates the forest structure: the most frequent nodes of the forest are selected to form a stable rule ensemble model</li>
-<li>Me: The interpretability of a Decision Tree with similar predictive accuracy of a RF. Seems like it would be good to fit both and use this model for additional interpretability.</li>
-<li>There’s also a Spatial SIRUS (<a href="https://github.com/LucaPate/Spatial_SIRUS">github</a>, <a href="https://arxiv.org/abs/2408.05537">paper</a>) which uses a spatial <span style="color: #990000">{RandomForestsGLS}</span> model in a SIRUS algorithm</li>
+<li>Possibly can be used on repeated measures data</li>
 </ul></li>
 <li><span style="color: #990000">{</span><a href="https://cran.r-project.org/web/packages/RandomForestsGLS/index.html" style="color: #990000">RandomForestsGLS</a><span style="color: #990000">}</span> - Generalizaed Least Squares RF
 <ul>
@@ -2851,10 +2848,20 @@ <h3 data-number="1.1.2" class="anchored" data-anchor-id="sec-alg-ml-trees-rf"><s
 <li>New Mahalanobis splitting rule for correlated real-valued outcomes in multivariate regression settings</li>
 </ul></li>
 <li><span style="color: #990000">{</span><a href="https://cran.r-project.org/web/packages/ShrinkageTrees/index.html" style="color: #990000">ShrinkageTrees</a><span style="color: #990000">}</span> (<a href="https://arxiv.org/abs/2507.22004">Paper</a>) - Bayesian regression tree models with shrinkage priors on step height</li>
+<li><span style="color: #990000">{</span><a href="https://cran.r-project.org/web/packages/sirus/index.html" style="color: #990000">sirus</a><span style="color: #990000">}</span>: <u>S</u>table and <u>I</u>nterpretable <u>Ru</u>le <u>S</u>et
+<ul>
+<li>Combines the simplicity of decision trees with a predictivity close to random forests</li>
+<li>Instead of aggregating predictions, SIRUS aggregates the forest structure: the most frequent nodes of the forest are selected to form a stable rule ensemble model</li>
+<li>Me: The interpretability of a Decision Tree with similar predictive accuracy of a RF. Seems like it would be good to fit both and use this model for additional interpretability.</li>
+<li>There’s also a Spatial SIRUS (<a href="https://github.com/LucaPate/Spatial_SIRUS">github</a>, <a href="https://arxiv.org/abs/2408.05537">paper</a>) which uses a spatial <span style="color: #990000">{RandomForestsGLS}</span> model in a SIRUS algorithm</li>
+</ul></li>
 <li><span style="color: #990000">{</span><a href="https://stochtree.ai/R_docs/pkgdown/" style="color: #990000">stochtree</a><span style="color: #990000">}</span> - Stochastic tree ensembles (i.e.&nbsp;BART, XBART) for supervised learning and causal inference.</li>
-<li><span style="color: #990000">{</span><a href="https://cran.r-project.org/web/packages/corrRF/index.html" style="color: #990000">corrRF</a><span style="color: #990000">}</span> (<a href="https://arxiv.org/abs/2503.12634">Paper</a>) - A clustered random forest algorithm for fitting random forests for data of independent clusters, that exhibit within cluster dependence
+<li><span style="color: #990000">{</span><a href="https://cran.r-project.org/web/packages/unityForest/index.html" style="color: #990000">unityForest</a><span style="color: #990000">}</span> - Improving Interaction Modeling and Interpretability in Random Forests
 <ul>
-<li>Possibly can be used on repeated measures data</li>
+<li>Currently, only classification is supported</li>
+<li>A random forest variant designed to better take covariates with purely interaction-based effects into account, including interactions for which none of the involved covariates exhibits a marginal effect.</li>
+<li>Facilitates the identification and interpretation of (marginal or interactive) effects</li>
+<li>Includes unity variable importance and covariate-representative tree roots (CRTRs) that provide interpretable visualizations of these conditions</li>
 </ul></li>
 </ul></li>
 </ul>
 
@@ -2331,13 +2331,6 @@ <h2 class="unnumbered anchored" data-anchor-id="sec-bgdat-misc">Misc</h2>
 <section id="sec-bgdat-hghperf" class="level2 unnumbered">
 <h2 class="unnumbered anchored" data-anchor-id="sec-bgdat-hghperf">High Performance</h2>
 <ul>
-<li><p><span style="color: #990000">{rpolars}</span>: Arrow product; uses SIMD which is a low-level vectorization that can be used to speed up simple operations like addition, subtraction, division, and multiplication</p>
-<ul>
-<li>Also see <a href="../qmd/r-polars.html#sec-r-polars" style="color: green">R, Polars</a> and <a href="../qmd/python-polars.html#sec-py-polars" style="color: green">Python, Polars</a></li>
-<li>Capable of using GPUs for up to a 10x execution time decrease.</li>
-<li>Polars Cloud can perform distributed computing</li>
-<li>Extensions: <span style="color: #990000">{</span><a href="https://www.tidypolars.etiennebacher.com/" style="color: #990000">tidypolars</a><span style="color: #990000">}</span>, <span style="color: goldenrod">{</span><a href="https://tidypolars.readthedocs.io/en/latest/" style="color: goldenrod">tidypolars</a><span style="color: goldenrod">}</span></li>
-</ul></li>
 <li><p><span style="color: #990000">{</span><a href="https://sebkrantz.github.io/collapse/" style="color: #990000">collapse</a><span style="color: #990000">}</span> (<a href="https://arxiv.org/abs/2403.05038">Vignette</a>): Fast grouped &amp; weighted statistical computations, time series and panel data transformations, list-processing, data manipulation functions, summary statistics and various utilities such as support for variable labels. Class-agnostic framework designed to work with vectors, matrices, data frames, lists and related classes i.e.&nbsp;<em>xts</em>, <em>data.table</em>, <em>tibble</em>, <em>pdata.frame</em>, <em>sf</em>.</p>
 <ul>
 <li><p>Optimize a script</p>
@@ -2364,11 +2357,31 @@ <h2 class="unnumbered anchored" data-anchor-id="sec-bgdat-hghperf">High Performa
 <span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a>  <span class="fu">fungroup</span>()</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div></li>
 <li><p><span style="color: #990000">{</span><a href="https://github.com/NicChr/fastplyr" style="color: #990000">fastplyr</a><span style="color: #990000">}</span> - Has a <span style="color: #990000">{dplyr}</span> API and a <span style="color: #990000">{collapse}</span> backend</p></li>
 </ul></li>
-<li><p><span style="color: #990000">{r2c}</span>: Fast grouped statistical computation; currently limited to a few functions, sometimes faster than <span style="color: #990000">{collapse}</span></p></li>
+<li><p><span style="color: goldenrod">{</span><a href="https://docs.nvidia.com/cupynumeric/latest/index.html" style="color: goldenrod">cuNumeric</a><span style="color: goldenrod">}</span> (<a href="https://towardsdatascience.com/numpy-api-on-a-gpu/">intro</a>)- Nvidia drop-in replacement for numpy that is built on the Legate framework</p>
+<ul>
+<li>Allow you to use multi-core CPUs, single or multi-GPU nodes, and even multi-node clusters without changing your Python code.</li>
+<li>It translates high-level array operations into a graph of fine-grained tasks and hands that graph to the C++ Legion runtime, which schedules the tasks, partitions the data, and moves tiles between CPUs, GPUs and network links for you.</li>
+</ul></li>
+<li><p><span style="color: #990000">{</span><a href="https://github.com/bbtheo/cuplyr" style="color: #990000">cuplyr</a><span style="color: #990000">}</span> - A dplyr backend for GPU acceleration via RAPIDS cuDF</p>
+<ul>
+<li>Can provide significant speedups on larger datasets (typically &gt;10M rows) without requiring major code changes.</li>
+</ul></li>
 <li><p><span style="color: #990000">{data.table}</span>: Enhanced data frame class with concise data manipulation framework offering powerful aggregation, extremely flexible split-apply-combine computing, reshaping, joins, rolling statistics, set operations on tables, fast csv read/write, and various utilities such as transposition of data.</p>
 <ul>
 <li>See <a href="../qmd/r-data-table.html#sec-r-dt" style="color: green">R, data.table</a></li>
 </ul></li>
+<li><p><span style="color: #990000">{</span><a href="https://cran.r-project.org/web/packages/kit/index.html" style="color: #990000">kit</a><span style="color: #990000">}</span> - Fast (implemented in C) vectorized and nested switches, some parallel (row-wise) statistics, and some utilities such as efficient partial sorting and unique values.</p></li>
+<li><p><span style="color: #990000">{matrixStats}</span>: Efficient row-and column-wise (weighted) statistics on matrices and vectors, including computations on subsets of rows and columns.</p></li>
+<li><p><span style="color: goldenrod">{</span><a href="https://towardsdatascience.com/this-decorator-will-make-python-30-times-faster-715ca5a66d5f" style="color: goldenrod">numba</a><span style="color: goldenrod">}</span> - JIT compiler that translates a subset of Python and NumPy code into fast machine code.</p></li>
+<li><p><span style="color: #990000">{polars}</span>: Arrow product; uses SIMD which is a low-level vectorization that can be used to speed up simple operations like addition, subtraction, division, and multiplication</p>
+<ul>
+<li>Also see <a href="../qmd/r-polars.html#sec-r-polars" style="color: green">R, Polars</a> and <a href="../qmd/python-polars.html#sec-py-polars" style="color: green">Python, Polars</a></li>
+<li>Capable of using GPUs for up to a 10x execution time decrease.</li>
+<li>Polars Cloud can perform distributed computing</li>
+<li>Extensions: <span style="color: #990000">{</span><a href="https://www.tidypolars.etiennebacher.com/" style="color: #990000">tidypolars</a><span style="color: #990000">}</span>, <span style="color: goldenrod">{</span><a href="https://tidypolars.readthedocs.io/en/latest/" style="color: goldenrod">tidypolars</a><span style="color: goldenrod">}</span></li>
+</ul></li>
+<li><p><span style="color: #990000">{</span><a href="https://github.com/t-kalinowski/quickr" style="color: #990000">quickr</a><span style="color: #990000">}</span>: R to Fortran transpiler</p></li>
+<li><p><span style="color: #990000">{r2c}</span>: Fast grouped statistical computation; currently limited to a few functions, sometimes faster than <span style="color: #990000">{collapse}</span></p></li>
 <li><p><span style="color: #990000">{</span><a href="https://cran.r-project.org/web//packages/Rfast/index.html" style="color: #990000">Rfast</a><span style="color: #990000">}</span>, <span style="color: #990000">{</span><a href="https://cran.r-project.org/web/packages/Rfast2/index.html" style="color: #990000">Rfast2</a><span style="color: #990000">}</span>: A collection of fast functions for data analysis.</p>
 <ul>
 <li><p>Rfast - Column- and row- wise means, medians, variances, minimums, maximums, many t, F and G-square tests, many regressions (normal, logistic, Poisson), are some of the many fast functions</p></li>
@@ -2385,15 +2398,6 @@ <h2 class="unnumbered anchored" data-anchor-id="sec-bgdat-hghperf">High Performa
 <span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a>    <span class="at">max =</span> minmax[<span class="dv">2</span>, ]</span>
 <span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"></a>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div></li>
 </ul></li>
-<li><p><span style="color: #990000">{matrixStats}</span>: Efficient row-and column-wise (weighted) statistics on matrices and vectors, including computations on subsets of rows and columns.</p></li>
-<li><p><span style="color: goldenrod">{</span><a href="https://towardsdatascience.com/this-decorator-will-make-python-30-times-faster-715ca5a66d5f" style="color: goldenrod">numba</a><span style="color: goldenrod">}</span> - JIT compiler that translates a subset of Python and NumPy code into fast machine code.</p></li>
-<li><p><span style="color: goldenrod">{</span><a href="https://docs.nvidia.com/cupynumeric/latest/index.html" style="color: goldenrod">cuNumeric</a><span style="color: goldenrod">}</span> (<a href="https://towardsdatascience.com/numpy-api-on-a-gpu/">intro</a>)- Nvidia drop-in replacement for numpy that is built on the Legate framework</p>
-<ul>
-<li>Allow you to use multi-core CPUs, single or multi-GPU nodes, and even multi-node clusters without changing your Python code.</li>
-<li>It translates high-level array operations into a graph of fine-grained tasks and hands that graph to the C++ Legion runtime, which schedules the tasks, partitions the data, and moves tiles between CPUs, GPUs and network links for you.</li>
-</ul></li>
-<li><p><span style="color: #990000">{kit}</span>: Fast vectorized and nested switches, some parallel (row-wise) statistics, and some utilities such as efficient partial sorting and unique values.</p></li>
-<li><p><span style="color: #990000">{</span><a href="https://github.com/t-kalinowski/quickr" style="color: #990000">quickr</a><span style="color: #990000">}</span>: R to Fortran transpiler</p></li>
 </ul>
 </section>
 <section id="sec-bgdat-lgmem" class="level2 unnumbered">