You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<li>Selects discriminative features via a multi-class class separability score (CSS), splits by nearest class centroid, and aggregates tree votes to produce predictions and class probabilities.</li>
2831
2831
</ul></li>
2832
-
<li><spanstyle="color: #990000">{</span><ahref="https://cran.r-project.org/web/packages/sirus/index.html" style="color: #990000">sirus</a><spanstyle="color: #990000">}</span>: <u>S</u>table and <u>I</u>nterpretable <u>Ru</u>le <u>S</u>et
2832
+
<li><spanstyle="color: #990000">{</span><ahref="https://cran.r-project.org/web/packages/corrRF/index.html" style="color: #990000">corrRF</a><spanstyle="color: #990000">}</span> (<ahref="https://arxiv.org/abs/2503.12634">Paper</a>) - A clustered random forest algorithm for fitting random forests for data of independent clusters, that exhibit within cluster dependence
2833
2833
<ul>
2834
-
<li>Combines the simplicity of decision trees with a predictivity close to random forests</li>
2835
-
<li>Instead of aggregating predictions, SIRUS aggregates the forest structure: the most frequent nodes of the forest are selected to form a stable rule ensemble model</li>
2836
-
<li>Me: The interpretability of a Decision Tree with similar predictive accuracy of a RF. Seems like it would be good to fit both and use this model for additional interpretability.</li>
2837
-
<li>There’s also a Spatial SIRUS (<ahref="https://github.com/LucaPate/Spatial_SIRUS">github</a>, <ahref="https://arxiv.org/abs/2408.05537">paper</a>) which uses a spatial <spanstyle="color: #990000">{RandomForestsGLS}</span> model in a SIRUS algorithm</li>
2834
+
<li>Possibly can be used on repeated measures data</li>
2838
2835
</ul></li>
2839
2836
<li><spanstyle="color: #990000">{</span><ahref="https://cran.r-project.org/web/packages/RandomForestsGLS/index.html" style="color: #990000">RandomForestsGLS</a><spanstyle="color: #990000">}</span> - Generalizaed Least Squares RF
<li>New Mahalanobis splitting rule for correlated real-valued outcomes in multivariate regression settings</li>
2852
2849
</ul></li>
2853
2850
<li><spanstyle="color: #990000">{</span><ahref="https://cran.r-project.org/web/packages/ShrinkageTrees/index.html" style="color: #990000">ShrinkageTrees</a><spanstyle="color: #990000">}</span> (<ahref="https://arxiv.org/abs/2507.22004">Paper</a>) - Bayesian regression tree models with shrinkage priors on step height</li>
2851
+
<li><spanstyle="color: #990000">{</span><ahref="https://cran.r-project.org/web/packages/sirus/index.html" style="color: #990000">sirus</a><spanstyle="color: #990000">}</span>: <u>S</u>table and <u>I</u>nterpretable <u>Ru</u>le <u>S</u>et
2852
+
<ul>
2853
+
<li>Combines the simplicity of decision trees with a predictivity close to random forests</li>
2854
+
<li>Instead of aggregating predictions, SIRUS aggregates the forest structure: the most frequent nodes of the forest are selected to form a stable rule ensemble model</li>
2855
+
<li>Me: The interpretability of a Decision Tree with similar predictive accuracy of a RF. Seems like it would be good to fit both and use this model for additional interpretability.</li>
2856
+
<li>There’s also a Spatial SIRUS (<ahref="https://github.com/LucaPate/Spatial_SIRUS">github</a>, <ahref="https://arxiv.org/abs/2408.05537">paper</a>) which uses a spatial <spanstyle="color: #990000">{RandomForestsGLS}</span> model in a SIRUS algorithm</li>
2857
+
</ul></li>
2854
2858
<li><spanstyle="color: #990000">{</span><ahref="https://stochtree.ai/R_docs/pkgdown/" style="color: #990000">stochtree</a><spanstyle="color: #990000">}</span> - Stochastic tree ensembles (i.e. BART, XBART) for supervised learning and causal inference.</li>
2855
-
<li><spanstyle="color: #990000">{</span><ahref="https://cran.r-project.org/web/packages/corrRF/index.html" style="color: #990000">corrRF</a><spanstyle="color: #990000">}</span>(<ahref="https://arxiv.org/abs/2503.12634">Paper</a>) - A clustered random forest algorithm for fitting random forests for data of independent clusters, that exhibit within cluster dependence
2859
+
<li><spanstyle="color: #990000">{</span><ahref="https://cran.r-project.org/web/packages/unityForest/index.html" style="color: #990000">unityForest</a><spanstyle="color: #990000">}</span>- Improving Interaction Modeling and Interpretability in Random Forests
2856
2860
<ul>
2857
-
<li>Possibly can be used on repeated measures data</li>
2861
+
<li>Currently, only classification is supported</li>
2862
+
<li>A random forest variant designed to better take covariates with purely interaction-based effects into account, including interactions for which none of the involved covariates exhibits a marginal effect.</li>
2863
+
<li>Facilitates the identification and interpretation of (marginal or interactive) effects</li>
2864
+
<li>Includes unity variable importance and covariate-representative tree roots (CRTRs) that provide interpretable visualizations of these conditions</li>
<li><p><spanstyle="color: #990000">{rpolars}</span>: Arrow product; uses SIMD which is a low-level vectorization that can be used to speed up simple operations like addition, subtraction, division, and multiplication</p>
2335
-
<ul>
2336
-
<li>Also see <ahref="../qmd/r-polars.html#sec-r-polars" style="color: green">R, Polars</a> and <ahref="../qmd/python-polars.html#sec-py-polars" style="color: green">Python, Polars</a></li>
2337
-
<li>Capable of using GPUs for up to a 10x execution time decrease.</li>
2338
-
<li>Polars Cloud can perform distributed computing</li>
<li><p><spanstyle="color: #990000">{</span><ahref="https://sebkrantz.github.io/collapse/" style="color: #990000">collapse</a><spanstyle="color: #990000">}</span> (<ahref="https://arxiv.org/abs/2403.05038">Vignette</a>): Fast grouped & weighted statistical computations, time series and panel data transformations, list-processing, data manipulation functions, summary statistics and various utilities such as support for variable labels. Class-agnostic framework designed to work with vectors, matrices, data frames, lists and related classes i.e. <em>xts</em>, <em>data.table</em>, <em>tibble</em>, <em>pdata.frame</em>, <em>sf</em>.</p>
<spanid="cb3-6"><ahref="#cb3-6" aria-hidden="true" tabindex="-1"></a><spanclass="fu">fungroup</span>()</span></code></pre></div><buttontitle="Copy to Clipboard" class="code-copy-button"><iclass="bi"></i></button></div></li>
2365
2358
<li><p><spanstyle="color: #990000">{</span><ahref="https://github.com/NicChr/fastplyr" style="color: #990000">fastplyr</a><spanstyle="color: #990000">}</span> - Has a <spanstyle="color: #990000">{dplyr}</span> API and a <spanstyle="color: #990000">{collapse}</span> backend</p></li>
2366
2359
</ul></li>
2367
-
<li><p><spanstyle="color: #990000">{r2c}</span>: Fast grouped statistical computation; currently limited to a few functions, sometimes faster than <spanstyle="color: #990000">{collapse}</span></p></li>
2360
+
<li><p><spanstyle="color: goldenrod">{</span><ahref="https://docs.nvidia.com/cupynumeric/latest/index.html" style="color: goldenrod">cuNumeric</a><spanstyle="color: goldenrod">}</span> (<ahref="https://towardsdatascience.com/numpy-api-on-a-gpu/">intro</a>)- Nvidia drop-in replacement for numpy that is built on the Legate framework</p>
2361
+
<ul>
2362
+
<li>Allow you to use multi-core CPUs, single or multi-GPU nodes, and even multi-node clusters without changing your Python code.</li>
2363
+
<li>It translates high-level array operations into a graph of fine-grained tasks and hands that graph to the C++ Legion runtime, which schedules the tasks, partitions the data, and moves tiles between CPUs, GPUs and network links for you.</li>
2364
+
</ul></li>
2365
+
<li><p><spanstyle="color: #990000">{</span><ahref="https://github.com/bbtheo/cuplyr" style="color: #990000">cuplyr</a><spanstyle="color: #990000">}</span> - A dplyr backend for GPU acceleration via RAPIDS cuDF</p>
2366
+
<ul>
2367
+
<li>Can provide significant speedups on larger datasets (typically >10M rows) without requiring major code changes.</li>
2368
+
</ul></li>
2368
2369
<li><p><spanstyle="color: #990000">{data.table}</span>: Enhanced data frame class with concise data manipulation framework offering powerful aggregation, extremely flexible split-apply-combine computing, reshaping, joins, rolling statistics, set operations on tables, fast csv read/write, and various utilities such as transposition of data.</p>
<li><p><spanstyle="color: #990000">{</span><ahref="https://cran.r-project.org/web/packages/kit/index.html" style="color: #990000">kit</a><spanstyle="color: #990000">}</span> - Fast (implemented in C) vectorized and nested switches, some parallel (row-wise) statistics, and some utilities such as efficient partial sorting and unique values.</p></li>
2374
+
<li><p><spanstyle="color: #990000">{matrixStats}</span>: Efficient row-and column-wise (weighted) statistics on matrices and vectors, including computations on subsets of rows and columns.</p></li>
2375
+
<li><p><spanstyle="color: goldenrod">{</span><ahref="https://towardsdatascience.com/this-decorator-will-make-python-30-times-faster-715ca5a66d5f" style="color: goldenrod">numba</a><spanstyle="color: goldenrod">}</span> - JIT compiler that translates a subset of Python and NumPy code into fast machine code.</p></li>
2376
+
<li><p><spanstyle="color: #990000">{polars}</span>: Arrow product; uses SIMD which is a low-level vectorization that can be used to speed up simple operations like addition, subtraction, division, and multiplication</p>
2377
+
<ul>
2378
+
<li>Also see <ahref="../qmd/r-polars.html#sec-r-polars" style="color: green">R, Polars</a> and <ahref="../qmd/python-polars.html#sec-py-polars" style="color: green">Python, Polars</a></li>
2379
+
<li>Capable of using GPUs for up to a 10x execution time decrease.</li>
2380
+
<li>Polars Cloud can perform distributed computing</li>
<li><p><spanstyle="color: #990000">{</span><ahref="https://github.com/t-kalinowski/quickr" style="color: #990000">quickr</a><spanstyle="color: #990000">}</span>: R to Fortran transpiler</p></li>
2384
+
<li><p><spanstyle="color: #990000">{r2c}</span>: Fast grouped statistical computation; currently limited to a few functions, sometimes faster than <spanstyle="color: #990000">{collapse}</span></p></li>
2372
2385
<li><p><spanstyle="color: #990000">{</span><ahref="https://cran.r-project.org/web//packages/Rfast/index.html" style="color: #990000">Rfast</a><spanstyle="color: #990000">}</span>, <spanstyle="color: #990000">{</span><ahref="https://cran.r-project.org/web/packages/Rfast2/index.html" style="color: #990000">Rfast2</a><spanstyle="color: #990000">}</span>: A collection of fast functions for data analysis.</p>
2373
2386
<ul>
2374
2387
<li><p>Rfast - Column- and row- wise means, medians, variances, minimums, maximums, many t, F and G-square tests, many regressions (normal, logistic, Poisson), are some of the many fast functions</p></li>
<spanid="cb4-8"><ahref="#cb4-8" aria-hidden="true" tabindex="-1"></a>)</span></code></pre></div><buttontitle="Copy to Clipboard" class="code-copy-button"><iclass="bi"></i></button></div></li>
2387
2400
</ul></li>
2388
-
<li><p><spanstyle="color: #990000">{matrixStats}</span>: Efficient row-and column-wise (weighted) statistics on matrices and vectors, including computations on subsets of rows and columns.</p></li>
2389
-
<li><p><spanstyle="color: goldenrod">{</span><ahref="https://towardsdatascience.com/this-decorator-will-make-python-30-times-faster-715ca5a66d5f" style="color: goldenrod">numba</a><spanstyle="color: goldenrod">}</span> - JIT compiler that translates a subset of Python and NumPy code into fast machine code.</p></li>
2390
-
<li><p><spanstyle="color: goldenrod">{</span><ahref="https://docs.nvidia.com/cupynumeric/latest/index.html" style="color: goldenrod">cuNumeric</a><spanstyle="color: goldenrod">}</span> (<ahref="https://towardsdatascience.com/numpy-api-on-a-gpu/">intro</a>)- Nvidia drop-in replacement for numpy that is built on the Legate framework</p>
2391
-
<ul>
2392
-
<li>Allow you to use multi-core CPUs, single or multi-GPU nodes, and even multi-node clusters without changing your Python code.</li>
2393
-
<li>It translates high-level array operations into a graph of fine-grained tasks and hands that graph to the C++ Legion runtime, which schedules the tasks, partitions the data, and moves tiles between CPUs, GPUs and network links for you.</li>
2394
-
</ul></li>
2395
-
<li><p><spanstyle="color: #990000">{kit}</span>: Fast vectorized and nested switches, some parallel (row-wise) statistics, and some utilities such as efficient partial sorting and unique values.</p></li>
2396
-
<li><p><spanstyle="color: #990000">{</span><ahref="https://github.com/t-kalinowski/quickr" style="color: #990000">quickr</a><spanstyle="color: #990000">}</span>: R to Fortran transpiler</p></li>
0 commit comments