Skip to content

Commit b73db6b

Browse files
committed
db-pg >> pg_tracing; geo-mod >> bin derivations;
1 parent 7bc4f41 commit b73db6b

19 files changed

Lines changed: 1483 additions & 1234 deletions

_book/qmd/algorithms-ml.html

Lines changed: 14 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2829,12 +2829,9 @@ <h3 data-number="1.1.2" class="anchored" data-anchor-id="sec-alg-ml-trees-rf"><s
28292829
<ul>
28302830
<li>Selects discriminative features via a multi-class class separability score (CSS), splits by nearest class centroid, and aggregates tree votes to produce predictions and class probabilities.</li>
28312831
</ul></li>
2832-
<li><span style="color: #990000">{</span><a href="https://cran.r-project.org/web/packages/sirus/index.html" style="color: #990000">sirus</a><span style="color: #990000">}</span>: <u>S</u>table and <u>I</u>nterpretable <u>Ru</u>le <u>S</u>et
2832+
<li><span style="color: #990000">{</span><a href="https://cran.r-project.org/web/packages/corrRF/index.html" style="color: #990000">corrRF</a><span style="color: #990000">}</span> (<a href="https://arxiv.org/abs/2503.12634">Paper</a>) - A clustered random forest algorithm for fitting random forests for data of independent clusters, that exhibit within cluster dependence
28332833
<ul>
2834-
<li>Combines the simplicity of decision trees with a predictivity close to random forests</li>
2835-
<li>Instead of aggregating predictions, SIRUS aggregates the forest structure: the most frequent nodes of the forest are selected to form a stable rule ensemble model</li>
2836-
<li>Me: The interpretability of a Decision Tree with similar predictive accuracy of a RF. Seems like it would be good to fit both and use this model for additional interpretability.</li>
2837-
<li>There’s also a Spatial SIRUS (<a href="https://github.com/LucaPate/Spatial_SIRUS">github</a>, <a href="https://arxiv.org/abs/2408.05537">paper</a>) which uses a spatial <span style="color: #990000">{RandomForestsGLS}</span> model in a SIRUS algorithm</li>
2834+
<li>Possibly can be used on repeated measures data</li>
28382835
</ul></li>
28392836
<li><span style="color: #990000">{</span><a href="https://cran.r-project.org/web/packages/RandomForestsGLS/index.html" style="color: #990000">RandomForestsGLS</a><span style="color: #990000">}</span> - Generalizaed Least Squares RF
28402837
<ul>
@@ -2851,10 +2848,20 @@ <h3 data-number="1.1.2" class="anchored" data-anchor-id="sec-alg-ml-trees-rf"><s
28512848
<li>New Mahalanobis splitting rule for correlated real-valued outcomes in multivariate regression settings</li>
28522849
</ul></li>
28532850
<li><span style="color: #990000">{</span><a href="https://cran.r-project.org/web/packages/ShrinkageTrees/index.html" style="color: #990000">ShrinkageTrees</a><span style="color: #990000">}</span> (<a href="https://arxiv.org/abs/2507.22004">Paper</a>) - Bayesian regression tree models with shrinkage priors on step height</li>
2851+
<li><span style="color: #990000">{</span><a href="https://cran.r-project.org/web/packages/sirus/index.html" style="color: #990000">sirus</a><span style="color: #990000">}</span>: <u>S</u>table and <u>I</u>nterpretable <u>Ru</u>le <u>S</u>et
2852+
<ul>
2853+
<li>Combines the simplicity of decision trees with a predictivity close to random forests</li>
2854+
<li>Instead of aggregating predictions, SIRUS aggregates the forest structure: the most frequent nodes of the forest are selected to form a stable rule ensemble model</li>
2855+
<li>Me: The interpretability of a Decision Tree with similar predictive accuracy of a RF. Seems like it would be good to fit both and use this model for additional interpretability.</li>
2856+
<li>There’s also a Spatial SIRUS (<a href="https://github.com/LucaPate/Spatial_SIRUS">github</a>, <a href="https://arxiv.org/abs/2408.05537">paper</a>) which uses a spatial <span style="color: #990000">{RandomForestsGLS}</span> model in a SIRUS algorithm</li>
2857+
</ul></li>
28542858
<li><span style="color: #990000">{</span><a href="https://stochtree.ai/R_docs/pkgdown/" style="color: #990000">stochtree</a><span style="color: #990000">}</span> - Stochastic tree ensembles (i.e.&nbsp;BART, XBART) for supervised learning and causal inference.</li>
2855-
<li><span style="color: #990000">{</span><a href="https://cran.r-project.org/web/packages/corrRF/index.html" style="color: #990000">corrRF</a><span style="color: #990000">}</span> (<a href="https://arxiv.org/abs/2503.12634">Paper</a>) - A clustered random forest algorithm for fitting random forests for data of independent clusters, that exhibit within cluster dependence
2859+
<li><span style="color: #990000">{</span><a href="https://cran.r-project.org/web/packages/unityForest/index.html" style="color: #990000">unityForest</a><span style="color: #990000">}</span> - Improving Interaction Modeling and Interpretability in Random Forests
28562860
<ul>
2857-
<li>Possibly can be used on repeated measures data</li>
2861+
<li>Currently, only classification is supported</li>
2862+
<li>A random forest variant designed to better take covariates with purely interaction-based effects into account, including interactions for which none of the involved covariates exhibits a marginal effect.</li>
2863+
<li>Facilitates the identification and interpretation of (marginal or interactive) effects</li>
2864+
<li>Includes unity variable importance and covariate-representative tree roots (CRTRs) that provide interpretable visualizations of these conditions</li>
28582865
</ul></li>
28592866
</ul></li>
28602867
</ul>

_book/qmd/big-data.html

Lines changed: 21 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -2331,13 +2331,6 @@ <h2 class="unnumbered anchored" data-anchor-id="sec-bgdat-misc">Misc</h2>
23312331
<section id="sec-bgdat-hghperf" class="level2 unnumbered">
23322332
<h2 class="unnumbered anchored" data-anchor-id="sec-bgdat-hghperf">High Performance</h2>
23332333
<ul>
2334-
<li><p><span style="color: #990000">{rpolars}</span>: Arrow product; uses SIMD which is a low-level vectorization that can be used to speed up simple operations like addition, subtraction, division, and multiplication</p>
2335-
<ul>
2336-
<li>Also see <a href="../qmd/r-polars.html#sec-r-polars" style="color: green">R, Polars</a> and <a href="../qmd/python-polars.html#sec-py-polars" style="color: green">Python, Polars</a></li>
2337-
<li>Capable of using GPUs for up to a 10x execution time decrease.</li>
2338-
<li>Polars Cloud can perform distributed computing</li>
2339-
<li>Extensions: <span style="color: #990000">{</span><a href="https://www.tidypolars.etiennebacher.com/" style="color: #990000">tidypolars</a><span style="color: #990000">}</span>, <span style="color: goldenrod">{</span><a href="https://tidypolars.readthedocs.io/en/latest/" style="color: goldenrod">tidypolars</a><span style="color: goldenrod">}</span></li>
2340-
</ul></li>
23412334
<li><p><span style="color: #990000">{</span><a href="https://sebkrantz.github.io/collapse/" style="color: #990000">collapse</a><span style="color: #990000">}</span> (<a href="https://arxiv.org/abs/2403.05038">Vignette</a>): Fast grouped &amp; weighted statistical computations, time series and panel data transformations, list-processing, data manipulation functions, summary statistics and various utilities such as support for variable labels. Class-agnostic framework designed to work with vectors, matrices, data frames, lists and related classes i.e.&nbsp;<em>xts</em>, <em>data.table</em>, <em>tibble</em>, <em>pdata.frame</em>, <em>sf</em>.</p>
23422335
<ul>
23432336
<li><p>Optimize a script</p>
@@ -2364,11 +2357,31 @@ <h2 class="unnumbered anchored" data-anchor-id="sec-bgdat-hghperf">High Performa
23642357
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a> <span class="fu">fungroup</span>()</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div></li>
23652358
<li><p><span style="color: #990000">{</span><a href="https://github.com/NicChr/fastplyr" style="color: #990000">fastplyr</a><span style="color: #990000">}</span> - Has a <span style="color: #990000">{dplyr}</span> API and a <span style="color: #990000">{collapse}</span> backend</p></li>
23662359
</ul></li>
2367-
<li><p><span style="color: #990000">{r2c}</span>: Fast grouped statistical computation; currently limited to a few functions, sometimes faster than <span style="color: #990000">{collapse}</span></p></li>
2360+
<li><p><span style="color: goldenrod">{</span><a href="https://docs.nvidia.com/cupynumeric/latest/index.html" style="color: goldenrod">cuNumeric</a><span style="color: goldenrod">}</span> (<a href="https://towardsdatascience.com/numpy-api-on-a-gpu/">intro</a>)- Nvidia drop-in replacement for numpy that is built on the Legate framework</p>
2361+
<ul>
2362+
<li>Allow you to use multi-core CPUs, single or multi-GPU nodes, and even multi-node clusters without changing your Python code.</li>
2363+
<li>It translates high-level array operations into a graph of fine-grained tasks and hands that graph to the C++ Legion runtime, which schedules the tasks, partitions the data, and moves tiles between CPUs, GPUs and network links for you.</li>
2364+
</ul></li>
2365+
<li><p><span style="color: #990000">{</span><a href="https://github.com/bbtheo/cuplyr" style="color: #990000">cuplyr</a><span style="color: #990000">}</span> - A dplyr backend for GPU acceleration via RAPIDS cuDF</p>
2366+
<ul>
2367+
<li>Can provide significant speedups on larger datasets (typically &gt;10M rows) without requiring major code changes.</li>
2368+
</ul></li>
23682369
<li><p><span style="color: #990000">{data.table}</span>: Enhanced data frame class with concise data manipulation framework offering powerful aggregation, extremely flexible split-apply-combine computing, reshaping, joins, rolling statistics, set operations on tables, fast csv read/write, and various utilities such as transposition of data.</p>
23692370
<ul>
23702371
<li>See <a href="../qmd/r-data-table.html#sec-r-dt" style="color: green">R, data.table</a></li>
23712372
</ul></li>
2373+
<li><p><span style="color: #990000">{</span><a href="https://cran.r-project.org/web/packages/kit/index.html" style="color: #990000">kit</a><span style="color: #990000">}</span> - Fast (implemented in C) vectorized and nested switches, some parallel (row-wise) statistics, and some utilities such as efficient partial sorting and unique values.</p></li>
2374+
<li><p><span style="color: #990000">{matrixStats}</span>: Efficient row-and column-wise (weighted) statistics on matrices and vectors, including computations on subsets of rows and columns.</p></li>
2375+
<li><p><span style="color: goldenrod">{</span><a href="https://towardsdatascience.com/this-decorator-will-make-python-30-times-faster-715ca5a66d5f" style="color: goldenrod">numba</a><span style="color: goldenrod">}</span> - JIT compiler that translates a subset of Python and NumPy code into fast machine code.</p></li>
2376+
<li><p><span style="color: #990000">{polars}</span>: Arrow product; uses SIMD which is a low-level vectorization that can be used to speed up simple operations like addition, subtraction, division, and multiplication</p>
2377+
<ul>
2378+
<li>Also see <a href="../qmd/r-polars.html#sec-r-polars" style="color: green">R, Polars</a> and <a href="../qmd/python-polars.html#sec-py-polars" style="color: green">Python, Polars</a></li>
2379+
<li>Capable of using GPUs for up to a 10x execution time decrease.</li>
2380+
<li>Polars Cloud can perform distributed computing</li>
2381+
<li>Extensions: <span style="color: #990000">{</span><a href="https://www.tidypolars.etiennebacher.com/" style="color: #990000">tidypolars</a><span style="color: #990000">}</span>, <span style="color: goldenrod">{</span><a href="https://tidypolars.readthedocs.io/en/latest/" style="color: goldenrod">tidypolars</a><span style="color: goldenrod">}</span></li>
2382+
</ul></li>
2383+
<li><p><span style="color: #990000">{</span><a href="https://github.com/t-kalinowski/quickr" style="color: #990000">quickr</a><span style="color: #990000">}</span>: R to Fortran transpiler</p></li>
2384+
<li><p><span style="color: #990000">{r2c}</span>: Fast grouped statistical computation; currently limited to a few functions, sometimes faster than <span style="color: #990000">{collapse}</span></p></li>
23722385
<li><p><span style="color: #990000">{</span><a href="https://cran.r-project.org/web//packages/Rfast/index.html" style="color: #990000">Rfast</a><span style="color: #990000">}</span>, <span style="color: #990000">{</span><a href="https://cran.r-project.org/web/packages/Rfast2/index.html" style="color: #990000">Rfast2</a><span style="color: #990000">}</span>: A collection of fast functions for data analysis.</p>
23732386
<ul>
23742387
<li><p>Rfast - Column- and row- wise means, medians, variances, minimums, maximums, many t, F and G-square tests, many regressions (normal, logistic, Poisson), are some of the many fast functions</p></li>
@@ -2385,15 +2398,6 @@ <h2 class="unnumbered anchored" data-anchor-id="sec-bgdat-hghperf">High Performa
23852398
<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a> <span class="at">max =</span> minmax[<span class="dv">2</span>, ]</span>
23862399
<span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"></a>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div></li>
23872400
</ul></li>
2388-
<li><p><span style="color: #990000">{matrixStats}</span>: Efficient row-and column-wise (weighted) statistics on matrices and vectors, including computations on subsets of rows and columns.</p></li>
2389-
<li><p><span style="color: goldenrod">{</span><a href="https://towardsdatascience.com/this-decorator-will-make-python-30-times-faster-715ca5a66d5f" style="color: goldenrod">numba</a><span style="color: goldenrod">}</span> - JIT compiler that translates a subset of Python and NumPy code into fast machine code.</p></li>
2390-
<li><p><span style="color: goldenrod">{</span><a href="https://docs.nvidia.com/cupynumeric/latest/index.html" style="color: goldenrod">cuNumeric</a><span style="color: goldenrod">}</span> (<a href="https://towardsdatascience.com/numpy-api-on-a-gpu/">intro</a>)- Nvidia drop-in replacement for numpy that is built on the Legate framework</p>
2391-
<ul>
2392-
<li>Allow you to use multi-core CPUs, single or multi-GPU nodes, and even multi-node clusters without changing your Python code.</li>
2393-
<li>It translates high-level array operations into a graph of fine-grained tasks and hands that graph to the C++ Legion runtime, which schedules the tasks, partitions the data, and moves tiles between CPUs, GPUs and network links for you.</li>
2394-
</ul></li>
2395-
<li><p><span style="color: #990000">{kit}</span>: Fast vectorized and nested switches, some parallel (row-wise) statistics, and some utilities such as efficient partial sorting and unique values.</p></li>
2396-
<li><p><span style="color: #990000">{</span><a href="https://github.com/t-kalinowski/quickr" style="color: #990000">quickr</a><span style="color: #990000">}</span>: R to Fortran transpiler</p></li>
23972401
</ul>
23982402
</section>
23992403
<section id="sec-bgdat-lgmem" class="level2 unnumbered">

0 commit comments

Comments
 (0)