[r][cpp] Add dense cross-product and related ops #243

bnprks · 2025-04-17T05:06:20Z

This adds a single-pass algorithm to compute x * t(x) for col-major x, along with related helper functions.

immanuelazn

Looks great overall. Tiny code artifacts and places that can be better documented but the logic appears sound. Good stuff Ben

immanuelazn · 2025-05-07T18:58:51Z

r/src/bpcells-cpp/matrixUtils/DenseTransposeMultiply.cpp

+    }
+}
+
+/**


Feel like the class decl should be in the header

immanuelazn · 2025-05-07T18:59:48Z

r/src/bpcells-cpp/matrixUtils/DenseTransposeMultiply.cpp

+/**
+ * Load CSparseMat chunks from a MatrixLoader
+ */
+class SparseChunkLoader {


Since this might be re-usable infra, it might be a good idea to throw SparseChunkLoader into a more general location

As discussed, this is left out until more operations utilize this.

immanuelazn · 2025-05-07T19:16:01Z

r/src/bpcells-cpp/matrixUtils/DenseTransposeMultiply.cpp

+        return ret;
+    }
+
+    // Set up multiplication worker threads:


The core logic for the actual gram matrix calculation seems to be pretty lost inside the multi-threading logic. Is there a way we can create a helper function to separate the thread_vec creation, and worker logic?

immanuelazn · 2025-05-07T19:17:22Z

r/src/bpcells-cpp/matrixUtils/DenseTransposeMultiply.cpp

+    void swap_mat(CSparseMat &other);
+};
+
+// Allow a leader thread to coordinate with worker threads


Also feels to be re-usable logic if we decide to have similar leader worker thread systems for our overall concurrency model.

immanuelazn · 2025-05-19T19:44:32Z

r/tests/testthat/test-matrix_stats.R

+      expect_equal(crossprod(t(m)), crossprod_dense(t(m_bp), threads=threads))
+      expect_equal(cor(t(m)), cor_dense(t(m_bp), threads=threads))
+      expect_equal(cov(t(m)), cov_dense(t(m_bp), threads=threads))
+      expect_equal


Last test here seems incomplete. However, I think you hit all the test cases here anyways that are relevant

immanuelazn · 2025-05-19T20:12:11Z

r/R/matrix.R

+#'
+#' The input matrix must be row-major for `crossprod_dense()`, `cor_dense()`, and `cov_dense()`.
+#' For `tcrossprod_dense()`, the input must be col-major. Stated another way: when these functions are used to calculate 
+#' feature correlations, the cell axis should always be the major axis. The functions will raise an error if


Feel like last sentence combines two sentences. Maybe "The functions will raise an error if the input is in the wrong storage order. The input matrix requires being passed through transpose_storage_order() to switch to the required storage order".

On another note, we can probably just link to "efficiency tips"

immanuelazn · 2025-05-19T20:12:25Z

r/R/matrix.R

+#' 
+#' **Input storage order**
+#'
+#' The input matrix must be row-major for `crossprod_dense()`, `cor_dense()`, and `cov_dense()`.


Should be col-major

immanuelazn · 2025-05-19T20:14:12Z

r/R/matrix.R

+#' 
+#' @param x (IterableMatrix) Input matrix. In general, disk-backed matrices should have cell-major storage ordering. (See details,
+#'   or `transpose_storage_order()`)
+#' @param buffer_bytes (integer) In-memory chunk size to use during computations. Performance is best when this is slightly below


Maybe a note that the buffer_bytes minimum scales linearly with how many features exist

immanuelazn · 2025-05-19T20:32:48Z

r/src/bpcells-cpp/matrixUtils/DenseTransposeMultiply.cpp

+) {
+    if (buffer_bytes < 48 * mat->rows()) {
+        throw std::runtime_error(
+            "dense_transpose_multiply: buffer_bytes must be at least 24 * mat.rows()"


Suggested change

"dense_transpose_multiply: buffer_bytes must be at least 24 * mat.rows()"

"dense_transpose_multiply: buffer_bytes must be at least 48 * mat.rows()"

immanuelazn · 2025-05-19T20:39:51Z

r/src/bpcells-cpp/matrixUtils/DenseTransposeMultiply.cpp

+
+    uint32_t first_col = 0;
+    while (true) {
+        if (finished_column_ && !loader_->nextCol()) {


some additional documentation on what the return bool indicates. ie why would this return true instead of false, when there's no additional information and the matrix is finished?

bnprks added 2 commits April 16, 2025 22:05

[r][cpp] Add dense cross-product and related ops

470f238

This adds a single-pass algorithm to compute x * t(x) for col-major x, along with related helper functions.

Update docs

45c688a

bnprks mentioned this pull request May 14, 2025

Crossprod and/or covariance matrix of IterableMatrix #201

Open

immanuelazn approved these changes May 19, 2025

View reviewed changes

+                  }
+              }
+              /**

	"dense_transpose_multiply: buffer_bytes must be at least 24 * mat.rows()"
	"dense_transpose_multiply: buffer_bytes must be at least 48 * mat.rows()"

[r][cpp] Add dense cross-product and related ops #243

Are you sure you want to change the base?

[r][cpp] Add dense cross-product and related ops #243

Uh oh!

Conversation

bnprks commented Apr 17, 2025

Uh oh!

immanuelazn left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants