-
Notifications
You must be signed in to change notification settings - Fork 30
[r][cpp] Add dense cross-product and related ops #243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This adds a single-pass algorithm to compute x * t(x) for col-major x, along with related helper functions.
immanuelazn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great overall. Tiny code artifacts and places that can be better documented but the logic appears sound. Good stuff Ben
| } | ||
| } | ||
|
|
||
| /** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel like the class decl should be in the header
| /** | ||
| * Load CSparseMat chunks from a MatrixLoader | ||
| */ | ||
| class SparseChunkLoader { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this might be re-usable infra, it might be a good idea to throw SparseChunkLoader into a more general location
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, this is left out until more operations utilize this.
| return ret; | ||
| } | ||
|
|
||
| // Set up multiplication worker threads: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The core logic for the actual gram matrix calculation seems to be pretty lost inside the multi-threading logic. Is there a way we can create a helper function to separate the thread_vec creation, and worker logic?
| void swap_mat(CSparseMat &other); | ||
| }; | ||
|
|
||
| // Allow a leader thread to coordinate with worker threads |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also feels to be re-usable logic if we decide to have similar leader worker thread systems for our overall concurrency model.
| expect_equal(crossprod(t(m)), crossprod_dense(t(m_bp), threads=threads)) | ||
| expect_equal(cor(t(m)), cor_dense(t(m_bp), threads=threads)) | ||
| expect_equal(cov(t(m)), cov_dense(t(m_bp), threads=threads)) | ||
| expect_equal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last test here seems incomplete. However, I think you hit all the test cases here anyways that are relevant
| #' | ||
| #' The input matrix must be row-major for `crossprod_dense()`, `cor_dense()`, and `cov_dense()`. | ||
| #' For `tcrossprod_dense()`, the input must be col-major. Stated another way: when these functions are used to calculate | ||
| #' feature correlations, the cell axis should always be the major axis. The functions will raise an error if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel like last sentence combines two sentences. Maybe "The functions will raise an error if the input is in the wrong storage order. The input matrix requires being passed through transpose_storage_order() to switch to the required storage order".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On another note, we can probably just link to "efficiency tips"
| #' | ||
| #' **Input storage order** | ||
| #' | ||
| #' The input matrix must be row-major for `crossprod_dense()`, `cor_dense()`, and `cov_dense()`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be col-major
| #' | ||
| #' @param x (IterableMatrix) Input matrix. In general, disk-backed matrices should have cell-major storage ordering. (See details, | ||
| #' or `transpose_storage_order()`) | ||
| #' @param buffer_bytes (integer) In-memory chunk size to use during computations. Performance is best when this is slightly below |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a note that the buffer_bytes minimum scales linearly with how many features exist
| ) { | ||
| if (buffer_bytes < 48 * mat->rows()) { | ||
| throw std::runtime_error( | ||
| "dense_transpose_multiply: buffer_bytes must be at least 24 * mat.rows()" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "dense_transpose_multiply: buffer_bytes must be at least 24 * mat.rows()" | |
| "dense_transpose_multiply: buffer_bytes must be at least 48 * mat.rows()" |
|
|
||
| uint32_t first_col = 0; | ||
| while (true) { | ||
| if (finished_column_ && !loader_->nextCol()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some additional documentation on what the return bool indicates. ie why would this return true instead of false, when there's no additional information and the matrix is finished?
This adds a single-pass algorithm to compute x * t(x) for col-major x, along with related helper functions.