Skip to content

Conversation

@bnprks
Copy link
Owner

@bnprks bnprks commented Apr 17, 2025

This adds a single-pass algorithm to compute x * t(x) for col-major x, along with related helper functions.

bnprks added 2 commits April 16, 2025 22:05
This adds a single-pass algorithm to compute
x * t(x) for col-major x, along with related
helper functions.
Copy link
Collaborator

@immanuelazn immanuelazn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great overall. Tiny code artifacts and places that can be better documented but the logic appears sound. Good stuff Ben

}
}

/**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel like the class decl should be in the header

/**
* Load CSparseMat chunks from a MatrixLoader
*/
class SparseChunkLoader {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this might be re-usable infra, it might be a good idea to throw SparseChunkLoader into a more general location

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, this is left out until more operations utilize this.

return ret;
}

// Set up multiplication worker threads:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core logic for the actual gram matrix calculation seems to be pretty lost inside the multi-threading logic. Is there a way we can create a helper function to separate the thread_vec creation, and worker logic?

void swap_mat(CSparseMat &other);
};

// Allow a leader thread to coordinate with worker threads
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also feels to be re-usable logic if we decide to have similar leader worker thread systems for our overall concurrency model.

expect_equal(crossprod(t(m)), crossprod_dense(t(m_bp), threads=threads))
expect_equal(cor(t(m)), cor_dense(t(m_bp), threads=threads))
expect_equal(cov(t(m)), cov_dense(t(m_bp), threads=threads))
expect_equal
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last test here seems incomplete. However, I think you hit all the test cases here anyways that are relevant

#'
#' The input matrix must be row-major for `crossprod_dense()`, `cor_dense()`, and `cov_dense()`.
#' For `tcrossprod_dense()`, the input must be col-major. Stated another way: when these functions are used to calculate
#' feature correlations, the cell axis should always be the major axis. The functions will raise an error if
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel like last sentence combines two sentences. Maybe "The functions will raise an error if the input is in the wrong storage order. The input matrix requires being passed through transpose_storage_order() to switch to the required storage order".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On another note, we can probably just link to "efficiency tips"

#'
#' **Input storage order**
#'
#' The input matrix must be row-major for `crossprod_dense()`, `cor_dense()`, and `cov_dense()`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be col-major

#'
#' @param x (IterableMatrix) Input matrix. In general, disk-backed matrices should have cell-major storage ordering. (See details,
#' or `transpose_storage_order()`)
#' @param buffer_bytes (integer) In-memory chunk size to use during computations. Performance is best when this is slightly below
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a note that the buffer_bytes minimum scales linearly with how many features exist

) {
if (buffer_bytes < 48 * mat->rows()) {
throw std::runtime_error(
"dense_transpose_multiply: buffer_bytes must be at least 24 * mat.rows()"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"dense_transpose_multiply: buffer_bytes must be at least 24 * mat.rows()"
"dense_transpose_multiply: buffer_bytes must be at least 48 * mat.rows()"


uint32_t first_col = 0;
while (true) {
if (finished_column_ && !loader_->nextCol()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some additional documentation on what the return bool indicates. ie why would this return true instead of false, when there's no additional information and the matrix is finished?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants