Skip to content

Conversation

@AZ9tumas
Copy link

@AZ9tumas AZ9tumas commented Oct 7, 2025

Broadcasting Implementation for Matrix Multiplication

The cTensor library has been enhanced with extensive broadcasting support for matrix multiplication operations, bringing it closer to the functionality offered by major tensor libraries like PyTorch and NumPy. This advancement significantly improves the library's flexibility by allowing matrix operations between tensors of different but compatible shapes.

Comprehensive Test Coverage Added

The test suite now includes thorough validation for various broadcasting scenarios:

  • 4D × 4D Broadcasting: Tests matrix multiplication between 4-dimensional tensors with different shapes that are broadcastable, such as {1,2,2,3} @ {2,1,3,2} -> {2,2,2,2}

  • 3D × 4D Broadcasting: Validates operations between 3D and 4D tensors, like {1,2,3} @ {2,1,3,2} -> {2,1,2,2}, ensuring proper dimension expansion and contraction

  • 3D × 3D Broadcasting: Covers scenarios such as {2,1,3} @ {1,3,2} -> {2,1,2} where broadcasting occurs across batch dimensions

The test cases verify that:

  • Simple 2D matrix operations continue to work correctly
  • Batch matrix operations maintain their expected behavior
  • Complex multi-dimensional broadcasting follows standard tensor algebra rules
  • Edge cases with identity matrices and special content (zeros, ones) produce correct results

This implementation ensures that existing functionality remains intact while adding powerful new capabilities for handling tensors with mismatched but compatible dimensions, making cTensor more versatile for real-world applications requiring flexible tensor operations.

@Advaitgaur004
Copy link
Contributor

@AZ9tumas, please squash your commits

@PrimedErwin
Copy link
Collaborator

@AZ9tumas thanks for your first pr to cTensor! Could you change the test file structure? github workflow We built an automatic test here for each operators, checking all the valid operators in the operators.c. Now the workflow has a problem, can you merge the broadcast function into the matmul function itself? So that we don't need to put an extra function in the operators.c, which causes the workflow fails.

@AZ9tumas
Copy link
Author

AZ9tumas commented Oct 7, 2025

Sure, I can integrate the broadcast function into the matmul function but what about slices? There is a function that takes a matrix and returns the 2D matrix based on batch and group. It's a helper function, anywhere else I can place this function so that operator.c can access it?

@PrimedErwin
Copy link
Collaborator

You can put it in utils.c, as same as Tensor_mul does. broadcast for Tensor_mul

@AZ9tumas AZ9tumas marked this pull request as draft October 8, 2025 03:26
@AZ9tumas AZ9tumas marked this pull request as ready for review October 9, 2025 07:58
@PrimedErwin
Copy link
Collaborator

@AZ9tumas The function Tensor_batch_slice defined in utils.c, where is it been used?

@AZ9tumas
Copy link
Author

AZ9tumas commented Oct 9, 2025

I have kept the function there as you had suggested earlier but decided not to use it in the end because of efficiency.
However I believe that this function might come handy later in the future as a helper function. In case it's not required I can remove it. 😁

@PrimedErwin
Copy link
Collaborator

You should remove it if the helper function is not used by anyone. Tensor_mul used cten_elemwise_broadcast as a helper function, which is put in the utils.c.
In addition, side devices perform matmul pretty slow(related to FPU), it's glad that you say

decided not to use it in the end because of efficiency.

@AZ9tumas
Copy link
Author

AZ9tumas commented Oct 9, 2025

Alright I will be adding a final commit removing the function entirely. I hope that this gets merged after that. Thank you for your assistance and guidance.

@PrimedErwin PrimedErwin merged commit 421cdac into pocketpy:test Oct 10, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants