Skip to content

Conversation

@ejmeitz
Copy link
Member

@ejmeitz ejmeitz commented Oct 2, 2025

Goals:

  • Automatically fuse broadcast operations into a single CUDA kernel
  • Allow users to fuse custom functions which when called invoke a fused kernel

ejmeitz and others added 11 commits October 2, 2025 14:01
* init broadcasting changes

* broadasting almost works

* error on user defined functions

* support scalar broadcasting

* asd

* better type promotion

* fix floaty unary ops

* use optimized square and reciprocal calls given 2 and -1 literals

* start tests

* add allowscalar and allowdouble

* ban promotion to all wider types

* working on tests

* force NDArray type param for dim to be Int64

* tests up to GEMM pass

* up to unary_ops pass

* unary reductions pass tests

* binops pass except things with NaN

* all tests pass

* add short hands to copy to and from Julia array

* tests for scalars

* fix tests i broke

* stuff

* fix some things

* all tests pass

* remove if-else

* separate out promotion logic

* add lcm, gcd, negation tests

---------

Co-authored-by: krasow <krasow@u.northwestern.edu>
Co-authored-by: krasow <krasow@u.northwestern.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants