Skip to content

Conversation

@oleksost
Copy link
Contributor

@oleksost oleksost commented Sep 27, 2024

At a high level: we compute base activations first, and then in the kernel, we do the scattering and dds multiplication, where tokens are only processed by their selected (top-K) sparse experts.

TODOs:

  • look into runtime etc.
  • make it work with other block-sizes
  • look into scattered MoE implementation

with sparsity/stk/measure_time.py:
image

@oleksost oleksost requested a review from sordonia November 6, 2024 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants