Sparse MoE #116

oleksost · 2024-09-27T13:48:33Z

At a high level: we compute base activations first, and then in the kernel, we do the scattering and dds multiplication, where tokens are only processed by their selected (top-K) sparse experts.

TODOs:

look into runtime etc.
make it work with other block-sizes
look into scattered MoE implementation

with sparsity/stk/measure_time.py:

oleksost added 4 commits September 20, 2024 18:30

merging kernel

3668251

profile_mask_merging

766453b

profile adapter merging

2dd98ab

sparse merging and sdd kernel

dad6f19

oleksost requested review from pclucas14 and sordonia September 27, 2024 13:49

oleksost added 17 commits September 27, 2024 10:16

require stk

4206e84

scattared implementation

334df5e

nvm

f95fd5d

nvm

85859c9

formating

c8d5361

refactor

0772fd5

format

5061165

rename

dd6b71a

test

3ee237d

test

1c83f10

black

1b68a3a

tests

306cda6

cleaned

98aabce

test

8fe4540

tests

8763136

do not use sets when parametrizing tests

ab4802a

nvm

6bba8af

sordonia approved these changes Oct 24, 2024

View reviewed changes

oleksost added 3 commits November 6, 2024 09:33

Merge branch 'main' into sparse_masks

f9e2423

cleanup

4487c0b

black formatter

57792f8

oleksost requested a review from sordonia November 6, 2024 15:10

isort formatter

0160598

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sparse MoE #116

Sparse MoE #116

Uh oh!

oleksost commented Sep 27, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Sparse MoE #116

Are you sure you want to change the base?

Sparse MoE #116

Uh oh!

Conversation

oleksost commented Sep 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

oleksost commented Sep 27, 2024 •

edited

Loading