Add block sparse linear and locally connected layers by clane9 · Pull Request #6 · clane9/columnformers

clane9 · 2024-05-01T16:53:58Z

No description provided.

- Use pytorch native blocksparse tensors following [this blog](https://pytorch.org/blog/speeding-up-vits/) rather than triton blocksparse, which seems not very stable. See e.g. [this recent PR](triton-lang/triton#4156) where `triton.ops` were deprecated. - Represent weights in `BlockSparseLinear` as block-sparse tensor rather than dense tensor. This will save a lot of gpu memory. - Change `BlockSparseLocallyConnected` interface to more closely match `nn.Conv2d`. Except remove support for padding and stride. For now we should be able to restrict to `padding="same"`, `stride=1` (?) - Rewrite function to construct local connectivity matrix to directly construct a sparse rather than dense matrix. Use vectorized operations rather than for loops for construction. This should save a lot of memory and run faster. - Add support for multiple input and output channels and depthwise convolution. The channels axis can either be first or last. For depthwise convolution, first should be more efficient (more block sparsity). TODO: - Finish testing on cuda. native blocksparse matmul is not implemented on CPU.

clane9 · 2024-07-02T18:00:32Z

I updated the implementation:

Use pytorch native blocksparse tensors following this blog rather than triton blocksparse, which seems not yet stable.
Represent weights in BlockSparseLinear as block-sparse tensor rather than dense tensor. This will save a lot of gpu memory.
Change BlockSparseLocallyConnected interface to more closely match nn.Conv2d. Except remove support for padding and stride. For now we should be able to restrict to padding="same", stride=1 (@alismil what do you think?).
Rewrite function for constructing local connectivity matrix to directly construct a sparse rather than dense matrix. Use vectorized operations rather than for loops for construction. This should save a lot of memory and run faster.
Add support for multiple input and output channels and depthwise convolution. The channels axis can either be first or last. For depthwise convolution, first should be more efficient (more block sparsity).

alismil · 2024-07-02T22:24:01Z

For now we should be able to restrict to padding="same", stride=1 (@alismil what do you think?).

If we want to use this for All TNNs, to make the dimensions work out we have to use different strides for each layer

Could not use `sparse_bsr` layout weight parameter. It fails to map to cuda correctly when calling `model.cuda()`. This is probably a bug that should be reported. But as a workaround I just unpack the `crow_indices`, `col_indices` as buffers and store the sparse bsr weight values as a standard strided parameter. Then I construct the sparse bsr weight tensor on the fly during forward. TODO: backward does not work. Raises ``` RuntimeError: addmm: computation on CUDA is not implemented for Strided + Strided @ SparseBsr ```

clane9 · 2024-07-03T00:50:28Z

If we want to use this for All TNNs, to make the dimensions work out we have to use different strides for each layer

Ok good point. Then maybe we should add back the support you had for strides and padding to _sparse_local_connectivity. Hopefully it doesn't complicate things too much.

alismil · 2024-07-16T16:26:57Z

+
+    # conv kernel index offsets. note that the kernel width is required to be odd.
+    # (k^2, 2)
+    kernel_half_width = (kernel_size - 1) // 2


This is a bit restrictive, can we adapt this to also include even kernel size by doing something like this?

if kernel_size % 2 == 0: kernel_half_width = kernel_size // 2 kernel_indices = torch.cartesian_prod( torch.arange(-kernel_half_width, kernel_half_width), torch.arange(-kernel_half_width, kernel_half_width), ) else: kernel_half_width = (kernel_size - 1) // 2 kernel_indices = torch.cartesian_prod( torch.arange(-kernel_half_width, kernel_half_width + 1), torch.arange(-kernel_half_width, kernel_half_width + 1), )

Yes I think you're right, something like this would probably be better. Although it feels like it should be possible to make the code shorter.

More generally, it would probably be best to have exactly the same interface and behavior as Conv2d. What I have now takes a few shortcuts.

Initial blocksparse linear outline

d0ec304

clane9 force-pushed the blocksparse branch from b7bb25b to d0ec304 Compare July 1, 2024 17:24

alismil and others added 6 commits July 1, 2024 14:05

BlockSparseLinear complete

bc7e748

added MixtureLayerNorm

4489743

matched MixtureLinear to main

d320747

completed BlockSparseLocallyConnected

f38640d

updated comments in _create_connectivity_matrix

6356947

alismil reviewed Jul 2, 2024

View reviewed changes

Comment thread columnformers/models/layers.py Outdated

Add in_shape="nd" option for no rearranging

b5b9a02

alismil reviewed Jul 16, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add block sparse linear and locally connected layers#6

Add block sparse linear and locally connected layers#6
clane9 wants to merge 9 commits into
mainfrom
blocksparse

clane9 commented May 1, 2024

Uh oh!

clane9 commented Jul 2, 2024 •

edited

Loading

Uh oh!

Uh oh!

alismil commented Jul 2, 2024 •

edited

Loading

Uh oh!

clane9 commented Jul 3, 2024

Uh oh!

alismil Jul 16, 2024

Uh oh!

clane9 Jul 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

clane9 commented May 1, 2024

Uh oh!

clane9 commented Jul 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

alismil commented Jul 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clane9 commented Jul 3, 2024

Uh oh!

alismil Jul 16, 2024

Choose a reason for hiding this comment

Uh oh!

clane9 Jul 17, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

clane9 commented Jul 2, 2024 •

edited

Loading

alismil commented Jul 2, 2024 •

edited

Loading