Not seeing the inference speed up on cuda using the sparse trainer notebook

Hi @madlag ,
I have tried the [notebook](https://github.com/huggingface/nn_pruning/blob/main/notebooks/01-sparse-trainer.ipynb) which is very similar to the notebook you shared in the issue #5 but I am not seeing any speed up at the end if we move the models to cuda, although I can see about 1.3X speed up on cpu. I am running this on EC2 g4dn.2xlarge instance which has T4 card. 

This is my [training code](https://gist.github.com/HamidShojanazeri/ce29ede884936dc8204bde7691c87f73)  and this is the [inference code](https://gist.github.com/HamidShojanazeri/05ff5751a119124e30c0e9eb44547389). I wonder if I am missing something here. 

The parameter counts shows the reduction but the inference speed is both pruned and non-pruned ~9 ms.
```
prunebert_model.num_parameters() / bert_model_original.num_parameters() = 0.6118184376136527

```

Thanks for you help and the great work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not seeing the inference speed up on cuda using the sparse trainer notebook #27

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Not seeing the inference speed up on cuda using the sparse trainer notebook #27

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions