Skip to content
This repository was archived by the owner on Jul 22, 2025. It is now read-only.
This repository was archived by the owner on Jul 22, 2025. It is now read-only.

Not seeing the inference speed up on cuda using the sparse trainer notebook #27

@HamidShojanazeri

Description

@HamidShojanazeri

Hi @madlag ,
I have tried the notebook which is very similar to the notebook you shared in the issue #5 but I am not seeing any speed up at the end if we move the models to cuda, although I can see about 1.3X speed up on cpu. I am running this on EC2 g4dn.2xlarge instance which has T4 card.

This is my training code and this is the inference code. I wonder if I am missing something here.

The parameter counts shows the reduction but the inference speed is both pruned and non-pruned ~9 ms.

prunebert_model.num_parameters() / bert_model_original.num_parameters() = 0.6118184376136527

Thanks for you help and the great work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions