Originally from this paper's code implementation - rethinking-network-pruning.
We use the training and pruning settings of this repo in pursuit of exactly the same experimental settings as the author of the paper did.
But modify the dataloader to our imbalanced dataset generator to test whether the conclusion of the paper still holds under different imbalanced situations.
We use CIFAR-10 as a base dataset to create imbalanced datasets.To test the models in different "Imbalanced Degree", we oversamples the dataset in two different dimensions: "Class Distribution" and "Class Ratio"
We define 3 different distributions as follows
-
Dist A: one label is much more than all the others e.g. [100, 1, 1, 1, 1, 1, 1, 1, 1, 1]
-
Dist B: 4 labels are much more than the other 6 labels e.g. [100, 100, 100, 100, 1, 1, 1, 1, 1, 1]
-
Dist C: one label is much less than all the others e.g. [100, 100, 100, 100, 100, 100, 100, 100, 100, 1]
To simulate a variety of imbalanced degrees, we use 4 class ratios [75, 100, 200, 500]. Examples as follows
- 75: [75, 1, 1, 1, 1, 1, 1, 1, 1, 1]
- 100: [100, 1, 1, 1, 1, 1, 1, 1, 1, 1]
- 200: [200, 1, 1, 1, 1, 1, 1, 1, 1, 1]
- 500: [500, 1, 1, 1, 1, 1, 1, 1, 1, 1]
This directory contains all the CIFAR experiments in the paper, where there are four pruning methods in total:
- L1-norm based channel pruning
- Network Slimming
- Soft filter pruning
- Non-structured weight-level pruning
For each method, check out example commands for baseline training, finetuning, scratch-E training and scratch-B training in the corresponding directorys.
We only use L1-norm based channel pruning as the pruning method. Then we compare the 4 methods' accuracy to find out the result.