Knowledge Distillation: EfficientNet-B0 Baseline vs Distilled

A comparative analysis of EfficientNet-B0 trained with standard cross-entropy loss versus knowledge distillation from EfficientNet-B2.

Project Structure

final_DL/
├── config.py              # Configuration and hyperparameters
├── data_loader.py         # Dataset loading and preprocessing
├── models.py              # Model definitions and distillation loss
├── train_baseline.py      # Baseline training script
├── train_distillation.py  # Knowledge distillation training script
├── evaluate.py            # Evaluation and comparison utilities
├── utils.py               # Helper functions
├── requirements.txt       # Python dependencies
├── data/                  # Downloaded datasets (auto-created)
├── checkpoints/           # Saved model weights (auto-created)
└── results/               # Training logs and plots (auto-created)

Setup

1. Install Dependencies

pip install -r requirements.txt

2. Verify GPU (Optional)

import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'CPU only'}")

Usage

Step 1: Train Baseline Model

python train_baseline.py

This trains EfficientNet-B0 with standard cross-entropy loss.

Step 2: Train Distilled Model

python train_distillation.py

This trains EfficientNet-B0 using knowledge distillation from EfficientNet-B2.

Step 3: Compare Results

python evaluate.py

This generates:

Performance comparison plots
Confusion matrices
Detailed comparison report

Configuration

Edit config.py to customize:

Parameter	Default	Description
`DATASET_NAME`	"TinyImageNet"	Dataset to use (place under `data/tiny-imagenet-200`)
`TEACHER_MODEL`	"efficientnet_b2"	Teacher model architecture
`STUDENT_MODEL`	"efficientnet_b0"	Student model architecture
`TEMPERATURE`	4.0	Distillation temperature
`ALPHA`	0.7	Weight for soft targets
`BATCH_SIZE`	32	Batch size (RTX 5070 headroom; lower if OOM)
`NUM_EPOCHS`	50	Training epochs

Model Sizes

Model	Parameters	Size (MB)	GPU Memory (batch=32 @ 64×64)
EfficientNet-B0 (Student)	5.3M	~20 MB	~2-3 GB
EfficientNet-B2 (Teacher)	9.1M	~35 MB	~4-5 GB
Combined (Training)	-	-	~6-8 GB (fits RTX 5070)

Knowledge Distillation

The distillation loss combines:

Soft loss: KL divergence between teacher and student soft predictions
Hard loss: Cross-entropy with ground truth labels

$$\mathcal{L} = \alpha \cdot T^2 \cdot \text{KL}(\sigma(z_s/T) | \sigma(z_t/T)) + (1-\alpha) \cdot \text{CE}(z_s, y)$$

Where:

$z_s$, $z_t$ = student and teacher logits
$T$ = temperature (higher = softer probabilities)
$\alpha$ = weight for soft targets
$\sigma$ = softmax function

Expected Results

After training on TinyImageNet (64×64):

Distilled model should outperform baseline by a few percentage points (exact delta varies)
Both models share the same inference cost (identical architecture)
Distilled model benefits from teacher soft targets across 200 classes

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
checkpoints		checkpoints
checkpoints_aktp		checkpoints_aktp
results		results
results_aktp		results_aktp
.gitignore		.gitignore
AKTPB0.py		AKTPB0.py
README.md		README.md
aktp_log.txt		aktp_log.txt
baseline_log.txt		baseline_log.txt
colab.ipynb		colab.ipynb
config.py		config.py
data_loader.py		data_loader.py
evaluate.py		evaluate.py
evaluation_allmodels.txt		evaluation_allmodels.txt
models.py		models.py
requirements.txt		requirements.txt
test_b2_load.py		test_b2_load.py
test_r18_load.py		test_r18_load.py
train_baseline.py		train_baseline.py
train_distillation.py		train_distillation.py
train_distillation_int.py		train_distillation_int.py
utils.py		utils.py
vanilla_distill.txt		vanilla_distill.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Knowledge Distillation: EfficientNet-B0 Baseline vs Distilled

Project Structure

Setup

1. Install Dependencies

2. Verify GPU (Optional)

Usage

Step 1: Train Baseline Model

Step 2: Train Distilled Model

Step 3: Compare Results

Configuration

Model Sizes

Knowledge Distillation

Expected Results

Troubleshooting

Out of Memory (OOM)

Slow Training

References

About

Uh oh!

Releases

Packages

Languages

cujoramirez/FinalProject_Deeplearning_Distillation

Folders and files

Latest commit

History

Repository files navigation

Knowledge Distillation: EfficientNet-B0 Baseline vs Distilled

Project Structure

Setup

1. Install Dependencies

2. Verify GPU (Optional)

Usage

Step 1: Train Baseline Model

Step 2: Train Distilled Model

Step 3: Compare Results

Configuration

Model Sizes

Knowledge Distillation

Expected Results

Troubleshooting

Out of Memory (OOM)

Slow Training

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages