- Root notebooks drive the work:
task1_cats_dogs.ipynb,task2_chihuahua_muffin.ipynb,task3_fashion_mnist.ipynb.prompt.mdis the assignment brief;CLAUDE.mdrecords model guidance;.vscode/holds editor prefs. - Keep datasets and artifacts out of git (e.g.,
data/cats_dogs/,data/muffin_chihuahua/,data/fashion_mnist/); store processed tensors next to the task notebook when needed. - Structure notebooks consistently: imports → config/constants → data prep → model definitions → training → evaluation/reporting.
- Create/activate an environment:
conda create -n dl python=3.10 && conda activate dl. - Install core deps:
pip install torch torchvision scikit-learn matplotlib kaggle jupyter. - Launch notebooks:
jupyter lab(orjupyter notebook) from the repo root. - Download data (requires
~/.kaggle/kaggle.json):
kaggle datasets download -d samuelcortinhas/cats-and-dogs-image-classification
kaggle datasets download -d samuelcortinhas/muffin-vs-chihuahua-image-classification - Optional sanity check for GPU:
python - <<'PY'\nimport torch; print(torch.__version__, torch.cuda.is_available())\nPY
- Python 3.10+, PEP8, 4-space indentation;
snake_casefor variables/functions,PascalCasefor classes. - Show model structure with
self.layers = nn.ModuleList([...])and add short markdown cells explaining architecture choices over LeNet-5. - Control randomness: set seeds and log split keys; enable
torch.backends.cudnn.benchmark = Trueonly when input shapes are fixed. - Naming: dataloaders
train_loader/val_loader/test_loader, metricsacc,f1_micro,f1_macro, checkpointsbest_model.pth.
- No automated suite; validate by executing notebooks. Ensure evaluation cells report mean and std over required repeats/folds (Task 1: 5× holdout, Task 2: StratifiedKFold 5×, Task 3: 10× holdout with 2:49:49 split).
- When changing data or models, re-run with a second seed to confirm stability; log the seed and data split recipe.
- Do not commit large outputs; keep checkpoints local and capture final metrics in markdown tables.
- Commit messages: concise, present-tense (e.g.,
Add efficientnet-lite for muffin task); keep changes scoped. - PR checklist: summary of notebook edits, data source or download command used, metrics table (accuracy + F1 micro/macro), environment notes (torch version, GPU/CPU), and training caveats (epochs, augmentations, seeds). Add plots only when they clarify results.
- Do not add datasets, checkpoints (
*.pth/*.pt), or notebook checkpoints (.ipynb_checkpoints/) to git; add ignore entries if new artifacts appear.
- Keep Kaggle credentials in
~/.kaggle/kaggle.jsonwith 600 permissions; never commit tokens or dataset zips. - When sharing notebooks, clear personal paths/usernames and strip bulky outputs unless they capture key results.