add training notebook

lukagerlach · lukagerlach · commit 92c985ffa598 · 2025-07-12T23:44:07.000+02:00
diff --git a/notebooks/train_a_model.ipynb b/notebooks/train_a_model.ipynb
@@ -0,0 +1,271 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "e53e9fe7",
+   "metadata": {},
+   "source": [
+    "# Model Training Notebook\n",
+    "\n",
+    "This notebook provides a simple interface to train different models on the BBBC021 dataset.\n",
+    "\n",
+    "## Available Models:\n",
+    "1. **Vanilla SimCLR** - Standard contrastive learning with data augmentations (optionally use weak labels to prevent compound of positive pair in negative pairs)\n",
+    "2. **Weak Supervision SimCLR** - Uses compound labels to create positive pairs\n",
+    "3. **WS-DINO** - Teacher-student distillation approach\n",
+    "\n",
+    "## Quick Start:\n",
+    "1. Set your training parameters in the configuration section (Check out our training module for a more detailed look at what params to set for each training approach)\n",
+    "2. Choose your model type\n",
+    "3. Run the training cell"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d98f856a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "import sys\n",
+    "import torch\n",
+    "import gc\n",
+    "from pathlib import Path\n",
+    "\n",
+    "# Add the parent directory to path so we can import our modules\n",
+    "sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(''))))\n",
+    "\n",
+    "# Import our training functions\n",
+    "from training.simclr_vanilla_train import train_simclr_vanilla\n",
+    "from training.simclr_ws_train import train_simclr\n",
+    "from training.wsdino_resnet_train import train_wsdino\n",
+    "\n",
+    "print(\"Available devices:\")\n",
+    "if torch.cuda.is_available():\n",
+    "    print(f\"CUDA: {torch.cuda.get_device_name(0)}\")\n",
+    "    print(f\"CUDA Memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB\")\n",
+    "else:\n",
+    "    print(\"CPU only\")\n",
+    "\n",
+    "print(f\"PyTorch version: {torch.__version__}\")\n",
+    "print(f\"CUDA available: {torch.cuda.is_available()}\")\n",
+    "if torch.cuda.is_available():\n",
+    "    print(f\"Number of GPUs: {torch.cuda.device_count()}\")\n",
+    "    \n",
+    "# Clean up any existing GPU memory\n",
+    "if torch.cuda.is_available():\n",
+    "    torch.cuda.empty_cache()\n",
+    "    gc.collect()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "207ac59e",
+   "metadata": {},
+   "source": [
+    "## Configuration\n",
+    "\n",
+    "Set your training parameters here. You can modify these values based on your computational resources and requirements."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b16b515b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# TRAINING CONFIGURATION\n",
+    "\n",
+    "# Data path - Update this to point to your BBBC021 dataset\n",
+    "DATA_ROOT = \"/scratch/cv-course2025/group8\"\n",
+    "\n",
+    "# Model selection - Choose one of: 'vanilla_simclr', 'ws_simclr', 'wsdino'\n",
+    "MODEL_TYPE = \"vanilla_simclr\"\n",
+    "\n",
+    "# Training parameters\n",
+    "EPOCHS = 50  # Number of training epochs (reduce for testing)\n",
+    "BATCH_SIZE = 128  # Batch size (reduce if you get out of memory errors)\n",
+    "LEARNING_RATE = 0.0003  # Learning rate\n",
+    "TEMPERATURE = 0.1  # Temperature for contrastive loss\n",
+    "PROJECTION_DIM = 128  # Projection head output dimension\n",
+    "\n",
+    "# Saving options\n",
+    "SAVE_EVERY = 10  # Save model every N epochs\n",
+    "SAVE_DIR = \"/scratch/cv-course2025/group8/model_weights\"  # Directory to save models\n",
+    "\n",
+    "# Advanced options (usually don't need to change)\n",
+    "COMPOUND_AWARE = True  # For vanilla SimCLR: use compound-aware loss\n",
+    "MOMENTUM = 0.996  # For WS-DINO: teacher momentum\n",
+    "\n",
+    "print(\"Training Configuration:\")\n",
+    "print(f\"  Model Type: {MODEL_TYPE}\")\n",
+    "print(f\"  Data Root: {DATA_ROOT}\")\n",
+    "print(f\"  Epochs: {EPOCHS}\")\n",
+    "print(f\"  Batch Size: {BATCH_SIZE}\")\n",
+    "print(f\"  Learning Rate: {LEARNING_RATE}\")\n",
+    "print(f\"  Save Directory: {SAVE_DIR}\")\n",
+    "\n",
+    "# Create save directory if it doesn't exist\n",
+    "os.makedirs(SAVE_DIR, exist_ok=True)\n",
+    "print(f\"  Save directory ready: {os.path.exists(SAVE_DIR)}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "33238a66",
+   "metadata": {},
+   "source": [
+    "## Model Information\n",
+    "\n",
+    "Here's a brief overview of each model type:\n",
+    "\n",
+    "### 1. Vanilla SimCLR\n",
+    "- **Method**: Standard contrastive learning with data augmentations\n",
+    "- **Positive pairs**: Two augmented versions of the same image\n",
+    "- You can use weak labels to prevent same compounds being ussed in negative pairs here, just use `compound_aware=True`\n",
+    "\n",
+    "### 2. Weak Supervision SimCLR (WS-SimCLR)\n",
+    "- **Method**: Uses compound labels to create positive pairs\n",
+    "- **Positive pairs**: Two different images from the same compound\n",
+    "\n",
+    "### 3. WS-DINO\n",
+    "- **Method**: Teacher-student distillation with weak supervision\n",
+    "- **Positive pairs**: Uses compound labels for supervision"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dc51b7d4",
+   "metadata": {},
+   "source": [
+    "## Training\n",
+    "\n",
+    "Run the cell below to start training with your configured parameters."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "70a6df99",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# =============================================================================\n",
+    "# TRAINING EXECUTION\n",
+    "# =============================================================================\n",
+    "\n",
+    "def train_model(model_type, **kwargs):\n",
+    "    \"\"\"\n",
+    "    Train a model based on the specified type and parameters.\n",
+    "    \"\"\"\n",
+    "    # Clear GPU memory before training\n",
+    "    if torch.cuda.is_available():\n",
+    "        torch.cuda.empty_cache()\n",
+    "        gc.collect()\n",
+    "    \n",
+    "    print(f\"Starting training for {model_type}\")\n",
+    "    print(\"=\" * 50)\n",
+    "    \n",
+    "    try:\n",
+    "        if model_type == \"vanilla_simclr\":\n",
+    "            print(\"Training Vanilla SimCLR...\")\n",
+    "            model = train_simclr_vanilla(\n",
+    "                root_path=kwargs['root_path'],\n",
+    "                epochs=kwargs['epochs'],\n",
+    "                batch_size=kwargs['batch_size'],\n",
+    "                learning_rate=kwargs['learning_rate'],\n",
+    "                temperature=kwargs['temperature'],\n",
+    "                projection_dim=kwargs['projection_dim'],\n",
+    "                save_every=kwargs['save_every'],\n",
+    "                save_dir=kwargs['save_dir'],\n",
+    "                compound_aware=kwargs.get('compound_aware', True)\n",
+    "            )\n",
+    "            \n",
+    "        elif model_type == \"ws_simclr\":\n",
+    "            print(\"Training Weak Supervision SimCLR...\")\n",
+    "            model = train_simclr(\n",
+    "                root_path=kwargs['root_path'],\n",
+    "                epochs=kwargs['epochs'],\n",
+    "                batch_size=kwargs['batch_size'],\n",
+    "                learning_rate=kwargs['learning_rate'],\n",
+    "                temperature=kwargs['temperature'],\n",
+    "                projection_dim=kwargs['projection_dim'],\n",
+    "                save_every=kwargs['save_every']\n",
+    "            )\n",
+    "            \n",
+    "        elif model_type == \"wsdino\":\n",
+    "            print(\"Training WS-DINO...\")\n",
+    "            model = train_wsdino(\n",
+    "                root_path=kwargs['root_path'],\n",
+    "                epochs=kwargs['epochs'],\n",
+    "                batch_size=kwargs['batch_size'],\n",
+    "                lr=kwargs['learning_rate'],\n",
+    "                momentum=kwargs.get('momentum', 0.996),\n",
+    "                temperature=kwargs['temperature'],\n",
+    "                save_every=kwargs['save_every']\n",
+    "            )\n",
+    "            \n",
+    "        else:\n",
+    "            raise ValueError(f\"Unknown model type: {model_type}\")\n",
+    "            \n",
+    "        print(\"=\" * 50)\n",
+    "        print(f\"Training completed successfully!\")\n",
+    "        print(f\"Models saved in: {kwargs['save_dir']}\")\n",
+    "        \n",
+    "        return model\n",
+    "        \n",
+    "    except Exception as e:\n",
+    "        print(f\"Training failed with error: {str(e)}\")\n",
+    "        print(\"Please check your configuration and try again.\")\n",
+    "        raise e\n",
+    "\n",
+    "# Prepare training parameters\n",
+    "training_params = {\n",
+    "    'root_path': DATA_ROOT,\n",
+    "    'epochs': EPOCHS,\n",
+    "    'batch_size': BATCH_SIZE,\n",
+    "    'learning_rate': LEARNING_RATE,\n",
+    "    'temperature': TEMPERATURE,\n",
+    "    'projection_dim': PROJECTION_DIM,\n",
+    "    'save_every': SAVE_EVERY,\n",
+    "    'save_dir': SAVE_DIR,\n",
+    "    'compound_aware': COMPOUND_AWARE,\n",
+    "    'momentum': MOMENTUM\n",
+    "}\n",
+    "\n",
+    "print(\"Training parameters:\")\n",
+    "for key, value in training_params.items():\n",
+    "    print(f\"  {key}: {value}\")\n",
+    "\n",
+    "# Start training\n",
+    "print(f\"\\nStarting training with model type: {MODEL_TYPE}\")\n",
+    "trained_model = train_model(MODEL_TYPE, **training_params)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2867e6fa",
+   "metadata": {},
+   "source": [
+    "## Save your model\n",
+    "\n",
+    "depending on your training approach, you will find your model under `/scratch/cv-course2025/group8/model_weights/<training_approach>`. You can then use the extractor and evaluator to see how your model performed. If you think you created a WORTHY model, we recommend giving it a unique and somewhat descriptive name and renaming the folders containing your model/features."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6bd4782f",
+   "metadata": {},
+   "source": []
+  }
+ ],
+ "metadata": {
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}