This project trains a U-Net model to remove handwritten text from images (e.g., exam papers).
data/train: Input images (with handwriting)data/train_label: Target images (clean, without handwriting)data/test: Test images to processmodel.py: U-Net architecture definitiondataset.py: Data loading and augmentationtrain.py: Training scriptpredict.py: Inference script
This project uses uv for dependency management.
-
Install uv (if not installed):
curl -LsSf https://astral.sh/uv/install.sh | sh -
Create Virtual Environment and Install Dependencies:
uv sync
Or manually:
uv venv source .venv/bin/activate uv pip install -r pyproject.toml
To train the model, run:
uv run train.py --epochs 50 --batch_size 4Arguments:
--train_img_dir: Path to training images (default:data/train)--train_label_dir: Path to training labels (default:data/train_label)--epochs: Number of training epochs (default: 200)--batch_size: Batch size (default: 4)--lr: Learning rate (default: 1e-4)--img_size: Image size, must be divisible by 16 (default: 1024)--checkpoint_dir: Directory to save models (default:checkpoints)--log_dir: Directory for TensorBoard logs (default:runs)
Loss Function Options:
--loss: Loss function type (default:combined)charbonnier: Single Charbonnier loss (robust to outliers)mse: Mean Squared Error lossl1: L1 / MAE losscombined: Advanced multi-component loss (recommended)
--loss-preset: Preset weights for combined loss (default:balanced)conservative: Stable training with lower weights (perc=0.05, ssim=0.3, edge=0.3)balanced: Balanced configuration (perc=0.1, ssim=0.5, edge=0.5)aggressive: Sharp results with higher weights (perc=0.2, ssim=0.8, edge=0.8)
Example with custom loss:
uv run train.py --epochs 100 --batch_size 8 --loss combined --loss-preset aggressiveMonitoring: You can monitor training progress using TensorBoard:
uv run tensorboard --logdir runsTo remove handwriting from new images:
uv run predict.py --input_dir data/test --output_dir results --model_path checkpoints/best_model.pthArguments:
--input_dir: Directory containing images to process--output_dir: Directory to save processed images--model_path: Path to the trained model checkpoint
- Architecture: U-Net with Bilinear Upsampling, ResBlock, and CBAM Attention
- Input: RGB Image (resized to 1024x1024 by default)
- Output: RGB Image (Cleaned)
- Loss Functions:
- Combined Loss (default): Multi-component loss optimizing multiple aspects
- Charbonnier Loss: Base pixel-wise reconstruction
- Perceptual Loss (VGG16): Preserves high-level features and textures
- SSIM Loss: Maintains structural similarity
- Edge Loss (Sobel): Preserves sharp edges and details
- Single Losses: Charbonnier, MSE, or L1 for simpler training
- Combined Loss (default): Multi-component loss optimizing multiple aspects