By Jose L. Gómez, Gabriel Villalonga and Antonio M. López
Abstract: Semantic image segmentation is a core task for autonomous driving, which is performed by deep models. Since training these models draws to a curse of human-based image labeling, the use of synthetic images with automatically generated labels together with unlabeled real-world images is a promising alternative. This implies addressing an unsupervised domain adaptation (UDA) problem. In this paper, we propose a new co-training procedure for synth-to-real UDA of semantic segmentation models. It performs iterations where the (unlabeled) real-world training images are labeled by intermediate deep models trained with both the (labeled) synthetic images and the real-world ones labeled in previous iterations. More specifically, a self-training stage provides two domain-adapted models and a model collaboration loop allows the mutual improvement of these two models. The final semantic segmentation labels (pseudo-labels) for the real-world images are provided by these two models. The overall procedure treats the deep models as black boxes and drives their collaboration at the level of pseudo-labeled target images, i.e., neither modifying loss functions is required, nor explicit feature alignment. We test our proposal on standard synthetic and real-world datasets for onboard semantic segmentation. Our procedure shows improvements ranging from approximately 13 to 31 mIoU points over baselines.
Welcome to the repository of our paper "Co-Training for Unsupervised Domain Adaptation of Semantic Segmentation Models". The code uses a modified forked version of the Detectron2 framework. Inside the tools folder you can find the Co-training, Self-training and others implementations used on the paper to achieve the reported results.
- Linux with Python ≥ 3.6
- PyTorch ≥ 1.6 and torchvision that matches the PyTorch installation.
- OpenCV
- Numpy
- PIL
- Scikit-learn
- Scikit-image
- Cityscapes scripts included in Detectron2 See installation instructions.
- Hardware: NVIDIA GPU ≥12GB to reproduce the paper results.
- Installation: Clone this github repository:
git clone https://github.com/JoseLGomez/Co-training_SemSeg_UDA.git
python -m pip install -e Co-training_SemSeg_UDA
cd Co-training_SemSeg_UDANote: To rebuild detectron2 that’s built from a local clone, use rm -rf build/ **/*.so to clean the old build first. You often need to rebuild detectron2 after reinstalling PyTorch.
- Datasets:
- Download GTA-5
- Download Synscapes.
- Download SYNTHIA.
- Download Cityscapes.
- Download BDD100K.
- Download Mapillary Vistas.
- List RGB images and GT:
- Create a .txt file with the images and their respective paths:
images.txt
/path/to/image/1.png
/path/to/image/2.png
/path/to/image/3.png
...
- Create a .txt file with the ground truth and their respective paths:
gt.txt
/path/to/ground/truth/1.png
/path/to/ground/truth/2.png
/path/to/ground/truth/3.png
...
Note: This can be done in linux using the command ls.
ls -1 $PWD/path/to/image/*.png > images.txt
ls -1 $PWD/path/to/ground/truth/*.png > gt.txt
- Ensure that the entries of both txt files match. In detail, the image from the first line on images.txt have its respective ground truth in the first line on gt.txt and so on.
- Offline LAB translation:
The colour correction in the LAB space explained in our paper is done offline running the script
tool/apply_LAB_to_source_data.py
- You need to set the variables
sourceandtargetinside the code:- Set
source=Path/to/synthetic/images.txtwhere images.txt is a synthetic dataset from step 2 (e.g. Synscapes, GTAV) - Set
source=Path/to/real/images.txtwhere images.txt is a real dataset from step 2 (e.g. Cityscapes, Mapillary, BDD100k) - Set
n_workersaccordingly to the number of threads available in your machine (recommended 8).
- Set
- The script generate automatically the translated RGB images inside the source dataset folder on a new folder named
rgb_translated - Update Step 2
The training step is divided in four parts:
- Baselines [with LAB translation]
The initial semantic segmentation models are trained using
tools/train_net_progress.pywith a config file that contains the hyper-parametersconfigs/X/baseline.yaml, where X denotes the source dataset
CUDA_VISIBLE_DEVICES=0 python tools/train_net_progress.py
--num-gpus 1
--config-file /path/to/config/file.yaml
--write-outputs
OUTPUT_DIR /path/to/save/experiment
DATASETS.TRAIN_IMG_TXT /path/to/train/data/images.txt
DATASETS.TRAIN_GT_TXT /path/to/train/data/gt.txt
DATASETS.TEST_IMG_TXT /path/to/evaluation/data/images.txt
DATASETS.TEST_GT_TXT /path/to/evaluation/data/gt.txt
Note: uppercase variables (e.g. OUTPUT_DIR, DATASETS.TRAIN_IMG_TXT) overrides the values inside the config file
during the execution, without modify the config file. You can set these values inside the config file if desired.
Note2: if you do not want intermediate inferences remove argument --write-outputs
- Self-training step
Self-training step specified in the paper is done running the script
tools/sem_seg_selftraining.pywith its respective configuration fileconfigs/X/self_training.yaml
CUDA_VISIBLE_DEVICES=0 python tools/sem_seg_selftraining.py
--num-gpus 1
--config-file /path/to/config/file.yaml
--unlabeled_dataset_A /path/to/real/data/image.txt
--weights_branchA /path/to/baseline/weights.pth
--unlabeled_dataset_A_name dataset_A
--max_unlabeled_samples 500
--scratch_training
--num-epochs 10
--seed 100
--recompute_all_pseudolabels
--mask_file /tools/ego_vehicle_mask.png
OUTPUT_DIR /path/to/save/experiment
Note: --mask_file is optional and allows you to apply a mask on the pseudo-label generation to set some parts to void directly. The mask needs to be binary with values [0, 1] or [0, 255]. In our case, we apply a mask on Cityscapes because the ego vehicle is always partially visible in the same position and setting this information to void directly remove noise.
- Co-training step
Co-training step specified in the paper is done running the script
tools/sem_seg_cotrainingV3.pywith its respective configuration fileconfigs/X/co_training.yaml
CUDA_VISIBLE_DEVICES=0 python tools/sem_seg_cotrainingV3.py
--num-gpus 1
--config-file /path/to/config/file.yaml
--unlabeled_dataset_A /path/to/real/data/image.txt
--same_domain
--weights_branchA /path/to/self-training/0/weights.pth
--weights_branchB /path/to/self-training/9/weights.pth
--unlabeled_dataset_A_name dataset_A
--unlabeled_dataset_B_name dataset_B
--max_unlabeled_samples 500
--num-epochs 5
--min_pixels 5000
--scratch_training
--seed 100
--recompute_all_pseudolabels
--ensembles
--mask_file /tools/ego_vehicle_mask.png
OUTPUT_DIR /path/to/save/experiment
- Final training step
- First, you need to generate the final pseudolabels of the target dataset using the following script
CUDA_VISIBLE_DEVICES=0 python sem_seg_cotrainingV3.py
--num-gpus 1
--config-file /path/to/config/file.yaml
--same_domain
--weights_branchA /path/to/self-training/0/weights.pth
--weights_branchB /path/to/self-training/9/weights.pth
--thres_A /co-training/exp/path//model_A/4/thresholds.npy
--thres_B /co-training/exp/path//model_B/4/thresholds.npy
--unlabeled_dataset_A_name dataset_A
--unlabeled_dataset_B_name dataset_B
--max_unlabeled_samples 500
--num-epochs 5
--seed 100
--ensembles
--mpt_ensemble
--no_training
OUTPUT_DIR /path/to/save/paseudo-labels
DATASETS.TEST_IMG_TXT /path/to/target/training/set/images.txt
Note: in /path/to/save/paseudo-labels you can find the predictions and the colour version to visualize. image_list.txt contains the txts path to the pseudolabels, to be used directly to train on the next step
- Next, use the script of step 1 with the config file set to train simultaneously in batch time source data and
pseudolabels
configs/X/final_step.yaml:
CUDA_VISIBLE_DEVICES=0 python tools/train_net_progress.py
--num-gpus 1
--config-file /path/to/config/file.yaml
--write-outputs
OUTPUT_DIR /path/to/save/experiment
You can evaluate any model using the following script in addition with the config file containing the target dataset to evaluate
CUDA_VISIBLE_DEVICES=0 python tools/train_net_progress.py
--num-gpus 1
--config-file /path/to/config/file.yaml
--write-outputs
--eval-only
MODEL.WEIGHTS /path/to/model/weights.pth
OUTPUT_DIR /path/to/save/experiment
| Step | Source | Target | mIoU | Config file | Weights |
|---|---|---|---|---|---|
| Baseline | GTAV | Cityscapes | 37.86 | configs/gtaV/baseline.yaml | [model] |
| Baseline + CB | GTAV | Cityscapes | 42.76 | configs/gtaV/baseline+CB.yaml | [model] |
| Self-training | GTAV | Cityscapes | 53.49 | configs/gtaV/self_training.yaml | [1] [10] |
| Co-training | GTAV | Cityscapes | 59.57 | configs/gtaV/final_step.yaml | [A] [B] [Final] |
| Baseline | Synscapes | Cityscapes | 45.98 | configs/synscapes/baseline.yaml | [model] |
| Self-training | Synscapes | Cityscapes | 55.34 | configs/synscapes/self_training.yaml | [10] |
| Co-training | Synscapes | Cityscapes | 58.38 | configs/synscapes/final_step.yaml | [A] [B] [Final] |
| Baseline | SYNTHIA | Cityscapes | 39.48 | configs/SYNTHIA/baseline.yaml | [model] |
| Self-training | SYNTHIA | Cityscapes | 48.74 | configs/SYNTHIA/self_training.yaml | [10] |
| Co-training | SYNTHIA | Cityscapes | 56.09 | configs/SYNTHIA/final_step.yaml | [A] [B] [Final] |
| Baseline | GTAV+Synscapes | Cityscapes | 59.32 | configs/gtaV+synscapes/baseline.yaml | [model] |
| Self-training | GTAV+Synscapes | Cityscapes | 67.47 | configs/gtaV+synscapes/self_training.yaml | [1] [10] |
| Co-training | GTAV+Synscapes | Cityscapes | 70.23 | configs/gtaV+synscapes/final_step.yaml | [A] [B] [Final] |
Detectron2 is released under the Apache 2.0 license.
If you use this code, please, cite the following paper:
@Article{Gomez:2023,
AUTHOR = {Gómez, Jose L. and Villalonga, Gabriel and López, Antonio M.},
TITLE = {Co-Training for Unsupervised Domain Adaptation of Semantic Segmentation Models},
JOURNAL = {Sensors},
VOLUME = {23},
YEAR = {2023},
NUMBER = {2},
ARTICLE-NUMBER = {621}
}This research has been supported by the Spanish Grant Ref. PID2020-115734RB-C21 funded by MCIN/AEI/10.13039/50110001103
Antonio M. López acknowledges the financial support to his general research activities given by ICREA under the ICREA Academia Program. Jose L. Gómez acknowledges the financial support to perform his Ph.D. given by the grant FPU16/04131. The authors acknowledge the support of the Generalitat de Catalunya CERCA Program and its ACCIO agency to CVC’s general activities.
This project is based on the following open-source projects. We thank their authors for making the source code publicly available.
