Revisiting Cross-Modal Knowledge Distillation: A Disentanglement Approach for RGBD Semantic Segmentation (CroDiNo-KD)

Roger Ferrod, Cássio F. Dantas, Luigi Di Caro, and Dino Ienco

📌 Overview

Multi-modal RGB and Depth (RGBD) data significantly enhance environmental perception by providing 3D spatial context. However, accessing all sensor modalities during inference may be infeasible due to sensor failures or resource constraints.

To overcome this, we introduce CroDiNo-KD (Cross-Modal Disentanglement: a New Outlook on Knowledge Distillation). Unlike traditional Cross-Modal Knowledge Distillation (CMKD) frameworks that rely on a computationally expensive teacher/student paradigm, CroDiNo-KD jointly trains single-modality RGB and Depth models through mutual interaction and collaboration.

By leveraging disentanglement representation learning, contrastive learning and decoupled data augmentation, our approach structures the models' internal manifolds into modality-invariant and modality-specific features.

✨ Key Highlights

Teacher-Free Paradigm: Eliminates the need for a multi-modal teacher, significantly reducing training time and parameter count.
Disentangled Representations: Separates feature embeddings into modality-invariant and modality-specific information.
Decoupled Augmentation: Allows independent, per-modality data augmentation strategies.
State-of-the-Art Performance: Consistently outperforms existing CMKD methods on diverse benchmarks (indoor, aerial and drone imagery).

🏗️ Architecture

Overview of the CroDiNo-KD architecture, featuring two encoder-decoder models and an auxiliary decoder, optimized via disentanglement and contrastive learning.

📊 Main Results

We evaluate CroDiNo-KD across three diverse RGBD datasets: NYU Depth v2 (indoor), Potsdam (aerial) and Mid-Air (drone flight). Our method achieves State-of-the-Art performance in Cross-Modal Knowledge Distillation (mIoU scores):

Dataset	Modality	Single-Modality Baseline	Best Competitor	CroDiNo-KD (Ours)
NYUDepth	RGB	42.64	43.86 (KDv2)	44.85
	Depth	36.01	37.28 (ProtoKD)	37.60
Potsdam	RGB	75,73	76.09 (Masked Dist.)	76.13
	Depth	42.47	42.43 (Masked Dist.)	42.78
Mid-Air	RGB	47.84	48.32 (KD-Net)	48.37
	Depth	47.07	47.40 (Masked Dist.)	47.91

Note: CroDiNo-KD not only achieves higher accuracy but completes training 43% faster than standard CMKD methods (20h vs 36h+ on Mid-Air).

📖 Citation

If you find this code or our paper useful in your research, please consider citing our work:

@inproceedings{crodinokd,
  title={Revisiting Cross-Modal Knowledge Distillation: A Disentanglement Approach for RGBD Semantic Segmentation},
  author={Ferrod, Roger and Dantas, C{\'a}ssio F. and Di Caro, Luigi and Ienco, Dino},
  booktitle={European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD)},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
docs		docs
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Revisiting Cross-Modal Knowledge Distillation: A Disentanglement Approach for RGBD Semantic Segmentation (CroDiNo-KD)

Roger Ferrod, Cássio F. Dantas, Luigi Di Caro, and Dino Ienco

📌 Overview

✨ Key Highlights

🏗️ Architecture

📊 Main Results

📖 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Revisiting Cross-Modal Knowledge Distillation: A Disentanglement Approach for RGBD Semantic Segmentation (CroDiNo-KD)

Roger Ferrod, Cássio F. Dantas, Luigi Di Caro, and Dino Ienco

📌 Overview

✨ Key Highlights

🏗️ Architecture

📊 Main Results

📖 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages