Skip to content

rogerferrod/CroDiNo-KD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Revisiting Cross-Modal Knowledge Distillation: A Disentanglement Approach for RGBD Semantic Segmentation (CroDiNo-KD)

Roger Ferrod, Cássio F. Dantas, Luigi Di Caro, and Dino Ienco



📌 Overview

Multi-modal RGB and Depth (RGBD) data significantly enhance environmental perception by providing 3D spatial context. However, accessing all sensor modalities during inference may be infeasible due to sensor failures or resource constraints.

To overcome this, we introduce CroDiNo-KD (Cross-Modal Disentanglement: a New Outlook on Knowledge Distillation). Unlike traditional Cross-Modal Knowledge Distillation (CMKD) frameworks that rely on a computationally expensive teacher/student paradigm, CroDiNo-KD jointly trains single-modality RGB and Depth models through mutual interaction and collaboration.

By leveraging disentanglement representation learning, contrastive learning and decoupled data augmentation, our approach structures the models' internal manifolds into modality-invariant and modality-specific features.

✨ Key Highlights

  • Teacher-Free Paradigm: Eliminates the need for a multi-modal teacher, significantly reducing training time and parameter count.
  • Disentangled Representations: Separates feature embeddings into modality-invariant and modality-specific information.
  • Decoupled Augmentation: Allows independent, per-modality data augmentation strategies.
  • State-of-the-Art Performance: Consistently outperforms existing CMKD methods on diverse benchmarks (indoor, aerial and drone imagery).

🏗️ Architecture

CroDiNo-KD Architecture Overview

Overview of the CroDiNo-KD architecture, featuring two encoder-decoder models and an auxiliary decoder, optimized via disentanglement and contrastive learning.


📊 Main Results

We evaluate CroDiNo-KD across three diverse RGBD datasets: NYU Depth v2 (indoor), Potsdam (aerial) and Mid-Air (drone flight). Our method achieves State-of-the-Art performance in Cross-Modal Knowledge Distillation (mIoU scores):

Dataset Modality Single-Modality Baseline Best Competitor CroDiNo-KD (Ours)
NYUDepth RGB 42.64 43.86 (KDv2) 44.85
Depth 36.01 37.28 (ProtoKD) 37.60
Potsdam RGB 75,73 76.09 (Masked Dist.) 76.13
Depth 42.47 42.43 (Masked Dist.) 42.78
Mid-Air RGB 47.84 48.32 (KD-Net) 48.37
Depth 47.07 47.40 (Masked Dist.) 47.91

Note: CroDiNo-KD not only achieves higher accuracy but completes training 43% faster than standard CMKD methods (20h vs 36h+ on Mid-Air).


📖 Citation

If you find this code or our paper useful in your research, please consider citing our work:

@inproceedings{crodinokd,
  title={Revisiting Cross-Modal Knowledge Distillation: A Disentanglement Approach for RGBD Semantic Segmentation},
  author={Ferrod, Roger and Dantas, C{\'a}ssio F. and Di Caro, Luigi and Ienco, Dino},
  booktitle={European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD)},
  year={2025}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages