Skip to content

SFI-Visual-Intelligence/EchoVLMLandmarks

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spatio-Temporal Landmark Detection via Selective Fine-Tuning of Echocardiography Foundation Models (NLDL 2026)

This repository accompanies the paper:

Preetraj Bhoodoo, Sarina Thomas, Elisabeth Wetzer, Anne Solberg, Guy Ben-Yosef.
Spatio-Temporal Landmark Detection via Selective Fine-Tuning of Echocardiography Foundation Models.
Proceedings of the 7th Northern Lights Deep Learning Conference (NLDL), PMLR 307, 2026.


Summary

We investigate whether modern video-based echocardiography foundation models can be adapted to precise spatio-temporal landmark detection (LV contour landmarks at end-diastole (ED) and end-systole (ES)) without extensive fine-tuning. We evaluate two strong encoders (EchoPrime and PanEcho) on EchoNet-Dynamic, and compare:

  • Encoder regimes: frozen vs. selective unfreezing vs. full fine-tuning
  • Decoder heads: MLP vs. graph-based (GCN) decoding
  • Baselines: ResNet-18 (2D/3D), ViT-Base, MViTv2-Small

A key finding is that selectively unfreezing only the last few blocks can recover most of the performance of full fine-tuning, especially when paired with a GCN head and augmentation.


Main architecture

Below is the high-level pipeline (encoder + landmark decoder head) used in the paper.

Figure: Overview of the experiment setup (sampled 16-frame clip from ED→ES → FM encoder → MLP/GCN → ED/ES landmarks).

Main architecture


Data

Experiments use EchoNet-Dynamic (apical-4-chamber echocardiography videos with ED/ES LV contour annotations).

1. Download EchoNet-Dynamic

Request access and download from the official source:
https://echonet.github.io/dynamic/

After downloading, your dataset directory should contain:

/path/to/echonet-dynamic/
    FileList.csv            # per-video metadata (split, EF, ESV, EDV)
    VolumeTracings.csv      # LV contour trace annotations per frame
    Videos/
        0X1A0A263B22CCD966.avi
        ...

2. Preprocess the dataset

Run the preprocessing script from the repo root. This extracts ED/ES frames, resizes them to 112×112, converts the contour tracings to 40 keypoints, and saves per-cycle .npy/.npz files used by the data loader:

python data/preprocess_echonet.py \
    --input_dir  /path/to/echonet-dynamic \
    --output_dir /path/to/echonet-dynamic/preprocessed \
    --save_files data/files/filenames

Output structure after preprocessing:

preprocessed/
  40/
    frames/                     # individual ED/ES frames as PNG (112x112x3)
      <ID>_<frame>.png
    annotations/                # per-frame keypoints and masks
      <ID>_<frame>.npz          # keys: 'kpts' (40,2), 'mask' (112,112), 'ef'
    cycle/
      frames/                   # full video clips as numpy arrays
        <ID>.npy                # shape: (3, num_frames, H, W), uint8
      annotations/              # per-cycle annotations
        <ID>.npz                # keys: 'kpts' (2,40,2), 'fnum', 'ef', 'vol1', 'vol2'

The script also writes filename list .txt files to data/files/filenames/cyclic/, which are used by the data loaders to define train/val/test splits.

3. (Optional) Create data subsets

To reproduce experiments with reduced training data (0.5%, 1%, 2%, 10%, 25%, 50%), generate subset filename lists:

python data/dataset_split.py --base-dir data/files/filenames/cyclic --seed 42

This creates files such as echonet_cycle_train_10_filenames.txt alongside the full split files, used when setting dataset: EchoNet_10 (etc.) in the config.


Usage

Set dataset_folder in your config YAML to the root of your EchoNet-Dynamic download (e.g. /path/to/echonet-dynamic). The data loader expects the preprocessed/ subfolder to exist at that path.

Train:

python tools/train_landmarks.py --cfg configs/resnet18_echonet.yaml

Eval:

python tools/eval_landmarks.py --model_checkpoint /path/to/experiment_folder

License

  • The paper is open-access under CC BY 4.0.
  • Code licensing will be specified upon release.

Citation

If you use this work, please cite:

@inproceedings{bhoodoo2026echovlmlandmarks,
  title     = {Spatio-Temporal Landmark Detection via Selective Fine-Tuning of Echocardiography Foundation Models},
  author    = {Bhoodoo, Preetraj and Thomas, Sarina and Wetzer, Elisabeth and Solberg, Anne and Ben-Yosef, Guy},
  booktitle = {Proceedings of the 7th Northern Lights Deep Learning Conference (NLDL)},
  series    = {Proceedings of Machine Learning Research (PMLR)},
  volume    = {307},
  year      = {2026}
}

About

Repository for NLDL 2026 paper.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%