Spatio-Temporal Landmark Detection via Selective Fine-Tuning of Echocardiography Foundation Models (NLDL 2026)
This repository accompanies the paper:
Preetraj Bhoodoo, Sarina Thomas, Elisabeth Wetzer, Anne Solberg, Guy Ben-Yosef.
Spatio-Temporal Landmark Detection via Selective Fine-Tuning of Echocardiography Foundation Models.
Proceedings of the 7th Northern Lights Deep Learning Conference (NLDL), PMLR 307, 2026.
We investigate whether modern video-based echocardiography foundation models can be adapted to precise spatio-temporal landmark detection (LV contour landmarks at end-diastole (ED) and end-systole (ES)) without extensive fine-tuning. We evaluate two strong encoders (EchoPrime and PanEcho) on EchoNet-Dynamic, and compare:
- Encoder regimes: frozen vs. selective unfreezing vs. full fine-tuning
- Decoder heads: MLP vs. graph-based (GCN) decoding
- Baselines: ResNet-18 (2D/3D), ViT-Base, MViTv2-Small
A key finding is that selectively unfreezing only the last few blocks can recover most of the performance of full fine-tuning, especially when paired with a GCN head and augmentation.
Below is the high-level pipeline (encoder + landmark decoder head) used in the paper.
Figure: Overview of the experiment setup (sampled 16-frame clip from ED→ES → FM encoder → MLP/GCN → ED/ES landmarks).
Experiments use EchoNet-Dynamic (apical-4-chamber echocardiography videos with ED/ES LV contour annotations).
Request access and download from the official source:
https://echonet.github.io/dynamic/
After downloading, your dataset directory should contain:
/path/to/echonet-dynamic/
FileList.csv # per-video metadata (split, EF, ESV, EDV)
VolumeTracings.csv # LV contour trace annotations per frame
Videos/
0X1A0A263B22CCD966.avi
...
Run the preprocessing script from the repo root. This extracts ED/ES frames, resizes them to 112×112, converts the contour tracings to 40 keypoints, and saves per-cycle .npy/.npz files used by the data loader:
python data/preprocess_echonet.py \
--input_dir /path/to/echonet-dynamic \
--output_dir /path/to/echonet-dynamic/preprocessed \
--save_files data/files/filenamesOutput structure after preprocessing:
preprocessed/
40/
frames/ # individual ED/ES frames as PNG (112x112x3)
<ID>_<frame>.png
annotations/ # per-frame keypoints and masks
<ID>_<frame>.npz # keys: 'kpts' (40,2), 'mask' (112,112), 'ef'
cycle/
frames/ # full video clips as numpy arrays
<ID>.npy # shape: (3, num_frames, H, W), uint8
annotations/ # per-cycle annotations
<ID>.npz # keys: 'kpts' (2,40,2), 'fnum', 'ef', 'vol1', 'vol2'
The script also writes filename list .txt files to data/files/filenames/cyclic/, which are used by the data loaders to define train/val/test splits.
To reproduce experiments with reduced training data (0.5%, 1%, 2%, 10%, 25%, 50%), generate subset filename lists:
python data/dataset_split.py --base-dir data/files/filenames/cyclic --seed 42This creates files such as echonet_cycle_train_10_filenames.txt alongside the full split files, used when setting dataset: EchoNet_10 (etc.) in the config.
Set dataset_folder in your config YAML to the root of your EchoNet-Dynamic download (e.g. /path/to/echonet-dynamic). The data loader expects the preprocessed/ subfolder to exist at that path.
python tools/train_landmarks.py --cfg configs/resnet18_echonet.yamlpython tools/eval_landmarks.py --model_checkpoint /path/to/experiment_folder- The paper is open-access under CC BY 4.0.
- Code licensing will be specified upon release.
If you use this work, please cite:
@inproceedings{bhoodoo2026echovlmlandmarks,
title = {Spatio-Temporal Landmark Detection via Selective Fine-Tuning of Echocardiography Foundation Models},
author = {Bhoodoo, Preetraj and Thomas, Sarina and Wetzer, Elisabeth and Solberg, Anne and Ben-Yosef, Guy},
booktitle = {Proceedings of the 7th Northern Lights Deep Learning Conference (NLDL)},
series = {Proceedings of Machine Learning Research (PMLR)},
volume = {307},
year = {2026}
}