This repository contains the code to reproduce the results from the paper:
This project requires Python 3.11. All dependencies are listed in requirements/requirements.txt.
We recommend using Conda to create and manage a clean environment:
conda create -n ciso_env python=3.11
conda activate ciso_env
pip install -r requirements/requirements.txtYou can also use a standard Python virtual environment:
python3.11 -m venv ciso_env
source ciso_env/bin/activate # On Windows use `ciso_env\Scripts\activate`
pip install -r requirements/requirements.txtThe results can be reproduced on any device (GPU, eGPU, or CPU), though the computational time will vary depending on the hardware's parallel processing capabilities. If you encounter memory issues, especially on lower-end devices, consider reducing the batch size in the training configuration to mitigate them. You may also need to install or update the appropriate NVIDIA drivers to work with PyTorch, depending on your specific setup.
All datasets used in the paper are publicly available here on Hugging Face.
Data preparation code is located in the data_preprocessing/ folder:
- SatButterfly:
ebutterfly_data_preparation.ipynb - sPlotOpen:
prepare_sPlotOpen_data.ipynb - SatBird Γ sPlotOpen (co-located data):
prepare_satbirdxsplots.ipynb
The configs directory contains subfolders for each dataset setup:
configs/satbirdconfigs/satbirdxsatbutterflyconfigs/satbirdxsplotconfigs/splot
Each subfolder includes YAML config files for models reported in the paper. These configs are used for both training and evaluation.
| File | Model Description |
|---|---|
config_ciso.yaml |
CISO model |
config_linear.yaml |
Linear model |
config_maxent.yaml |
Linear model with MaxEnt features |
config_mlp.yaml |
MLP |
config_mlp_plusplus.yaml |
MLP++ |
All trained model checkpoints are available here. For each dataset, each folder includes 3 sub-folders corresponding to 3 different runs (seeds).
| Folder | Description |
|---|---|
1_sPlotOpen |
Within-dataset experiments for sPlotOpen |
2_SatBird |
Within-dataset experiments for SatBird |
3_SatBirdxSatButterfly |
Across-datasets experiments for SatBird and SatButterfly |
4_SatBirdxsPlotOpen |
Across-datasets experiments for SatBird and sPlotOpen |
[Optional] You can log experiments using Comet ML, a platform for visualizing metrics as your models are training. To enable logging, make sure to export your COMET_API_KEY and COMET_WORKSPACE environment variables:
export COMET_API_KEY=your_comet_api_key
export COMET_WORKSPACE=your_workspaceTo train a model, set config.mode = "train" ,and run:
python main.py --config=configs/<dataset>/<model_config>.yamlExamples of configuration files for different datasets and models can be found in the configs/ directory.
Use the predict_family_of_species parameter to control the subset of species evaluated. This parameter defaults to-1 during training (i.e., not used).
π¦ SatBird:
predict_family_of_species = 0β evaluate non-songbirdspredict_family_of_species = 1β evaluate songbirds
πΏ sPlotOpen:
predict_family_of_species = 0β evaluate non-treespredict_family_of_species = 1β evaluate trees
π¦π¦ SatBird & SatButterfly:
predict_family_of_species = 0β evaluate birdspredict_family_of_species = 1β evaluate butterflies
π¦πΏSatBird & sPlotOpen:
predict_family_of_species = 0β evaluate plantspredict_family_of_species = 1β evaluate birds
For models that support partial labels (i.e., CISO and MLP++), evaluation can be conditioned on the observations of species from another group:
partial_labels.eval_known_rate == 0β evaluate with no partial labels (all other species groups are unknown)partial_labels.eval_known_rate == 1β evaluate with partial labels (labels from the other species group are provided)
You can reproduce the key results and figures from the paper using the scripts and notebooks provided below:
To reproduce the results on Tables 1 & 2, as well as additional metrics in Table 5, you first download
the model_checkpoints folder from here.
For a certain dataset and model, you run the following command given the corresponding config file, and specify the
desired file name to save results.
For the config file specified, you need to control the following parameters:
| Parameter | Description |
|---|---|
load_ckpt_path |
Path to the checkpoints folder or exact checkpoint path. |
predict_family_of_species |
Controls which family of species to evaluate as shown above in Evaluation. |
partial_labels/eval_known_ratio |
Set to 1, to condition on known labels for the other group of species. |
Set config.mode = "test"
python main.py --config=configs/<dataset>/<model_config>.yaml --results_file_name=<results_file_name.csv>To reproduce results for CISO on SatBird, use configs/satbird/config_ciso.yaml. Set load_ckpt_path
to model_checkpoints/2_SatBird/satbird_ciso to evaluate the 3 different runs.
- Evaluate songbirds Unconditioned:
predict_family_of_species = 1. - Evaluate non-songbirds Unconditioned:
predict_family_of_species = 0. - Evaluate songbirds Conditioned on non-songbirds:
predict_family_of_species = 1andeval_known_ratio = 1. - Evaluate non-songbirds Conditioned on songbirds:
predict_family_of_species = 0andeval_known_ratio = 1.
- Figure 3:
figures/generate_figure_3.ipynb - Figure 4:
figures/generate_figure_4.ipynb - Figure 5:
figures/generate_figure_5.ipynb
Code for additional baselines such as Random Forest and sjSDM is available under src/models.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) License.
