GSGE-CycPeptMP-Benchmarking

This repository contains the code, data, and configurations for benchmarking our GSGE (Group-SELFIES Graph Embeddings) and sequence-based models against the CycPeptMP dataset from Li et al. (2024). The project leverages our DeepCROW (Deep Classification & Regression Optimization Workflow) Benchmark Pipeline for hyperparameter optimization (HPO), model training, and evaluation, with comparisons across various models including GCNs, standard Transformers, LSTMs, and xLSTMs.

Key Results

Test Set Performance

Figure 1: Test set MAE leaderboard across all models. Orange bars denote Li et al. (2024) CycPeptMP baselines.

Figure 2: Test set R² leaderboard across all models.

Statistical Significance

Figure 3: Critical difference diagrams for pairwise model comparisons across CV folds.

Figure 4: Model comparison sets (MCS) — sets of models not significantly worse than the best.

Cross-Endpoint Transfer

Figure 5: Cross-endpoint MAE comparison across models and endpoints.

Figure 6: Direct MAE comparison of our best models vs. Li et al. (2024) per endpoint.

GSGE

GSGE (Group-SELFIES Graph Embeddings) extends molecular fragment tokenization/node information using learned molecular fragments graph embeddings. It is functional group aware, while preserving learned fragment molecular structural information via graph-based autoencoding.

GSGE enables:

Compact molecular graph representations using molecular fragment nodes
Embedding learned molecular fragment chemistry into continuous latent space
Designed and tested the more complex molecular structures of cyclic peptides

Figure 7: GSGE compound graph used in the GCNs in this study.

See https://github.com/JasperDurinck/GSGE-dev for more info

Repository Structure

Data

data/: Contains peptide_used.csv, the dataset used for benchmarking, sourced from Li et al. (2024) CycPeptMP.
split_idx/: Holds .npy files with train, validation, and test indices for cross-validation (CV) splits (0, 1, 2) and the holdout test set (Test_index.npy). Indices correspond to peptide_used.csv index, not ID (refer to CycPeptMP for ID-based indexing). Example: Valid_index_cv0.npy. A test notebook (test.ipynb) in this directory validates the indexing.

Vocabularies

vocabs/:
- test_gsge_save_with_descriptors.pkl: Fragment vocabulary for GSGE, used to construct compound graphs in our GSGE package.
- SMILES_BPE_vocab/: Contains custom_BPE_SMILES_v1_vocab_config.json and custom_BPE_SMILES_v1_vocab.json for SMILES-based Byte Pair Encoding (BPE) tokenization, used in the DeepCROW package for ablation studies on Transformers, LSTMs, and xLSTMs.

Model Optimization

model_optimization/: Contains subdirectories for each model (e.g., gcn_7desc_mlp), each with:
- config_hpo.yaml: Configuration for HPO, executable via the DeepCROW Benchmark Pipeline CLI (dc_benchmark_pipeline path/config_hpo.yaml).
- config_holdout_evaluation.yaml: Configuration for evaluating the best HPO model (stored in model_config.yaml) on the holdout test set across multiple seeds.
- hpo_results/: Nested directories for each seed (e.g., cv012_hpo_seed_42), containing:
  - CSV files like fit_model_test_results_cv1.csv and fit_model_valid_results_metrics_cv2.csv with non-rounded predictions (pred, known, ids; censored data: <-8, >-4).
  - weights/: Model weights for each fit.
- custom_code/: Custom code for models, data processing, descriptor calculations, and functions dynamically imported by the DeepCROW pipeline.

Experiments & Analysis

All analysis notebooks and outputs live under experiments/.

`experiments/model_comparison/`

Main analysis hub comparing all graph-based and sequence models.

test_overview.ipynb: Generates the holdout test set leaderboards, scatter panels, residual panels, error distributions, and uncertainty-vs-error plots. Outputs saved to figures/test_overview/.
val_overview.ipynb: Same analysis for the cross-validation validation set. Outputs saved to figures/val_overview/.
model_cv_test_comparison.csv / model_cv_valid_comparison.csv: Aggregated metrics (mean ± std over seeds) for all models on test and validation sets.

`statistical_significance/`

statistical_significance_analysis.ipynb: Runs non-parametric and parametric significance tests across CV folds, generates:
- Boxplots (parametric and non-parametric)
- Confidence interval grids (ranked and unranked)
- Critical difference diagrams
- Model comparison sets (MCS) plots
- Normality diagnostics
figures/: All output figures (PDF, PNG, SVG).
model_comparison.py / model_labels.py: Shared utilities for model filtering and display labels.

`hpo_mae_loss_trajectory/`

Notebooks plotting HPO training loss trajectories for selected models (gcn_7desc_mlp, transformer_bpe, tlstm_bpe).

`statistical_comparison/`

Data preparation notebooks (make_df.ipynb) that compile per-run metrics into dataframes for downstream significance testing, including a Li et al. sub-directory for baseline comparisons.

`experiments/sequence_models/model_comparison/`

Same test_overview.ipynb and val_overview.ipynb structure as above, scoped to sequence-based models (Transformers BPE/SELFIES/SAT, LSTMs, xLSTMs).

`experiments/cross_endpoint_transfer/`

Analysis of model generalization across all CycPeptMP endpoints (cell permeability, PAMPA, Caco-2, etc.).

cross_endpoint_metrics_by_run.csv: Per-run metrics for every endpoint and model.
cross_endpoint_metrics_summary_*.csv: Summary statistics (mean ± std) across seeds and CV folds.
cross_endpoint_metrics_li_et_al*.csv / .tex: Li et al. baseline metrics and combined comparison tables (also exported as LaTeX).
figures/: Output figures including:
- Endpoint leaderboards ranked by MAE and Pearson r
- Heatmaps of MAE across models × endpoints
- Direct Li et al. vs. ours comparison plots
- MAE + R² combined comparison panels

Individual model experiment directories

Each model has its own subdirectory under experiments/ (e.g., gcn_7desc_mlp, gae_gcn, fps_mlp, 7desc_mlp, etc.) containing HPO configs, HPO results, and model weights mirroring the model_optimization/ layout.

Inference

inference/: Contains a DeepCROW pipeline example for running inference with trained models.

Environment Setup

For this project we used Python 3.12:
```
conda create -n gsge_env python==3.12
```

setup.sh: Specifies required packages.

# installs all required python packages (except xLSTM)
bash setup_env.sh

Tests to check setup:
```
GSGE_CLI run_test
```
xLSTM: Installed from PyPI (pip install xlstm==2.0.2). See environment_dev.yml for the pinned version. The official xLSTM repository is available at https://github.com/NX-AI/xlstm
environment_dev.yml: Provides a complete list of package versions used during experiments.

Reproducibility

All models except xLSTM are trained deterministically and can be reproduced identically in the same environment. xLSTM may have minor variations due to implementation specifics.

Usage

Dependencies can be found in setup.sh or see environment_dev.yml for specifics.
Use the local training pipeline in model_optimization/custom_code/ for model training and hyperparameter optimization.
Use inference/ for running predictions with trained models.
Refer to experiments/model_comparison/ for test/validation leaderboards and significance analysis.
Refer to experiments/cross_endpoint_transfer/ for cross-endpoint generalization analysis.

Notes

The split_idx/test.ipynb notebook verifies the correctness of CV and test splits.
GSGE package details are available in our developed GSGE repository.
xLSTM models use the official xlstm package (v2.0.2) from PyPI.
All figures are exported in PDF, PNG, and SVG formats.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.idea		.idea
data		data
docs/imgs		docs/imgs
experiments		experiments
gsge_benchmark		gsge_benchmark
notebooks		notebooks
tests		tests
README.md		README.md
environment_dev.yml		environment_dev.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GSGE-CycPeptMP-Benchmarking

Key Results

Test Set Performance

Statistical Significance

Cross-Endpoint Transfer

GSGE

Repository Structure

Data

Vocabularies

Model Optimization

Experiments & Analysis

`experiments/model_comparison/`

`statistical_significance/`

`hpo_mae_loss_trajectory/`

`statistical_comparison/`

`experiments/sequence_models/model_comparison/`

`experiments/cross_endpoint_transfer/`

Individual model experiment directories

Inference

Environment Setup

Reproducibility

Usage

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GSGE-CycPeptMP-Benchmarking

Key Results

Test Set Performance

Statistical Significance

Cross-Endpoint Transfer

GSGE

Repository Structure

Data

Vocabularies

Model Optimization

Experiments & Analysis

experiments/model_comparison/

statistical_significance/

hpo_mae_loss_trajectory/

statistical_comparison/

experiments/sequence_models/model_comparison/

experiments/cross_endpoint_transfer/

Individual model experiment directories

Inference

Environment Setup

Reproducibility

Usage

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`experiments/model_comparison/`

`statistical_significance/`

`hpo_mae_loss_trajectory/`

`statistical_comparison/`

`experiments/sequence_models/model_comparison/`

`experiments/cross_endpoint_transfer/`

Packages