Reproducing experiments on the NPLIB dataset

Dear Hassoun,

I hope this message finds you well. I am very interested in your work and appreciate that the data and code have been made publicly available. I am trying to reproduce the evaluation results reported in Table 2 of your paper on the NPLIB1 dataset and would greatly appreciate your guidance.

Specifically, we aim to evaluate JESTR on the three splits of the NPLIB1 dataset (split_0, split_1, and split_2, provided by MIST: https://github.com/samgoldman97/mist). We observed that for split_0(the split used in your paper), using the released pretrained weights yields performance close to the results reported in the paper. However, when we train from scratch using the data and code provided in the repository (specifically, the train.py script from the main branch), the evaluation results differ significantly from those reported.

Could you kindly clarify whether the provided code is complete, or if there might be any important steps that we may have overlooked when using the main branch code? For reference, we have attached our params.yaml.

Thank you very much for your time and assistance.

Best regards,
Fang

params.yaml
\#exp: "nist23_full_inst_rand"  #"among_nce_full_inst_rand" #"among_nce_all_inst" #"among_nce_all_prec_inst" #"canopus" #"nist23" #"CASMI"
exp: "canopus"
fp_path: "ecfp_3_4096"
load_dicts: True
use_sampling: False
element_list: ['H', 'C',  'O', 'N', 'P', 'S', 'Cl', 'F', 'Br', 'I'] # for canopus
\# element_list: ['H', 'C',  'O', 'N', 'P', 'S', 'Cl', 'F', 'Br', 'I', 'Si', 'B', 'As', 'Se']
atom_feature: 'full'
bond_feature: 'full'
load_dicts: True
ignore_test_contr: True
batch_size_train_contr: 32
batch_size_train_contr_cand: 32
batch_size_train_final: 64
batch_size_val_final: 128
num_epoch_contr: 1000
num_epoch_final: 100
contr_temp: 0.05
aug_cands: False
aug_cands_wt: 0.1
cand_aug_random: False
gnn_channels: [64,128,256]
attn_heads: [12,12,12]
gnn_type: "gcn"
num_gnn_layers: 3
gnn_hidden_dim: 512
gnn_out_feat: 196
global_pooling: "max"
gnn_dropout: 0.2
contr_lr: 0.5e-3
final_lr: 0.05e-2 #for canopus
#final_lr: 0.05e-4
final_embedding_dim: 512
fc_dropout: 0.4
spec_embedding_dim: 1024
debug: False
logfile: 'run.log'
mz_log_low: -2
mz_log_high: 3
mz_spacing: 'log'
mz_precision: 32
resolution: 1
max_mz: 1000
mz_transformation: 'log10over3'
sinus_embed_dim: 64
aggregator: 'sum' #max, sum, mean, maxpool
wt_contr: 0.5
wt_fp: 0.5
fp_len: 4096
frz_contr: True
contr_trg: True
augment: False
fp_loss: 'bce' # bce, cos
data_dir: 'data/'
early_stopping_patience: 10
early_stopping_patience_contr: 80
tfm_dim: 512
tfm_dropout: 0.1
tfm_nhead: 4
num_tfm_layers: 3
dim_feedforward: 256
spec_enc: 'MLP_BIN' #'MLP_BIN', 'MLP_SIN', 'TFM'
inter: True #whether predicting interaction
\# pretrained_mol_enc_model: 'data/weights/pretrained_mol_enc_model_1707829192911_best.pt'
\# pretrained_spec_enc_model: 'data/weights/pretrained_spec_enc_model_1707829192911_best.pt'
\# pretrained_inter_model: 'data/weights/pretrained_inter_model_1707829192911_best.pt' 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducing experiments on the NPLIB dataset #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reproducing experiments on the NPLIB dataset #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions