-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathhelp.txt
More file actions
24 lines (22 loc) · 1.59 KB
/
help.txt
File metadata and controls
24 lines (22 loc) · 1.59 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
This repository contains:
- 2 input files:
- Database Redox Pot Fe2S2 proteins.xlsx
- tableAmm.txt, a utility file with amino acids' parametrization and list of other cofactors to be
counted by features_calculator.py
- 2 utility modules:
- utils.py, with functions used by both features_calculator.py and em_predict2.py
- ML_models.py a dictionary for models with hyperparameters grid which will be tuned with GridSearch optimization
- pdb-files folder, which contains all pdb files of dataset's proteins, including in silico generated mutants
- Folder A contains the scripts for training seprate models for each specific combination of radius values r1 and r2:
- features_calculator.py script uded to compute molecular descriptors values. These descriptors
are saved in a dataset_features_r1_r2.xlsx file which serves as input for model training.
- em_predict.py, the main script used to launch models training and to test their performance
- Folder B includes the code for training a single model that simultaneously considers all features calculated
for every r1 and r2:
- features_calculator.py
- total.py to merge all features in one single file total.xlsx, avoiding repetitions
- em_predict.py
The remaining models were constructed using the scripts in folder A and modifying the features_calculators.py
output files removing the selected features.
Warning: we run all codes in linux, when running features_calculator.py in windows a modification on PDBParser library
is needed (l.192 resname = line[17:20].replace(' ',''))