You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The purpose of this repository is to collect useful scripts which mainly use RDKit. Contributions are welcome!
Some scripts may require further dependencies.
Comments and recommendations for contributors:
There is a read_input.py script which contains the function read_input. It reads molecules from SMI, SDF, SDF.GZ and PKL (pickled molecules as tuples of mol and mol_title) files and STDIN (SMI and SDF formats are supported) and it returns tuples of (mol, mol_title). This is a generator and can be applied to process large collections of molecules. I advise to use this function if you do not need other data from input files.
There is _template.py file which can be used as a template for new scripts. Please do not change names for input, output, ncpu and verbose arguments. This will help to make command line arguments consistent across scripts.
Add help messages to your scripts.
Ideally scripts should be able to communicate with STDIN and STDOUT to combine them with pipes. I implemented this in gen_stereo_rdkit.py and gen_conf_rdkit.py.
All scripts can contain errors, so use them on your own risk. If you will find a mistake please create the issue and we will fix it. However, we constantly revise old scripts and fix errors because every found mistake is penultimate.
Particular scripts
Manipulate with SDF:
Script
Description
add_prefix
Add a prefix to molecule names in SDF file.
extractsdf
Extract molecule names and field values from input SDF.
extract_mol_by_name
Extract molecules by name (partial name matching) to new SDF file.
insert_sdf
Add data from a text file as additional fields to input SDF file.
remove_dupl_by_field
Remove entries from SDF file having duplicated mol title or field value.
rename_mols
Identify identical entries (conformers) and rename consistently.
sdf_field2title
Insert field values into molecular title (or SMILES, or sequential titles).
sdf_title2field
Insert molecular title into a given SDF field.
strip_blank_lines
Remove empty lines in multi-line field values in input SDF.
Format and file (inter)conversion:
Script
Description
cansmi
Return canonical SMILES of input molecules.
frags2mols
Save disconnected components as individual molecules with suffix in name.
molchemaxon2pdb
Convert molecules to separate PDB files using RDKit & ChemAxon.
mols2pdb
Convert molecules (SMI/SDF) to PDB, adding hydrogens and conformers.
pkl2sdf
Convert PKL to SDF (e.g. conformers generated by gen_conf_rdkit).
sdf2mols
Split SDF into multiple MOL files.
sdf2pkl
Convert SDF to multi-conformer PKL (requires sequential mol titles).
smi2sdf
Convert SMILES to SDF including extra fields if present.
split_pdb
Split PDB by chains and save to separate PDB files.