- Overview
- COSMO-RS
- Classical Molecular Dynamics
- Machine Learning
- DFT Computed Properties
- Scripts
- XYZ Files
- Updates
- Licensing
- Contact
The CO₂ER database is a publicly available repository of computed molecular properties and simulation-ready inputs for the study of organic solvents in electrochemical CO₂ reduction (CO₂ER). The data spans quantum chemical (DFT), thermodynamic (COSMO-RS), and classical molecular dynamics (MD) calculations and covers solvents from 11 chemical classes.
The dataset includes DFT-optimized geometries (in XYZ format) and a range of DFT-computed properties such as binding energies, partial atomic charges, polarizabilities, ionization potentials, electron affinities, and HOMO–LUMO energies. It also contains MD-derived properties like radial distribution functions, coordination numbers, and diffusion coefficients, as well as COSMO-RS-predicted properties such as CO₂ solubility, viscosity, and more. This resource is designed to support high-throughput screening, molecular simulation, and machine learning model development for CO₂ER electrolyte discovery.
Contains 24 solvent properties predicted using COSMO-RS, including melting point, flash point, viscosity, van der Waals volume, polarity, and more.
Estimated solubility of CO₂ in pure organic solvents, as predicted by COSMO-RS.
DFT-derived COSMO files used as input for COSMO-RS simulations. Prepared using DFT Gaussian calculation via MISPR workflows.
Raw CSV files with coordination numbers between CO₂ (carbon atom) and surrounding electrolyte solvent.
CSV files reporting the diffusion coefficients of each species (TBA⁺, BF₄⁻, solvent, CO₂) in electrolyte systems with 0.1 M [TBA⁺][BF₄⁻] and 0.1 M CO₂. Includes standard deviations and R² values.
json files containing the solvent force field parameters (OPLS/AA) used to run the MD simulations.
LAMMPS-compatible data files describing initial atomic positions, atom types, and topology for electrolyte systems.
LAMMPS template files used to run the following MD steps: energy minimization, NPT equilibration, melting and quenching, and NVT production.
CSV files containing RDF data between CO₂ atoms (C and O) and the center of mass of TBA⁺, BF₄⁻, and solvent molecules. Useful for analyzing solvation shell structure.
Metadata file summarizing MD setup details: number of molecules, atom types, initial seeds, molecular weights, and other relevant system parameters.
This section includes pre-trained ML models for predicting solvent properties relevant to CO₂ER: CO₂ solubility, viscosity, ionization potential, and electron affinity.
Folder containing chemical_class.csv, which has a detailed description of the chemical class for each molecule.
Folder containing data_computed_properties.csv with computed properties used in ML workflows.
- RDKit molecular descriptors
- Mordred chemical descriptors
dft_descriptors.csv: DFT-based descriptorscosmors_descriptors.csv: COSMO-RS-derived descriptors
Filtered data containing ~400–500 columns from the main df_merged.csv.
Filtered data for ionization potential containing ~100 columns from df_filtered.csv.
Filtered data for electron affinity containing ~100 columns from df_filtered.csv.
Saved EA model.
Saved EA model with DFT descriptors.
Saved EA model in JSON format.
Script to merge data of different molecules and their respective descriptors.
Script for EA model using the CatBoost algorithm.
Script for IP model using the Random Forest algorithm.
Script for Viscosity model using the XGBoost algorithm.
Script for Solubility model using the XGBoost algorithm.
Script for initial preprocessing to create df_filtered.csv.
Script for further preprocessing of EA data to create filtered_ea_clean.csv.
Script for further preprocessing of IP data to create df_filtered_ip.csv.
EA model serialized for XGBoost.
DFT-calculated binding energies for solvent–CO₂ complexes. Stored in raw JSON format, including all metadata.
Raw JSON files containing DFT calculations of the solvent electron affinity, including all associated metadata.
Raw JSON files containing DFT calculations of the solvent ionization potentials, including all associated metadata.
Consolidated table of 13 molecular properties from DFT, including HOMO/LUMO energies, SCF energy, dipole moment, polarizability, charges and more.
run_md.py: script for running the automated MD simulations of the electrolyte systems composed of 0.1 M [TBA⁺][BF₄⁻] salt and 0.1 M CO₂ in various solvents system using MISPRrun_ip_ea.py: script for running the automated IP and EA simulations of the solvents system using MISPR.run_be.py: script for running the automated binding energy simulations solvents system using MISPR.
XYZ format geometry files of DFT-optimized solvent molecules.
See updates.md for changelog and dataset version history.
This repository is released under the MIT License and made publicly available under the Creative Commons Attribution 4.0 (CC BY 4.0) license. You may copy, distribute, and adapt the content with appropriate credit.
For questions or suggestions, please contact:
Kuldeepsinh Raj
Ph.D. Student, Materials Science and Engineering
Stony Brook University
kuldeepsinh.raj@stonybrook.edu