This project is a work-in-progress, notebook-first implementation of a classic workflow for building a machine-learning force field / potential energy surface for a small molecular system: alanine dipeptide in implicit solvent.
The notebook adapts an existing tutorial as a reference. However, significant part has been changed, specially for better understanding I put lots of comments. If required, I explained some topics, the physics and math behind any concept.
The main goal is to demonstrate an end-to-end pipeline:
- Construct the system (alanine dipeptide, implicit solvent setup).
- Run biased MD using a classical force field (ff14SBonlysc) to generate diverse configurations (sampling enhanced with metadynamics).
- Label configurations with a more expensive reference method (semi-empirical PM6) by computing energies and atomic forces.
- Fit an ML surrogate to the PM6 energy/force surface using:
- Gaussian Process Regression (GPR)
- Neural Networks (NN)
- Run MD on the ML potential.
- Compute free energy surfaces (Ramachandran plot) from ML-driven sampling.
- Intro + workflow outline
- Background material (e.g., brief GPR explanation)
- Tooling choices:
- AmberTools (ff14SBonlysc + PM6)
- ASE for MD + trajectory analysis
- nglview for interactive visualization
- PyTorch for autodiff + ML models
- Environment setup notes (conda-based, mentions
environment.yml)
Note: Paths and some parts of the pipeline are still being cleaned up / made reproducible.
As the project matures, a structure like this keeps it clean: