Tutorial for DS04 - MRS Spring 2022

See symposium session link

About

This repo includes the tutorial materials for the MRS Spring 2022 - DS04 tutorial on MLOps. The tutorial description is copied below:

The ability to leverage machine learning (ML) is rapidly becoming a part of the material scientist’s toolkit, whether for aiding ab initio computational work, screening datasets of candidate materials, or interfacing with in-lab experimenters and equipment. This tutorial aims to address the problems that frequently arise after a materials scientist builds their first few successful ML models: How can multiple versions of models, and their predictions, be effectively tracked? Are there ways to automatically test models before running large batch predictions?

This tutorial will use case studies from recent work in combinatorial science and materials screening to present an interactive Python tutorial for participants. Familiarity with basic machine learning principles along with Git version control are recommended for participation in this tutorial.

Learning Outcomes

The concept of using correlation IDs to track models and predictions, which is a best-practice from large scale ML in the (web-based) technology industry.
Using automated workflow tools, such as the free Github Actions, to automate testing and sanity checks for ML models before time and resources are committed for large predictions.
A short survey of ML systems design principles, including:
- logging ML parameters with experiment-tracking libraries (e.g., MLflow),
- packaging Python dependencies to improve reproducibility,
- orchestrating ML pipelines to allow partial restarts (e.g., re-run predictions without retraining) and easier debugging.

Tutorial Contents

In lieu of slides, this tutorial uses interactive Jupyter notebooks. These notebooks will be stepped through during the tutorial with opportunities for Q&A throughout. The tutorial considers three mock (but realistic!) scenarios:

In a ML-driven materials screening experiment, you've made updates to an ML model after you've already started working with some of your screened materials. How will you track, manage, and reconcile different screening "runs" produced by different ML models?
Some scientists in your computational & ML group are experimenting with new ML models that might outperform your previous models. How can you bake in some sanity checks before committing to "real" predictions on your dataset of materials and material properties, which you plan to send to your experimental-science collaborators?
You've worked out a matured ML workflow, where you build new models regularly and leverage predictions on large materials datasets. Now your research team has grown to include more grad students, postdocs, and research scientists. How can your systems scale alongside your team?

Repo Contents

This repo contains three folders, corresponding to the three above tutorial scenarios. Each folder has a self-contained Python notebook, a requirements.txt file (which specifies which Python libraries you need), and a data/ folder which contains data files that are specific to each scenario.

You can run any of the notebooks in any of the following ways:

Go to https://colab.research.google.com/ and make a new set of notebooks using the GitHub URL: https://github.com/eddotman/mrs-s22-ds04-tutorial/
Clone this repo to your local device, and run these notebooks on your local machine (or any machine you have access to)

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
01_tracking_predictions		01_tracking_predictions
02_testing_models		02_testing_models
03_scaling_teams		03_scaling_teams
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tutorial for DS04 - MRS Spring 2022

About

Learning Outcomes

Tutorial Contents

Repo Contents

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Tutorial for DS04 - MRS Spring 2022

About

Learning Outcomes

Tutorial Contents

Repo Contents

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages