Threat model for machine learning projects

Here I want to brainstorm a list to what are all the potential threats (i.e., where can things go wrong) to a machine learning project? Our checklist need not address all of them, but we should in our literature review describe them all, and identify which our checklist covers. Here's my starting list:

- Mismatch of machine learning model choice with respect to the data used for training and evaluation (e.g., linear regression for binomial data)
- Data quality issues (e.g., missing data, duplicate data, data anomalies, etc)
- Errors in code (e.g., bug in code that leads to data labels being shifted by one, or misnaming a file being written to disk)
- Data leakage between training and test set, leading to overfitting (e.g., test data being used to create pre-processing object)
- Model stability issues (e.g., a different train-validation split leads to a large change in the model)
- Model behaviour/learning issues (e.g., model learns shortcut to predictions that can learn to erroneous prediction in certain cases)
- Bias/fairness issues (e.g., model makes different predictions for particular subgroups of observations)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Threat model for machine learning projects #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Threat model for machine learning projects #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions