-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Here I want to brainstorm a list to what are all the potential threats (i.e., where can things go wrong) to a machine learning project? Our checklist need not address all of them, but we should in our literature review describe them all, and identify which our checklist covers. Here's my starting list:
- Mismatch of machine learning model choice with respect to the data used for training and evaluation (e.g., linear regression for binomial data)
- Data quality issues (e.g., missing data, duplicate data, data anomalies, etc)
- Errors in code (e.g., bug in code that leads to data labels being shifted by one, or misnaming a file being written to disk)
- Data leakage between training and test set, leading to overfitting (e.g., test data being used to create pre-processing object)
- Model stability issues (e.g., a different train-validation split leads to a large change in the model)
- Model behaviour/learning issues (e.g., model learns shortcut to predictions that can learn to erroneous prediction in certain cases)
- Bias/fairness issues (e.g., model makes different predictions for particular subgroups of observations)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels