Skip to content

Threat model for machine learning projects #3

@ttimbers

Description

@ttimbers

Here I want to brainstorm a list to what are all the potential threats (i.e., where can things go wrong) to a machine learning project? Our checklist need not address all of them, but we should in our literature review describe them all, and identify which our checklist covers. Here's my starting list:

  • Mismatch of machine learning model choice with respect to the data used for training and evaluation (e.g., linear regression for binomial data)
  • Data quality issues (e.g., missing data, duplicate data, data anomalies, etc)
  • Errors in code (e.g., bug in code that leads to data labels being shifted by one, or misnaming a file being written to disk)
  • Data leakage between training and test set, leading to overfitting (e.g., test data being used to create pre-processing object)
  • Model stability issues (e.g., a different train-validation split leads to a large change in the model)
  • Model behaviour/learning issues (e.g., model learns shortcut to predictions that can learn to erroneous prediction in certain cases)
  • Bias/fairness issues (e.g., model makes different predictions for particular subgroups of observations)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions