Skip to content

QC pipeline #19

@shraddhapai

Description

@shraddhapai

We need a pipeline for preprocessing steps in assessing data quality and data cleaning before running the predictor. Currently there is no such mechanism in place. Operations pipeline would run:

  • identify structure in missingness of data
  • identify and flag outlier samples
  • run some unsupervised analyses on the samples. e.g. pca, hierarchical clustering
  • For continuous-valued data, compare several similarity metrics to find one which best separates classes. e.g. RNAcorr.R written by SP for PanCancer
  • Hierarchical clustering of classes and PCA, following same idea.
  • Running univariate test to prune matrix of variables that goes into netDx.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions