This Python script uses the cptac package to retrieve CPTAC proteogenomics data, then uses scikit-learn to train a linear model that attempts to predict standardized protein abundance values for a sample using transcript data.
The script is uploaded as part of a manuscript under consideration for research reproducibility purpose. It is not intended for general usage and has not been tested against other data sets or use cases.
Edward Lau 2021