Code for the evaluation of bidra robustness and application to discrepancies analysis.
Used datasets: Gray (public), gCSI (public), CTRPv2 (public) and IRIC (in-house).
The data from the three public datasets are access through PharmacoGX, using the output_[nameOfDataset]_curves.ipynb. Additional relevant files are also downloaded. An anonymized version of the IRIC dataset is made available. It is important to make sure that the first column of the curve_info/ files is labelled exp_id.
As a first step, the data CSV of the public datasets are converted to H5 files with public_data/csvToH5.jl. Experiments are also filtered for extreme values, as describe in the manuscript.
Datasets used: Gray, gCSI and CTRPv2
Posterior for each experiments are generated with compound_characterization/bidra.jl. The bash script compound_characterization/partitionBiDRA.sh allows to split a given dataset in N batch and run BiDRA simultaneously. Posterior are stored in the dataset H5 files (public_datasets/bidra/). Batches' diagnostics files can me merged for a single dataset with compound_characterization/mergeDiagnostics.jl. The diagnostics and the batch timing statistics are stored in _generated_data/.
Once all three datasets have been imported and converted to H5, LM estimates for each experiments can calculated with compound_characterization/curveFit.jl. Results are saved in a single file in public_datasets/all_julia_curveFit.csv.
-
Pairings of duplicated experiments are obtained with
correlation_metrics/generatesPairings.jl. This code also produce a list of of pairings for more than two replicated experiments. Results are saved inpublic_datasets/. -
Response consistency across biological replicates is assessed through calculation of correlation metrics for viability responses. Values are obtained with
correlation_metrics/viabilityCorrelation.jland saved in_generated_data/viabCorrelations.csv. -
Efficiency metrics correlation is calculated with
correlation_metrics/posteriorCorrelation.jlandcorrelation_metrics/LMcorrelation.jl. The results are respectively saved in_generated_data/posteriorCorrelations.csv,_generated_data/medianCorrelations.csvand_generated_data/qqCorrelations.csv, and in_generated_data/mlCorrelations.csv. -
Correlations between randomly paired experiments are calculated with
correlation_metrics/runRandomPairings_BiDRA.jlandcorrelation_metrics/runRandomPairings_ML.jl. The results are respectively saved in_generated_data/bidraRandomCorrelation.csvand in_generated_data/mlRandomCorrelations.csv.
Figures are outputed in _generated_figures and illustrate results obtained from the compound characterization and the correlation analysis.
Analysis results presented as supplementary materials can be generated with the scripts contained in other_analysis/. These analysis include the multi-replicates correlations (multiRep_posteriorcorrelation.jl and multiRep_MLcorrelation.jl), the across datasets correlations (acrossDataset_singletonCorr.jl), the AAC analysis (aac_analysis.jl) and the generation of the various plots (generateFigure_supp.jl and aac_plot.jl).
The analysis of the IRIC dataset is stand-alone and is designed to be run within the iric_dataset directory. Posterior are generated with bidra_inference.jl and saved in results. The complete SAR analysis can be run with sar_analysis.jl and the resulting figures will be saved in figures.
project
│ README.md
│ Manifest.toml
│ Project.toml
|
└───_generated_data
| └───tmp
|
└───_generated_figures
| └───discrepancies_replicates
| └───methods_comparison
| └───models
| └───robustness
| └───viab_corr
| └───supp_fig
|
└───public_datasets
| | csvToH5.jl
| | output_ctrpv2_curves.ipynb
| | output_gCSI_curves.ipynb
| | output_gray_curves.ipynb
| |
| └───bidra
| └───cellAnnotations
| └───curves_info
| └───drugAnnotations
|
└───iric_dataset
| | IRIC_anonymized.csv
| | bidra_inference.jl
| | sar_analysis.jl
| | utils_iric.jl
| |
| └───results
| └───data