-
Notifications
You must be signed in to change notification settings - Fork 0
Feature/DIMS refactor GenerateViolinPlots #82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
…tor_GenerateViolinPlots
rernst
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all lots of work done, good job! I feel like there is room for improvement, some general thoughts:
-
Many parameters have names like metab_interest_sorted. In the context of a function, it’s not relevant whether the input is “of interest” or “sorted.” Use neutral, descriptive names that reflect the data type or role.
-
Several functions are named after their use case rather than their functionality. name functions based on what they do, not where they are used.
-
When breaking function calls across lines, maintain a consistent style. Preferred format:
Rfunction1(
function_2(param_a),
param_b,
param_c,
)-
There is no error catching for missing files or invalid paths. Currently, the code will crash, making debugging difficult.
-
There seems to be a lot of ad-hoc data transformations. It feels like the DIMS application is missing a standardized data format for saving and reusing data between steps.
| # Remove columns, move HMDB_code & HMDB_name column to the front, change intensity columns to numeric | ||
| intensities_zscore_df <- intensities_zscore_df %>% | ||
| select(-c(plots, HMDB_name_all, HMDB_ID_all, sec_HMDB_ID, HMDB_key, sec_HMBD_ID_rlvnc, name, | ||
| relevance, descr, origin, fluids, tissue, disease, pathway, nr_ctrls)) %>% | ||
| relocate(c(HMDB_code, HMDB_name)) %>% | ||
| rename(mean_controls = avg_ctrls, sd_controls = sd_ctrls) %>% | ||
| mutate(across(!c(HMDB_name, HMDB_code), as.numeric)) | ||
|
|
||
| # Get the controls and patient IDs, select the intensity columns | ||
| controls <- colnames(intensities_zscore_df)[grepl("^C", colnames(intensities_zscore_df)) & | ||
| !grepl("_Zscore$", colnames(intensities_zscore_df))] | ||
| control_intensities_cols_index <- which(colnames(intensities_zscore_df) %in% controls) | ||
| nr_of_controls <- length(controls) | ||
|
|
||
| patients <- colnames(intensities_zscore_df)[grepl("^P", colnames(intensities_zscore_df)) & | ||
| !grepl("_Zscore$", colnames(intensities_zscore_df))] | ||
| patient_intensities_cols_index <- which(colnames(intensities_zscore_df) %in% patients) | ||
| nr_of_patients <- length(patients) | ||
|
|
||
| intensity_cols_index <- c(control_intensities_cols_index, patient_intensities_cols_index) | ||
| intensity_cols <- colnames(intensities_zscore_df)[intensity_cols_index] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be one (or more) 'prepare_data' functions.
| intensity_cols_index <- c(control_intensities_cols_index, patient_intensities_cols_index) | ||
| intensity_cols <- colnames(intensities_zscore_df)[intensity_cols_index] | ||
|
|
||
| #### Calculate ratios of intensities for metabolites #### |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parts of this block can be 'calculate' functions.
| zscore_patients_df <- intensities_zscore_ratios_df %>% select(HMDB_code, HMDB_name, any_of(paste0(patients, "_Zscore"))) | ||
| zscore_controls_df <- intensities_zscore_ratios_df %>% select(HMDB_code, HMDB_name, any_of(paste0(controls, "_Zscore"))) | ||
|
|
||
| #### Make violin plots ##### |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this a make create violoin plot pdf function
| save_prob_scores_to_excel(diem_probability_score, output_dir, run_name) | ||
|
|
||
|
|
||
| #### Generate dIEM plots ######### |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could also be a function.
| # metabs_iems <- lapply(top_iems, function(iem) { | ||
| # iem_probablity <- patient_top_iems_probs %>% filter(Disease == iem) %>% pull(!!sym(patient_id)) | ||
| # metabs_iems_names <- c(metabs_iems_names, paste0(iem, ", probability score ", iem_probablity)) | ||
| # metab_iem <- expected_biomarkers_df %>% filter(Disease == iem) %>% select(HMDB_code, HMDB_name) | ||
| # return(metab_iem) | ||
| # }) | ||
| # names(metabs_iems) <- metabs_iems_names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove old? code.
The refactor of GenerateViolinPlots: