Skip to content

Conversation

@ALuesink
Copy link
Contributor

@ALuesink ALuesink commented Aug 15, 2025

The refactor of GenerateViolinPlots:

  • code to functions
  • added unit tests

@ALuesink ALuesink marked this pull request as ready for review August 21, 2025 12:52
@ALuesink ALuesink changed the base branch from main to develop November 3, 2025 08:24
@rernst rernst self-requested a review November 18, 2025 14:54
Copy link
Member

@rernst rernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all lots of work done, good job! I feel like there is room for improvement, some general thoughts:

  1. Many parameters have names like metab_interest_sorted. In the context of a function, it’s not relevant whether the input is “of interest” or “sorted.” Use neutral, descriptive names that reflect the data type or role.

  2. Several functions are named after their use case rather than their functionality. name functions based on what they do, not where they are used.

  3. When breaking function calls across lines, maintain a consistent style. Preferred format:

Rfunction1(
   function_2(param_a),  
   param_b,  
   param_c,
)
  1. There is no error catching for missing files or invalid paths. Currently, the code will crash, making debugging difficult.

  2. There seems to be a lot of ad-hoc data transformations. It feels like the DIMS application is missing a standardized data format for saving and reusing data between steps.

Comment on lines +39 to +59
# Remove columns, move HMDB_code & HMDB_name column to the front, change intensity columns to numeric
intensities_zscore_df <- intensities_zscore_df %>%
select(-c(plots, HMDB_name_all, HMDB_ID_all, sec_HMDB_ID, HMDB_key, sec_HMBD_ID_rlvnc, name,
relevance, descr, origin, fluids, tissue, disease, pathway, nr_ctrls)) %>%
relocate(c(HMDB_code, HMDB_name)) %>%
rename(mean_controls = avg_ctrls, sd_controls = sd_ctrls) %>%
mutate(across(!c(HMDB_name, HMDB_code), as.numeric))

# Get the controls and patient IDs, select the intensity columns
controls <- colnames(intensities_zscore_df)[grepl("^C", colnames(intensities_zscore_df)) &
!grepl("_Zscore$", colnames(intensities_zscore_df))]
control_intensities_cols_index <- which(colnames(intensities_zscore_df) %in% controls)
nr_of_controls <- length(controls)

patients <- colnames(intensities_zscore_df)[grepl("^P", colnames(intensities_zscore_df)) &
!grepl("_Zscore$", colnames(intensities_zscore_df))]
patient_intensities_cols_index <- which(colnames(intensities_zscore_df) %in% patients)
nr_of_patients <- length(patients)

intensity_cols_index <- c(control_intensities_cols_index, patient_intensities_cols_index)
intensity_cols <- colnames(intensities_zscore_df)[intensity_cols_index]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be one (or more) 'prepare_data' functions.

intensity_cols_index <- c(control_intensities_cols_index, patient_intensities_cols_index)
intensity_cols <- colnames(intensities_zscore_df)[intensity_cols_index]

#### Calculate ratios of intensities for metabolites ####
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parts of this block can be 'calculate' functions.

zscore_patients_df <- intensities_zscore_ratios_df %>% select(HMDB_code, HMDB_name, any_of(paste0(patients, "_Zscore")))
zscore_controls_df <- intensities_zscore_ratios_df %>% select(HMDB_code, HMDB_name, any_of(paste0(controls, "_Zscore")))

#### Make violin plots #####
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this a make create violoin plot pdf function

save_prob_scores_to_excel(diem_probability_score, output_dir, run_name)


#### Generate dIEM plots #########
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also be a function.

Comment on lines +187 to +193
# metabs_iems <- lapply(top_iems, function(iem) {
# iem_probablity <- patient_top_iems_probs %>% filter(Disease == iem) %>% pull(!!sym(patient_id))
# metabs_iems_names <- c(metabs_iems_names, paste0(iem, ", probability score ", iem_probablity))
# metab_iem <- expected_biomarkers_df %>% filter(Disease == iem) %>% select(HMDB_code, HMDB_name)
# return(metab_iem)
# })
# names(metabs_iems) <- metabs_iems_names
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove old? code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants