Skip to content

Conversation

@mraves2
Copy link
Contributor

@mraves2 mraves2 commented Jan 16, 2026

Deze feature zorgt ervoor dat er extra QC informatie vanuit de DIMS pipeline in de eindmail komt, zodat de gebruiker in 1 oogopslag de kwaliteit van de run kan beoordelen.
Verschillende stappen van de pipeline, met name AverageTechReplicates en GenerateQCOutput, genereren extra txt bestanden, die als content opgenomen worden in DIMS.nf.

Copy link
Contributor

@BasMonkey BasMonkey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made some remarks about performance and code duplication. Please see the comments left on the code.

is_codes <- rownames(is_list)

# check if there is data present for all the samples that the pipeline started with,
# if not write sample name to a log file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this comment has been lost in this change. I think its quite useful

# pos
for (line_index in seq_len(nrow(is_pos_selection_subset))) {
is_selected <- is_pos_selection_subset$HMDB_name[line_index]
thresh_selected <- all_is_thresholds$plasma$pos[which(all_is_thresholds$names$pos == is_selected)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The which() method preforms a linear search action per row. A more efficient way is to make use of the match() method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, the list of values is really small, so I'm not too worried about performance. Which() is used often in different scripts of the pipeline, so it will be a focus of the refactor of v3.5 to investigate which instances of which() can be replaced by match(). The only fundamental difference between which() and match() is that the former returns all instances, whereas the latter returns only the first. For each occurrence of which() in the code, we'll have to decide whether it can be replaced by match().

is_selected <- is_pos_selection_subset$HMDB_name[line_index]
thresh_selected <- all_is_thresholds$plasma$pos[which(all_is_thresholds$names$pos == is_selected)]
if (is_pos_selection_subset$Intensity[line_index] < thresh_selected) {
is_below_threshold <- rbind(is_below_threshold, is_pos_selection_subset[line_index, ])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid rbind() in a loop, since it repeatedly reallocates and copies the data frame, which is inefficient and may use a huge amount of ram for larger datasets. Consider collecting indices or rows first and binding once at the end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duly noted. This is a very small data frame, so I'll leave it as is, but in the refactor for v3.5 where all scripts are evaluated, I will take this point into consideration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants