-
Notifications
You must be signed in to change notification settings - Fork 0
Feature/DIMS_QCinfo_in_mail #93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
BasMonkey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made some remarks about performance and code duplication. Please see the comments left on the code.
| is_codes <- rownames(is_list) | ||
|
|
||
| # check if there is data present for all the samples that the pipeline started with, | ||
| # if not write sample name to a log file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this comment has been lost in this change. I think its quite useful
DIMS/GenerateQCOutput.R
Outdated
| # pos | ||
| for (line_index in seq_len(nrow(is_pos_selection_subset))) { | ||
| is_selected <- is_pos_selection_subset$HMDB_name[line_index] | ||
| thresh_selected <- all_is_thresholds$plasma$pos[which(all_is_thresholds$names$pos == is_selected)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The which() method preforms a linear search action per row. A more efficient way is to make use of the match() method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, the list of values is really small, so I'm not too worried about performance. Which() is used often in different scripts of the pipeline, so it will be a focus of the refactor of v3.5 to investigate which instances of which() can be replaced by match(). The only fundamental difference between which() and match() is that the former returns all instances, whereas the latter returns only the first. For each occurrence of which() in the code, we'll have to decide whether it can be replaced by match().
DIMS/GenerateQCOutput.R
Outdated
| is_selected <- is_pos_selection_subset$HMDB_name[line_index] | ||
| thresh_selected <- all_is_thresholds$plasma$pos[which(all_is_thresholds$names$pos == is_selected)] | ||
| if (is_pos_selection_subset$Intensity[line_index] < thresh_selected) { | ||
| is_below_threshold <- rbind(is_below_threshold, is_pos_selection_subset[line_index, ]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid rbind() in a loop, since it repeatedly reallocates and copies the data frame, which is inefficient and may use a huge amount of ram for larger datasets. Consider collecting indices or rows first and binding once at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duly noted. This is a very small data frame, so I'll leave it as is, but in the refactor for v3.5 where all scripts are evaluated, I will take this point into consideration.
Deze feature zorgt ervoor dat er extra QC informatie vanuit de DIMS pipeline in de eindmail komt, zodat de gebruiker in 1 oogopslag de kwaliteit van de run kan beoordelen.
Verschillende stappen van de pipeline, met name AverageTechReplicates en GenerateQCOutput, genereren extra txt bestanden, die als content opgenomen worden in DIMS.nf.