Dealing with memory and speed issues in permutation analysis for large datasets #2

areyesq89 · 2018-07-05T15:31:26Z

I ran into the problem reported here and dig into debugging the error. It is related to the gcapcPeaks function. It turns out that, for some datasets, the vector of permuted values can be extremely large (>50 billion) and the functions quantile and density just break. I solved this by adding an option to sample permuted values from a uniform distribution. I added a parameter permsamp= that indicates the fraction of permuted values to use for the size of the sample.

I also modified some lines of code to speed them up. For the example dataset, these changes improve the runs by only ~5 seconds, but for larger datasets it makes a more substantial difference.

This version passes R CMD check without problems. Let me know if these suggestions make sense!

Alejandro

…apcPeaks

…sities

areyesq89 added 4 commits July 2, 2018 20:04

Improved speed of pvalue calculation and sampling for quantiles in gc…

34e3dee

…apcPeaks

Speed up code and subset permutation values also when calculating den…

e359507

…sities

removed system.time command

6b96abc

bumped version, improved wording of the sampperm parameter

53a63d1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dealing with memory and speed issues in permutation analysis for large datasets #2

Dealing with memory and speed issues in permutation analysis for large datasets #2

Uh oh!

areyesq89 commented Jul 5, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Dealing with memory and speed issues in permutation analysis for large datasets #2

Are you sure you want to change the base?

Dealing with memory and speed issues in permutation analysis for large datasets #2

Uh oh!

Conversation

areyesq89 commented Jul 5, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant