-
Notifications
You must be signed in to change notification settings - Fork 32
Description
We have encountered a crash in the FragmentMixDistribution.modelNFR method (in Occupancy.py) for two samples in a recent experiment, when running nucleoatac version 0.3.4. (The program runs successfully on all other samples in the same experiment.)
For both problem samples the command line is of the form:
nucleoatac run --bed <broadpeaks> --bam <bamfile> --fasta <fastafile> --cores 8 --out <samplename>
where broadpeaks is output from MACS.
The crash produces the following (redacted) traceback:
Traceback (most recent call last):
File "XXXX/nucleoatac034/bin/nucleoatac", line 33, in <module>
nucleoatac_main(args)
File "XXXX/nucleoatac034/lib/python2.7/site-packages/nucleoatac/cli.py", line 56, in nucleoatac_main
run_occ(occ_args)
File "XXXX/nucleoatac034/lib/python2.7/site-packages/nucleoatac/run_occ.py", line 96, in run_occ
fragment_dist.modelNFR()
File "XXXX/nucleoatac034/lib/python2.7/site-packages/nucleoatac/Occupancy.py", line 65, in modelNFR
nuc[nuc<=0]=min(min(nfr)*0.1,min(nuc[nuc>0])*0.001)
ValueError: min() arg is an empty sequence
which I've identified as being due to the fact that in each case nuc[nuc>0] is empty, as all values in the nuc array are negative. As a result min(nuc[nuc>0] produces the ValueError exception above.
My understanding is that nuc is the "subtracted difference between the extrapolated nucleosome-free model and the observed fragment distribution" mentioned in the "Occupancy determination" section of the NucleoATAC paper. The specific line of code where the failure occurs appears to be a subsequent "correction" to the subtracted distribution which replaces any zero or negative values with a small positive value.
The patch below is a workaround to avoid the error, by trapping for the ValueError exception and setting the zero and negative values to min(nfr)*0.1 (the other expression that appears in the min statement above):
--- a/nucleoatac/Occupancy.py
+++ b/nucleoatac/Occupancy.py
@@ -62,7 +62,10 @@ class FragmentMixDistribution:
nuc = np.concatenate((np.zeros(boundaries[1]-self.lower),
self.fragmentsizes.get(boundaries[1],self.upper) -
self.nfr_fit.get(boundaries[1],self.upper)))
- nuc[nuc<=0]=min(min(nfr)*0.1,min(nuc[nuc>0])*0.001)
+ try:
+ nuc[nuc<=0]=min(min(nfr)*0.1,min(nuc[nuc>0])*0.001)
+ except ValueError:
+ nuc[nuc<=0]=min(nfr)*0.1
self.nuc_fit = FragmentSizes(self.lower, self.upper, vals = nuc)
def plotFits(self,filename=None):
"""plot the Fits"""
With this patch the program is then able to continue to completion for both samples.
We would have two questions:
- Is this workaround a sensible fix for the problem? (i.e. can we still trust output from a version of
nucleoatacwhich uses this patch?) - Are there any specific conditions or properties of the input data which might trigger the problem with the subtracted differences all being less than zero? (e.g. could it be caused by the inclusion of a specific "problem" region, or by a particular distribution of fragment length sizes etc.)
We're aware that the program is no longer actively maintained. However we would welcome any comments on the above - thanks for your help!
(Also adding @IanCodes to this issue)