-
Notifications
You must be signed in to change notification settings - Fork 59
Add ADC pulses to Store #359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Beef up Hit finding README (yay more documentation)
Add entire ADC trace to the pulse class
Expand ADCpulse class to include entire trace
Extend ADCpulse class to include traces
bug fix and additional documentation for the newly added pulse features
more bug fixes to ADCpulse (should work now!)
bug fix for storing traces
trying to make it as backwards compatible as possible
|
Also there will be a follow up PR for a toolchain to easily read the newly added traces. |
jminock
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing surface level incorrect with changes to the Tool or DataModel. Would like further review of version conditional in DataModel if this is appropriate fix for handling different iterations of files @marc1uk or @brichards64
| ar & raw_area_; | ||
| ar & raw_amplitude_; | ||
| ar & calibrated_amplitude_; | ||
| if (version > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does version work? How does it iterate? Is this the correct way to ensure the version differences? These are questions more for Ben and Marcus.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pretty much the same as ROOT's schema evolution - the default class version is 0. If you modify the class definition, you need to manually alter the class version (what value you give it is up to you) and the (de)serialisation code to check that and behave appropriately. I don't believe there's any automatic way to ensure it does the 'correct' thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No functionality is being taken away, and the changes would only impact the ADCHitFinder to my knowledge; so, I don't think there's anything wrong with making changes to the DataModel in principle. Will need further review from Ben and/or Marcus regarding the version conditional.
| } | ||
| } | ||
|
|
||
| // extract the x and y points of the pulse (subtract off baseline and "zero" the pulse to the pulse start) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this same block is added in four times in this PR. Can we make it a function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of duplicated code in this tool (different pulse-finding approaches are very similar; it's not the most elegant tool). It agree it would be worth while to re-organize this tool into more concise functions. Could we for now add it as a TODO? Rather than making this particular feature its own function, since much of the pulse-finding approaches use the same infrastructure it could instead be folded into a single function that could be called (more than just the trace extraction).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm inclined to think that the fact that this code is added as part of a new feature, even if this was to be refactored, it would be fair for it to be a separate function. Besides, even if the code could do with refactoring, putting unrelated improvements off as TODO is one thing, but making the situation worse is something esle. It shouldn't be a difficult change.
|
|
||
| // Store the freshly made pulse in the vector of found pulses | ||
| pulses.emplace_back(channel_key, | ||
| ( wmin * NS_PER_SAMPLE )-timing_offset, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm curious what this change is about? I can't spot a variable rename from NS_PER_SAMPLE to NS_PER_ADC_SAMPLE....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tool calls both (and both are defined the same way in constants):
ToolAnalysis/DataModel/Constants.h
Line 5 in 9187a75
| constexpr unsigned int NS_PER_SAMPLE = 2; // ns |
ToolAnalysis/DataModel/ANNIEconstants.h
Line 17 in 9187a75
| constexpr unsigned int NS_PER_ADC_SAMPLE = 2; // ns |
I was trying to make it all consistent. It functionally doesn't change anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this pushes the question upstream - why are there two constants that appear to both represent the same thing. "The tool calls both" is not a good thing (and doesn't really explain why this instance in particular should be changed).
Can we do a find-and-replace and remove the redundancy?
|
ok, so andrew's code is the correct way to update a class definition when adding in new variables, but of course causality dictates that old code (which doesn't check the version) assumes the old format and will encounter unexpected data during deserialisation. I don't think there's a way around this - but as Steven says, it's pretty unlikely that you'll be running old analysis code on new data. If you did want to do so, it shouldn't be too difficult to patch in the new variable to the old code. An alternative would be to store the pulse waveform snippet separately, rather than as part of the ADCPulse class, but i expect that'll introduce more complexity than it's worth to maintain backwards compatibilty for edge case studies. |
| unsigned short raw_amplitude, double calibrated_amplitude, | ||
| double charge); | ||
| unsigned short raw_amplitude, double calibrated_amplitude, double charge, | ||
| const std::vector<double>& trace_x = std::vector<double>(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's very odd for the constructor to be taking references as arguments, and in general references should not be initialised to temporaries (I'm a little surprised the compiler let you do it). I suppose ultimately it may work, but I would make this non-references to avoid confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree it is odd / not standard. I haven't noticed any issues though and it seems to behave fine with the event building. We can flag it moving forward if you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's odd/non-standard, then it's probably something to not do. 😅
Assuming the point here is to avoid a second copy as part of constructing this object from optional existing data, i think the correct way to do it would be to accept a pointer that defaults to null, and to make the single copy within the constructor if the passed pointer is not null.
| inline double amplitude() const { return calibrated_amplitude_; } | ||
|
|
||
| // @brief Returns the x [ns] and y [ADC] points of the "found" pulse (baseline-subtracted and relative to pulse start point) | ||
| inline const std::vector<double>& GetTraceXPoints() const { return trace_x_; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
functions should not return references to member variables.
If the ADCPulse object is destroyed (goes out of scope or gets deleted), a caller may be holding onto a dangling reference to a member of an object that no longer exists (i.e., a segfault waiting to happen). Just return by value and if the caller is smart, they can capture that into a reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pointing this out. I agree that returning by value is safer. I have not observed any problems in my workflow but its likely because any instances I'm fetching it, its lifetime is short.
In practice I was having some troubles getting something to work, and this ended up working without slowing down the event building or giving any observed issues (both creating the ProcessedData and reading from it). One potential problem I have with changing it is the potential performance impact. This object is created and accessed millions of times per run (15k triggers per part file, 128 PMTs, many hits per event). A real concern is the potential slowdown if returning by value and copying the vector millions of times when churning through the (already) slow event building. I unfortunately don't have a ton of time to stress test this (but have stress tested this change ~thousands of times.
If I had to guess these traces will not be used often as analyzers are largely focused on the hits information. It was mostly added just in case anyone wants it down the line. I think for potential performance concerns (and limited time) we could keep it unless you feel strongly otherwise.
|
ok in light of trying to get things merged I'll leave these as TODOS |
Describe your changes
Since the start of Phase II there was a decision to ditch the PMT waveforms for the ProcessedData and only save the extracted hits (ID, charge, timing, etc...). This was done for a good reason as typically analyzers only need the Hits information and the ProcessedData size was cut down from ~80MB (with the full PMT waveforms) to ~6MB (Hits only) per part file.
Frank brought up a good point that we should not be "throwing away" the pulse information in case there is ever a need to examine the noise or do any sort of analysis on the pulses that are saved in our data.
That information isn't "throw away" technically (as it still can be found in the RawData), but in order to actually look at the raw waveforms and traces, you had to essentially re-run the event building tools via the old
DataDecoder(or the newEventBuilderRaw) toolchain without the hit finding tools, then run another toolchain to pull the waveforms out in a readable format via something like thePrintADCDatatool... this is a pain and many of the tools involved are somewhat deprecated.Instead of saving the entirety of the waveforms, we can instead just save the identified pulse from the hit finding (from pulse start to pulse end) in the ADCPulse class along with the start, stop, amplitude, charge, etc... so that anyone can access it from the ProcessedData if needed.
This PR expands the ADCPulse class to include the raw traces for each hit, allowing it to be accessed by other tools via
RecoADCHits. ThePhaseIIADCHitfindertool populates this with the x (ns) and y (adc) points of the identified pulse, saving them to vectors.The only caution (and question) I have is that if you run an older version of ToolAnalysis (without the changes in this PR) over any newly ProcessedData with the ADCPulse expansion, i.e. if you run the event building tools with these changes and try to load the ProcessedData with the traces using an older version of ToolAnalysis, the
LoadANNIEEventtool will fail to load it from the store. Andrew Sutton made a similar modification to the ADCPulse class as part of his closed PR #324 (he added the stop time of the pulse to the class - I omitted it from PR #358 as I knew i was going to eventually add the whole pulse), and his remedy was to add to ADCPulse.h something like:I have added this to my modifications of the class, though it doesn't seem to remedy that problem. These changes do preserve compatibility when running toolchains over older ProcessedData without the ADCPulse expansion. So any older data remains readable by the newer version of ToolAnalysis.
If anyone (Ben or Marcus in particular) have a comment on what could be happening with this I would be happy to alter my PR. Not sure the issue identified above is really a back-breaker - anyone analyzing data with an out-of-date ToolAnalysis will only be hindered if they are trying to look at the newest data from the new EventBuilderV2 tools... and in that case there are so many new modifications from Yue's latest PRs that they should be used the latest ToolAnalysis anyways.
I also added more information to the
PhaseIIADCHitFinderREADME as it was a little opaque (and we always need more documentation).Checklist before submitting your PR
newusage, there is a reason the data must be on the heapnewthere is adelete, unless I explicitly know why (e.g. ROOT or a BoostStore takes ownership)Modifications are made to the DataModel so additional oversight by Level 0 is needed.
Additional information
The impact on file sizes by adding the traces is not large. Here is a test from an AmBe run on two part files:
Looks to add ~2MB per part file.