Add ADC pulses to Store #359

S81D · 2025-09-23T21:04:27Z

Describe your changes

Since the start of Phase II there was a decision to ditch the PMT waveforms for the ProcessedData and only save the extracted hits (ID, charge, timing, etc...). This was done for a good reason as typically analyzers only need the Hits information and the ProcessedData size was cut down from ~80MB (with the full PMT waveforms) to ~6MB (Hits only) per part file.

Frank brought up a good point that we should not be "throwing away" the pulse information in case there is ever a need to examine the noise or do any sort of analysis on the pulses that are saved in our data.

That information isn't "throw away" technically (as it still can be found in the RawData), but in order to actually look at the raw waveforms and traces, you had to essentially re-run the event building tools via the old DataDecoder (or the newEventBuilderRaw) toolchain without the hit finding tools, then run another toolchain to pull the waveforms out in a readable format via something like the PrintADCData tool... this is a pain and many of the tools involved are somewhat deprecated.

Instead of saving the entirety of the waveforms, we can instead just save the identified pulse from the hit finding (from pulse start to pulse end) in the ADCPulse class along with the start, stop, amplitude, charge, etc... so that anyone can access it from the ProcessedData if needed.

This PR expands the ADCPulse class to include the raw traces for each hit, allowing it to be accessed by other tools via RecoADCHits. The PhaseIIADCHitfinder tool populates this with the x (ns) and y (adc) points of the identified pulse, saving them to vectors.

The only caution (and question) I have is that if you run an older version of ToolAnalysis (without the changes in this PR) over any newly ProcessedData with the ADCPulse expansion, i.e. if you run the event building tools with these changes and try to load the ProcessedData with the traces using an older version of ToolAnalysis, the LoadANNIEEvent tool will fail to load it from the store. Andrew Sutton made a similar modification to the ADCPulse class as part of his closed PR #324 (he added the stop time of the pulse to the class - I omitted it from PR #358 as I knew i was going to eventually add the whole pulse), and his remedy was to add to ADCPulse.h something like:

template <class Archive> void serialize(Archive& ar,
      const unsigned int version)
    {
      Hit::serialize(ar, version);

      ar & start_time_;
      ar & peak_time_;
      ...
      ...
      if (version > 0) {
        ar & stop_time;
      }

...
...

};

// Need to increment the class version since we added time as a new variable
// the version number ensures backward compatibility when serializing 
BOOST_CLASS_VERSION(ADCPulse, 1)

I have added this to my modifications of the class, though it doesn't seem to remedy that problem. These changes do preserve compatibility when running toolchains over older ProcessedData without the ADCPulse expansion. So any older data remains readable by the newer version of ToolAnalysis.

If anyone (Ben or Marcus in particular) have a comment on what could be happening with this I would be happy to alter my PR. Not sure the issue identified above is really a back-breaker - anyone analyzing data with an out-of-date ToolAnalysis will only be hindered if they are trying to look at the newest data from the new EventBuilderV2 tools... and in that case there are so many new modifications from Yue's latest PRs that they should be used the latest ToolAnalysis anyways.

I also added more information to the PhaseIIADCHitFinder README as it was a little opaque (and we always need more documentation).

Checklist before submitting your PR

This PR implements a single change (one new/modified Tool, or a set of changes to implement one new/modified feature)
This PR alters the minimum number of files to affect this change
[N/A] If this PR includes a new Tool, a README and minimal demonstration ToolChain is provided
[N/A] If a new Tool/ToolChain requires model or configuration files, their paths are not hard-coded, and means of generating those files is described in the readme, with examples provided on /pnfs/annie/persistent
[N/A] For every new usage, there is a reason the data must be on the heap
[N/A] For every new there is a delete, unless I explicitly know why (e.g. ROOT or a BoostStore takes ownership)

Modifications are made to the DataModel so additional oversight by Level 0 is needed.

Additional information

The impact on file sizes by adding the traces is not large. Here is a test from an AmBe run on two part files:

without trace | with trace
--------------------------
22MB      |   24 MB
24MB      |   26 MB

Looks to add ~2MB per part file.

Beef up Hit finding README (yay more documentation)

Add entire ADC trace to the pulse class

Expand ADCpulse class to include entire trace

Extend ADCpulse class to include traces

bug fix and additional documentation for the newly added pulse features

more bug fixes to ADCpulse (should work now!)

bug fix for storing traces

trying to make it as backwards compatible as possible

S81D · 2025-09-23T21:08:35Z

Also there will be a follow up PR for a toolchain to easily read the newly added traces.

jminock

Nothing surface level incorrect with changes to the Tool or DataModel. Would like further review of version conditional in DataModel if this is appropriate fix for handling different iterations of files @marc1uk or @brichards64

jminock · 2025-09-24T15:46:13Z

DataModel/ADCPulse.h

      ar & raw_area_;
      ar & raw_amplitude_;
      ar & calibrated_amplitude_;
+      if (version > 0) {


How does version work? How does it iterate? Is this the correct way to ensure the version differences? These are questions more for Ben and Marcus.

pretty much the same as ROOT's schema evolution - the default class version is 0. If you modify the class definition, you need to manually alter the class version (what value you give it is up to you) and the (de)serialisation code to check that and behave appropriately. I don't believe there's any automatic way to ensure it does the 'correct' thing.

jminock · 2025-09-24T15:48:30Z

DataModel/ADCPulse.h

No functionality is being taken away, and the changes would only impact the ADCHitFinder to my knowledge; so, I don't think there's anything wrong with making changes to the DataModel in principle. Will need further review from Ben and/or Marcus regarding the version conditional.

marc1uk · 2025-10-09T23:21:21Z

UserTools/PhaseIIADCHitFinder/PhaseIIADCHitFinder.cpp

        }
      }

+	// extract the x and y points of the pulse (subtract off baseline and "zero" the pulse to the pulse start)


this same block is added in four times in this PR. Can we make it a function?

There's a lot of duplicated code in this tool (different pulse-finding approaches are very similar; it's not the most elegant tool). It agree it would be worth while to re-organize this tool into more concise functions. Could we for now add it as a TODO? Rather than making this particular feature its own function, since much of the pulse-finding approaches use the same infrastructure it could instead be folded into a single function that could be called (more than just the trace extraction).

i'm inclined to think that the fact that this code is added as part of a new feature, even if this was to be refactored, it would be fair for it to be a separate function. Besides, even if the code could do with refactoring, putting unrelated improvements off as TODO is one thing, but making the situation worse is something esle. It shouldn't be a difficult change.

marc1uk · 2025-10-09T23:23:16Z

UserTools/PhaseIIADCHitFinder/PhaseIIADCHitFinder.cpp

+
    // Store the freshly made pulse in the vector of found pulses
    pulses.emplace_back(channel_key,
-      ( wmin * NS_PER_SAMPLE )-timing_offset,


i'm curious what this change is about? I can't spot a variable rename from NS_PER_SAMPLE to NS_PER_ADC_SAMPLE....

The tool calls both (and both are defined the same way in constants):

ToolAnalysis/DataModel/Constants.h

Line 5 in 9187a75

constexpr unsigned int NS_PER_SAMPLE = 2; // ns

and

ToolAnalysis/DataModel/ANNIEconstants.h

Line 17 in 9187a75

constexpr unsigned int NS_PER_ADC_SAMPLE = 2; // ns

I was trying to make it all consistent. It functionally doesn't change anything.

i think this pushes the question upstream - why are there two constants that appear to both represent the same thing. "The tool calls both" is not a good thing (and doesn't really explain why this instance in particular should be changed).
Can we do a find-and-replace and remove the redundancy?

marc1uk · 2025-10-09T23:48:58Z

ok, so andrew's code is the correct way to update a class definition when adding in new variables, but of course causality dictates that old code (which doesn't check the version) assumes the old format and will encounter unexpected data during deserialisation. I don't think there's a way around this - but as Steven says, it's pretty unlikely that you'll be running old analysis code on new data. If you did want to do so, it shouldn't be too difficult to patch in the new variable to the old code.

An alternative would be to store the pulse waveform snippet separately, rather than as part of the ADCPulse class, but i expect that'll introduce more complexity than it's worth to maintain backwards compatibilty for edge case studies.

marc1uk · 2025-10-10T11:26:42Z

DataModel/ADCPulse.h

-      unsigned short raw_amplitude, double calibrated_amplitude,
-      double charge);
+      unsigned short raw_amplitude, double calibrated_amplitude, double charge, 
+      const std::vector<double>& trace_x = std::vector<double>(),


It's very odd for the constructor to be taking references as arguments, and in general references should not be initialised to temporaries (I'm a little surprised the compiler let you do it). I suppose ultimately it may work, but I would make this non-references to avoid confusion.

Yeah I agree it is odd / not standard. I haven't noticed any issues though and it seems to behave fine with the event building. We can flag it moving forward if you want.

If it's odd/non-standard, then it's probably something to not do. 😅
Assuming the point here is to avoid a second copy as part of constructing this object from optional existing data, i think the correct way to do it would be to accept a pointer that defaults to null, and to make the single copy within the constructor if the passed pointer is not null.

marc1uk · 2025-10-10T11:28:47Z

DataModel/ADCPulse.h

    inline double amplitude() const { return calibrated_amplitude_; }

+    // @brief Returns the x [ns] and y [ADC] points of the "found" pulse (baseline-subtracted and relative to pulse start point)
+    inline const std::vector<double>& GetTraceXPoints() const { return trace_x_; }


functions should not return references to member variables.
If the ADCPulse object is destroyed (goes out of scope or gets deleted), a caller may be holding onto a dangling reference to a member of an object that no longer exists (i.e., a segfault waiting to happen). Just return by value and if the caller is smart, they can capture that into a reference.

Thank you for pointing this out. I agree that returning by value is safer. I have not observed any problems in my workflow but its likely because any instances I'm fetching it, its lifetime is short.

In practice I was having some troubles getting something to work, and this ended up working without slowing down the event building or giving any observed issues (both creating the ProcessedData and reading from it). One potential problem I have with changing it is the potential performance impact. This object is created and accessed millions of times per run (15k triggers per part file, 128 PMTs, many hits per event). A real concern is the potential slowdown if returning by value and copying the vector millions of times when churning through the (already) slow event building. I unfortunately don't have a ton of time to stress test this (but have stress tested this change ~thousands of times.

If I had to guess these traces will not be used often as analyzers are largely focused on the hits information. It was mostly added just in case anyone wants it down the line. I think for potential performance concerns (and limited time) we could keep it unless you feel strongly otherwise.

marc1uk · 2025-10-22T00:31:24Z

ok in light of trying to get things merged I'll leave these as TODOS

S81D added 8 commits September 23, 2025 14:54

Update README.md

2f97474

Beef up Hit finding README (yay more documentation)

Update PhaseIIADCHitFinder.cpp

812801d

Add entire ADC trace to the pulse class

Update ADCPulse.cpp

2eabe6c

Expand ADCpulse class to include entire trace

Update ADCPulse.h

d0dd88a

Extend ADCpulse class to include traces

Update ADCPulse.h

50d88fb

bug fix and additional documentation for the newly added pulse features

Update ADCPulse.cpp

3795074

more bug fixes to ADCpulse (should work now!)

Update PhaseIIADCHitFinder.cpp

e14e79c

bug fix for storing traces

Update ADCPulse.h

a64f0e4

trying to make it as backwards compatible as possible

S81D assigned marc1uk and jminock and unassigned marc1uk Sep 23, 2025

jminock reviewed Sep 24, 2025

View reviewed changes

jminock added help wanted question labels Sep 24, 2025

S81D mentioned this pull request Oct 7, 2025

PrintADCTraces tool #363

Merged

6 tasks

marc1uk reviewed Oct 9, 2025

View reviewed changes

marc1uk added waiting for submitter and removed help wanted question labels Oct 9, 2025

marc1uk reviewed Oct 10, 2025

View reviewed changes

jminock added Ready for Level 0 Review and merge and removed waiting for submitter labels Oct 20, 2025

marc1uk added the waiting for submitter label Oct 21, 2025

marc1uk merged commit aea75aa into ANNIEsoft:Application Oct 22, 2025
1 check passed

Add ADC pulses to Store #359

Add ADC pulses to Store #359

Uh oh!

Conversation

S81D commented Sep 23, 2025

Describe your changes

Checklist before submitting your PR

Additional information

Uh oh!

S81D commented Sep 23, 2025

Uh oh!

jminock left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

S81D Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

marc1uk commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marc1uk Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

marc1uk Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

marc1uk Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

S81D Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

marc1uk commented Oct 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

S81D Oct 10, 2025 •

edited

Loading

marc1uk commented Oct 9, 2025 •

edited

Loading

marc1uk Oct 10, 2025 •

edited

Loading

marc1uk Oct 21, 2025 •

edited

Loading

marc1uk Oct 10, 2025 •

edited

Loading

S81D Oct 10, 2025 •

edited

Loading