Skip to content

Conversation

@atcsutton
Copy link
Contributor

Describe your changes

Here's a second try at this PR after some cleanup, extension, and optimization.

The primary change is adding in the PMTWaveformSim tool. Details on the implementation can be found on docDB: https://annie-docdb.fnal.gov/cgi-bin/sso/ShowDocument?docid=5821. One major change that has been implemented since that presentation is a time optimization. Originally, I was simulating a full 70 us readout window for every hit. That means ~35000 samples, which is very time intensive. Now, I am only simulating up to the latest hit time for each PMT. Adding the waveform simulation cause about 2X more run time compared to just using the true MCHits. However, the ClusterFinder is still much more intensive; needing about 20s to run over a single test file, compared to the 4s that PMTWaveformSim requires.

I have also included some minor changes to downstream tools that allows us to incorporate the simulated waveforms. PhaseIIADCHitFinder needed some specific changes and a new running mode flag. Additionally, I added the ability to skip an execute loop in PhaseIIADCHitFinder, ClusterFinder, and BackTracker. The skip variable is set in the ANNIEEvent by PMTWaveformSim in the case that there are no good MCHits to simulate. Skipping the execute of downstream tools prevents unnecessary error prints from occurring.

The final change is unrelated to the waveform simulation, but is an issue I found during this work. As reported on the last slide of the above docDB, the baseline variance calculation implemented in PhaseIIADCCalibrator::make_calibrated_waveforms_ze3ra_multi is incorrect. Originally, the noise sigma was determined by taking the variance of multiple baseline means. In the case of a single baseline sample this procedure will return a variance of 0. The proper thing to do is to average out the variances of each baseline mean estimation. Additionally, there is a feature which uses the "first baseline" if all of the determined baselines are too noisy. In that case, the original implementation would inadvertently set the baseline to 0.

Breakdown of the 21 files changed:
8 are part of the example ToolChain config
3 are the actual tool (header, source, and readme)
2 are the extension of ADCPulses to record their stop time.
2 are extending BackTracker to handle MC waveforms
3 are changes to PhaseIIADCHitFinder and ClusterFinder to handle MC waveforms
2 are generic changes to Unity and Factory
1 is the PhaseIIADCCalibrator bug fix

Checklist before submitting your PR

  • [✔︎] This PR implements a single change (one new/modified Tool, or a set of changes to implement one new/modified feature)
  • [✔︎] This PR alters the minimum number of files to affect this change
  • [✔︎] If this PR includes a new Tool, a README and minimal demonstration ToolChain is provided
  • [✔︎] If a new Tool/ToolChain requires model or configuration files, their paths are not hard-coded, and means of generating those files is described in the readme, with examples provided on /pnfs/annie/persistent
  • [✔︎] For every new usage, there is a reason the data must be on the heap
  • [✔︎] For every new there is a delete, unless I explicitly know why (e.g. ROOT or a BoostStore takes ownership)

Additional Material

Attach any validation or demonstration files here. You may also link to relavant docdb articles.

Andrew Sutton and others added 12 commits November 21, 2024 13:12
…. Also change TGraph pointer to an object.
…/ToolAnalysis into feature/asutton_MCPMTWaveforms
…ly need them to interface with BackTracker. Second, adding the stop_time to the ADCPulse (have version control and default value for backward compatibility). Third, integrating BackTracker. Fourth, allowing for an upstream tool to signal to PhaseIIADCHitFinder, ClusterFinder, and BackTracker that a given Execute step will be skipped. This occurs if there are no MCHits to process or if no good waveforms are produced, and prevents red herring errors from being thrown.
@S81D
Copy link
Collaborator

S81D commented Feb 3, 2025

Here are some downstream tools that may require modifications in order to handle the new waveform MC hits:

I don't have experience using many of these tools so it will need to be tested and verified. But at minimum if we push this PR we should modify the README's to let users know the tool may break / not work properly if using the MC Waveforms, as was discussed in one of the Software calls.

…of the MCParticle Idx in all cases rather than the particle ID. This makes access to the underlying particle a bit more straightforward, assuming the vector order does not change. Expansions include the addition of the earliest, mean, and median MCHit times. When Clusters are made with Hits (ie. PMTWaveformSim), there is a matcing step to determine the relevant MCHits, and those are used for these calculations.
@S81D S81D self-assigned this Aug 21, 2025
@S81D
Copy link
Collaborator

S81D commented Aug 21, 2025

Hey @atcsutton, there is an outstanding PR #344 in which I modified the data hit finding to reflect what was done in the Gains calibration. (Hopefully) we can discuss that on Monday (Aug 25th) get that merged soon after.

This change pulls in multiple different tools, including the PhaseIIADCHitFinder tool - when PR 344 gets merged could you rebase your branch onto the latest main so that everything places nicely? Sorry in advance for the extra work.

I have some other comments I will provide below on the other tools.

@S81D
Copy link
Collaborator

S81D commented Aug 21, 2025

I've tested this toolchain at length and everything works as intended when using these tools within the toolchain you provided. The DataModel (ADCPulse) is changed to save more of the pulse information - it would be nice to confirm this does not break any of the downstream EventBuilder tools.

Per the PR guide, given the change to the DataModel @marc1uk could you take a look at this?

@S81D
Copy link
Collaborator

S81D commented Aug 21, 2025

Another concern is the mixing of data-like hits (and the data hit finding tool) with an MC toolchain. While I think this a good approach as we will be using the same tool for both simulation and data, it will require additional modifications to downstream tools in order to read both the MC truth information and the data hits (in tools like ClusterFinder it is easy to just specify you are using Hits instead of MCHits, but in others such as in the PhaseIITreeMaker tool it uses that input to read both the MCHits and truth particle information).

I have two pull requests (#329 and #330) that partially address those issues. I will also submit a PR later with changes that ensure all the tools play nicely (I essentially just add a config variable PMTWaveformSim that will grab the MC truth information with the data hits). The important thing is this doesn't seem to break anyone's simulation tools if they aren't using the waveform simulator.

DetectorGeoFile ./configfiles/LoadGeometry/DetectorGeometrySpecs.csv
LAPPDGeoFile ./configfiles/LoadGeometry/LAPPDGeometry.csv
TankPMTGeoFile ./configfiles/LoadGeometry/FullTankPMTGeometry.csv
TankPMTGainFile ./configfiles/LoadGeometry/ChannelSPEGains_BeamRun20192020.csv
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an outdated Gains file. Please use ./configfiles/LoadGeometry/ChannelSPEGains2023.csv (https://github.com/ANNIEsoft/ToolAnalysis/blob/Application/configfiles/LoadGeometry/ChannelSPEGains2023.csv)

@S81D
Copy link
Collaborator

S81D commented Aug 21, 2025

Another point I want to raise is this:

  // An upstream tool may opt to skip this execution stage
  // For example the PMTWaveformSim tool will skip events with no MCHits or if
  // no waveforms are produced.
  bool skip = false;
  bool got_skip_status = m_data->Stores["ANNIEEvent"]->Get("SkipExecute", skip);
  if (got_skip_status && skip) {
    Log("ClusterFinder: An upstream tool told me to skip this event.",v_warning,verbose);
    return true;
  } 

You added this condition to the ClusterFinder and PhaseIIADCHitFinder - I understand why its there given its computationally beneficial to skip the event and not generate the waveform and is there mainly to prevent unnecessary error prints from occurring, as you stated. However, in using this toolchain it seems like any of the other downstream tools using the output of this tool also need this condition added (for example the PhaseIITreeMaker). In addition, for certain analyses (like the neutron efficiency or cross sections) we need the total number of entries and the event information even if there are no MC hits. I noticed, for example, changing the TreeMaker to accommodate this event skipping means you lose that event in the TriggerTree as the event is skipped entirely. I'm not sure if this is the best approach. Could we instead add a "blank" entry or something similar? I realize this is easier said than done but It makes counting the true number of triggers (or counting events that did not yield any MC hits) difficult.

@S81D S81D added the on hold label Aug 21, 2025
@S81D
Copy link
Collaborator

S81D commented Aug 21, 2025

I will go through the PMTWaveformSim and BackTracker more thoroughly - so far most of it looks fine. Only thing that's really popping out is the use of new and ROOT TGraphs that I need to double check for memory leakage:

PMTWaveformSim
here: fRandom = new TRandom3();
here: TGraph grTemp = TGraph();

BackTracker
line 466

@atcsutton
Copy link
Contributor Author

Another point I want to raise is this:

  // An upstream tool may opt to skip this execution stage
  // For example the PMTWaveformSim tool will skip events with no MCHits or if
  // no waveforms are produced.
  bool skip = false;
  bool got_skip_status = m_data->Stores["ANNIEEvent"]->Get("SkipExecute", skip);
  if (got_skip_status && skip) {
    Log("ClusterFinder: An upstream tool told me to skip this event.",v_warning,verbose);
    return true;
  } 

You added this condition to the ClusterFinder and PhaseIIADCHitFinder - I understand why its there given its computationally beneficial to skip the event and not generate the waveform and is there mainly to prevent unnecessary error prints from occurring, as you stated. However, in using this toolchain it seems like any of the other downstream tools using the output of this tool also need this condition added (for example the PhaseIITreeMaker). In addition, for certain analyses (like the neutron efficiency or cross sections) we need the total number of entries and the event information even if there are no MC hits. I noticed, for example, changing the TreeMaker to accommodate this event skipping means you lose that event in the TriggerTree as the event is skipped entirely. I'm not sure if this is the best approach. Could we instead add a "blank" entry or something similar? I realize this is easier said than done but It makes counting the true number of triggers (or counting events that did not yield any MC hits) difficult.

You're right, this needs to be revisited. This was a bit of a hacky workaround to keep events/executes aligned. I'd be fine to remove these flags and use a blank entry. I thought that I tried that solution but it didn't quite work, but that was months ago now and I can't recall the exact issue.

@S81D
Copy link
Collaborator

S81D commented Aug 25, 2025

It was decided in the software meeting (Aug 25th, 2025) that we will close this PR; I have some outstanding modifications on the PMTWaveformSim + downstream tools that I currently use that I will merge as a separate PR.

Andrew agreed to re-open a PR with the BackTracker changes - there are a few bug fixes pointed out by Franklin that need to be added so it would be more appropriate as a standalone PR.

As per the comments above on the event skipping - while this should be changed it works fine for now and we do not have a current working solution to fill the waveforms with blank entries or an empty vector. We will work on coming up with a better solution but for now I will open a PR with the event skipping, then we can brainstorm/test ways to change this feature as a future PR.

@S81D S81D closed this Aug 25, 2025
@S81D S81D removed the on hold label Aug 25, 2025
@S81D S81D mentioned this pull request Sep 23, 2025
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants