Package for the calibration of CL values vs yield for the ATLAS dark matter STA project
A secondary goal of this repository is to learn a little more about how git works.
There are a number of python scripts in this package. Some are executable, and these are the ones documented below. The non-executable python files contain classes that are used by the executable scripts. You'll need to refer to the code itself (and associated comments) to see which classes are used where.
First, check out the repo
$ git clone https://github.com/mikeflowerdew/DMSTA_calibration
$ cd DMSTA_calibration/On top of the git repository, you also need to have a directory called Data_Yields (located within DMSTA_calibration). The contents are large, so personally I make this a soft link to a data-file area. It needs to contain:
D3PDs.txt, with one simulated dataset per line.SummaryNtuple_STA_all_version4.root(or change the name inSkimYieldFile.pyif yours is different).
Many scripts make plots using the ATLAS style, therefore you also need this correctly installed. You can test to see if this works like this:
$ python
> import ROOT
> ROOT.gROOT.LoadMacro("AtlasStyle.C")
> ROOT.SetAtlasStyle()
> ROOT.gROOT.LoadMacro("AtlasUtils.C")If that all works OK, then you're good to go.
First, the summary ntuple is to be split and drastically reduced in size (else later processing steps will be very slow). The command to do this is
$ ./SkimYieldFile.pyThis creates three skimmed files in Data_Yields/:
SummaryNtuple_STA_sim.root, with just the 500 simulated models (deduced fromD3PDs.txt).SummaryNtuple_STA_evgen.root, with the ~460k models with evgen (deduced from the ntuple itself).SummaryNtuple_STA_noevgen.root, containing all models not in SummaryNtuple_STA_evgen.root.
This script has no command-line options: in principle it is a "do once and forget" script. It's possible to make some plots to compare the three files using ./SimBias.py, however there are some unsolved problems with the histogram binning, making the interpretation rather difficult.
For obscure reasons (I do not have HistFitter on my office desktop), this is performed in a separate directory:
$ cd HistFitter/
$ ./HistFitterLoop.pyThis uses the information in HistFitter/PaperSRData.dat (all numbers taken/inferred from the published papers) to produce the CLs calibration curves. There are no command line options. The fit setup is really just the HistFitter tutorial, with a slightly complicated procedure to scan the full CLs range efficiently (given that the CLs is a result, not an input).
Processing this step takes something like 6 hours. The output is stored in HistFitter/CLsFunctions_logCLs.root, while the last fit for each SR is in HistFitter/SR*/. Don't forget to
$ cd ../before continuing.
Now, the CLs values in the Data_* directories are calibrated against the truth yields in SummaryNtuple_STA_sim.root. This is very fast.
$ ./CorrelationPlotter.pyThis creates a directory called plots_officialMC/ with lots of plots that can be copied straight over to the support note.
The behaviour of CorrelationPlotter.py can be altered using command-line options (use -h to see them all). The most important ones are:
- To use the original evgen instead of the official MC for the truth yields, with output in
plots_privateMC/:
$ ./CorrelationPlotter.py --truthlevel
- To compare the real combined CLs values with those from various combinations of the per-SR CLs values, using input from
Data_*_combinationand with output inproductcheck/:
$ ./CorrelationPlotter.py --productcheck
This applies the calibration performed in the previous step to the SummaryNtuple_STA_evgen.root ntuple produced in step 1. To run the code, just do
$ ./CombineCLs.py --all -eThe output of this script goes into a new directory called results/. You'll notice it creates a subdirectory, which changes depending on which command-line options you provide. The files created in this directory are:
- Inputs for the STAs:
STAresults.csvandDoNotProcess.txt. The STAs need both files - the latter contains models which have insufficient truth info to process correctly. - Plots for the support note:
CLsplot.pdf,LogCLsplot.pdf,NSRplot.pdf, andCLresults.root. - A LaTeX summary of the main results:
SRcountTable.tex, which can be directly copied toSupportNote/Tables/in the SVN area. - Some summary information in pickled format:
SRcount.pickleandExclusionCount.pickle.
The pickle files are a cache of the main results, so that simple changes to the other outputs can be made without rerunning the event loop (ie in seconds rather than minutes). To skip the event loop, simply remove the --all argument when you run the script. If you want to change any of the other options, then typically you have to reinstate --all.
Some command line options control exactly how the CLs is computed:
-n 10can be used for testing the event loop, where you only want to run over a few models. In this case, "test" is added to the results directory name.--truthlevelwill use the input fromplots_privateMC/instead ofplots_officialMC(assuming you already created it!). The same name substitution is made in the results directory name.--strategy twosmallestwill multiply the two smallest CLs values together, instead of just using the smallest (which is the default). This also changes the output directory name in an obvious way. Other strategies could be added by making changes toCombiner.__AnalyseModelinCombineCLs.py.--truncatereplaces all CLs values less than 1e-6 with a value of 1e-6 (after the "strategy" has been applied). The word "Truncate" is appended to the strategy name in the output directory name.