To run jobs multi-threaded or on a computing cluster, epic-analysis can be configured to run one
job per simulation ROOT file. The scripts in this directory support this procedure.
Run any script without any arguments for more documentation.
NOTE: Running epic-analysis using these hpc tools is not as well tested as running
single-threaded; please check everything carefully, especially Q2 weights. Report any
issues; you are welcome to contribute your own scripts to support your preferred computing cluster.
It is highly recommended to test jobs with small samples, before launching a full-scale analysis
on all available data.
The hpc toolkit has a built-in macro for streamlining analysis across many campaigns, Q2 ranges, energy configurations, and even detector setups. The pipeline's aim is to automate the following steps entirely on Jefferson Lab's slurm system:
- Creation of the main s3
.configfile (typically stored indatarec/) - Calculation of the number of events stored within each
s3file's TTree (used for calculating event-by-event weights). These are also cached inhpc/nevents_databasesfor faster computation of future pipelines. - Splitting of the main s3
.configfile into batches (for parallel computing). - Execution of the analysis macro for each batched
.configfile - Merging of the output analysis
.rootfiles into a singleanalysis.rootfile
The script that handles the pipeline is run-local-slurm-pipeline.rb. The user should edit this script with the desired configurations. These include the campaigns, the energies of interest within those campaigns, the detector configuration, the number of files from s3 to analyze (per Q2 range) and the number of root files which are analyzed per slurm job. By default, several of these parameters will trip the error handler until the user sets them accordingly.
We note that the calculation of the nevents for each s3 TTree, albeit time-consuming, is very important for our parallel computing needs. This is because the event-by-event Q2weights depend on how many total events are simulated for each Q2 range. Since we are batching the main s3 .config into smaller chunks, this information is lost unless we calculate the number of events before running the analysis. These event counts are then used to set manual weights in the batched .config files.
To run the pipeline:
hpc/run-local-slurm-pipeline.rb
Optionally, you can use the --overwrite flag to skip the query to delete pre-existing project files.
There are several known issues to be aware of pertaining to memory usage. If NROOT_FILES_PER_JOB is too large, then the per job memory allocation listed in run-local-slurm.rb may be too small to create the ROOT TTree's from the analysis macro. Additionally, the merging of all ROOT TFile's into one may run out of memory. This would be limited by the memory allocation listed in the pipeline job created by run-local-slurm-pipeline.rb. It is set to 4000mb now which is reasonable and should not run out of memory.
hpc/prepare.rbA typical config file will list several ROOT files; run hpc/prepare.rb to split a config
file into one config file per ROOT file; these config files can each be fed to an analysis
macro. Total yields per Q2 bin are automatically obtained and stored in all config files, to make
sure the resulting Q2 weights are correct for the combined set of files.
Alternatively, one can split the starting config file into multiple config files where the user specifies (as a third argument) the number of ROOT files per config. To do so, one would utilize the following script.
hpc/prepare-multi-roots.rbThis step depends on where you want to run jobs. In general, output ROOT files will be written
to a user-specified subdirectory of out/.
hpc/run-local-condor.rbIf you have a local condor service, use this script to prepare a condor configuration script.
The user must then run condor_submit from outside of eic-shell (individual jobs will be run in eic-shell).
Log files will be written to hpc/log/.
- TODO
- Can we avoid using S3?
- TODO - need Slurm config generator
- Can we avoid using S3?
hpc/merge.rbAfter successfully running jobs, combine the resulting output ROOT files; this is basically hadd
but with some handlers for our custom classes Histos, BinSet, etc. The resulting combined file
can then be used in downstream post-processing macros or user analysis.