ALICE Experiment and Testbed Simulation: A Novel Algorithmic Particle Physics Learning Experiment
GNNs for HEP Particle Tracking
TODO:
- GNN in this folder DAVIDo~~!
- XGBoost to do radial stuff
- XGBoost + KMeansClustering to identify the 18-sided structures present in data
- Generalize
treeToCSVto any ROOT structure, and use arguments
- ROOT files, stored in
roots/*.rootand scraped from CERN's open data EOS, are converted into CSVs usingroot treeToCSV.Cafter downloadingrootfrom https://root.cern/install/. This stores each ROOT tree as a separate csv incsvs/Clusters/andcsvs/RecTracks. - Convert CSVs to NPZs using
processCSVtoNPZ.py, which stores../clusters.npzand../tracks.npz.
Assumes that you have your data of the form ../clusters.npz to store particle clusters.
run visualizeData.py --events 0 1 2 3 4 to print the ``fSubdetId'' field of the clusters. This uses plotly in order to plot the clusters and colors them using this field.
Specifications to visualizeData are as follows:
- events are indexed 0 through 317, and visualizeData will automatically generate only 300,000 points at a time. Over 500,000 points using plotly can crash your computer.
In boostData.py, tweaking around with the optimal parameters used for mapping various parameters of the dataset loaded in ../clusters.npz, namely the fields of fDetId, fSubdetId, and fLabel[3]. fSubdetId is the most interesting and shows preliminary track processing.
- an ensemble tree method with
XGBoostis found that has 99.6% accuracy to within a single integer classification forfSubdetId, which ranges from 0 to up to 7000 and isn't trivially spatially correlated.