GSoC 2026: E2E & End-to-End Deep Learning

Candidate: Anvitha Bhat

Organisation: ML4SCI / E2E

Project: Foundation Models for E2E Event Reconstruction

Overview

This repository presents the E2E Physics Foundation Model for Task 2j at the High-Luminosity LHC. By implementing a Joint-Embedding Predictive Architecture (JEPA) with Linear FastAttention, it solves the $O(N^2)$ bottleneck of traditional Transformers at scale.

Component	Technical Result	Status	Reference
Task 2j: Foundation Model	75.53% Accuracy (0.35 Loss)	Complete	Task README
Task 2g: E2E Inference	Linear O(N) Scaling (Opset 18)	Complete	Task README

Key Navigation

Task 2j (JEPA & FastAttention) — Pre-training, Latent Discovery, and Scaling.
Task 2g (CMSSW Inference) — ONNX Export and Benchmark Guide.
Methodology Deep Dive — Technical Implementation details.

Visual Evidences

E2E Architecture Flowchart

flowchart TD
    A["Raw E2E Data"] --> B["Preprocessing (100k Events)"]
    B --> C["E2E Foundation Model (JEPA)"]
    
    subgraph JEPA["Joint-Embedding Architecture"]
        direction TB
        C1["Input Sequence"] --> C2["Random Masking"]
        C2 --> C3["Context Encoder (FastAttention)"]
        C2 --> C4["Target Encoder (Momentum)"]
        C3 --> C5["Predictor"]
        C4 --> C6["Latent Target"]
        C5 <-->|"JEPA Loss"| C6
    end
    
    C3 --> D["CLS Embedding"]
    D --> E["Classification Head"] --> F{"Result"}
    F --> G["75.53% Accuracy"]
    
    D -.-> H["t-SNE Latent Visualization"]
    C3 -.-> I["ONNX Export (Opset 18)"]
    I -.-> J["E2E Inference (< 5ms)"]

    style JEPA fill:#f9f9f9,stroke:#333,stroke-dasharray: 5 5
    style G fill:#d4edda,stroke:#28a745,color:#155724
    style J fill:#cce5ff,stroke:#004085,color:#004085

Technical Results & Proofs

Latent Manifold (t-SNE)	$O(N)$ Scaling (FastAttention)

Physics Saliency	Loss Decay (0.8 → 0.3)

Representation Learning: JEPA discovers distinct manifold separation between Quark and Gluon jets without explicit labels during pre-training.
Linear Scaling: Replacing $O(N^2)$ attention with FastAttention allows the model to handle High-Luminosity LHC pileup scales (2048+ particles) without memory overflow.
Physics Intuition: Saliency maps confirm the model focuses on core kinematic features ($p_T, \eta, \phi$).

Ablation Summary

The full pipeline (JEPA and FastAttention) achieves a +15% AUC gain over vanilla Transformer baselines on high-multiplicity events.

Technical Architecture

1. Joint-Embedding Predictive Architecture (JEPA)

JEPA predicts latent representations, in contrast to conventional Autoencoders that reconstruct raw pixels. As a result, the model focuses on the physical laws governing energy distributions and is resistant to detector noise.

2. FastAttention O(N) Efficiency

The <5ms HLT latency budget is achieved while preserving global context by linearizing the attention mechanism. As a sparsity ready framework, this architecture can thus perform sophisticated dictionary learning right out of the box.

Repo Structure

Each task lives in its own parent folder with a README.md, models/, and evidence.

E2E_2026/
├── Task_2j_Foundation_Model/    # JEPA pre-training + O(N) attention
│   ├── models/                  #   FastAttention & JepaMAE
│   ├── data/                    #   preprocess_cms.py (100k events, 80-10-10)
│   ├── training/                #   train_cms.py, val_cms.py, run_ablation.py
│   ├── results/
│   │   ├── e2e_flowchart.jpg      #   architecture flowchart
│   │   ├── verify_e2e_results.py  #   primary: run for 75.53%
│   │   ├── weights/               #   pre-trained weights
│   │   └── plots/               #   latent_tsne.png, loss_decay_plot.png, etc.
│   └── README.md              #   Task 2j detail page
│
├── Task_2g_CMSSW_Inference/    # CMSSW-ready ONNX inference
│   ├── onnx_models/             #   part_hybrid_vit.onnx, momentum_regressor.onnx
│   ├── benchmarks/              #   run_onnx_inference.py, benchmark_model.py
│   ├── CMSSW_Guide.md           #   E2E-ready ONNX inference guide
│   └── README.md              #   Task 2g detail page
│
├── reco/                        #   CMSSW inference configuration (inference_cfg.py)
├── utils/                      #   Shared visualization & export scripts
├── proj_data/                  #   Processed .npz splits (train/val/test)
├── QuarkGluon/                 #   Raw E2E parquet files (~22 GB)
└── README.md

Quick Verification

All commands are to be run from the repo root:

# task 2j: reproduce 75.53% accuracy
python Task_2j_Foundation_Model/data/preprocess_cms.py
python Task_2j_Foundation_Model/results/verify_e2e_results.py   # 75.53%

# task 2j: regenerate plots
python utils/visualize_latent_space.py                           # t-SNE
python utils/plot_scaling_comparison.py                          # O(N) scaling

# task 2g: E2E latency benchmark
pip install onnxruntime
python Task_2g_CMSSW_Inference/benchmarks/run_onnx_inference.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GSoC 2026: E2E & End-to-End Deep Learning

Overview

Key Navigation

Visual Evidences

E2E Architecture Flowchart

Technical Results & Proofs

Ablation Summary

Technical Architecture

1. Joint-Embedding Predictive Architecture (JEPA)

2. FastAttention O(N) Efficiency

Repo Structure

Quick Verification

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Task_2g_CMSSW_Inference		Task_2g_CMSSW_Inference
Task_2j_Foundation_Model		Task_2j_Foundation_Model
reco		reco
utils		utils
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

GSoC 2026: E2E & End-to-End Deep Learning

Overview

Key Navigation

Visual Evidences

E2E Architecture Flowchart

Technical Results & Proofs

Ablation Summary

Technical Architecture

1. Joint-Embedding Predictive Architecture (JEPA)

2. FastAttention O(N) Efficiency

Repo Structure

Quick Verification

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages