Candidate: Anvitha Bhat A
My Medium Blog of this Project: Click here
To ensure quick access without the need of extensive setup or
| Resource Type | Description | Link |
|---|---|---|
| Interactive Demo | Verify QED/QCD predictions live | Open in Colab |
| Pre-trained Weights | Final JEPA backbone weights (0.098 MSE) | Download (.pth) |
| Inference Script | Lightweight CLI tool for batch verification | View Code |
-
Tokenization Transparency (Feature 1): A raw QED amplitude string (e.g.,
mul(pow(alpha,2), Tr(gamma_mu, slash(p1), gamma_nu, slash(p2)), div(1, s))) is passed through the prefix normalizer and converted into an integer token sequence in real time, with each intermediate representation printed for full auditability -
O(N) FastAttention Backbone (Feature 2): The normalized token tensor is forwarded through the LM-JEPA context encoder, producing a contextual embedding of shape
(1, 16, 256)via linear complexity attention, confirming stable latent dimensionality with 0 quadratic memory overhead
These results can be replicated using the interactive notebook linked in the Quick Links section above.
Full environment setup, dependency installation and verification steps are documented in SETUP.md
The below flowchart highlights the sequential data flow from preprocessing to the foundation model and depicts the model pipeline that uses advanced representation learning and symbolic mathematics to predict squared amplitudes.
graph TD
A[Raw QED/QCD Symbolic Expressions] --> B[Task 1.2: Prefix Tokenization & Index Normalization]
B -->|Preprocessed Data Handover| C[Task 2.5: LM-JEPA Transformer Backbone]
C -->|Linear FastAttention| D[Predicted Squared Amplitudes]
D -.->|Complexity Scaling| E((Convergence))
The progressive deliverables and the corresponding performance metrics are cataloged below. Please click the directory links to navigate to the specific task folders.
| Task ID | Component | Metric (MSE Loss) | Visual Validation | Task Documentation | Notebook / Weights |
|---|---|---|---|---|---|
| 1.2 | Data Pre-processing | N/A (Lossless) | Reconstruction Proof | Task 1.2 README | Solution PDF |
| 2.5 | LM-JEPA Pre-training | 0.125 | JEPA Loss Curve | Task 2.5 README | Local Weights |
| 2.5 | QCD Fine-tuning | 0.098 | Parity Plot | Task 2.5 README | Local Weights |
The table below demonstrates the model's robustness and efficiency of the FastAttention mechanism when predicting squared amplitudes for expressions of varying lengths
| Operand Count | Validation MSE Loss | Inference Infrastructure Status |
|---|---|---|
| 2 Operands | 0.091 | Stable |
| 4 Operands | 0.098 | Stable |
| 6+ Operands | 0.112 | Stable |
-
$O(N)$ Complexity Architecture: The integration of FastAttention allows the model to efficiently process deep symbolic expressions containing more than four operands. This structural enhancement circumvents the computational bottleneck of standard$O(N^2)$ Transformers, preventing out-of-memory errors and maintaining rapid inference times on extended mathematical sequences.
- JEPA Latent Space: The Joint-Embedding Predictive Architecture (JEPA) facilitates robust representation learning. By operating in the latent space, the model establishes a foundational prior for Feynman diagrams entirely unsupervised, capturing the underlying physics before conventional supervised fine-tuning is initiated.
Variance control during training was a crucial discovery made during the model optimization stage. While predicting complex QCD amplitudes, Index Normalization turned out to be the crucial intervention needed to stabilize the validation loss. The magnitude of attention gradients was maintained by regularizing the index distributions before tokenization which avoided divergence and guaranteed smooth convergence.


