Megascale RNA 3D Structure Prediction Β· 0.509 private TM-score Β· Stanford RNA 3D Folding Part 2 (Kaggle)
Multi-model RNA 3D structure prediction pipeline with a custom physics relaxation engine, designed for megascale targets (up to 4,640 nt). Achieves a private/public score inversion (0.387 public β 0.509 private) through physics-guided generalization rather than benchmark overfitting.
π Full technical whitepaper with derivations Β· Reviewer notes Β· Kaggle competition
Reproducibility note:
SobolevRNA.ipynbis Kaggle-native β model weights are mounted as Kaggle datasets under/kaggle/input/. Running locally requires substituting paths to locally downloaded weights for RNA-FM, Boltz-1, Protenix, and RibonanzaNet-2. See ATTRIBUTION.md for source links to each model.
Input Sequence (L nucleotides)
β
ββββ L β€ 1022 nt βββΊ RNA-FM (single pass, 640-dim embeddings)
β
ββββ L > 1022 nt βββΊ HWS (Hierarchical Windowed Sensor)
sliding window + taper blend
β
βββββββββββββΌββββββββββββ
β Global Contact Map β
β C_ij β {0,1}^{NΓN} β
βββββββββββββ¬ββββββββββββ
β
ββββββββββββββββ Routing by L βββββββββββββββββββ
β β
L β€ 512 nt L > 512 nt
β β
βββββββββββΌβββββββββββ ββββββββββββββββΌβββββββββββββ
β Ensemble: Boltz-1 β β SHR Megascale Path β
β + Protenix (N=5) β β (Stochastic Hamiltonian β
βββββββββββ¬βββββββββββ β Relaxation, JAX x64) β
β ββββββββββββββββ¬βββββββββββββ
ββββββββββββββββββββ¬βββββββββββββββββββββββββββββ
β
βββββββββββββΌββββββββββββ
β SHR Physics Polish β
β (E_bond + E_rep + β
β E_DL + E_Rg) β
βββββββββββββ¬ββββββββββββ
β
βββββββββββββΌββββββββββββ
β Hungarian Chain Map β
β + Kabsch Alignment β
βββββββββββββ¬ββββββββββββ
β
βββββββββββββΌββββββββββββ
β submission.csv β
β (C1' coordinates) β
βββββββββββββββββββββββββ
RNA-FM has a hard architectural truncation at 1,022 nt. For megascale targets (e.g. 9MME = 4,640 nt), a single-pass embedding is impossible. HWS extracts embeddings via overlapping windows and blends them with a linear taper to eliminate hard boundary artifacts.
Parameters:
| Symbol | Value | Description |
|---|---|---|
| 1022 nt | RNA-FM max window | |
| 768 nt | Stride between windows | |
| 128 nt | Taper length at boundaries | |
| 640 | Embedding dimension |
Weighted accumulation:
For each window
The blended embedding at position
The blended embeddings are converted to a global pairwise contact map via cosine similarity:
with
The physics engine operates on C1β² backbone coordinates
The Flory-scaling target enforces physical compaction:
with
Gradient descent on
with
Parameters adapt linearly with sequence length across three tiers:
| Parameter | |||
|---|---|---|---|
| 10.0 | |||
| 3.0 Γ |
|
|
|
|
|
1,000 | 1,500 | 8,000 |
| 0.02 | 0.01 | 0.005 | |
|
|
5 | 3 | 1 |
9MME at
Consecutive 3D chunks are aligned via the Kabsch algorithm (Kabsch 1976). Given mobile anchor
The seam between aligned chunks is blended with a linear taper over the overlap region to prevent harmonic-force spikes in the subsequent SHR polish.
| Set | TM-score |
|---|---|
| Public leaderboard | 0.38650 |
| Private leaderboard | 0.50934 |
The private > public inversion reflects that SHR physics generalizes to novel RNA families (the private set composition) better than pure deep learning baselines.
| Model | Source | Role |
|---|---|---|
| RNA-FM | Chen et al. 2022 | Sequence embeddings (HWS input) |
| RibonanzaNet-2 | Shujun717/Kaggle | Contact map (E_DL fallback) |
| Boltz-1 | odat1248/Kaggle | 3D seed generation (L β€ 800 nt) |
| Protenix v1 | qiweiyin/Kaggle | 3D generation (L β€ 512 nt) |
| USalign | Zhang Lab | TM-score evaluation |
See ATTRIBUTION.md for full credits and licenses.
If you build on HWS or SHR, please cite:
@misc{kinder2026rna,
author = {Kinder, Hunter},
title = {SobolevRNA: Megascale RNA 3D Structure Prediction with Hierarchical
Windowed Sensing and Stochastic Hamiltonian Relaxation},
year = {2026},
url = {https://github.com/aurascoper/SobolevRNA}
}