A selective state space model (Mamba architecture) for photon energy estimation from raw MKID IQ timestream data. Replaces the traditional coordinate transform + optimal filter pipeline with a learned model that operates directly on I(t), Q(t). Designed to fit inside an FPGA-class resource budget on the MKIDGen3 RFSoC readout platform.
Supports both MLX (Apple Silicon) and PyTorch (NVIDIA GPU / CPU) via a unified backend abstraction. The active backend is auto-detected at import time or overridden with VENOM_BACKEND=mlx|torch.
A single 3,314-parameter model, trained per-detector from the same recipe, evaluated on published calibration data:
| Detector | N wavelengths | Range | VENOM mean KDE R | OF mean KDE R | Δ |
|---|---|---|---|---|---|
| InHf bilayer (Zobrist 2022) | 10 | 254–1310 nm | 26.6 | 24.8 | +7% |
| PtSi (Zobrist 2019) | 5 | 808–1310 nm | 19.8 | 12.2 | +62% |
InHf OF baseline uses Zobrist's per-wavelength-template optimal filter (10 hand-tuned templates from mkidcalculator.filter_pulses(template_mask=True), shipped as InHfData/metadata.json). PtSi OF baseline uses the published shared 920 nm template. VENOM uses a single model with no per-wavelength tuning.
See paper/venom.pdf for the full writeup.
VENOM uses a two-tier design motivated by the eventual FPGA deployment target:
Tier 1 — SSM Backbone (SSMBackbone): Processes raw IQ samples through a single Mamba block (LayerNorm, causal depthwise Conv1D of width 4, selective SSM with input-dependent B, C, Δ, SiLU gate). Diagonal S4D-real state matrix stored as log_neg_A for stability. Runs sample-by-sample in recurrent mode at ADC rate.
Tier 2 — Energy Head (EnergyHead): Takes the full sequence of backbone features, applies learned attention pooling over time, and regresses photon energy with a 3-layer GELU MLP. Outputs (μ̂, log σ̂²) for Gaussian NLL training. Runs at photon rate (orders of magnitude slower than ADC rate).
- Parallel (
__call__): Segmented Hillis-Steele associative scan over the full sequence, used during training. Accumulated in float64 to limit rounding error at long sequence lengths. - Recurrent (
step): O(1) per-sample with carried(h, conv_state), matches the eventual FPGA datapath.
Numerical agreement between the two paths is characterized across the full validation set by scripts/parallel_recurrent_distribution.py. Worst-case |e_par − e_rec| is 1.4 meV on PtSi (sequence length 126) and 1.7 meV on InHf (sequence length 426) — under 5% of per-pulse σ on both datasets. Per-wavelength R values shift by ≤ 0.12 R units between the two modes.
backend.py— unified MLX / PyTorch abstraction. Exportsmx,nn,optim. Backend selected byVENOM_BACKENDor auto-detected.synthetic_pca.py— PCA-based synthetic pulse generator. Streaming Gram eigendecomposition → K=50 principal components, per-wavelength mean via CubicSpline and covariance via shape-preserving PCHIP interpolation, Cholesky sampling plus orthogonal CSD reconstruction residuals. Produces 120k synthetic pulses/epoch.synthetic.py— legacy template-interpolation synthetic generator (kept for reference; superseded bysynthetic_pca.py).ssvkernel.py— vectorized locally-adaptive KDE from Shimazaki & Shinomoto (2010); used bycompute_resolving_powerfor the KDE FWHM.
| Parameter | Value |
|---|---|
d_model |
16 |
d_state |
8 |
n_layers |
1 |
d_conv |
4 |
expand |
1 |
d_head_hidden |
32 |
| Total params | 3,314 |
Python 3.11+.
# Core dependencies
pip install numpy scipy tqdm safetensors
# Pick one compute backend:
pip install mlx # Apple Silicon
pip install torch # NVIDIA GPU / CPU
# Reports and figures
pip install matplotlib reportlabBackend auto-selects MLX only when it can run model code; headless sessions fall back to PyTorch. Force one with VENOM_BACKEND=torch or VENOM_BACKEND=mlx.
The winner checkpoints and training metadata used for all numbers in the paper live in opt/:
| File | Contents |
|---|---|
opt/venom_inhf_20260415_143354.safetensors |
InHf winner weights (3,314 params) |
opt/venom_inhf_20260415_143354_meta.json |
Training history + validation indices |
opt/venom_ptsi_20260415_071143.safetensors |
PtSi winner weights |
opt/venom_ptsi_20260415_071143_meta.json |
Training history + validation indices |
opt/parallel_recurrent_distribution.json |
Per-pulse ` |
opt/RESULTS_2026-04-15.md |
Session log documenting how the winners were reached |
opt/RESULTS_{inhf,ptsi}_winner_20260415.pdf |
Final training-report PDFs for each dataset |
Loading a winner checkpoint:
from mkid_ssm import MKIDEnergySSM
import mlx.core as mx # or: from backend import mx
model = MKIDEnergySSM(d_model=16, d_state=8, n_layers=1,
d_conv=4, expand=1, d_head_hidden=32)
mx.eval(model.parameters())
model.load_weights("opt/venom_inhf_20260415_143354.safetensors")New .safetensors checkpoints are written in the true safetensors format. Legacy VENOM checkpoints that used a .safetensors suffix for PyTorch zip archives still load through model.load_weights().
# InHf (10 wavelengths, 254-1310 nm)
python mkid_ssm.py --downsample 4 --n-components 50 \
--loss gaussian_nll --epochs 25 --n-synthetic 120000 \
--n-layers 1 --d-model 16 --d-state 8 --expand 1 \
--peak-sigma 2.0 --report-every 25
# PtSi (5 wavelengths, 808-1310 nm)
python mkid_ssm.py --ptsi --downsample 4 --n-components 50 \
--loss gaussian_nll --epochs 25 --n-synthetic 120000 \
--n-layers 1 --d-model 16 --d-state 8 --expand 1 \
--peak-sigma 2.0 --report-every 25The two commands differ only in --ptsi. Both use the same model, training recipe, and PCA synthetic generator. Pre-trained weights for both datasets live in opt/venom_{inhf,ptsi}_20260415_*.safetensors.
scripts/run_remote_train.py is a Ray-Client launcher that runs the standard PyTorch training path on a remote CUDA host without modifying any training code. The driver only needs ray==2.55.1 matched to the cluster's Python minor version (3.12 for our reference setup running an NGC PyTorch container with PyTorch 2.11, CUDA 13, RTX 5090); all heavy imports (torch, mkid_ssm) happen on the worker.
# One-time: driver env matched to the cluster's Python minor version
conda create -n pyray python=3.12 -y
/opt/anaconda3/envs/pyray/bin/pip install "ray[client]==2.55.1"
# Site-specific defaults: copy .env.example to .env and edit, or
# export the variables in your shell, or pass --ray-address /
# --data-root explicitly on the CLI.
cp .env.example .env # then edit .env with your cluster URL and data path
# Submit a run; flags after `--` are forwarded verbatim to mkid_ssm.main()
/opt/anaconda3/envs/pyray/bin/python scripts/run_remote_train.py \
--run-name ptsi_winner \
-- --ptsi --downsample 4 --n-components 50 --loss gaussian_nll \
--epochs 25 --n-synthetic 120000 --n-layers 1 --d-model 16 \
--d-state 8 --expand 1 --peak-sigma 2.0 --report-every 25The launcher uploads the repo source as runtime_env.working_dir (data, weights, and zips are excluded), submits a single @ray.remote(num_gpus=1) task that symlinks <data-root>/{InHf,PtSi}Data into the working dir, sets VENOM_BACKEND=torch, and pulls weights/, training_reports_*/, and VENOM_*.pdf back into runs/<run-name>/ after the run completes.
| Flag | Meaning |
|---|---|
--ptsi |
Use PtSi dataset instead of InHf |
--downsample N |
Anti-aliased FIR decimation factor (default 4) |
--n-tau T |
Pulse window = T × τ_qp past trigger (default 5) |
--peak-sigma S |
MAD peak-height clip, σ (default 2.0) |
--loss {mse,gaussian_nll} |
Default gaussian_nll |
--n-synthetic N |
Synthetic pulses per epoch (0 = disable, default winner = 120k) |
--n-components K |
PCA components for synthetic generator (default 50) |
--epochs N |
Training epochs |
--report-every N |
Emit multi-page PDF training report every N epochs |
--full |
Use full trace, no pulse windowing |
--jitter |
Per-wavelength label jitter |
--plot-features |
Plot all backbone feature channels after training |
--synthetic |
Use fully synthetic (no real) calibration for quick tests |
load_mkid_data() loads per-wavelength .npz files from a data directory. Each file must contain i_trace, q_trace, mask, peak_heights. Energy labels are computed from
When a metadata.json with an opt_filt_R dictionary is present, those per-wavelength OF resolving powers are used in preference to recomputing from peak_heights. This is how the published per-wavelength-template OF values from Zobrist 2022 are restored for InHf (shipped with InHfData/metadata.json). PtSi has no metadata.json and uses a post-clip KDE of the stored shared-template peak heights, matching Zobrist 2019.
from mkid_ssm import load_mkid_data
iq, energies, wl_ids, wl_labels, opt_R, peak_heights = load_mkid_data(
data_dir='InHfData',
downsample=4,
n_tau=5.0,
peak_height_sigma=2.0,
)| Dataset | Dir | Wavelengths | Raw rate | Source |
|---|---|---|---|---|
| InHf bilayer | InHfData/ |
254–1310 nm (10) | 0.8 MHz | Zobrist 2022 |
| PtSi | PtSiData/ |
808–1310 nm (5) | 2.0 MHz | Zobrist 2019 |
The raw IQ calibration data (InHfData.zip ≈ 1.1 GB, PtSiData.zip ≈ 580 MB) is too large to ship through GitHub. It is available on request — contact the Mazin Lab and we will provide a download link. Extract each archive in the repo root so InHfData/ and PtSiData/ sit next to mkid_ssm.py.
-
compute_resolving_power(values, energy)— KDE-based$R = E / \mathrm{FWHM}$ using the locally adaptive ssvkernel (Shimazaki & Shinomoto 2010). Headline metric; matchesmkidcalculator's convention. -
summarize_resolving_power(pred, true, wl_ids)— Gaussian-approximation$R = E / (2.355 \cdot \mathrm{RMSE})$ . Stricter (penalizes per-wavelength bias). Used as a training monitor. Edge wavelengths excluded from the aggregate.
- Optimizer: AdamW, weight decay 1e-4
- LR schedule: cosine annealing with 5-epoch linear warmup
- Loss: Gaussian NLL (default) or MSE. Gaussian NLL outputs
(μ, log σ²); sigma absorbs heavy tails so training stays stable on outlier-heavy calibration data. - Validation: 15% stratified-by-wavelength hold-out (fixed seed)
- Gradient accumulation: configurable micro-batch size to cap peak memory
- Gradient checkpointing: backbone layers
Auxiliary scripts live in scripts/:
| Script | Purpose |
|---|---|
train_single.py |
Train one config with periodic PDF reports |
run_remote_train.py |
Ray-Client launcher: run the PyTorch training path on a remote CUDA host |
generate_paper_figures.py |
Regenerate all paper figures from winner weights |
parallel_recurrent_distribution.py |
Characterize parallel-vs-recurrent mode agreement on the full validation set |
convert_inhf_data.py |
Convert mkidcalculator pickles to the common npz format |
make_report.py |
PDF data summary (raw-trace overview) |
explore_data.py |
Quick data-exploration plots |
sweep_hyperparams_v2.py |
Grid hyperparameter sweep with per-config reports |
pytest tests/
# In headless/CI sessions:
VENOM_BACKEND=torch pytest tests/Covers: model shapes, recurrent/parallel equivalence, KDE accuracy against Gaussian theory, edge cases, stratified splitting, micro-batch weights, IQ normalization.
- Gu & Dao, "Mamba: Linear-Time Sequence Modeling with Selective State Spaces," COLM 2024
- Shimazaki & Shinomoto, "Kernel Bandwidth Optimization in Spike Rate Estimation," J. Comput. Neurosci. 29, 171 (2010)
- Zobrist et al., "Wide-band parametric amplifier readout and resolution of optical microwave kinetic inductance detectors," Appl. Phys. Lett. 115, 042601 (2019)
- Zobrist et al., "Membraneless Phonon Trapping and Resolution Enhancement in Optical Microwave Kinetic Inductance Detectors," Phys. Rev. Lett. 129, 017701 (2022)
- Smith et al., "MKIDGen3: Energy-resolving single-photon-counting MKID readout on an RFSoC," Rev. Sci. Instrum. 95, 114705 (2024)
- Fritsch & Carlson, "Monotone Piecewise Cubic Interpolation," SIAM J. Numer. Anal. 17, 238 (1980)
If you use VENOM in published work, please cite this repository and the associated paper (forthcoming). A CITATION.cff file is included; GitHub will render a "Cite this repository" shortcut in the sidebar.
BSD 3-Clause. See LICENSE for the full text. Briefly: you may use, modify, and redistribute this software with or without modification, provided the copyright notice is retained and the Mazin Lab / UCSB name is not used to endorse derivative products without written permission.