Skip to content

MazinLab/VENOM

Repository files navigation

VENOM: Very Efficient Neural Optimal-filter for MKIDs

A selective state space model (Mamba architecture) for photon energy estimation from raw MKID IQ timestream data. Replaces the traditional coordinate transform + optimal filter pipeline with a learned model that operates directly on I(t), Q(t). Designed to fit inside an FPGA-class resource budget on the MKIDGen3 RFSoC readout platform.

Supports both MLX (Apple Silicon) and PyTorch (NVIDIA GPU / CPU) via a unified backend abstraction. The active backend is auto-detected at import time or overridden with VENOM_BACKEND=mlx|torch.

Headline results

A single 3,314-parameter model, trained per-detector from the same recipe, evaluated on published calibration data:

Detector N wavelengths Range VENOM mean KDE R OF mean KDE R Δ
InHf bilayer (Zobrist 2022) 10 254–1310 nm 26.6 24.8 +7%
PtSi (Zobrist 2019) 5 808–1310 nm 19.8 12.2 +62%

InHf OF baseline uses Zobrist's per-wavelength-template optimal filter (10 hand-tuned templates from mkidcalculator.filter_pulses(template_mask=True), shipped as InHfData/metadata.json). PtSi OF baseline uses the published shared 920 nm template. VENOM uses a single model with no per-wavelength tuning.

See paper/venom.pdf for the full writeup.

Architecture

VENOM uses a two-tier design motivated by the eventual FPGA deployment target:

Tier 1 — SSM Backbone (SSMBackbone): Processes raw IQ samples through a single Mamba block (LayerNorm, causal depthwise Conv1D of width 4, selective SSM with input-dependent B, C, Δ, SiLU gate). Diagonal S4D-real state matrix stored as log_neg_A for stability. Runs sample-by-sample in recurrent mode at ADC rate.

Tier 2 — Energy Head (EnergyHead): Takes the full sequence of backbone features, applies learned attention pooling over time, and regresses photon energy with a 3-layer GELU MLP. Outputs (μ̂, log σ̂²) for Gaussian NLL training. Runs at photon rate (orders of magnitude slower than ADC rate).

Dual execution modes

  • Parallel (__call__): Segmented Hillis-Steele associative scan over the full sequence, used during training. Accumulated in float64 to limit rounding error at long sequence lengths.
  • Recurrent (step): O(1) per-sample with carried (h, conv_state), matches the eventual FPGA datapath.

Numerical agreement between the two paths is characterized across the full validation set by scripts/parallel_recurrent_distribution.py. Worst-case |e_par − e_rec| is 1.4 meV on PtSi (sequence length 126) and 1.7 meV on InHf (sequence length 426) — under 5% of per-pulse σ on both datasets. Per-wavelength R values shift by ≤ 0.12 R units between the two modes.

Supporting modules

  • backend.py — unified MLX / PyTorch abstraction. Exports mx, nn, optim. Backend selected by VENOM_BACKEND or auto-detected.
  • synthetic_pca.py — PCA-based synthetic pulse generator. Streaming Gram eigendecomposition → K=50 principal components, per-wavelength mean via CubicSpline and covariance via shape-preserving PCHIP interpolation, Cholesky sampling plus orthogonal CSD reconstruction residuals. Produces 120k synthetic pulses/epoch.
  • synthetic.py — legacy template-interpolation synthetic generator (kept for reference; superseded by synthetic_pca.py).
  • ssvkernel.py — vectorized locally-adaptive KDE from Shimazaki & Shinomoto (2010); used by compute_resolving_power for the KDE FWHM.

Winner configuration (2026-04-15)

Parameter Value
d_model 16
d_state 8
n_layers 1
d_conv 4
expand 1
d_head_hidden 32
Total params 3,314

Installation

Python 3.11+.

# Core dependencies
pip install numpy scipy tqdm safetensors

# Pick one compute backend:
pip install mlx          # Apple Silicon
pip install torch        # NVIDIA GPU / CPU

# Reports and figures
pip install matplotlib reportlab

Backend auto-selects MLX only when it can run model code; headless sessions fall back to PyTorch. Force one with VENOM_BACKEND=torch or VENOM_BACKEND=mlx.

Pre-trained winner weights

The winner checkpoints and training metadata used for all numbers in the paper live in opt/:

File Contents
opt/venom_inhf_20260415_143354.safetensors InHf winner weights (3,314 params)
opt/venom_inhf_20260415_143354_meta.json Training history + validation indices
opt/venom_ptsi_20260415_071143.safetensors PtSi winner weights
opt/venom_ptsi_20260415_071143_meta.json Training history + validation indices
opt/parallel_recurrent_distribution.json Per-pulse `
opt/RESULTS_2026-04-15.md Session log documenting how the winners were reached
opt/RESULTS_{inhf,ptsi}_winner_20260415.pdf Final training-report PDFs for each dataset

Loading a winner checkpoint:

from mkid_ssm import MKIDEnergySSM
import mlx.core as mx   # or: from backend import mx

model = MKIDEnergySSM(d_model=16, d_state=8, n_layers=1,
                      d_conv=4, expand=1, d_head_hidden=32)
mx.eval(model.parameters())
model.load_weights("opt/venom_inhf_20260415_143354.safetensors")

New .safetensors checkpoints are written in the true safetensors format. Legacy VENOM checkpoints that used a .safetensors suffix for PyTorch zip archives still load through model.load_weights().

Reproducing the winner runs

# InHf (10 wavelengths, 254-1310 nm)
python mkid_ssm.py --downsample 4 --n-components 50 \
    --loss gaussian_nll --epochs 25 --n-synthetic 120000 \
    --n-layers 1 --d-model 16 --d-state 8 --expand 1 \
    --peak-sigma 2.0 --report-every 25

# PtSi (5 wavelengths, 808-1310 nm)
python mkid_ssm.py --ptsi --downsample 4 --n-components 50 \
    --loss gaussian_nll --epochs 25 --n-synthetic 120000 \
    --n-layers 1 --d-model 16 --d-state 8 --expand 1 \
    --peak-sigma 2.0 --report-every 25

The two commands differ only in --ptsi. Both use the same model, training recipe, and PCA synthetic generator. Pre-trained weights for both datasets live in opt/venom_{inhf,ptsi}_20260415_*.safetensors.

Remote training via Ray

scripts/run_remote_train.py is a Ray-Client launcher that runs the standard PyTorch training path on a remote CUDA host without modifying any training code. The driver only needs ray==2.55.1 matched to the cluster's Python minor version (3.12 for our reference setup running an NGC PyTorch container with PyTorch 2.11, CUDA 13, RTX 5090); all heavy imports (torch, mkid_ssm) happen on the worker.

# One-time: driver env matched to the cluster's Python minor version
conda create -n pyray python=3.12 -y
/opt/anaconda3/envs/pyray/bin/pip install "ray[client]==2.55.1"

# Site-specific defaults: copy .env.example to .env and edit, or
# export the variables in your shell, or pass --ray-address /
# --data-root explicitly on the CLI.
cp .env.example .env   # then edit .env with your cluster URL and data path

# Submit a run; flags after `--` are forwarded verbatim to mkid_ssm.main()
/opt/anaconda3/envs/pyray/bin/python scripts/run_remote_train.py \
    --run-name ptsi_winner \
    -- --ptsi --downsample 4 --n-components 50 --loss gaussian_nll \
       --epochs 25 --n-synthetic 120000 --n-layers 1 --d-model 16 \
       --d-state 8 --expand 1 --peak-sigma 2.0 --report-every 25

The launcher uploads the repo source as runtime_env.working_dir (data, weights, and zips are excluded), submits a single @ray.remote(num_gpus=1) task that symlinks <data-root>/{InHf,PtSi}Data into the working dir, sets VENOM_BACKEND=torch, and pulls weights/, training_reports_*/, and VENOM_*.pdf back into runs/<run-name>/ after the run completes.

Common CLI flags

Flag Meaning
--ptsi Use PtSi dataset instead of InHf
--downsample N Anti-aliased FIR decimation factor (default 4)
--n-tau T Pulse window = T × τ_qp past trigger (default 5)
--peak-sigma S MAD peak-height clip, σ (default 2.0)
--loss {mse,gaussian_nll} Default gaussian_nll
--n-synthetic N Synthetic pulses per epoch (0 = disable, default winner = 120k)
--n-components K PCA components for synthetic generator (default 50)
--epochs N Training epochs
--report-every N Emit multi-page PDF training report every N epochs
--full Use full trace, no pulse windowing
--jitter Per-wavelength label jitter
--plot-features Plot all backbone feature channels after training
--synthetic Use fully synthetic (no real) calibration for quick tests

Data loading

load_mkid_data() loads per-wavelength .npz files from a data directory. Each file must contain i_trace, q_trace, mask, peak_heights. Energy labels are computed from $E = hc/\lambda$.

When a metadata.json with an opt_filt_R dictionary is present, those per-wavelength OF resolving powers are used in preference to recomputing from peak_heights. This is how the published per-wavelength-template OF values from Zobrist 2022 are restored for InHf (shipped with InHfData/metadata.json). PtSi has no metadata.json and uses a post-clip KDE of the stored shared-template peak heights, matching Zobrist 2019.

from mkid_ssm import load_mkid_data

iq, energies, wl_ids, wl_labels, opt_R, peak_heights = load_mkid_data(
    data_dir='InHfData',
    downsample=4,
    n_tau=5.0,
    peak_height_sigma=2.0,
)

Supported datasets

Dataset Dir Wavelengths Raw rate Source
InHf bilayer InHfData/ 254–1310 nm (10) 0.8 MHz Zobrist 2022
PtSi PtSiData/ 808–1310 nm (5) 2.0 MHz Zobrist 2019

Obtaining the data

The raw IQ calibration data (InHfData.zip ≈ 1.1 GB, PtSiData.zip ≈ 580 MB) is too large to ship through GitHub. It is available on request — contact the Mazin Lab and we will provide a download link. Extract each archive in the repo root so InHfData/ and PtSiData/ sit next to mkid_ssm.py.

Energy resolution metrics

  • compute_resolving_power(values, energy) — KDE-based $R = E / \mathrm{FWHM}$ using the locally adaptive ssvkernel (Shimazaki & Shinomoto 2010). Headline metric; matches mkidcalculator's convention.
  • summarize_resolving_power(pred, true, wl_ids) — Gaussian-approximation $R = E / (2.355 \cdot \mathrm{RMSE})$. Stricter (penalizes per-wavelength bias). Used as a training monitor. Edge wavelengths excluded from the aggregate.

Training details

  • Optimizer: AdamW, weight decay 1e-4
  • LR schedule: cosine annealing with 5-epoch linear warmup
  • Loss: Gaussian NLL (default) or MSE. Gaussian NLL outputs (μ, log σ²); sigma absorbs heavy tails so training stays stable on outlier-heavy calibration data.
  • Validation: 15% stratified-by-wavelength hold-out (fixed seed)
  • Gradient accumulation: configurable micro-batch size to cap peak memory
  • Gradient checkpointing: backbone layers

Scripts

Auxiliary scripts live in scripts/:

Script Purpose
train_single.py Train one config with periodic PDF reports
run_remote_train.py Ray-Client launcher: run the PyTorch training path on a remote CUDA host
generate_paper_figures.py Regenerate all paper figures from winner weights
parallel_recurrent_distribution.py Characterize parallel-vs-recurrent mode agreement on the full validation set
convert_inhf_data.py Convert mkidcalculator pickles to the common npz format
make_report.py PDF data summary (raw-trace overview)
explore_data.py Quick data-exploration plots
sweep_hyperparams_v2.py Grid hyperparameter sweep with per-config reports

Tests

pytest tests/
# In headless/CI sessions:
VENOM_BACKEND=torch pytest tests/

Covers: model shapes, recurrent/parallel equivalence, KDE accuracy against Gaussian theory, edge cases, stratified splitting, micro-batch weights, IQ normalization.

References

  • Gu & Dao, "Mamba: Linear-Time Sequence Modeling with Selective State Spaces," COLM 2024
  • Shimazaki & Shinomoto, "Kernel Bandwidth Optimization in Spike Rate Estimation," J. Comput. Neurosci. 29, 171 (2010)
  • Zobrist et al., "Wide-band parametric amplifier readout and resolution of optical microwave kinetic inductance detectors," Appl. Phys. Lett. 115, 042601 (2019)
  • Zobrist et al., "Membraneless Phonon Trapping and Resolution Enhancement in Optical Microwave Kinetic Inductance Detectors," Phys. Rev. Lett. 129, 017701 (2022)
  • Smith et al., "MKIDGen3: Energy-resolving single-photon-counting MKID readout on an RFSoC," Rev. Sci. Instrum. 95, 114705 (2024)
  • Fritsch & Carlson, "Monotone Piecewise Cubic Interpolation," SIAM J. Numer. Anal. 17, 238 (1980)

Citation

If you use VENOM in published work, please cite this repository and the associated paper (forthcoming). A CITATION.cff file is included; GitHub will render a "Cite this repository" shortcut in the sidebar.

License

BSD 3-Clause. See LICENSE for the full text. Briefly: you may use, modify, and redistribute this software with or without modification, provided the copyright notice is retained and the Mazin Lab / UCSB name is not used to endorse derivative products without written permission.

About

VENOM: Very Efficient Neural Optimal-filter for MKIDs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors