Releases: ductho-le/WaveDL
Releases · ductho-le/WaveDL
v1.8.0
Added
- Models: 8 new architectures (69 → 71 public, after removing 6):
- WaveNet (small/base/large): Gated dilated convolutional network adapted for 1D waveform regression. Signature
tanh×sigmoidgated activation, skip connections summed across all dilation layers, same-padding (non-causal). ~1M/4M/15M params. 1D-only. - S4D (small/base/large): Diagonal Structured State Space Model (S4D-Lin kernel). Computed as FFT convolution — O(L log L), fully vectorized,
torch.compile-safe, MPS-compatible. HiPPO-LegS initialization. ~0.8M/3.2M/11M params. 1D-only. - EfficientNet-B4: Medium tier, ~19M params. Torchvision pretrained. 2D-only.
- EfficientNet-B7: Large tier, ~66M params. Torchvision pretrained. 2D-only.
- WaveNet (small/base/large): Gated dilated convolutional network adapted for 1D waveform regression. Signature
Removed
- Models: 6 variants pruned to clean up redundant parameter-count tiers:
efficientnet_b1(7.8M) — sandwiched between B0 (5.3M) and B2 (9.1M)efficientvit_m0(2.2M),efficientvit_m2(3.8M) — M-series clusters 2–4M; keep M1 onlyefficientvit_b0(2.1M) — duplicates M-series rangeefficientvit_b3(46M),efficientvit_l1(49M) — near-identical; keep L2 (60M) only
- Resulting EfficientNet tier: B0 (5.3M) → B2 (9.1M) → B4 (19M) → B7 (66M)
- Resulting EfficientViT tier: M1 (2.6M) → B1 (7.5M) → B2 (21.8M) → L2 (60.5M)
Fixed
- DenseNet: Replaced
MaxPoolwithAvgPoolin stem — workaround for a Triton compiler bug that causes incorrect gradients on large tensors when using--compile - MaxViT: Replaced hardcoded
_DIVISOR = 28with a_NATIVE_SIZESmap (224 → 224,384 → 384,512 → 512);img_sizeis now passed totimm.create_model()so attention windows are pre-configured for the actual input resolution - HPC plotting: Added
matplotlib.use("Agg")beforepyplotimport intrain.py,test.py, andutils/metrics.py— prevents crash on headless compute nodes that have no$DISPLAY
v1.7.1
Added
- Cross-validation: MPS (Apple Silicon GPU) device support — auto-detects CUDA → MPS → CPU
- HPO: SQLite study persistence (
--storage) — interrupted searches resume automatically - ConvNeXt: Stochastic depth (DropPath) via
timm.layers.DropPathwith linearly-increasing rates (was no-opnn.Identity())
Changed
- CI: All workflows now test against Python
3.11/3.12/3.13matrix - Tests: Regression tests rewritten to call production code (
hpo.main(),export_to_onnx(),_save_best_checkpoint()) instead of replaying logic inline
Fixed
- Critical: Auto-resume duplicated epochs — history now truncated to
start_epochon resume - Critical: Cross-validation OOM (SIGKILL:9) —
CVDatasetuses zero-copytorch.from_numpy()instead oftorch.tensor() - Critical: Release workflow race — split matrix job so GitHub Release is created once, not per Python version
- Critical: Inference crash on empty datasets —
run_inference()guards empty predictions;main()early-returns before scaler validation - Models: Pretrained timm wrappers (CaFormer, EfficientViT, FastViT, MaxViT) now probe feature dims in
eval()mode to preserve BatchNorm running stats - Cross-validation:
StratifiedKFoldgracefully falls back toKFoldwhen bins have too few samples - Training: Warning suppression scoped to known-noisy libraries (
sklearn,timm,torchvision,scipy) instead of blanketFutureWarning/DeprecationWarningfilter - Metrics: Relative-error plots and CDF percentile markers now exclude
NaNfrom near-zero targets (was mapping to 0%, understating error) - Metrics: 5 plot functions (
plot_correlation_heatmap,plot_relative_error,plot_error_cdf,plot_prediction_vs_index,plot_error_boxplot) now call_ensure_style_configured()for consistent styling - Cross-validation: Fold-level
gc.collect()+torch.cuda.empty_cache(); scheduler/optimizer args no longer silently dropped;pin_memoryconditional on CUDA - DDP: ReduceLROnPlateau broadcasts per-group LRs (preserves multi-group ratios)
- HPO: Crashed subprocess trials return
inf;--inprocessforcesn_jobs=1; emptynvidia-smino longer yields phantom GPU - Launcher: W&B default changed to
"online"(fixes spurious offline sync messages on local machines) - Training: Mixed-precision log shows actual
accelerator.mixed_precisionvalue - Inference: Output directory created before ONNX export
Removed
- Training: Dead
_run_train_epoch()/_run_validation()helpers (~130 lines)
v1.7.0
Added
- HPO:
--mediumsearch preset (balanced between--quickand full) - HPO:
--inprocessmode for in-process trial execution with pruning support (faster, but no GPU memory isolation) - Training:
train_single_trial()function for programmatic HPO integration with pruning callbacks - Tests: 368 new lines in
test_integration.pycovering CLI E2E subprocess tests, HPO objective execution, ONNX denormalization accuracy, and Mamba long-sequence stability - Utils:
setup_hpc_cache_dirs()exported as public API for HPC environments
Changed
- Mamba: Chunked parallel scan for sequences > 512 tokens (numerical stability), warning for sequences > 2048
- ViT:
MultiHeadAttentionnow usesF.scaled_dot_product_attention(PyTorch 2.0+ fused attention) - CNN: Added proper weight initialization (Kaiming for conv, Xavier for linear)
- Refactoring: Consolidated
_setup_cache_dir()intowavedl.utils.setup_hpc_cache_dirs() - Refactoring: Consolidated
LayerNormNdinto_pretrained_utils.py(removed duplicate fromconvnext.py) - Refactoring: Added
DropPathandfreeze_backbone()utilities to_pretrained_utils.py - Refactoring: Extracted
_run_train_epoch()and_run_validation()helpers intrain.py - HPO: Subprocess mode now uses
NopPruner; in-process mode usesMedianPruner - HPO: Conditional args (
huber_delta,momentum) always set with defaults, notNone
Fixed
- Mamba: Numerical overflow on long sequences (> 512) via chunked scan
- ConvNeXt V2: Renamed misleading class names to match architecture
- Metrics: Type hint
any→Anyinload_checkpoint()return type - Training: Removed redundant
MPLCONFIGDIRsetup (already handled bysetup_hpc_cache_dirs()) - ResNet3D: Input channel adaptation uses shared
_adapt_input_channels()utility - UniRepLKNet: Input channel adaptation uses shared utility
- Template: Fixed docstring placeholder in
_template.py
v1.6.3
Fixed
- Data: Explicit
--input_keynow raisesKeyErrorif not found (previously silently fell back to auto-detection, risking wrong data load) - DDP: Non-main ranks now timeout after 1 hour (configurable via
WAVEDL_CACHE_TIMEOUT) instead of waiting indefinitely for cache files - DDP: Cache wait uses
time.monotonic()for robustness against system clock changes - Inference: Clear
ImportErrorwith install instructions when.safetensorscheckpoint exists but library not installed - Training: Scaler now always copied to checkpoint (previously skipped if destination existed, causing stale scaler on retrain)
- Documentation:
CONTRIBUTING.mdsetup now includes[dev]extras for pre-commit and ruff
Added
- Tests: 6 new tests covering explicit key validation, safetensors error handling, and scaler portability
v1.6.2
Added
- CLI: Unified
wavedl-traincommand that works on both local machines and HPC clusters- Auto-detects environment (SLURM, PBS, LSF, SGE, Cobalt)
- HPC: Uses local caching (CWD), offline WandB
- Local: Uses standard cache locations (
~/.cache) - Fast
--list_modelsflag (no accelerate overhead) wavedl-hpckept as backwards-compatible alias
Changed
- CLI: Renamed
hpc.py→launcher.py(clearer purpose for universal launcher) - Documentation: All README examples now use
wavedl-traininstead ofaccelerate launch
v1.6.1
Added
- Models: 12 new architectures (57 → 69 total):
- UniRepLKNet (tiny/small/base): Large-kernel ConvNet with 31×31 kernels for long-range wave correlations. Dimension-agnostic (1D/2D/3D). Custom implementation, no pretrained weights.
- EfficientViT (m0-m2, b0-b3, l1-l2): Memory-efficient ViT with cascaded group attention. 9 variants from 2.1M to 60.5M params. ImageNet pretrained via timm. 2D only.
Changed
- Refactoring: Consolidated
SpatialShapetype alias intobase.py(was duplicated in 8 files) - Refactoring: Consolidated GroupNorm helpers (
_get_num_groups,_find_group_count,_compute_num_groups) into singlecompute_num_groups()inbase.py - Refactoring: Renamed
_timm_utils.py→_pretrained_utils.py(now handles both torchvision and timm models) - Refactoring: Extracted pretrained model channel adaptation into shared utilities:
adapt_first_conv_for_single_channel(): For torchvision models with known pathsfind_and_adapt_input_convs(): For timm models with dynamic layer discovery
Fixed
- MaxViT: Auto-resize input to compatible size (divisible by 28) for arbitrary input dimensions
- Mamba/Vim: Replaced O(L) sequential for-loop with vectorized parallel scan (~100x faster, fixes infinite hang with
--compile) - Dependencies: Added
onnxscript(required bytorch.onnx.exportin PyTorch 2.1+) - HPC Cache: Pre-download script now uses exact weight versions, preventing redundant downloads
v1.6.0
Added
- Models: 19 new architectures (38 → 57 total): ConvNeXt V2, Mamba, Vision Mamba, MaxViT, FastViT, CAFormer, PoolFormer
- Tests: Expanded architecture tests with freeze_backbone and single-channel input validation
Changed
- CLI: Renamed
--no-pretrainedto--no_pretrainedfor consistency with other flags
Fixed
- ConvNeXt: Added LayerScale (init=1e-6) and fixed LayerNorm to prevent gradient explosion
- Data:
_TransposedH5Datasetnow hasndimproperty (fixes MAT v7.3 memmap crash) - Data: Explicit
--output_keynow raisesKeyErrorif not found (no silent fallback) - Training: Mixed precision (
--precision bf16/fp16) now wraps forward pass inautocast()
v1.5.7
Added
- Plotting: Refactored with helper functions for cleaner code
- Tests: 178 new unit tests (725 → 903 total)
- Training:
--deterministicand--cache_validateflags
Changed
- Plotting: Publication-quality styling with LaTeX fonts
- Documentation: Updated README with new SPIE paper link
- Pretrained Models: All use modified conv for 1-channel (3× memory savings vs expand)
Fixed
- CLI:
--pretrainednow usesBooleanOptionalAction(was no-op) - Constraints:
x[i,j]auto-squeezes channel for single-channel data - Inference: Channels-last format now raises error with fix guidance
- Inference:
load_checkpointusespretrained=False(offline-safe) - Inference:
--input_key/--output_keystrict validation (exact match required) - TCN:
GroupNormdivisibility for custom channel counts - CLI:
--importerror handling for missing/invalid files - Pretrained:
freeze_backbonenow freezes adapted stem conv - Pretrained: Swin
features[0][0]access guarded for torchvision compatibility - CI: LaTeX rendering optional with
_is_latex_available()check - Examples: Added missing checkpoint files, fixed notebook cell
- Tests: Fixed RUF059 lint warnings in
test_data_cv.py
v1.5.6
Added
- ViT:
pad_if_neededparameter inPatchEmbedandViTBasefor NDE/QUS applications where edge effects matter (pads input to patch-aligned size instead of dropping edge pixels) - Training:
--no_pretrainedflag to train from scratch without ImageNet weights - MATLAB:
WaveDL_ONNX_Inference.mscript for ONNX model inference in MATLAB with automatic data format handling
Changed
- API:
NPZSource.load_mmap()now returnsLazyDataHandle(consistent withHDF5SourceandMATSource) - Warnings: Narrowed warning suppression in
train.pyto preserve legitimate torch/numpy warnings about NaN and dtype issues - Data: Cache validation now uses SHA256 content hash instead of mtime (portable across folders, robust against Dropbox/cloud sync)
- Examples: Renamed
elastic_cnn_example/toelasticity_prediction/with MobileNetV3 model (was CNN)
Fixed
- API: Removed special-case handling in train.py and data.py for inconsistent
load_mmap()return types - DDP: ReduceLROnPlateau patience was divided by GPU count (accelerator wrapper caused multi-process stepping)
- MATLAB ONNX: Fixed critical image transpose issue - data must be transposed to convert from MATLAB column-major to Python row-major ordering
- MATLAB ONNX: Added network initialization step after
importNetworkFromONNX(required for networks with unknown input formats)
v1.5.5
Fixed
- Inference: Single-sample MAT files with multiple targets now correctly load as
(1, T)instead of(T, 1) - HPO: Removed read-only site-packages cwd (prevents permission errors when pip-installed)
- Data: Cache invalidation now raises RuntimeError if stale files cannot be removed (prevents silent stale data reuse)
- Data: NPZ file descriptors now properly closed after loading (prevents leaks in long-running workflows)
- Metrics:
plot_qqhandles zero-variance errors gracefully (no more NaN/division-by-zero) - Tests: Integration test for multi-epoch training no longer flaky (removed random-data loss decrease assertion)