Skip to content

Releases: ductho-le/WaveDL

v1.8.0

23 Feb 19:13

Choose a tag to compare

Added

  • Models: 8 new architectures (69 → 71 public, after removing 6):
    • WaveNet (small/base/large): Gated dilated convolutional network adapted for 1D waveform regression. Signature tanh×sigmoid gated activation, skip connections summed across all dilation layers, same-padding (non-causal). ~1M/4M/15M params. 1D-only.
    • S4D (small/base/large): Diagonal Structured State Space Model (S4D-Lin kernel). Computed as FFT convolution — O(L log L), fully vectorized, torch.compile-safe, MPS-compatible. HiPPO-LegS initialization. ~0.8M/3.2M/11M params. 1D-only.
    • EfficientNet-B4: Medium tier, ~19M params. Torchvision pretrained. 2D-only.
    • EfficientNet-B7: Large tier, ~66M params. Torchvision pretrained. 2D-only.

Removed

  • Models: 6 variants pruned to clean up redundant parameter-count tiers:
    • efficientnet_b1 (7.8M) — sandwiched between B0 (5.3M) and B2 (9.1M)
    • efficientvit_m0 (2.2M), efficientvit_m2 (3.8M) — M-series clusters 2–4M; keep M1 only
    • efficientvit_b0 (2.1M) — duplicates M-series range
    • efficientvit_b3 (46M), efficientvit_l1 (49M) — near-identical; keep L2 (60M) only
  • Resulting EfficientNet tier: B0 (5.3M) → B2 (9.1M) → B4 (19M) → B7 (66M)
  • Resulting EfficientViT tier: M1 (2.6M) → B1 (7.5M) → B2 (21.8M) → L2 (60.5M)

Fixed

  • DenseNet: Replaced MaxPool with AvgPool in stem — workaround for a Triton compiler bug that causes incorrect gradients on large tensors when using --compile
  • MaxViT: Replaced hardcoded _DIVISOR = 28 with a _NATIVE_SIZES map (224 → 224, 384 → 384, 512 → 512); img_size is now passed to timm.create_model() so attention windows are pre-configured for the actual input resolution
  • HPC plotting: Added matplotlib.use("Agg") before pyplot import in train.py, test.py, and utils/metrics.py — prevents crash on headless compute nodes that have no $DISPLAY

v1.7.1

14 Feb 22:53

Choose a tag to compare

Added

  • Cross-validation: MPS (Apple Silicon GPU) device support — auto-detects CUDA → MPS → CPU
  • HPO: SQLite study persistence (--storage) — interrupted searches resume automatically
  • ConvNeXt: Stochastic depth (DropPath) via timm.layers.DropPath with linearly-increasing rates (was no-op nn.Identity())

Changed

  • CI: All workflows now test against Python 3.11/3.12/3.13 matrix
  • Tests: Regression tests rewritten to call production code (hpo.main(), export_to_onnx(), _save_best_checkpoint()) instead of replaying logic inline

Fixed

  • Critical: Auto-resume duplicated epochs — history now truncated to start_epoch on resume
  • Critical: Cross-validation OOM (SIGKILL:9) — CVDataset uses zero-copy torch.from_numpy() instead of torch.tensor()
  • Critical: Release workflow race — split matrix job so GitHub Release is created once, not per Python version
  • Critical: Inference crash on empty datasets — run_inference() guards empty predictions; main() early-returns before scaler validation
  • Models: Pretrained timm wrappers (CaFormer, EfficientViT, FastViT, MaxViT) now probe feature dims in eval() mode to preserve BatchNorm running stats
  • Cross-validation: StratifiedKFold gracefully falls back to KFold when bins have too few samples
  • Training: Warning suppression scoped to known-noisy libraries (sklearn, timm, torchvision, scipy) instead of blanket FutureWarning/DeprecationWarning filter
  • Metrics: Relative-error plots and CDF percentile markers now exclude NaN from near-zero targets (was mapping to 0%, understating error)
  • Metrics: 5 plot functions (plot_correlation_heatmap, plot_relative_error, plot_error_cdf, plot_prediction_vs_index, plot_error_boxplot) now call _ensure_style_configured() for consistent styling
  • Cross-validation: Fold-level gc.collect() + torch.cuda.empty_cache(); scheduler/optimizer args no longer silently dropped; pin_memory conditional on CUDA
  • DDP: ReduceLROnPlateau broadcasts per-group LRs (preserves multi-group ratios)
  • HPO: Crashed subprocess trials return inf; --inprocess forces n_jobs=1; empty nvidia-smi no longer yields phantom GPU
  • Launcher: W&B default changed to "online" (fixes spurious offline sync messages on local machines)
  • Training: Mixed-precision log shows actual accelerator.mixed_precision value
  • Inference: Output directory created before ONNX export

Removed

  • Training: Dead _run_train_epoch() / _run_validation() helpers (~130 lines)

v1.7.0

05 Feb 23:31

Choose a tag to compare

Added

  • HPO: --medium search preset (balanced between --quick and full)
  • HPO: --inprocess mode for in-process trial execution with pruning support (faster, but no GPU memory isolation)
  • Training: train_single_trial() function for programmatic HPO integration with pruning callbacks
  • Tests: 368 new lines in test_integration.py covering CLI E2E subprocess tests, HPO objective execution, ONNX denormalization accuracy, and Mamba long-sequence stability
  • Utils: setup_hpc_cache_dirs() exported as public API for HPC environments

Changed

  • Mamba: Chunked parallel scan for sequences > 512 tokens (numerical stability), warning for sequences > 2048
  • ViT: MultiHeadAttention now uses F.scaled_dot_product_attention (PyTorch 2.0+ fused attention)
  • CNN: Added proper weight initialization (Kaiming for conv, Xavier for linear)
  • Refactoring: Consolidated _setup_cache_dir() into wavedl.utils.setup_hpc_cache_dirs()
  • Refactoring: Consolidated LayerNormNd into _pretrained_utils.py (removed duplicate from convnext.py)
  • Refactoring: Added DropPath and freeze_backbone() utilities to _pretrained_utils.py
  • Refactoring: Extracted _run_train_epoch() and _run_validation() helpers in train.py
  • HPO: Subprocess mode now uses NopPruner; in-process mode uses MedianPruner
  • HPO: Conditional args (huber_delta, momentum) always set with defaults, not None

Fixed

  • Mamba: Numerical overflow on long sequences (> 512) via chunked scan
  • ConvNeXt V2: Renamed misleading class names to match architecture
  • Metrics: Type hint anyAny in load_checkpoint() return type
  • Training: Removed redundant MPLCONFIGDIR setup (already handled by setup_hpc_cache_dirs())
  • ResNet3D: Input channel adaptation uses shared _adapt_input_channels() utility
  • UniRepLKNet: Input channel adaptation uses shared utility
  • Template: Fixed docstring placeholder in _template.py

v1.6.3

05 Feb 20:46

Choose a tag to compare

Fixed

  • Data: Explicit --input_key now raises KeyError if not found (previously silently fell back to auto-detection, risking wrong data load)
  • DDP: Non-main ranks now timeout after 1 hour (configurable via WAVEDL_CACHE_TIMEOUT) instead of waiting indefinitely for cache files
  • DDP: Cache wait uses time.monotonic() for robustness against system clock changes
  • Inference: Clear ImportError with install instructions when .safetensors checkpoint exists but library not installed
  • Training: Scaler now always copied to checkpoint (previously skipped if destination existed, causing stale scaler on retrain)
  • Documentation: CONTRIBUTING.md setup now includes [dev] extras for pre-commit and ruff

Added

  • Tests: 6 new tests covering explicit key validation, safetensors error handling, and scaler portability

v1.6.2

31 Jan 00:20

Choose a tag to compare

Added

  • CLI: Unified wavedl-train command that works on both local machines and HPC clusters
    • Auto-detects environment (SLURM, PBS, LSF, SGE, Cobalt)
    • HPC: Uses local caching (CWD), offline WandB
    • Local: Uses standard cache locations (~/.cache)
    • Fast --list_models flag (no accelerate overhead)
    • wavedl-hpc kept as backwards-compatible alias

Changed

  • CLI: Renamed hpc.pylauncher.py (clearer purpose for universal launcher)
  • Documentation: All README examples now use wavedl-train instead of accelerate launch

v1.6.1

30 Jan 08:05

Choose a tag to compare

Added

  • Models: 12 new architectures (57 → 69 total):
    • UniRepLKNet (tiny/small/base): Large-kernel ConvNet with 31×31 kernels for long-range wave correlations. Dimension-agnostic (1D/2D/3D). Custom implementation, no pretrained weights.
    • EfficientViT (m0-m2, b0-b3, l1-l2): Memory-efficient ViT with cascaded group attention. 9 variants from 2.1M to 60.5M params. ImageNet pretrained via timm. 2D only.

Changed

  • Refactoring: Consolidated SpatialShape type alias into base.py (was duplicated in 8 files)
  • Refactoring: Consolidated GroupNorm helpers (_get_num_groups, _find_group_count, _compute_num_groups) into single compute_num_groups() in base.py
  • Refactoring: Renamed _timm_utils.py_pretrained_utils.py (now handles both torchvision and timm models)
  • Refactoring: Extracted pretrained model channel adaptation into shared utilities:
    • adapt_first_conv_for_single_channel(): For torchvision models with known paths
    • find_and_adapt_input_convs(): For timm models with dynamic layer discovery

Fixed

  • MaxViT: Auto-resize input to compatible size (divisible by 28) for arbitrary input dimensions
  • Mamba/Vim: Replaced O(L) sequential for-loop with vectorized parallel scan (~100x faster, fixes infinite hang with --compile)
  • Dependencies: Added onnxscript (required by torch.onnx.export in PyTorch 2.1+)
  • HPC Cache: Pre-download script now uses exact weight versions, preventing redundant downloads

v1.6.0

29 Jan 08:57

Choose a tag to compare

Added

  • Models: 19 new architectures (38 → 57 total): ConvNeXt V2, Mamba, Vision Mamba, MaxViT, FastViT, CAFormer, PoolFormer
  • Tests: Expanded architecture tests with freeze_backbone and single-channel input validation

Changed

  • CLI: Renamed --no-pretrained to --no_pretrained for consistency with other flags

Fixed

  • ConvNeXt: Added LayerScale (init=1e-6) and fixed LayerNorm to prevent gradient explosion
  • Data: _TransposedH5Dataset now has ndim property (fixes MAT v7.3 memmap crash)
  • Data: Explicit --output_key now raises KeyError if not found (no silent fallback)
  • Training: Mixed precision (--precision bf16/fp16) now wraps forward pass in autocast()

v1.5.7

24 Jan 22:07

Choose a tag to compare

Added

  • Plotting: Refactored with helper functions for cleaner code
  • Tests: 178 new unit tests (725 → 903 total)
  • Training: --deterministic and --cache_validate flags

Changed

  • Plotting: Publication-quality styling with LaTeX fonts
  • Documentation: Updated README with new SPIE paper link
  • Pretrained Models: All use modified conv for 1-channel (3× memory savings vs expand)

Fixed

  • CLI: --pretrained now uses BooleanOptionalAction (was no-op)
  • Constraints: x[i,j] auto-squeezes channel for single-channel data
  • Inference: Channels-last format now raises error with fix guidance
  • Inference: load_checkpoint uses pretrained=False (offline-safe)
  • Inference: --input_key/--output_key strict validation (exact match required)
  • TCN: GroupNorm divisibility for custom channel counts
  • CLI: --import error handling for missing/invalid files
  • Pretrained: freeze_backbone now freezes adapted stem conv
  • Pretrained: Swin features[0][0] access guarded for torchvision compatibility
  • CI: LaTeX rendering optional with _is_latex_available() check
  • Examples: Added missing checkpoint files, fixed notebook cell
  • Tests: Fixed RUF059 lint warnings in test_data_cv.py

v1.5.6

15 Jan 19:34

Choose a tag to compare

Added

  • ViT: pad_if_needed parameter in PatchEmbed and ViTBase for NDE/QUS applications where edge effects matter (pads input to patch-aligned size instead of dropping edge pixels)
  • Training: --no_pretrained flag to train from scratch without ImageNet weights
  • MATLAB: WaveDL_ONNX_Inference.m script for ONNX model inference in MATLAB with automatic data format handling

Changed

  • API: NPZSource.load_mmap() now returns LazyDataHandle (consistent with HDF5Source and MATSource)
  • Warnings: Narrowed warning suppression in train.py to preserve legitimate torch/numpy warnings about NaN and dtype issues
  • Data: Cache validation now uses SHA256 content hash instead of mtime (portable across folders, robust against Dropbox/cloud sync)
  • Examples: Renamed elastic_cnn_example/ to elasticity_prediction/ with MobileNetV3 model (was CNN)

Fixed

  • API: Removed special-case handling in train.py and data.py for inconsistent load_mmap() return types
  • DDP: ReduceLROnPlateau patience was divided by GPU count (accelerator wrapper caused multi-process stepping)
  • MATLAB ONNX: Fixed critical image transpose issue - data must be transposed to convert from MATLAB column-major to Python row-major ordering
  • MATLAB ONNX: Added network initialization step after importNetworkFromONNX (required for networks with unknown input formats)

v1.5.5

13 Jan 22:23

Choose a tag to compare

Fixed

  • Inference: Single-sample MAT files with multiple targets now correctly load as (1, T) instead of (T, 1)
  • HPO: Removed read-only site-packages cwd (prevents permission errors when pip-installed)
  • Data: Cache invalidation now raises RuntimeError if stale files cannot be removed (prevents silent stale data reuse)
  • Data: NPZ file descriptors now properly closed after loading (prevents leaks in long-running workflows)
  • Metrics: plot_qq handles zero-variance errors gracefully (no more NaN/division-by-zero)
  • Tests: Integration test for multi-epoch training no longer flaky (removed random-data loss decrease assertion)