This module handles the loading, processing, and visualization of Impulse Response (IR) data used to train the VAE.
The IRDataset class is responsible for loading .wav files and converting them into spectral features suitable for the VAE.
- Loading: Reads audio files using
librosaat 44.1kHz (mono). - Trimming/Padding: Ensures all audio clips are exactly
length_samples(default: 65536) long. - STFT: Computes the Short-Time Fourier Transform (STFT) with
n_fft=1024. - Magnitude Spectrum: Calculates the magnitude
|STFT|and takes the mean over time to get a single spectral fingerprint per IR. - Log Scaling: Converts amplitude to dB scale.
- Normalization: Normalizes the dB spectrum to the range
[0, 1](assuming a floor of -80dB).
from aether.data.dataset import create_train_iterator
# Create an infinite iterator for training
train_iter = create_train_iterator(data_dir="data/EchoThief", batch_size=32)
batch = next(train_iter) # Shape: (32, 513)To expand your dataset offline, use the included augmentation script:
uv run python -m aether.data.augment --input_dir data/EchoThief --output_dir data/EchoThief/AugmentedThis generates pitch-shifted copies (±12, ±7, ±5 semitones) of your IRs, significantly increasing dataset diversity.
The visualize_irs function generates an interactive HTML report to explore the dataset.
- Randomly selects a subset of IRs from the data directory.
- Generates Waveform and Spectrogram plots for each IR.
- Embeds the original Audio for listening.
- Exports to
visualizations/index.html.
uv run python -m aether.data.visualize --data_dir data/EchoThief --count 5Resulting report structure:
visualizations/index.htmlvisualizations/assets/*.wavvisualizations/assets/*.png