Skip to content

Latest commit

 

History

History
55 lines (41 loc) · 2.04 KB

File metadata and controls

55 lines (41 loc) · 2.04 KB

Data Module (aether.data)

This module handles the loading, processing, and visualization of Impulse Response (IR) data used to train the VAE.

Dataset (dataset.py)

The IRDataset class is responsible for loading .wav files and converting them into spectral features suitable for the VAE.

Processing Pipeline

  1. Loading: Reads audio files using librosa at 44.1kHz (mono).
  2. Trimming/Padding: Ensures all audio clips are exactly length_samples (default: 65536) long.
  3. STFT: Computes the Short-Time Fourier Transform (STFT) with n_fft=1024.
  4. Magnitude Spectrum: Calculates the magnitude |STFT| and takes the mean over time to get a single spectral fingerprint per IR.
  5. Log Scaling: Converts amplitude to dB scale.
  6. Normalization: Normalizes the dB spectrum to the range [0, 1] (assuming a floor of -80dB).

Usage

from aether.data.dataset import create_train_iterator

# Create an infinite iterator for training
train_iter = create_train_iterator(data_dir="data/EchoThief", batch_size=32)
batch = next(train_iter) # Shape: (32, 513)

Data Augmentation

To expand your dataset offline, use the included augmentation script:

uv run python -m aether.data.augment --input_dir data/EchoThief --output_dir data/EchoThief/Augmented

This generates pitch-shifted copies (±12, ±7, ±5 semitones) of your IRs, significantly increasing dataset diversity.

Visualization (visualize.py)

The visualize_irs function generates an interactive HTML report to explore the dataset.

Features

  • Randomly selects a subset of IRs from the data directory.
  • Generates Waveform and Spectrogram plots for each IR.
  • Embeds the original Audio for listening.
  • Exports to visualizations/index.html.

CLI Usage

uv run python -m aether.data.visualize --data_dir data/EchoThief --count 5

Resulting report structure:

  • visualizations/index.html
  • visualizations/assets/*.wav
  • visualizations/assets/*.png

← Back to Index | Next: Models →