Official repository for the paper "A Two-Step Approach for Speech Enhancement in Low-SNR Scenarios Using Cyclostationary Beamforming and DNNs."
This project explores the application of Cyclic Minimum Power Distortionless Response (cMPDR) beamforming with Deep Learning for enhanced speech audio processing.
This repository provides the deep neural network (DNN) architectures used in the paper, designed to work in conjunction with cyclostationary MPDR (cMPDR) beamforming for robust speech enhancement in challenging low-SNR environments.
π Paper (arXiv): A two-step approach for speech enhancement in low-SNR scenarios using cyclostationary beamforming and DNNs
- OS: Linux (Ubuntu 22.04 or similar)
- Python: 3.11.13 (Python 3.10 or 3.11 recommended)
- pip: 24.0 (
β οΈ versions β₯24.1 may have issues with pytorch-lightning metadata) - CUDA: 12.8
- GPU: NVIDIA GPU with CUDA support (tested on GTX 1080 Ti)
- TensorFlow: 2.14.0
- NVIDIA CUDA Toolkit: 11.8 (cu11) and 12.8 (cu12)
- If using pip β₯24.1, downgrade to pip 24.0:
pip install "pip<24.1" - Ensure NVIDIA drivers (570.x or compatible) are installed
- The project uses both CUDA 11 and CUDA 12 libraries for compatibility
For full dependencies, see requirements.txt.
It is strongly recommended to use a virtual environment.
- Install dependencies:
pip install -r requirements.txt- Initialize all submodules (required for cMPDR code):
git submodule update --init --recursive- Install the cMPDR submodule in editable mode:
cd cmpdr
pip install -e .
cd .. pip install -e . --config-settings editable_mode=compatNavigate to the networks directory:
cd src/networks/Run the CRNN architecture:
python CRNN.pyRun the ULCNet architecture:
python ULCNet.pyGenerate the noise dataset:
python noise_generation/generate_noise_dataset.pyThis will create the background noise files needed for synthesis.
The real-world datasets used in this work are generated using the tools provided in the DNS Challenge 2020 repository, following the data generation methodology described in the paper.
To synthesize noisy speech using the DNS Challenge tools:
-
Add DNS Challenge repository as a submodule (if not already done)
-
Edit the configuration in
noisyspeech_synthesizer.cfg. Use all default values except:audio_length: 5total_hours: 100snr_lower: -20snr_upper: 0- Set the correct paths to the clean and noise datasets
-
Edit
noisyspeech_synthesizer_singleprocess.py(lines 198-200) to improve the naming of noisy files:
noisyfilename = 'noisy_fileid_' + str(file_num) + '_' + clean_files_joined + '_' + \
noise_files_joined + '_snr' + str(snr) + '_tl' + str(target_level) + '.wav'- Run the synthesizer from the root directory of the DNS Challenge repository:
python noisyspeech_synthesizer_singleprocess.pypython src/utils/split_dataset.py "path/to/dev_dataset/" "train_pct" "val_pct" "test_pct"Example:
python src/utils/split_dataset.py dev_datasets/ 80 15 5train, val, and test that contain symlinks to the original files.
If you move or delete the original files, the symlinks will be broken!
Before training or evaluating models, you can apply different preprocessing techniques to enhance the noisy speech.
First, set your dataset path as a shell variable for convenience:
# Store dataset name in a shell variable
data=dev_datasets_simuApply Wiener filtering without prior knowledge of the noise:
python wiener_inference_cli.py -i ../../$data/noisy -o ../../$data/wiener -vApply Wiener filtering with oracle knowledge (requires noise-only reference files):
python wiener_inference_cli_new.py -i ../../$data/noisy -o ../../$data/wiener_oracle -n ../../$data/noise -vApply cyclostationary MPDR beamforming:
cmvdr -i ./noisy/ -o ./cmpdr/ -p -w 30 -b 200 --verbosewhere
usage: cmvdr [-h] -i INPUT_PATH [-o OUTPUT_PATH] [-n NOISE_PATH] [-v] [-p] [-w WORKERS] [-b BATCH_SIZE]
Run cMVDR inference on a single file or a folder of audio files.
options:
-h, --help show this help message and exit
-i INPUT_PATH, --input_path INPUT_PATH
Path to the input audio file or folder.
-o OUTPUT_PATH, --output_path OUTPUT_PATH
Path to the output folder. If not provided, output will be saved in the same folder as input.
-n NOISE_PATH, --noise_path NOISE_PATH
Path to the noise audio file or folder (optional, to estimate noise frequency). To match input files, append _fileid_123.wav to the noise and the noisy files.
-v, --verbose If set, print detailed logs to the console.
-p, --parallel If set, process files in parallel using multiple workers.
-w WORKERS, --workers WORKERS
Number of parallel workers (default: number of CPU cores).
-b BATCH_SIZE, --batch_size BATCH_SIZE
Number of files to process per batch in parallel mode (default: 100).
To evaluate the performance of your models:
cd src/eval
python evaluate_folder.py --helpThis will display the following help message:
usage: evaluate_folder.py [-h] [-d FOLDER_DENOISED [FOLDER_DENOISED ...]]
[-r FOLDER_REFERENCE] [--sort-by-snr]
Evaluate audio files in a folder.
options:
-h, --help show this help message and exit
-d FOLDER_DENOISED [FOLDER_DENOISED ...], --folder_denoised FOLDER_DENOISED [FOLDER_DENOISED ...]
List of paths to folders containing denoised audio files.
-r FOLDER_REFERENCE, --folder_reference FOLDER_REFERENCE
Path to the folder containing clean reference audio files (optional).
--sort-by-snr Sort results by SNR brackets (optional). Default is False.
If you use this repository or its contents in your research, please cite the associated paper:
@misc{bologni_twostep_2026,
title={A two-step approach for speech enhancement in low-SNR scenarios using cyclostationary beamforming and DNNs},
author={Giovanni Bologni and NicolΓ‘s Arrieta Larraza and Richard Heusdens and Richard C. Hendriks},
year={2026},
eprint={2602.12986},
archivePrefix={arXiv},
primaryClass={eess.AS},
url={https://arxiv.org/abs/2602.12986},
}