Skip to content

narrietal/cMPDR_DNN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

15 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸŽ™οΈ cMPDR Γ— DNN

Official repository for the paper "A Two-Step Approach for Speech Enhancement in Low-SNR Scenarios Using Cyclostationary Beamforming and DNNs."

This project explores the application of Cyclic Minimum Power Distortionless Response (cMPDR) beamforming with Deep Learning for enhanced speech audio processing.

This repository provides the deep neural network (DNN) architectures used in the paper, designed to work in conjunction with cyclostationary MPDR (cMPDR) beamforming for robust speech enhancement in challenging low-SNR environments.

πŸ“„ Paper (arXiv): A two-step approach for speech enhancement in low-SNR scenarios using cyclostationary beamforming and DNNs


βœ… Tested Working Configuration

System Requirements

  • OS: Linux (Ubuntu 22.04 or similar)
  • Python: 3.11.13 (Python 3.10 or 3.11 recommended)
  • pip: 24.0 (⚠️ versions β‰₯24.1 may have issues with pytorch-lightning metadata)
  • CUDA: 12.8
  • GPU: NVIDIA GPU with CUDA support (tested on GTX 1080 Ti)

Key Dependencies

  • TensorFlow: 2.14.0
  • NVIDIA CUDA Toolkit: 11.8 (cu11) and 12.8 (cu12)

Installation Notes

  • If using pip β‰₯24.1, downgrade to pip 24.0: pip install "pip<24.1"
  • Ensure NVIDIA drivers (570.x or compatible) are installed
  • The project uses both CUDA 11 and CUDA 12 libraries for compatibility

For full dependencies, see requirements.txt.


πŸ“¦ Installation

It is strongly recommended to use a virtual environment.

  1. Install dependencies:
   pip install -r requirements.txt
  1. Initialize all submodules (required for cMPDR code):
   git submodule update --init --recursive
  1. Install the cMPDR submodule in editable mode:
   cd cmpdr
   pip install -e .
   cd ..

⚠️ PyCharm users: Use the following command instead (see PyCharm issue):

   pip install -e . --config-settings editable_mode=compat

πŸ—οΈ Building the Models

Navigate to the networks directory:

cd src/networks/

CRNN

Run the CRNN architecture:

python CRNN.py

ULCNet

Run the ULCNet architecture:

python ULCNet.py

🎧 Dataset Generation

1. πŸ”Š Noise Data

Generate the noise dataset:

python noise_generation/generate_noise_dataset.py

This will create the background noise files needed for synthesis.

2. 🌍 Real-World Data

The real-world datasets used in this work are generated using the tools provided in the DNS Challenge 2020 repository, following the data generation methodology described in the paper.

πŸ”€ Create Noisy Speech

To synthesize noisy speech using the DNS Challenge tools:

  1. Add DNS Challenge repository as a submodule (if not already done)

  2. Edit the configuration in noisyspeech_synthesizer.cfg. Use all default values except:

    • audio_length: 5
    • total_hours: 100
    • snr_lower: -20
    • snr_upper: 0
    • Set the correct paths to the clean and noise datasets
  3. Edit noisyspeech_synthesizer_singleprocess.py (lines 198-200) to improve the naming of noisy files:

noisyfilename = 'noisy_fileid_' + str(file_num) + '_' + clean_files_joined + '_' + \
                   noise_files_joined + '_snr' + str(snr) + '_tl' + str(target_level) + '.wav'
  1. Run the synthesizer from the root directory of the DNS Challenge repository:
   python noisyspeech_synthesizer_singleprocess.py

πŸ“‚ Split Dataset into Train/Val/Test Sets

python src/utils/split_dataset.py "path/to/dev_dataset/" "train_pct" "val_pct" "test_pct"

Example:

python src/utils/split_dataset.py dev_datasets/ 80 15 5

⚠️ Warning: This command creates folders named train, val, and test that contain symlinks to the original files. If you move or delete the original files, the symlinks will be broken!


πŸ”§ Preprocessing

Before training or evaluating models, you can apply different preprocessing techniques to enhance the noisy speech.

Setting Up

First, set your dataset path as a shell variable for convenience:

# Store dataset name in a shell variable
data=dev_datasets_simu

Wiener Filtering (Blind)

Apply Wiener filtering without prior knowledge of the noise:

python wiener_inference_cli.py -i ../../$data/noisy -o ../../$data/wiener -v

Wiener Filtering (Oracle)

Apply Wiener filtering with oracle knowledge (requires noise-only reference files):

python wiener_inference_cli_new.py -i ../../$data/noisy -o ../../$data/wiener_oracle -n ../../$data/noise -v

cMPDR Beamforming

Apply cyclostationary MPDR beamforming:

cmvdr -i ./noisy/ -o ./cmpdr/ -p -w 30 -b 200 --verbose

where

usage: cmvdr [-h] -i INPUT_PATH [-o OUTPUT_PATH] [-n NOISE_PATH] [-v] [-p] [-w WORKERS] [-b BATCH_SIZE]

Run cMVDR inference on a single file or a folder of audio files.

options:
  -h, --help            show this help message and exit
  -i INPUT_PATH, --input_path INPUT_PATH
                        Path to the input audio file or folder.
  -o OUTPUT_PATH, --output_path OUTPUT_PATH
                        Path to the output folder. If not provided, output will be saved in the same folder as input.
  -n NOISE_PATH, --noise_path NOISE_PATH
                        Path to the noise audio file or folder (optional, to estimate noise frequency). To match input files, append _fileid_123.wav to the noise and the noisy files.
  -v, --verbose         If set, print detailed logs to the console.
  -p, --parallel        If set, process files in parallel using multiple workers.
  -w WORKERS, --workers WORKERS
                        Number of parallel workers (default: number of CPU cores).
  -b BATCH_SIZE, --batch_size BATCH_SIZE
                        Number of files to process per batch in parallel mode (default: 100).

πŸ“Š Evaluation

To evaluate the performance of your models:

cd src/eval
python evaluate_folder.py --help

This will display the following help message:

usage: evaluate_folder.py [-h] [-d FOLDER_DENOISED [FOLDER_DENOISED ...]] 
                          [-r FOLDER_REFERENCE] [--sort-by-snr]

Evaluate audio files in a folder.

options:
  -h, --help            show this help message and exit
  -d FOLDER_DENOISED [FOLDER_DENOISED ...], --folder_denoised FOLDER_DENOISED [FOLDER_DENOISED ...]
                        List of paths to folders containing denoised audio files.
  -r FOLDER_REFERENCE, --folder_reference FOLDER_REFERENCE
                        Path to the folder containing clean reference audio files (optional).
  --sort-by-snr         Sort results by SNR brackets (optional). Default is False.

πŸ“ Citation

If you use this repository or its contents in your research, please cite the associated paper:

@misc{bologni_twostep_2026,
      title={A two-step approach for speech enhancement in low-SNR scenarios using cyclostationary beamforming and DNNs}, 
      author={Giovanni Bologni and NicolΓ‘s Arrieta Larraza and Richard Heusdens and Richard C. Hendriks},
      year={2026},
      eprint={2602.12986},
      archivePrefix={arXiv},
      primaryClass={eess.AS},
      url={https://arxiv.org/abs/2602.12986}, 
}

About

Official repository of the "A two-step approach for speech enhancement in low-SNR scenarios using cyclostationary beamforming and DNNs" paper

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages