Dual-signal Transformation LSTM Network for 8000Hz Speech Enhancement

Tensorflow 2.x implementation of the stacked dual-signal transformation LSTM network (DTLN) for real-time noise suppression.

The original repository can be found here.

The DTLN model was handed in to the deep noise suppression challenge (DNS-Challenge) and the paper was presented at Interspeech 2020.

For more information see the paper. The results of the DNS-Challenge are published here.

Author: Nils L. Westhausen (Communication Acoustics , Carl von Ossietzky University, Oldenburg, Germany)

This code is licensed under the terms of the MIT license.

Citing:

If you are using the DTLN model, please cite:

@inproceedings{Westhausen2020,
  author={Nils L. Westhausen and Bernd T. Meyer},
  title={{Dual-Signal Transformation LSTM Network for Real-Time Noise Suppression}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={2477--2481},
  doi={10.21437/Interspeech.2020-2631},
  url={https://dx.doi.org/10.21437/Interspeech.2020-2631}
}

Contents of the README:

Dual-signal Transformation LSTM Network for 8000Hz Speech Enhancement
- For more information see the paper. The results of the DNS-Challenge are published here.

Contents of the repository:

DTLN_model.py
This file is containing the model, data generator and the training routine.
run_training.py
Script to run the training. Before you can start the training with $ python run_training.pyyou have to set the paths to you training and validation data inside the script. The training script uses a default setup.
run_evaluation.py
Script to process a folder with optional subfolders containing .wav files with a trained DTLN model. With the pretrained model delivered with this repository a folder can be processed as following:
$ python run_evaluation.py -i /path/to/input -o /path/for/processed -m ./pretrained_model/model.h5
The evaluation script will create the new folder with the same structure as the input folder and the files will have the same name as the input files.
measure_execution_time.py
Script for measuring the execution time with the saved DTLN model in ./pretrained_model/dtln_saved_model/. For further information see this section.
real_time_processing.py
Script, which explains how real time processing with the SavedModel works. For more information see this section.

./pretrained_model/ \
- DTLN_model_8khz_42epoch.h5: Model weights as described in thesis, with (frame_length, frame_shift) = (512, 128)
- DTLN_8khz_model_1.tflite together with DTLN_8khz_model_2.tflite: same as DTLN_model_8khz_42epoch.h5 but as TF-lite model with external state handling.
- my_custom_model_1.tflite together with my_custom_model_2.tflite: Modified version with (frame_length, frame_shift) = (256, 64)

To contents

Python dependencies:

The following packages will be required for this repository:

TensorFlow (2.x) - some files may require different versions, see the comments in each file for more details
librosa
wavinfo

All additional packages (numpy, soundfile, etc.) should be installed on the fly when using conda or pip. I recommend using conda environments or pyenv virtualenv for the python environment. For training a GPU with at least 5 GB of memory is required. I recommend at least Tensorflow 2.1 with Nvidia driver 418 and Cuda 10.1. If you use conda Cuda will be installed on the fly and you just need the driver. For evaluation-only the CPU version of Tensorflow is enough.

The tf-lite runtime must be downloaded from here.

To contents

Training data preparation:

Data folders can be retrieved from here
After decompressed, all folders should be merged as following: .
├── ...
├── training_set
│  ├── train
│  │  ├── clean
│  │  └── noisy
│  ├── val
│  │  ├── clean
│  │  └── noisy
└── ...
To contents

Run a training of the DTLN model:

Make sure all dependencies are installed in your python environment.
Change the paths to your training and validation dataset in run_training.py.
Run $ python run_training.py.

To contents

Measuring the execution time of the DTLN model with the SavedModel format:

In total there are three ways to measure the execution time for one block of the model: Running a sequence in Keras and dividing by the number of blocks in the sequence, building a stateful model in Keras and running block by block, and saving the stateful model in Tensorflow's SavedModel format and calling that one block by block. In the following I will explain how running the model in the SavedModel format, because it is the most portable version and can also be called from Tensorflow Serving.

A Keras model can be saved to the saved model format:

import tensorflow as tf
'''
Building some model here
'''
tf.saved_model.save(your_keras_model, 'name_save_path')

Important here for real time block by block processing is, to make the LSTM layer stateful, so they can remember the states from the previous block.

The model can be imported with

model = tf.saved_model.load('name_save_path')

For inference we now first call this for mapping signature names to functions

infer = model.signatures['serving_default']

and now for inferring the block x call

y = infer(tf.constant(x))['conv1d_1']

This command gives you the result on the node 'conv1d_1'which is our output node for real time processing. For more information on using the SavedModel format and obtaining the output node see this Guide.

For making everything easier this repository provides a stateful DTLN SavedModel. For measuring the execution time call:

$ python measure_execution_time.py

To contents

Real time processing with the SavedModel format:

For explanation look at real_time_processing.py.

Here some consideration for integrating this model in your project:

The sampling rate of this model is fixed at 16 kHz. It will not work smoothly with other sampling rates.
The block length of 32 ms and the block shift of 8 ms are also fixed. For changing these values, the model must be retrained.
The delay created by the model is the block length, so the input-output delay is 32 ms.
For real time capability on your system, the execution time must be below the length of the block shift, so below 8 ms.
If can not give you support on the hardware side, regarding soundcards, drivers and so on. Be aware, a lot of artifacts can come from this side.

To contents

Real time processing with tf-lite:

With TF 2.3 it is finally possible to convert LSTMs to tf-lite. It is still not perfect because the states must be handled seperatly for a stateful model and tf-light does not support complex numbers. That means that the model is splitted in two submodels when converting it to tf-lite and the calculation of the FFT and iFFT is performed outside the model. I provided an example script for explaining, how real time processing with the tf light model works (real_time_processing_tf_lite.py). In this script the tf-lite runtime is used. The runtime can be downloaded here. Quantization works now.

To contents

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
pretrained_model		pretrained_model
samples		samples
DTLN_model.py		DTLN_model.py
LICENSE		LICENSE
README.md		README.md
convert_weights_to_saved_model.py		convert_weights_to_saved_model.py
convert_weights_to_tf_lite.py		convert_weights_to_tf_lite.py
measure_execution_time.py		measure_execution_time.py
real_time_dtln_audio.py		real_time_dtln_audio.py
real_time_processing.py		real_time_processing.py
real_time_processing_tf_lite.py		real_time_processing_tf_lite.py
run_evaluation.py		run_evaluation.py
run_training.py		run_training.py
run_training_notebook.ipynb		run_training_notebook.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dual-signal Transformation LSTM Network for 8000Hz Speech Enhancement

For more information see the paper. The results of the DNS-Challenge are published here.

Citing:

Contents of the README:

Contents of the repository:

Python dependencies:

Training data preparation:

Run a training of the DTLN model:

Measuring the execution time of the DTLN model with the SavedModel format:

Real time processing with the SavedModel format:

Real time processing with tf-lite:

About

Releases

Packages

Languages

License

WujuMaster/DTLN

Folders and files

Latest commit

History

Repository files navigation

Dual-signal Transformation LSTM Network for 8000Hz Speech Enhancement

For more information see the paper. The results of the DNS-Challenge are published here.

Citing:

Contents of the README:

Contents of the repository:

Python dependencies:

Training data preparation:

Run a training of the DTLN model:

Measuring the execution time of the DTLN model with the SavedModel format:

Real time processing with the SavedModel format:

Real time processing with tf-lite:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages