Speaker Voice Separation using Neural Nets

Installation

git clone https://github.com/Muhammad-Ahmad-Ghani/svoice_demo.git
cd svoice_demo
conda create -n svoice python=3.7 -y
conda activate svoice
# CUDA 11.3
conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.3 -c pytorch -y
# CPU only
pip install torch==1.12.0+cpu torchvision==0.13.0+cpu torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cpu
pip install -r requirements.txt

Pretrained-Model	Dataset	Epochs	Train Loss	Valid Loss
checkpoint.th	Librimix-7 (16k-mix_clean)	31	0.04	0.64

This is an intermediate checkpoint just for demo purpose.

create directory outputs/exp_ and save checkpoint there

svoice_demo
├── outputs
│   └── exp_
│       └── checkpoint.th
...

Run Gradio Demo

conda activate svoice
python demo.py

Training

Create dataset mix_clean with sample rate 16K using librimix repo.

Dataset Structure

svoice_demo
├── Libri{NUM_OF_SPEAKERS}Mix_Dataset -> Libri7Mix_Dataset
│   └── wav{SAMPLE_RATE_VALUE}k -> wav16k
│       └── min
│       │   └── dev
│       │       └── ...
│       │   └── test
│       │       └── ...
│       │   └── train-360
│       │       └── ...
...

Create `metadata` files

Run predefined scripts if you want.

# for 7 speakers
bash create_metadata_librimix7.sh
# for 10 speakers
bash create_metadata_librimix10.sh

Change conf/config.yaml according to your settings. Set C: NUM_OF_SPEAKERS value at line 66 for number of speakers.

python train.py

This will automaticlly read all the configurations from the conf/config.yaml file. To know more about the training you may refer to original svoice repo.

Distributed Training

python train.py ddp=1

Evaluating

python -m svoice.evaluate <path to the model> <path to folder containing mix.json and all target separated channels json files s<ID>.json>

Citation

The svoice code is borrowed from original svoice repository. All rights of code are reserved by META Research.

@inproceedings{nachmani2020voice,
  title={Voice Separation with an Unknown Number of Multiple Speakers},
  author={Nachmani, Eliya and Adi, Yossi and Wolf, Lior},
  booktitle={Proceedings of the 37th international conference on Machine learning},
  year={2020}
}

@misc{cosentino2020librimix,
    title={LibriMix: An Open-Source Dataset for Generalizable Speech Separation},
    author={Joris Cosentino and Manuel Pariente and Samuele Cornell and Antoine Deleforge and Emmanuel Vincent},
    year={2020},
    eprint={2005.11262},
    archivePrefix={arXiv},
    primaryClass={eess.AS}
}

License

This repository is released under the CC-BY-NC-SA 4.0. license as found in the LICENSE file.

The file: svoice/models/sisnr_loss.py and svoice/data/preprocess.py were adapted from the kaituoxu/Conv-TasNet repository. It is an unofficial implementation of the Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation paper, released under the MIT License. Additionally, several input manipulation functions were borrowed and modified from the yluo42/TAC repository, released under the CC BY-NC-SA 3.0 License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speaker Voice Separation using Neural Nets

Installation

Run Gradio Demo

Training

Create `metadata` files

Distributed Training

Evaluating

Citation

License

About

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
conf		conf
samples		samples
scripts		scripts
svoice		svoice
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
create_metadata_librimix10.sh		create_metadata_librimix10.sh
create_metadata_librimix7.sh		create_metadata_librimix7.sh
demo.py		demo.py
requirements.txt		requirements.txt
train.py		train.py

License

muhammad-ahmed-ghani/svoice_demo

Folders and files

Latest commit

History

Repository files navigation

Speaker Voice Separation using Neural Nets

Installation

Run Gradio Demo

Training

Create metadata files

Distributed Training

Evaluating

Citation

License

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Contributors 2

Languages

Create `metadata` files