This repository contains the framework for training speaker recognition models described in the paper 'In defence of metric learning for speaker recognition' and 'Pushing the limits of raw waveform speaker recognition'.
pip install -r requirements.txt
The following script can be used to download and prepare the VoxCeleb dataset for training.
python ./dataprep.py --save_path data --download --user USERNAME --password PASSWORD
python ./dataprep.py --save_path data --extract
python ./dataprep.py --save_path data --convert
In order to use data augmentation, also run:
python ./dataprep.py --save_path data --augment
In addition to the Python dependencies, wget
and ffmpeg
must be installed on the system.
- ResNetSE34L with AM-Softmax:
python ./trainSpeakerNet.py --config ./configs/ResNetSE34L_AM.yaml
- RawNet3 with AAM-Softmax
python ./trainSpeakerNet.py --config ./configs/RawNet3_AAM.yaml
- ResNetSE34L with Angular prototypical:
python ./trainSpeakerNet.py --config ./configs/ResNetSE34L_AP.yaml
You can pass individual arguments that are defined in trainSpeakerNet.py by --{ARG_NAME} {VALUE}
.
Note that the configuration file overrides the arguments passed via command line.
A pretrained model, described in [1], can be downloaded from here.
You can check that the following script returns: EER 2.1792
. You will be given an option to save the scores.
python ./trainSpeakerNet.py --eval --model ResNetSE34L --log_input True --trainfunc angleproto --save_path exps/test --eval_frames 400 --initial_model baseline_lite_ap.model
A larger model trained with online data augmentation, described in [2], can be downloaded from here.
The following script should return: EER 1.0180
.
python ./trainSpeakerNet.py --eval --model ResNetSE34V2 --log_input True --encoder_type ASP --n_mels 64 --trainfunc softmaxproto --save_path exps/test --eval_frames 400 --initial_model baseline_v2_smproto.model
Pretrained RawNet3, described in [3], can be downloaded via git submodule update --init --recursive
.
The following script should return EER 0.8932
.
python ./trainSpeakerNet.py --eval --config ./configs/RawNet3_AAM.yaml --initial_model models/weights/RawNet3/model.pt
Softmax (softmax)
AM-Softmax (amsoftmax)
AAM-Softmax (aamsoftmax)
GE2E (ge2e)
Prototypical (proto)
Triplet (triplet)
Angular Prototypical (angleproto)
ResNetSE34L (SAP, ASP)
ResNetSE34V2 (SAP, ASP)
VGGVox40 (SAP, TAP, MAX)
--augment True
enables online data augmentation, described in [2].
You can add new models and loss functions to models
and loss
directories respectively. See the existing definitions for examples.
-
Use
--mixedprec
flag to enable mixed precision training. This is recommended for Tesla V10