NeXt-TDNN for Speaker Verification [ICASSP 2024]

This repository is the official implementation of "NeXt-TDNN: Modernizing Multi-Scale Temporal Convolution Backbone for Speaker Verification" accepted in ICASSP 2024 Paper Link (Arxiv) / Paper Link (IEEE)

News

🔥 December, 2023: We have uploaded the pre-trained models of our NeXt-TDNN in the experiments folder!

🔥 February 2024, the NeXt-TDNN model was updated with cyclic learning rate scheduling. This update improved the EER from 0.79/1.04/1.82% to 0.72/0.94/1.68% in VoxCeleb1-O/E/H. Changes were made to the LR scheduling, gradient clipping value, and batch size. Please check configs/NeXt_TDNN_C256_B3_K65_7_cyclical_lr_step.py for details.

0. Getting Start

Prerequisites

This code requires the following:

lightning == 2.1.2

Installation

CUDA, PyToch installation

# CUDA
conda install -c "nvidia/label/cuda-11.8.0" cuda

# PyTorch
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

Data preparation

VoxCeleb Dataset
- To download VoxCeleb dataset fot train/test, execute the command described in the Data preparation section of the voxceleb_trainer repository
- Download VoxCeleb1-O, VoxCeleb1-E, and VoxCeleb1-H for test and locate it data directory

1. Model Training

To train ASV model, run main script in train mode. You can select the desired training configuration through config argument.

to train NeXt-TDNN(C=256, B=3)

python main.py --mode train --config configs/NeXt_TDNN_C256_B3_K65_7

2. Model Test

To test on VoxCeleb1, run the script below. As in training, select the desired test configuration.

# VoxCeleb1-O
python main.py --mode test --config configs/NeXt_TDNN_C256_B3_K65_7

# ⚡ VoxCeleb1-O, VoxCeleb1-E, VoxCeleb1-H
python main.py --mode test_all --config configs/NeXt_TDNN_C256_B3_K65_7

3. Reference

4. Citation

If you find our work useful, please refer to

@INPROCEEDINGS{10447037,
  author={Heo, Hyun-Jun and Shin, Ui-Hyeop and Lee, Ran and Cheon, YoungJu and Park, Hyung-Min},
  booktitle={ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, 
  title={NeXt-TDNN: Modernizing Multi-Scale Temporal Convolution Backbone for Speaker Verification}, 
  year={2024},
  volume={},
  number={},
  pages={11186-11190},
  keywords={Convolution;Speech recognition;Transformers;Acoustics;Task analysis;Speech processing;speaker recognition;speaker verification;TDNN;ConvNeXt;multi-scale},
  doi={10.1109/ICASSP48485.2024.10447037}}

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
aggregation		aggregation
backend		backend
configs		configs
data		data
experiments		experiments
loss		loss
models		models
optimizer		optimizer
preprocessing		preprocessing
scheduler		scheduler
.gitignore		.gitignore
LICENSE		LICENSE
NeXt_TDNN_structure.png		NeXt_TDNN_structure.png
README.md		README.md
SpeakerNet.py		SpeakerNet.py
engine.py		engine.py
eval_metric.py		eval_metric.py
main.py		main.py
requirements.txt		requirements.txt
table_result.png		table_result.png
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NeXt-TDNN for Speaker Verification [ICASSP 2024]

News

0. Getting Start

Prerequisites

Installation

Data preparation

1. Model Training

2. Model Test

3. Reference

4. Citation

About

Releases

Packages

Contributors 2

Languages

License

dmlguq456/NeXt_TDNN_ASV

Folders and files

Latest commit

History

Repository files navigation

NeXt-TDNN for Speaker Verification [ICASSP 2024]

News

0. Getting Start

Prerequisites

Installation

Data preparation

1. Model Training

2. Model Test

3. Reference

4. Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages