Skip to content

dmlguq456/NeXt_TDNN_ASV

Repository files navigation

NeXt-TDNN for Speaker Verification [ICASSP 2024]

This repository is the official implementation of "NeXt-TDNN: Modernizing Multi-Scale Temporal Convolution Backbone for Speaker Verification" accepted in ICASSP 2024 Paper Link (Arxiv) / Paper Link (IEEE)

News

🔥 December, 2023: We have uploaded the pre-trained models of our NeXt-TDNN in the experiments folder!

🔥 February 2024, the NeXt-TDNN model was updated with cyclic learning rate scheduling. This update improved the EER from 0.79/1.04/1.82% to 0.72/0.94/1.68% in VoxCeleb1-O/E/H. Changes were made to the LR scheduling, gradient clipping value, and batch size. Please check configs/NeXt_TDNN_C256_B3_K65_7_cyclical_lr_step.py for details.

0. Getting Start

Prerequisites

This code requires the following:

  • lightning == 2.1.2

Installation

  • CUDA, PyToch installation
# CUDA
conda install -c "nvidia/label/cuda-11.8.0" cuda

# PyTorch
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

Data preparation

1. Model Training

To train ASV model, run main script in train mode. You can select the desired training configuration through config argument.

  • to train NeXt-TDNN(C=256, B=3)
python main.py --mode train --config configs/NeXt_TDNN_C256_B3_K65_7

2. Model Test

To test on VoxCeleb1, run the script below. As in training, select the desired test configuration.

# VoxCeleb1-O
python main.py --mode test --config configs/NeXt_TDNN_C256_B3_K65_7

# ⚡ VoxCeleb1-O, VoxCeleb1-E, VoxCeleb1-H
python main.py --mode test_all --config configs/NeXt_TDNN_C256_B3_K65_7

3. Reference

4. Citation

If you find our work useful, please refer to

@INPROCEEDINGS{10447037,
  author={Heo, Hyun-Jun and Shin, Ui-Hyeop and Lee, Ran and Cheon, YoungJu and Park, Hyung-Min},
  booktitle={ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, 
  title={NeXt-TDNN: Modernizing Multi-Scale Temporal Convolution Backbone for Speaker Verification}, 
  year={2024},
  volume={},
  number={},
  pages={11186-11190},
  keywords={Convolution;Speech recognition;Transformers;Acoustics;Task analysis;Speech processing;speaker recognition;speaker verification;TDNN;ConvNeXt;multi-scale},
  doi={10.1109/ICASSP48485.2024.10447037}}

About

Official repository of NeXt-TDNN for speaker verification

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages