Skip to content

EIHW/EmoNet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EmoNet

EmoNet is a Python toolkit for multi-corpus speech emotion recognition and other audio classification tasks.

(c) 2021 Maurice Gerczuk, Shahin Amiriparian, Björn Schuller: Universität Augsburg

Please direct any questions or requests to Maurice Gerczuk (maurice.gerczuk at uni-a.de) or Shahin Amiriparian (shahin.amiriparian at uni-a.de).

Citing

If you use EmoNet or any code from EmoNet in your research work, you are kindly asked to acknowledge the use of EmoNet in your publications.

M. Gerczuk, S. Amiriparian, S. Ottl, and B. Schuller, “EmoNet: A transfer learning framework for multi-corpus speech emotionrecognition,” 2021. https://arxiv.org/abs/2103.08310

@misc{gerczuk2021emonet,
      title={EmoNet: A Transfer Learning Framework for Multi-Corpus Speech Emotion Recognition}, 
      author={Maurice Gerczuk and Shahin Amiriparian and Sandra Ottl and Björn Schuller},
      year={2021},
      eprint={2103.08310},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}

Installation

All dependencies can be installed via pip from the requirements.txt:

pip install -r requirements.txt

It is advisable to do this from within a newly created virtual environment.

Usage

The basic commandline is accessible from the repository's basedirectory by calling:

python -m emo-net.cli --help

This prints a help message specifying the list of subcommands. For each subcommand, more help is available via:

python -m emo-net.cli [subcommand] --help

Data Preparation

The toolkit can be used for arbitrary audio classification tasks. To prepare your dataset, resample all audio content to 16kHz wav files (e.g. with ffmpeg). Afterwards, you need label files in .csv format that specify the categorical target for each sample in the training, development and test partitions, i.e., three files "train.csv", "devel.csv" and "test.csv". The files must include the path to each audio file in the first column - relative to a common basedirectory - and a categorical label in the second column. A header line "file,label" should be included.

Command line options

The CLI has a nested structure, i.e., it uses two layers of subcommands. The first subcommand specifies the type of neural network architecture that is used. Here, "cnn" gives access to the ResNet architecture which also includes residual adapters, based on the training setting. Two other options, "rnn" and "fusion" are also included but untested and in early stages of development. The rest of this guide will therefore focus on the "cnn" subcommand. After specifying the model type, two distinct subcommands are accessible: "single-task" and "multi-task", which refer to the type of training procedure. For single task, training is performed on one database at a time specified by its basedirectory and the label files for train, validation and developments:

python -m emo-net.cli -v cnn single-task -t [taskName] --data-path /path/to/task/wavs -tr train.csv -v devel.csv -te test.csv

One additional parameter is needed that defines the type of training performed. Here, the choice can be made between tuning a fresh model from scratch (-m scratch), fully finetuning an existing model (-m finetune), training only the classifier head (-m last-layer) and the residual adapter approach (-m adapters). For the last three methods, a pre-trained model has to be loaded by specifying the path to its weights via -im /path/to/weights.h5. While all other parameters have sensible default values, the full list is given below:

Option Type Description
-dp, --data-path DIRECTORY Directory of data files. [required]
-t, --task TEXT Name of the task that is trained. [required]
-tr, --train-csv FILE Path to training csv file. [required]
-v, --val-csv FILE Path to validation csv file. [required]
-te, --test-csv FILE Path to test csv file. [required]
-bs, --batch-size INTEGER Define batch size.
-nm, --num-mels INTEGER Number of mel bands in spectrogram.
-e, --epochs INTEGER Define max number of training epochs.
-p, --patience INTEGER Define patience before early stopping / reducing learning rate in epochs.
-im, --initial-model FILE Initial model for resuming training.
-bw, --balanced-weights FLAG Automatically set balanced class weights.
-lr, --learning-rate FLOAT Initial earning rate for optimizer.
-do, --dropout FLOAT Dropout for the two positions (after first and second convolution of each block).
-ebp, --experiment-base-path PATH Basepath where logs and checkpoints should be stored.
-o, --optimizer [sgd|rmsprop|adam|adadelta] Optimizer used for training.
-N, --number-of-resnet-blocks INTEGER Number of convolutional blocks in the ResNet layers.
-nf, --number-of-filters INTEGER Number of filters in first convolutional block.
-wf, --widen-factor INTEGER Widen factor of wide ResNet
-c, --classifier [avgpool|FCNAttention] The classification top of the network architeture. Choose between simple pooling + dense layer (needs fixed window size) and fully convolutional attention.
-w, --window FLOAT Window size in seconds.
-l, --loss [crossentropy|focal|ordinal] Classification loss. Ordinal loss ues sorted class labels.
-m, --mode [scratch|adapters|last-layer|finetune] Type of training to be performed.
-sfl, --share-feature-layer FLAG Share the feature layer (weighted attention of deep features) between tasks.
-iwd, --individual-weight-decay FLAG Set weight decay in adapters according to size of training dataset. Smaller datasets will have larger weight decay to keep closer to the pre-trained network.
--help FLAG Show this message and exit.

The "multi-task" command line slightly differs from the one described above. The most notable difference is in how the data is passed. Instead of passing individual .csv files for each partition, a directory - "--multi-task-setup" - which contains a folder with "train.csv", "val.csv" and "test.csv" files for each database has to be specified. Additionally, "-t" now is used to specify a list of databases (subfolders of the multi task setup) that should be used for training. As multi-domain training is done in a round-robin fashion, there is no predefined notion of a training epoch. Therefore, an additional option ("--steps-per-epoch") is used to define the size of an artificial training epoch. These additional parameters are also given in the table below.

Option Type Description
-dp, --data-path DIRECTORY Directory of wav files. [required]
-mts, --multi-task-setup DIRECTORY Directory with the setup csvs ("train.csv", "val.csv", "test.csv") for each task in a separate folder. [required]
-t, --tasks TEXT Names of the tasks that are trained. [required]
-spe, --steps-per-epoch INTEGER Number of training steps for each artificial epoch.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages