Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
data		data
data_feeders		data_feeders
experiments		experiments
models		models
settings		settings
tools		tools
.gitignore		.gitignore
README.md		README.md
conda_dependencies.yml		conda_dependencies.yml
pip_dependencies.txt		pip_dependencies.txt

Repository files navigation

Sound event detection with depthwise separable and dilated convolutions

Welcome to the repository of DnD-SED method.

This is the repository for the method presented in the paper "Sound Event Detection with Depthwise Separable and Dilated Convolutions", by K. Drossos, S. I. Mimilakis, S. Gharib, Y. Li, and T. Virtanen.

Our code is based on PyTorch framework and we use the publicly available dataset TUT-SED Synthetic 2016.

Our paper is submitted for review to the IEEE World Congress on Computational Intelligence/International Joint Conference on Neural Networks (WCCI/IJCNN).

You can find an online version of our paper at arXiv: url to be announced

If you use our method, please cite our paper.

Method introduction

Methods for sound event detection (SED) are usually based on a composition of three functions; a feature extractor, an identifier of long temporal context, and a classifier. State-of-the-art SED methods use typical 2D convolutional neural networks (CNNs) as the feature extractor and an RNN for identifying long temporal context (a simple affine transform with a non-linearity is utilized as a classifier). This set-up can yield a considerable amount of parameters, amounting up to couple of millions (e.g. 4M) Additionally, the utilization of an RNN impedes the training process and the parallelization of the method.

With our DnD-SED method we propose the replacement of the typical 2D CNNs used as a feature extractor with depthwise separable convolutions, and the replacement of the RNN with dilated convolutions. We compare our method with the widely-used CRNN method, using the publicly available TUT-SED Synthetic 2016 dataset. We conduct a series of 10 experiments and we report mean values of time needed for one training epoch, F1 score, error rate, and amount of parameters.

We achieve a considerable decrease at the computational complexity and a simultaneous increase on the SED performance. Specifically, we achieve a reduction of the amount of parameters and the mean time needed for one training epoch (reduction of 85% and 72% respectively). Also, we achieve an increase of the mean F1 score by 4/6% and a reduction of the mean error rate by 3.8%.

You can find more information in our paper!

System set-up

To run and use our method (or simply repeat the experiments), you need to set-up the code and use the specific dataset. We provide you the full code used for the method, but you will have to get the audio files and extract the features.

Code set-up

To set-up the code, you will need

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sound event detection with depthwise separable and dilated convolutions

Welcome to the repository of DnD-SED method.

Table of Contents

Method introduction

System set-up

Code set-up

Data set-up

Conducting the experiments

About

Releases

Packages

Languages

License

dr-costas/dnd-sed

Folders and files

Latest commit

History

Repository files navigation

Sound event detection with depthwise separable and dilated convolutions

Welcome to the repository of DnD-SED method.

Table of Contents

Method introduction

System set-up

Code set-up

Data set-up

Conducting the experiments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages