FastMFCC

A comparison of two feature extraction pipelines for Keyword Spotting. They are referred to as MFCC_slow and MFCC_fast and are tested on Raspberry4.

The whole preoprocessing pipeline for this Keyword Spotting task consist of a Short Time Fourier Transform (STFT) followed by a Mel-frequency cepstral coefficients (MFCCs) representation.

The objective is to define a preprocessing routine referred to as MFCC_fast returning a tensor of a given shape minimizing execution time and maximizing signal-to-noise ratio:

The developed routine reduces of 32% the execution time, from 25ms to 17ms, returning an SNR = 22.37dB. The analysis is deepened in https://github.com/ScorcaF/FastMFCC/blob/21f000e0aa346a0eded66ec63c7705f6f4002c18/Group14_Homework1.pdf

Background

Always-on speech recognition is not energy efficient as it requires to transmit a continuous audio stream to the cloud, where data get processed. To mitigate this concern, devices first detect short keywords such as “Hey Siri” or “Ok Google” that wake up the device and trigger the full-scale speech recognition. This task, called Keyword Spotting, is much simpler, and therefore can be performed on board of the sensing nodes with lightweight Convolutional Neural Networks.

Before feeding the data to a Convolutional Neural Networks, it is required to perform a set of preprocessing steps. The most common strategy is to move from the time domain to the frequency domain using Short Time Fourier Transform (STFT). This transformation converts a one-dimensional timeseries signal into a two-dimensional image, enabling to solve keyword spotting as an image classification problem.

Another common feature extraction step relies on the hypothesis that representing sounds as they are perceived by the human ear improves the classification accuracy and can be achieved extracting the Mel-frequency cepstral coefficients (MFCCs) from the input signal. The Mel-frequency cepstrum is a representation of the STFT of a sound that tries to mimic how the membrane in human ears senses the vibrations of sounds. The MFCCs are coefficients that composes the Mel-frequency cepstrum.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Group14_Homework1.pdf		Group14_Homework1.pdf
HW1_ex2_Group14.py		HW1_ex2_Group14.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FastMFCC

Background

About

Releases

Packages

Languages

ScorcaF/FastMFCC

Folders and files

Latest commit

History

Repository files navigation

FastMFCC

Background

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages