Skip to content

Classification of digits based on their Audio Inputs.

License

Notifications You must be signed in to change notification settings

Ankit152/DigitAudio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Digit Audio Classification 🔢

A Deep Learning model that classifies digits based on its Audio Input.

Audio 🔊

Audio is a term used to describe any sound or noise in a range the human ear is capable of hearing. Measured in hertz, the audio signal on a computer is generated using a sound card and heard through speakers or headphones.

Any digital information with speech or music stored on and played through a computer is known as an audio file or sound file. One of the most common types of audio file formats used today is the MP3.

An audio signal is a representation of sound, typically using either a changing level of electrical voltage for analog signals, or a series of binary numbers for digital signals.

Waveplots 〰

To plot all these waveplots, I came up across an exciting python library librosa. librosa is a python package for music and audio analysis. It provides the building blocks necessary to create music information retrieval systems.

These are the waveplots of the digits when we spell them. Please note that we are spelling one or any other digit as it should be spelled, not like oooooaannee or something like that. You can also try it once. Its funny though. 😆😆😆😆

Zero 0️⃣

One 1️⃣

Two 2️⃣

Three 3️⃣

Four 4️⃣

Five 5️⃣

Six 6️⃣

Seven 7️⃣

Eight 8️⃣

Nine 9️⃣

Countplot 📊

This is the countplot of the audio data. The X-axis represents the digits 0-9 and the Y-axis represents the count of respective digit.

We can note that the dataset is equally distributed i.e. they are balanced.

Model Architecture 🌟

  • Layer 0: Input of 40 dimensional
  • Layer 1: Dense layer of 128 neurons followed by ReLU Activation and Dropout rate of 0.25.
  • Layer 2: Dense layer of 256 neurons followed by ReLU Activation and Dropout rate of 0.25.
  • Layer 3: Dense layer of 128 neurons followed by ReLU Activation and Dropout rate of 0.25.
  • Layer 4: The softmax layer for the classifiaction of 10 different digits.

The below is the pictoral representation of the model architecture.

Model Training ⚔️

The model was trained over 200 epochs. Here are the accuracy and loss curves.

Accuracy vs Epochs 📈

Loss vs Epochs 📉

From the above figure we can say that the model is not overfitting.

Training Record 📓