- About the Project
- Getting Started
- Results and Demo
- Future Work
- Contributors
- Acknowledgements and Resources
- License
- The noise needed to be removed which is naturally induced like the non environmental noise which is removed with the denoising the signal . Refer this documentation also this Blog on AI noise reduction
- The Librosa Library for Audio manupulation is used.
- For the Audio signals we used scipy
- Matplotlib used to manipulate the data and visualize the signal .
The rest is Numpy for mathematical operations , wave for the operating on the wave file .
Noise Reduction
├───docs ## Documents and Images
│ └───Input Audio file
├───
Project Details
│ |
│ ├───
│ │ ├───Research papers
│ │ ├───Linear Algebra
│ │ ├───Neural networks & Deep Learning
│ │ ├───Project Documentation
│ │ ├───AI Noise Reduction Blog
│ │ ├───AI Noise Reduction Report
│ │ └───Code Implementation
│ │ ├───AI Noise Reduction.py
│ │ ├───audio.wav
│ │ ├───Resources
- Tested on Windows
git clone https://github.com/Dhriti03/Noise-Reduction.git
cd Noise-Reduction
In your Notebook install certain libraries
pip install wave
pip install librosa
pip install scipy.io
pip install matplotlib.pyplot
*This is the original Audio File * *After Addition of the Noise * *The final Audio signal after removing noise * *Flowchart for the project *
- On Manipulating the code according to your requirements, you could use it to control most of the Audio signlas . ##Theory
- An FFT is calculated over the noise audio clip
- Statistics are calculated over FFT of the the noise (in frequency)
- A threshold is calculated based upon the statistics of the noise (and the desired sensitivity of the algorithm)
- A mask is determined by comparing the signal FFT to the threshold
- The mask is smoothed with a filter over frequency and time
- The mask is appled to the FFT of the signal, and is inverted
import IPython
from scipy.io import wavfile
import scipy.signal
import numpy as np
import matplotlib.pyplot as plt
import librosa
import wave
%matplotlib inline
- Here we are importing the libraries like the IPython lib used for the to create a comprehensive environment for interactive and exploratory computing.
- From the Scipy.io library is used for manipulating the data and visualization of the data using a wide range of python commands .
- NumPy contains a multi-dimensional array and matrix data structures. It can be utilised to perform a number of mathematical operations on arrays such as trigonometric, statistical, and algebraic routines thus is a very useful library .
- Matplotlib.pyplot library helps to understand the huge amount of data through different visualisations.
- Librosa used when we work with audio data like in music generation(using LSTM's), Automatic Speech Recognition. It provides the building blocks necessary to create the music information retrieval systems.
- %matplotlib inline to enable the inline plotting, where the plots/graphs will be displayed just below the cell where your plotting commands are written. It provides interactivity with the backend in the frontends like the jupyter notebook.
wav_loc = r'/home/Noise_Reduction/Downloads/wave/file.wav'
rate, data = wavfile.read(wav_loc,mmap=False)
Here we take the waw file path location and then read that waw file with the wavefile module which is from the Scipy.io library. with parameters (filename - string or open file handle which is a Input WAV file.) then the (mmap : bool, optional in which whether to read data as memory-mapped (default: False).
def fftnoise(f):
f = np.array(f, dtype="complex")
Np = (len(f) - 1) // 2
phases = np.random.rand(Np) * 2 * np.pi
phases = np.cos(phases) + 1j * np.sin(phases)
f[1 : Np + 1] *= phases
f[-1 : -1 - Np : -1] = np.conj(f[1 : Np + 1])
return np.fft.ifft(f).real
- Here we firstly define the fft noise function in brief , a Fast Fourier transform (FFT) is an algorithm that computes the discrete Fourier transform (DFT) of a sequence, or its inverse (IDFT). Fourier analysis converts a signal from its original domain (often time or space) to a representation in the frequency domain and vice versa. The DFT is obtained by decomposing a sequence of values into components of different frequencies.
- Using fast fourier transform and defining a function of data type complex and finally calculating the real part of the function. In this the freqencies ranging between minimum frequency and max frequency are set to 1 and rest unwanted are neglected.
- Giving the file location
- Reading the wav file
- -32767 to +32767 is proper audio (to be symmetrical) and 32768 means that the audio clipped at that point
- wav-file is 16 bit integer, the range is [-32768, 32767], thus dividing by 32768 (2^15) will give the proper twos-complement range of [-1, 1]
def band_limited_noise(min_freq, max_freq, samples=1024, samplerate=1):
freqs = np.abs(np.fft.fftfreq(samples, 1 / samplerate))
f = np.zeros(samples)
f[np.logical_and(freqs >= min_freq, freqs <= max_freq)] = 1
return fftnoise(f)
- A function or time series whose Fourier transform is restricted to a finite range of frequencies or wavelengths.
- defining the freq with the standard freq with the min and max limit.
noise_len = 2 # seconds
noise = band_limited_noise(min_freq=4000, max_freq = 12000, samples=len(data), samplerate=rate)*10
noise_clip = noise[:rate*noise_len]
audio_clip_band_limited = data+noise
- The Band-Limited White Noise block specifies a two-sided spectrum, where the units are Hz.
- where the max of 12000 and min freq of 4000 is compared wrt the noise and the data provided.
- here we are clipping the noise signal by having a product of rate and the len of the noise signal.
- thus adding the noise and the given data
- In effect, adding noise expands the size of the training dataset.
- random noise is added to the input variables making them different every time it is exposed to the model.
- Adding noise to input samples is a simple form of data augmentation.
- Adding noise means that the network is less able to memorize training samples because they are changing all of the time,
- resulting in smaller network weights and a more robust network that has lower generalization error.
import time
from datetime import timedelta as td
- import time This module provides various time-related functions. For related functionality, see also the datetime and calendar modules. class datetime.timedelta
- A duration expressing the difference between two date, time, or datetime instances to microsecond resolution.
def _stft(y, n_fft, hop_length, win_length):
return librosa.stft(y=y, n_fft=n_fft, hop_length=hop_length, win_length=win_length)
- Short Time Fourier Transform can be used to quantify change of a nonstationary signal’s frequency and phase content over time.
- Hop length should refer to the number of samples in between successive frames. For signal analysis, Hop length should be less than the frame size, so that frames overlap.
- Parameters ynp.ndarray [shape=(n,)], real-valued input signal
n_fftint > 0 [scalar]
length of the windowed signal after padding with zeros. The number of rows in the STFT matrix D is (1 + n_fft/2). The default value, n_fft=2048 samples, corresponds to a physical duration of 93 milliseconds at a sample rate of 22050 Hz, i.e. the default sample rate in librosa. This value is well adapted for music signals. However, in speech processing, the recommended value is 512, corresponding to 23 milliseconds at a sample rate of 22050 Hz. In any case, we recommend setting n_fft to a power of two for optimizing the speed of the fast Fourier transform (FFT) algorithm.
hop_lengthint > 0 [scalar]
number of audio samples between adjacent STFT columns.
Smaller values increase the number of columns in D without affecting the frequency resolution of the STFT.
If unspecified, defaults to win_length // 4 (see below).
win_lengthint <= n_fft [scalar]
Each frame of audio is windowed by window of length win_length and then padded with zeros to match n_fft.
Smaller values improve the temporal resolution of the STFT (i.e. the ability to discriminate impulses that are closely spaced in time) at the expense of frequency resolution (i.e. the ability to discriminate pure tones that are closely spaced in frequency). This effect is known as the time-frequency localization trade-off and needs to be adjusted according to the properties of the input signal y.
If unspecified, defaults to win_length = n_fft .
return librosa.istft(y, hop_length, win_length)
- Inverse short-time Fourier transform (ISTFT).Converts complex-valued spectrogram stft_matrix to time-series y by minimizing the mean squared error between stft_matrix and STFT of y as described in
- In general, window function, hop length and other parameters should be same as in stft, which mostly leads to perfect reconstruction of a signal from unmodified stft_matrix.
def _amp_to_db(x):
return librosa.core.amplitude_to_db(x, ref=1.0, amin=1e-20, top_db=80.0)
1.Convert an amplitude spectrogram to dB-scaled spectrogram.This is equivalent to power_to_db(S**2), but is provided for convenience.
return librosa.core.db_to_amplitude(x, ref=1.0)
- Convert a dB-scaled spectrogram to an amplitude spectrogram.
- This effectively inverts amplitude_to_db:
- db_to_amplitude(S_db) ~= 10.0(0.5 * (S_db + log10(ref)/10))**
def plot_spectrogram(signal, title):
fig, ax = plt.subplots(figsize=(20, 4))
cax = ax.matshow(
signal,
origin="lower",
aspect="auto",
cmap=plt.cm.seismic,
vmin=-1 * np.max(np.abs(signal)),
vmax=np.max(np.abs(signal)),
)
- Ploting the spectogram with signal as the input .
- Axes Class contains most of the figure elements: Axis, Tick, Line2D, Text, Polygon, etc., and sets the coordinate system.
- It provides multiple colour maps in matplotlib accessible via this function .o find a good representation in 3D colorspace for your data set.
fig.colorbar(cax)
ax.set_title(title)
-
The best way to see what's happening, is to add a colorbar (plt.colorbar(), after creating the scatter plot). You'll note that your out values between 0 and 10000 are all below the lowest part of the bar, where things are a very light green.
-
In general, values below vmin will be colored with the lowest color, and values above vmax will get the highest color.
-
If you set vmax smaller than vmin, internally they will be swapped. Although, depending on the exact version of matplotlib and the precise functions called, matplotlib might give an error warning. So, best to set vmin always lower than vmax.
def plot_statistics_and_filter(
mean_freq_noise, std_freq_noise, noise_thresh, smoothing_filter
):
fig, ax = plt.subplots(ncols=2, figsize=(20, 4))
plt_std, = ax[0].plot(std_freq_noise, label="Std. power of noise")
plt_std, = ax[0].plot(noise_thresh, label="Noise threshold (by frequency)")
ax[0].set_title("Threshold for mask")
ax[0].legend()
cax = ax[1].matshow(smoothing_filter, origin="lower")
fig.colorbar(cax)
ax[1].set_title("Filter for smoothing Mask")
- Plots basic statistics of noise reduction.
- Signal-to-noise ratio (SNR or S/N) is a measure used in science and engineering that compares the level of a desired signal to the level of background noise.
- SNR is defined as the ratio of signal power to the noise power, often expressed in decibels.
- A ratio higher than 1:1 (greater than 0 dB) indicates more signal than noise.
- Setting up the threshhold frequency for noise masking.
- Masking threshold refers to a process where one sound is rendered inaudible because of the presence of another sound.
- So the masking threshold is the sound pressure level of a sound needed to make the sound audible in the presence of another noise called a "masker"
- thus added the threshold .
- Blur noise signals with various low pass filters
- Apply custom-made filters to images (2D convolution)
def removeNoise( # to average the signal (voltage) of the positive-slope portion (rise) of a triangle wave to try to remove as much noise as possible.
audio_clip, # these clips are the parameters used on which we would do the respective operations
noise_clip,
n_grad_freq=2, # how many frequency channels to smooth over with the mask.
n_grad_time=4, # how many time channels to smooth over with the mask.
n_fft=2048, # number audio of frames between STFT columns.
win_length=2048, # Each frame of audio is windowed by `window()`. The window will be of length `win_length` and then padded with zeros to match `n_fft`..
hop_length=512, # number audio of frames between STFT columns.
n_std_thresh=1.5, # how many standard deviations louder than the mean dB of the noise (at each frequency level) to be considered signal
prop_decrease=1.0, #To what extent should you decrease noise (1 = all, 0 = none)
verbose=False, # flag allows you to write regular expressions that look presentable
visual=False, #Whether to plot the steps of the algorithm
):
-
def removeNoise( to average the signal (voltage) of the positive-slope portion (rise) of a triangle wave to try to remove as much noise as possible.
-
audio_clip,
these clips are the parameters used on which we would do the respective operations -
noise_clip, n_grad_freq=2 how many frequency channels to smooth over with the mask.
-
n_grad_time=4, how many time channels to smooth over with the mask.
-
n_fft=2048
number audio of frames between STFT columns. -
win_length=2048, Each frame of audio is windowed by
window()
. The window will be of lengthwin_length
and then padded with zeros to matchn_fft
.. -
hop_length=512, number audio of frames between STFT columns.
-
n_std_thresh=1.5 how many standard deviations louder than the mean dB of the noise (at each frequency level) to be considered signal
-
prop_decrease=1.0, To what extent should you decrease noise (1 = all, 0 = none)
-
verbose=False,
flag allows you to write regular expressions that look presentable visual=False, #Whether to plot the steps of the algorithm ):
noise_stft = _stft(noise_clip, n_fft, hop_length, win_length)
noise_stft_db = _amp_to_db(np.abs(noise_stft))
- STFT over noise
- convert to dB
mean_freq_noise = np.mean(noise_stft_db, axis=1)
std_freq_noise = np.std(noise_stft_db, axis=1)
noise_thresh = mean_freq_noise + std_freq_noise * n_std_thresh
- Calculate statistics over noise
- Here we for the thresh noise we add the mean and the standard noise and the n_std noise .
sig_stft = _stft(audio_clip, n_fft, hop_length, win_length)
sig_stft_db = _amp_to_db(np.abs(sig_stft))
- STFT over signal
mask_gain_dB = np.min(_amp_to_db(np.abs(sig_stft)))
- Calculate value to mask dB to
smoothing_filter = np.outer(
np.concatenate(
[
np.linspace(0, 1, n_grad_freq + 1, endpoint=False),
np.linspace(1, 0, n_grad_freq + 2),
]
)[1:-1],
np.concatenate(
[
np.linspace(0, 1, n_grad_time + 1, endpoint=False),
np.linspace(1, 0, n_grad_time + 2),
]
)[1:-1],
)
smoothing_filter = smoothing_filter / np.sum(smoothing_filter)
- Create a smoothing filter for the mask in time and frequency
db_thresh = np.repeat(
np.reshape(noise_thresh, [1, len(mean_freq_noise)]),
np.shape(sig_stft_db)[1],
axis=0,
).T
- calculate the threshold for each frequency/time bin
sig_mask = sig_stft_db < db_thresh
- mask for the signal
sig_mask = scipy.signal.fftconvolve(sig_mask, smoothing_filter, mode="same")
sig_mask = sig_mask * prop_decrease
- Mask Convolution with Smoothning filter
# mask the signal
sig_stft_db_masked = (
sig_stft_db * (1 - sig_mask)
+ np.ones(np.shape(mask_gain_dB)) * mask_gain_dB * sig_mask
) # mask real
sig_imag_masked = np.imag(sig_stft) * (1 - sig_mask)
sig_stft_amp = (_db_to_amp(sig_stft_db_masked) * np.sign(sig_stft)) + (
1j * sig_imag_masked
)
- Mask the signal
# recover the signal
recovered_signal = _istft(sig_stft_amp, hop_length, win_length)
recovered_spec = _amp_to_db(
np.abs(_stft(recovered_signal, n_fft, hop_length, win_length))
)
- recover the signal
- Thus apply mask if the signal is above the threshold
- convolve the mask with a smoothing filter
- Applying the Noise reduction algorithum for the already downloaded wav file.
- Applying the FFT over the live recording of the audio signal .
- Further more deep implementation of the AI for the Noise cancellation.
- Applying the Noise reduction Algorithum for various formats of Audio files .
- The live audio signal with the microphone and Esp32 and thus will get the wav file for the further computation and signal processing .