US20080189104A1 - Adaptive noise suppression for digital speech signals - Google Patents
Adaptive noise suppression for digital speech signals Download PDFInfo
- Publication number
- US20080189104A1 US20080189104A1 US12/009,601 US960108A US2008189104A1 US 20080189104 A1 US20080189104 A1 US 20080189104A1 US 960108 A US960108 A US 960108A US 2008189104 A1 US2008189104 A1 US 2008189104A1
- Authority
- US
- United States
- Prior art keywords
- gain
- noise
- power
- speech
- frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000001629 suppression Effects 0.000 title description 27
- 230000003044 adaptive effect Effects 0.000 title description 3
- 238000001228 spectrum Methods 0.000 claims abstract description 47
- 238000009499 grossing Methods 0.000 claims abstract description 31
- 230000000694 effects Effects 0.000 claims abstract description 18
- 238000012805 post-processing Methods 0.000 claims abstract description 14
- 230000033228 biological regulation Effects 0.000 claims abstract description 12
- 238000000034 method Methods 0.000 claims description 29
- 238000004458 analytical method Methods 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 8
- 230000015572 biosynthetic process Effects 0.000 claims description 5
- 230000001419 dependent effect Effects 0.000 claims description 5
- 238000003786 synthesis reaction Methods 0.000 claims description 5
- 238000012545 processing Methods 0.000 description 28
- 238000010586 diagram Methods 0.000 description 8
- 238000004891 communication Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 7
- 230000015654 memory Effects 0.000 description 7
- 238000013459 approach Methods 0.000 description 5
- 230000008447 perception Effects 0.000 description 5
- 230000003595 spectral effect Effects 0.000 description 5
- 239000000654 additive Substances 0.000 description 4
- 230000000996 additive effect Effects 0.000 description 4
- 238000001914 filtration Methods 0.000 description 4
- 230000004044 response Effects 0.000 description 4
- 230000002123 temporal effect Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 230000002093 peripheral effect Effects 0.000 description 3
- 230000004075 alteration Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000003321 amplification Effects 0.000 description 1
- 230000001174 ascending effect Effects 0.000 description 1
- 230000002238 attenuated effect Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000002347 injection Methods 0.000 description 1
- 239000007924 injection Substances 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000011426 transformation method Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 230000007306 turnover Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
Definitions
- the disclosure relates generally to audio signal processing, and in particular to suppressing additive noise in a speech signal in a communication system.
- an additive background noise signal is introduced into the speech signal.
- the corrupted speech signal, or noisy speech signal often poses difficulties for the receiving party, such as degraded quality or reduced intelligibility. For instance, when having a conversation over the mobile phone in a driving car or on a busy street, the background noise is often high enough to make the conversation far less efficient than in a quiet room. It is hence often desired to remove the corrupting noise either before the noisy signal is transmitted at the sender or before the received noisy signal is played out at the receiver.
- Embodiments of the present disclosure relate to a system and method that rates the voice activity with a continuous score, and adaptively estimates the noise power in psychoacoustic bands and accordingly adjusts the noisy signal spectrum based on probabilistic heuristics to suppress the noise in a speech signal.
- an apparatus for adaptively suppressing noise in an input signal frequency spectrum derived from overlapping input frames includes a psychoacoustic power computation module configured to compute a noisy signal power in psychoacoustic bands, a voice activity scoring module configured to compute a probabilistic score for a presence of a speech, and a noise estimation module configured to estimate a noise power in the psychoacoustic bands based on information of past frames, the probabilistic score, and the computed noisy signal power.
- the system also includes a gain computation module configured to compute a gain for each frequency, based on a probabilistic heuristic, the probabilistic score and the information on the past frames, and a gain post-processing module configured to perform a gain time smoothing, a gain frequency smoothing, and a gain regulation for the computed gain.
- a gain computation module configured to compute a gain for each frequency, based on a probabilistic heuristic, the probabilistic score and the information on the past frames
- a gain post-processing module configured to perform a gain time smoothing, a gain frequency smoothing, and a gain regulation for the computed gain.
- a method for adaptively suppressing a noise in an input signal frequency spectrum derived from overlapping input frames includes computing a noisy signal power in psychoacoustic bands, computing a probabilistic score for a presence of a speech, and estimating a noise power in the psychoacoustic bands based on information of past frames, the probabilistic score, and the computed noisy signal power.
- the method also includes computing a gain for each frequency, based on a probabilistic heuristic, the probabilistic score and the information on the past frames, post-processing the computed gain by performing a gain time smoothing, a gain frequency smoothing, and a gain regulation on the computed gain, and adjusting the input signal frequency spectrum by attenuating a noise in the input signal frequency spectrum based on the post-processed gain.
- a computer program embodied on a computer readable medium and operable to be executed by a processor.
- the computer program includes computer readable program code for converting overlapping input frames into an input signal frequency spectrum, computing a noisy signal power in psychoacoustic bands and computing a probabilistic score for a presence of a speech.
- the computer program also includes computer readable program code for estimating a noise power in the psychoacoustic bands based on information of past frames, the probabilistic score, and the computed noisy signal power, and computing a gain for each frequency, based on a probabilistic heuristic, the probabilistic score and the information on the past frames.
- the computer program further includes computer readable program code for post-processing the computed gain by performing a gain time smoothing, a gain frequency smoothing, and a gain regulation on the computed gain and adjusting the input signal frequency spectrum by attenuating a noise in the input signal frequency spectrum based on the post-processed gain.
- FIG. 1 shows two possible applications for one embodiment of the present disclosure in a telecommunication system
- FIG. 2 shows a high-level block diagram of functional modules related to noise suppression according to one embodiment of the present disclosure
- FIG. 3 shows a block diagram of a processing engine for noise suppression according to one embodiment of the present disclosure
- FIG. 4 shows an block diagram for a gain post-processing module according to one embodiment of the present disclosure
- FIG. 5 shows an exemplary curve of a voice activity score component as a function of a voice band count
- FIG. 6 shows an exemplary frame core distribution and associated frame characteristics according to one embodiment of the present disclosure
- FIG. 7 shows an exemplary curve for the noise time smoothing factor for different constants according to one embodiment of the present disclosure
- FIG. 8 shows an exemplary curve of a scale factor as a function of an estimate noise according to one embodiment of the present disclosure
- FIG. 9 illustrates exemplary curves of gain vs. a ratio of noise power to threshold according to one embodiment of the present disclosure
- FIG. 10 illustrates an exemplary gain regulation curve according to one embodiment of the present disclosure.
- FIG. 11 depicts a block diagram of a generic controller 1100 for a wireless terminal according to one embodiment of the present disclosure.
- spectral subtraction works by estimating the power of additive noise and subtracting it from the noisy signal power to obtain an estimated spectrum of the clean speech, based on the assumption that the corrupting noise is uncorrelated with speech, which is generally true in practice. Special treatment is needed to avoid negative power after subtraction.
- phase information is generally taken the same as the noisy signal, as it is found to be less important for perception than power.
- Spectral weighting is to obtain a weight for each frequency that corresponds to an optimum filter that minimizes the mean-square error of the processed signal against the desired signal (clean speech), a form of Wiener filter implemented in the frequency domain. It involves estimating the noise power and computing the spectrum of the noisy signal, after which a weighting gain is calculated. These two methods can be considered as special cases of generalized Wiener filtering, and one issue is that it relies on accurate estimation of the noise power.
- the model based approach is based on an underlying speech model and has also been investigated in the past.
- the parameters of the model are first estimated and then the speech is generated using the estimated parameters.
- One issue associated with this approach is that a high level of complexity. The fact that accurate estimation of the model parameters for a noisy signal is itself difficult. Practically, for better accuracy, a higher model order is necessary, which in turns increases the complexity significantly, in some cases exponentially.
- FIG. 1 shows a communication system 100 .
- the communication system 100 includes a sender 110 and a receiver 130 .
- the sender 110 can include one or more software modules and one or more hardware modules.
- the examples of the sender 110 can be a wireless terminal or a wireline phone terminal.
- the first block diagram shows the sender 110 of the communication system 100 , where noise suppression is carried out before the speech is encoded.
- the sender 110 includes a microphone input unit 111 , an analog-to-digital converter (ADC) 113 , a noise suppression unit 200 , a speech encoding unit 117 , and modulation and transmission unit 119 .
- ADC analog-to-digital converter
- the microphone input unit 111 can receive speech from a speaker and generate analog signals.
- the ADC 113 converts the analog speech signals to corresponding digital signals.
- the noise suppression module 200 is configured to suppress noise in the speech signals before the speech signals are transmitted to the receiver 130 . More details of the noise suppression module 200 are shown in FIG. 2 and FIG. 3 and described therein.
- the input to noise suppression module 200 is in Pulse Code Modulation (PCM) format obtained by the ADC 113 . Typical sampling frequency, denoted as Fs, is 8 KHz, though 16 KHz or other frequencies are sometimes used.
- PCM Pulse Code Modulation
- Fs Typical sampling frequency
- the speech signals in digital format are encoded at the speech encoding module 117 .
- the encoded speech data are modulated and transmitted to the receiver by the modulation and transmission module 119 .
- the receiver 130 can include one or more software modules and one or more hardware modules.
- the examples of the sender 110 can be a wireless terminal or a wireline phone terminal.
- the receiver 130 can include a reception and demodulation unit 139 , speech decoding unit 137 , a noise suppression module 200 , a digital-to-analog converter (DAC) convert 133 , and a speaker output unit 131 .
- the noise suppression module 200 on the receiver 130 is identical to the one on the sender 110 .
- noise suppression is carried out after the signal is decoded by the decoding unit 137 , also operating in PCM format.
- the operations at the receiver 130 are the mirror image of those at the sender 110 .
- the reception and demodulation unit 139 receive and demodulate the speech data and then the speed decoding module 137 decodes the speech data into the PCM format.
- the noise suppression module 200 is configured to suppress the noise in the speech data.
- the DAC 133 converts the speech data back to the analog format to be played back by the speaker output unit 131 .
- one embodiment of the present disclosure should work equally well in either scenario. Practically, it is preferred to carry out noise suppression at the sender 110 ; because the receiver often has no information as to whether the received signal had its noise suppressed at the sender 110 and simply reapplying noise suppression may compromise the speech quality. Thus, following the well-established principles of Wiener filtering, a method according to one embodiment of the present disclosure works in the frequency domain to suppress the noise. To make the processing more closely related to human perception and to keep cost low in terms of memory and computation, processing is done in the psychoacoustically motivated bands, for example the Bark bands as shown in Table 1 below.
- the frequency range covering the first two or three formants is identified as more important, also referred to as speech band.
- the psychoacoustic bands are divided into three groups: Low Range (LR) for bands below the speech band, Middle Range (MR) for those in the speech band, and High Range (HR) for those above the speech band.
- LR Low Range
- MR Middle Range
- HR High Range
- Table 1 Processing is discriminatively carried out for bands in different groups according to one embodiment of the present disclosure.
- FIG. 2 shows a high-level functional block diagram of a noise suppression module 200 , according to one embodiment of the present disclosure.
- the noise suppression module 200 can include one or more software modules and one or more hardware module.
- the noise suppression module 200 is implemented in the generic controller 1100 as illustrated in FIG. 11 .
- the noise suppression module 200 includes an input windowing module 211 , a frequency analysis module 213 , a processing engine 300 , a frequency synthesis module 217 , and an output overlapping and adding module 219 .
- the process engine 300 includes a voice activity scoring module 313 , a perceptual analysis and processing module 331 , and a noise estimation module 315 .
- the method works in block-processing mode; that is, input stream is segmented into overlapping frames, each frame processed separately, and output obtained by overlap-and-adding the processed frames.
- the input Windowing module 211 segments the input signal into overlapping frames. Overlapping ratio is typically chosen to be half; that is, the first half of the current frame is in fact the second half of the previous frame. A window is multiplied with the frame to ensure smooth transition from frame to frame, and to suppress high frequencies introduced by segmentation.
- the frequency analysis 213 then transforms the windowed frame to the frequency domain using a frequency analysis method.
- FFT Fast Fourier Transform
- the processing engine 300 is configured to analyze and identify the noise in the input signal spectrum and then suppress the noise.
- the processing engine 300 includes a voice activity score module 313 , a perceptual analysis and processing module 331 , and a noise estimation module 315 . These component modules of the processing engine 300 for noise suppression are depicted in more details in FIG. 3 and FIG. 4 and described therein.
- the frequency synthesis module 217 and the output overlap-and-add module 219 are configured to the transform processed signal spectrum back to time-domain, after the noise suppression operations on the input signal spectrum.
- the frequency synthesis and overlap-and-add module 219 may use an inverse transformation method of frequency analysis to convert the processed signal spectrum in frequency domain back to the time domain. If FFT was used for frequency analysis, then Inverse FFT is applied.
- the processed time domain signal of the current frame is aligned with the corresponding part of the previously processed frame and they are summed to produce the output.
- the overlapping region of current frame with the next frame is saved for synthesis of next output frame.
- FIG. 3 shows a block diagram of a processing engine 300 for noise suppression according to one embodiment of the present disclosure.
- the processing engine 300 can include one or more software modules and one or more hardware module.
- the processing engine 300 is implemented in the generic controller 1100 for a wireless terminal, as illustrated in FIG. 11 .
- the processing engine 300 includes a Bark bank power computation module 311 , a voice activity scoring module 313 , a noise power estimation module 315 , and a gain computation module 317 .
- the processing engine 300 also includes a gain post-processing module 400 , a signal spectrum adjustment module 321 , and a mode switching decision module 323 .
- the processing engine 300 also includes a signal power array updating module 314 and an information store 316 .
- the information of a certain number of past frames may be stored in the information store 316 to facilitate modules such as Voice Activity Scoring (VAS) module 313 , the noise power estimation 315 , the gain computation module 317 and the gain post-processing module 319 .
- VAS Voice Activity Scoring
- the voice activity scoring (VAS) module 313 is configured to compute a continuous score to rate the possibility of the presence of speech.
- noise power is estimated for adjusting the noisy signal spectrum.
- VAS voice activity scoring
- the VAS module 313 is particularly useful in making the estimation of noise power fuzzy so as to eliminate the risk of wrong classification by a traditional voice activity detector (VAD) that outputs binary decisions.
- VAD voice activity detector
- the VAS module 313 computes a score in a continuous range such that a low score indicates the input frame highly likely being a noise-only frame and a high score indicates the input frame highly likely being a frame dominated by speech. This scoring scheme is found advantageous over the binary decision scheme of a conventional Voice Activity Detector (VAD) due to the quasi- and non-stationary nature of speech signals.
- VAD Voice Activity Detector
- the noise power estimation module 315 follows the principle of temporal tracking. Making use of the observation that noise power normally changes slowly. According to one embodiment of the present disclosure, taking advantage of the score output by the VAS, the noise estimation module 315 can respond quickly to non-stationarity in the input, in addition to being able to cope with signals that are neither noise-only nor speech-dominated with a very high likelihood.
- the gain computation module 317 may compute a gain for each frequency according to a heuristic, based on the estimated noise power.
- the heuristic may be expressed as follows. As the ratio of the noisy signal frequency component power to the estimated noise frequency component power grows, the possibility of that frequency component of the noisy signal being noise decreases, and when the ratio is large enough the frequency component can eventually be taken as containing speech only.
- the gain post-process module 400 performs a post-gain processing on the computed gain for each frequency, with the estimated noise power, and according to probabilistic heuristics.
- the post-gain processing module 400 makes sure the processed signal sound natural.
- FIG. 4 shows details of the post-gain processing module 400 .
- the signal spectrum adjustment module 321 adjusts the noisy signal spectrum by multiplying the final gains with the magnitudes of the noisy signal spectrum to attenuate noise. This in effect suppresses the noise to achieve improved quality and intelligibility of speech.
- the mode switching decision module 323 checks mode switching criteria for each frame to decide a mode for next frame. To cope with changing environments, the noise suppression engine may operate in and automatically switch between two modes: NORMAL for adequate noise and NOISY for extremely high noise.
- the following sections describe these operations of the processing engine 300 for noise suppression in more detail. These operations are performed by the Bark band power computation module 311 , the VAS module 313 , the signal power array updating module 314 , the noise power estimation module 315 , the gain computation module 317 , the gain post-processing module 400 , the signal spectrum adjustment module 321 and the mode switching decision module 323 .
- the Bark band power computer module 311 computes the signal bank power in psychoacoustic bands. Equation 1 below represents the power in the psychoacoustic bands, where X i,k denotes the ith frequency sample of kth frame after frequency analysis, j is the band index, k is the frame index, B j is the set of frequency indices of the jth band according to Table 1 above.
- the voice activity scoring module 313 assigns a score, denoted as FRAME_SCORE k , to the current frame k to indicate the possibility of existence of speech. It is continuous and non-negative, with a larger value indicating higher possibility of containing speech.
- FRAME_SCORE k is computed based on a combination of two metrics: Score_ 1 taking into account the shape of the signal's power spectrum, and Score_ 2 the total power. Specifically, Score_ 1 is a function of the number of MR bands of the current frame having greater power than corresponding MR bands of the previously estimated noise scaled by a factor. A pseudo code is shown below to illustrate how the signal power and noise power are compared to obtain the input to the function for computing Score_ 1 .
- X j,k b Signal power of psychoacoustic band j of current frame k (see Equation 0)
- D j,k ⁇ 1 b Estimated noise power of psychoacoustic band j of previous frame k ⁇ 1 (see (Equation 4)
- ⁇ A constant scaling factor, preferably in the range of 1.5 to 4.
- FIG. 5 shows a curve that results from a function into which the computed value cnt that is fed to finally obtain Score_ 1 .
- threshold_cnt controls the turning point above which the curve, hence Score_ 1 , increases more quickly as cnt increases.
- Score_ 2 is related to the ratio of total power of the current frame to that of the previous estimated noise.
- Score_ ⁇ 2 ⁇ * ⁇ j ⁇ X j , k b ⁇ j ⁇ D j , k - 1 b ( Equation ⁇ ⁇ 1 )
- ⁇ is a constant and takes a value in the range of 0.25 to 0.5.
- the final score is a weighted sum of these two:
- the noise power estimation module 315 estimates the noise power in psychoacoustic bands that are more closely related to human perception than individual frequencies. The estimation works in one of two modes that are adapted to different signal characteristics: one mode for noise-like signal, and the other for speech-like signal.
- the threshold NOISE_SPEECH_TH can be tuned with test signals.
- the estimation is based on the principle of temporal tracking; that is, noise power in each band changes slowly in time and is closely related to the recent frames having small power.
- the signal power of N recent frames is sorted in ascending order, and a portion of the array from the beginning is averaged as the estimated noise power in this band of the current frame.
- the total number of recent frames, N, for which the signal power is stored, may correspond to a time interval of about 200 to 400 milliseconds.
- estimated noise power for band j is
- ⁇ is an adaptive smoothing factor to eliminate abrupt change, and is derived from a predefined constant NOISE_SMOOTH_FACTOR, which is greater than 0.5, and the normalized deviation of total power of current frame from the mean total power of a few recent frames.
- G is the set of frame indices for P most recent frames.
- the signal power is taken as the estimated noise power:
- the smoothing factor ⁇ gradually changes from the 1-NOISE_SMOOTH_FACTOR to NOISE_SMOOTH_FACTOR as FRAME_SCORE k increases from a lower score threshold NOISE_TH_L to a higher score threshold NOISE_TH_H, as depicted in FIG. 7 .
- the final noise power is updated following (Equation 4.
- the gain computation module 317 computes a gain for each frequency component I according to a probabilistically driven heuristics.
- a threshold THRES j is first computed based on the estimated noise power D j,k b :
- r ⁇ j ⁇ X j , k b ⁇ j ⁇ D j , k - 1 b .
- FIG. 8 An example curve to compute SCALE_FACTOR k with r is illustrated in FIG. 8 .
- the gain G i,k is computed according to a probabilistically driven curve that can be either linear or non-linear.
- FIG. 9 shows some example curves that can be used.
- a turnover point is identified, below which the gain is attenuated and above which it is amplified.
- Different degrees of attenuation/attenuation amplification correspond to different probabilistic heuristics in the treatment of noise. Further improvement can be achieved by assigning the same gain to frequencies in one psychoacoustic band if they are in the LR or when current frame is found to be noise-only. This also simplifies computation.
- G i,k is computed as
- G i , k ⁇ f ⁇ ( X j , k b / C j THRES j ) , if ⁇ ⁇ j ⁇ LR ⁇ ⁇ or ⁇ ⁇ FRAME_SCORE k ⁇ NOISE_TH ⁇ _L f ⁇ ( ⁇ X i , k ⁇ 2 THRES j ) , otherwise ( Equation ⁇ ⁇ 8 )
- B j is the set of frequency indices of the jth band according to Table 1
- C j is the total number of frequency components in band j
- f( ) is a function designed according to probabilistic heuristics as mentioned above.
- FIG. 4 shows the component modules of the gain post-processing module 400 .
- the gain post-processing module 400 further processes the computed gains to ensure the quality of processed signal and may include a gain time smoothing module 411 , a gain frequency smoothing module 413 , and, and a gain regulation 415 .
- the gain time smoothing module 411 can smooth the gains in the time domain. As known, a filter that changes too fast in the time domain results in unnaturalness in the processed signal and in some cases may introduce musical noise. Hence, the gains are carefully smoothed in the time axis.
- the gain time smoothing module 411 takes into account the signal temporal characteristics by detecting if the current frame is a release; if so, the time smoothing factor is adjusted according to G i,k-1 , based on the heuristic that the higher G i,k-1 is the more likely frequency i corresponding to a decaying voice and hence is given a higher value to better preserve voice. If not a release, is assigned with the lowest value.
- Equation 9 The time smoothing formula is expressed as shown by Equation 9 below.
- ⁇ i is a frequency-dependent time smoothing factor, preferably in the range of 0.3 to 0.7.
- the gain smoothing over frequency smoothing module 413 can mitigate artifacts introduced into the computed gains.
- the computed gains are all positive real numbers, and they correspond to a zero-phase filter which is symmetric in the time domain. If the filter impulse response has significant energy near its beginning (and tail by symmetry), when convolving with the windowed input signal, some artifacts may be introduced into the output. This can be mitigated by multiplying the filter impulse response with a smoothing window. In the frequency domain, this can be accomplished by filtering gains ⁇ G′ i,k ⁇ with a linear-phase low-pass filter. A finite impulse response (FIR) filter of order as low as four is normally adequate.
- FIR finite impulse response
- G min a threshold G min , (i.e., G′ i,k G min .
- the threshold G min determines the maximum suppression of noise and it also serves as an injection of comfort noise. Furthermore, no gain should exceed unity, G′ i,k 1, the gain.
- the gain regulation curve 1000 is depicted in FIG. 10 according to one embodiment of the present disclosure.
- the noisy signal spectrum adjustment module 321 can adjust the noisy signal spectrum by multiplying the post-processed gain G′ i,k with respective frequency component X i,k to produce a filtered spectrum ⁇ Y i,k ⁇ as shown by Equation 10 below.
- the mode switching decision module 323 is configured to determine a mode of operation based on the empirical observation and then switch into the mode.
- a significant portion of non-noise frames if FRAME_SCORE k >NOISE_TH_H, see FIG. 6 ) are in fact speech-dominated frames (if FRAME_SCORE k >SPEECH_TH).
- the mode is switched from NORMAL to NOISY when this portion falls below a threshold.
- this portion is too large, mode is switched from NOISY to NORMAL.
- the exact proportion can be tuned with the actual test signals that comprise streams of normal noise and streams of high noise.
- one embodiment of the present disclosure provides a system and method for adaptively suppressing noise in a speech signal with little memory and computation.
- the method and system can adaptively suppress additive noise in a speech signal for improved quality and intelligibility.
- Input signal is segmented into overlapping frames and each frame is processed in the frequency domain.
- Voice activity of an input frame is rated with a score in a continuous range to adapt other processing modules.
- Noise power is estimated in psychoacoustically motivated bands, making the processing closely related to human perception.
- a gain for each frequency is computed according to probabilistic heuristics, smoothed in the time axis and frequency axis, and regulated before adjusting the noisy signal spectrum, to ensure the naturalness of the processed speech.
- the method can operate in and automatically switch between two modes: one for adequate noise and the other for extremely high noise. This method is very efficient in terms of memory and computation as some processing is done in a psychoacoustic scale which has only about 20 bands.
- FIG. 11 depicts a block diagram of a generic controller 1100 for a wireless terminal.
- the generic controller 1100 depicted includes a processor 1102 connected to a level two cache/bridge 1104 , which is connected in turn to a local system bus 1106 .
- Local system bus 1106 may be, for example, a peripheral component interconnect (PCI) architecture bus.
- PCI peripheral component interconnect
- Also connected to local system bus in the depicted example are a main memory 1108 and a graphics adapter 1110 .
- the graphics adapter 1110 may be connected to display 1111 .
- LAN local area network
- WiFi Wireless Fidelity
- I/O input/output
- Disk controller 1120 can be connected to a storage 1126 , which can be any suitable machine usable or machine readable storage medium, including but not limited to nonvolatile, hard-coded type mediums such as read only memories (ROMs) or erasable, electrically programmable-read only memories (EEPROMs), magnetic tape storage, and user-recordable type mediums such as floppy disks, hard disk drives and compact disk read only memories (CD-ROMs) or digital versatile disks (DVDs), and other known optical, electrical, or magnetic storage devices.
- ROMs read only memories
- EEPROMs electrically programmable-read only memories
- CD-ROMs compact disk read only memories
- DVDs digital versatile disks
- audio adapter 1124 Also connected to I/O bus 1116 in the example shown is audio adapter 1124 , to which speakers (not shown) may be connected for playing sounds.
- Keyboard/mouse adapter 1118 provides a connection for a pointing device (not shown), such as a mouse, a trackball, and a trackpointer, etc.
- FIG. 11 may vary for particular embodiments.
- other peripheral devices such as an optical disk drive and the like, also may be used in addition or in place of the hardware depicted.
- the depicted example is provided for the purpose of explanation only and is not meant to imply architectural limitations with respect to the present disclosure.
- the generic controller 1100 in accordance with an embodiment of the present disclosure includes an operating system employing a graphical user interface.
- the operating system permits multiple display windows to be presented in the graphical user interface simultaneously, with each display window providing an interface to a different application or to a different instance of the same application.
- a cursor in the graphical user interface may be manipulated by a user through the pointing device. The position of the cursor may be changed and/or an event, such as clicking a mouse button, generated to actuate a desired response.
- One of various commercial operating systems such as a version of Microsoft WindowsTM, a product of Microsoft Corporation located in Redmond, Wash. may be employed if suitably modified.
- the operating system is modified or created in accordance with the present disclosure as described.
- LAN/WAN/Wireless adapter 1112 can be connected to a network 1130 (not a part of generic controller 1100 ), which can be any public or private data processing system network or combination of networks, as known to those of skill in the art, including the Internet.
- the generic controller 1100 can communicate over network 1130 with server system 1140 , which is also not part of generic controller 1100 , but can be implemented, for example, as a separate generic controller 1100 .
- Couple and its derivatives refer to any direct or indirect communication between two or more elements, whether or not those elements are in physical contact with one another.
- the term “or” is inclusive, meaning and/or.
- the phrases “associated with” and “associated therewith,” as well as derivatives thereof, may mean to include, be included within, interconnect with, contain, be contained within, connect to or with, couple to or with, be communicable with, cooperate with, interleave, juxtapose, be proximate to, be bound to or with, have, have a property of, or the like.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Telephone Function (AREA)
Abstract
Description
- The present application is related to U.S. Provisional Patent No. 60/881,028, filed Jan. 18, 2007, entitled “ADAPTIVE NOISE SUPPRESSION FOR DIGITAL SPEECH SIGNALS”. U.S. Provisional Patent No. 60/881,028 is assigned to the assignee of the present application and is hereby incorporated by reference into the present disclosure as if fully set forth herein. The present application hereby claims priority under 35 U.S.C. §119(e) to U.S. Provisional Patent No. 60/881,028.
- The disclosure relates generally to audio signal processing, and in particular to suppressing additive noise in a speech signal in a communication system.
- In many communication applications, an additive background noise signal is introduced into the speech signal. The corrupted speech signal, or noisy speech signal, often poses difficulties for the receiving party, such as degraded quality or reduced intelligibility. For instance, when having a conversation over the mobile phone in a driving car or on a busy street, the background noise is often high enough to make the conversation far less efficient than in a quiet room. It is hence often desired to remove the corrupting noise either before the noisy signal is transmitted at the sender or before the received noisy signal is played out at the receiver.
- Embodiments of the present disclosure relate to a system and method that rates the voice activity with a continuous score, and adaptively estimates the noise power in psychoacoustic bands and accordingly adjusts the noisy signal spectrum based on probabilistic heuristics to suppress the noise in a speech signal.
- In one embodiment, an apparatus for adaptively suppressing noise in an input signal frequency spectrum derived from overlapping input frames is provided. The system includes a psychoacoustic power computation module configured to compute a noisy signal power in psychoacoustic bands, a voice activity scoring module configured to compute a probabilistic score for a presence of a speech, and a noise estimation module configured to estimate a noise power in the psychoacoustic bands based on information of past frames, the probabilistic score, and the computed noisy signal power. The system also includes a gain computation module configured to compute a gain for each frequency, based on a probabilistic heuristic, the probabilistic score and the information on the past frames, and a gain post-processing module configured to perform a gain time smoothing, a gain frequency smoothing, and a gain regulation for the computed gain.
- In another embodiment, a method for adaptively suppressing a noise in an input signal frequency spectrum derived from overlapping input frames is provided. The method includes computing a noisy signal power in psychoacoustic bands, computing a probabilistic score for a presence of a speech, and estimating a noise power in the psychoacoustic bands based on information of past frames, the probabilistic score, and the computed noisy signal power. The method also includes computing a gain for each frequency, based on a probabilistic heuristic, the probabilistic score and the information on the past frames, post-processing the computed gain by performing a gain time smoothing, a gain frequency smoothing, and a gain regulation on the computed gain, and adjusting the input signal frequency spectrum by attenuating a noise in the input signal frequency spectrum based on the post-processed gain.
- In yet another embodiment, a computer program embodied on a computer readable medium and operable to be executed by a processor is provided. The computer program includes computer readable program code for converting overlapping input frames into an input signal frequency spectrum, computing a noisy signal power in psychoacoustic bands and computing a probabilistic score for a presence of a speech. The computer program also includes computer readable program code for estimating a noise power in the psychoacoustic bands based on information of past frames, the probabilistic score, and the computed noisy signal power, and computing a gain for each frequency, based on a probabilistic heuristic, the probabilistic score and the information on the past frames. The computer program further includes computer readable program code for post-processing the computed gain by performing a gain time smoothing, a gain frequency smoothing, and a gain regulation on the computed gain and adjusting the input signal frequency spectrum by attenuating a noise in the input signal frequency spectrum based on the post-processed gain.
- Other technical features may be readily apparent to one skilled in the art from the following figures, descriptions and claims.
- For a more complete understanding of this disclosure and its features, reference is now made to the following description, taken in conjunction with the accompanying drawings, in which:
-
FIG. 1 shows two possible applications for one embodiment of the present disclosure in a telecommunication system; -
FIG. 2 shows a high-level block diagram of functional modules related to noise suppression according to one embodiment of the present disclosure; -
FIG. 3 shows a block diagram of a processing engine for noise suppression according to one embodiment of the present disclosure; -
FIG. 4 shows an block diagram for a gain post-processing module according to one embodiment of the present disclosure; -
FIG. 5 shows an exemplary curve of a voice activity score component as a function of a voice band count; -
FIG. 6 shows an exemplary frame core distribution and associated frame characteristics according to one embodiment of the present disclosure; -
FIG. 7 shows an exemplary curve for the noise time smoothing factor for different constants according to one embodiment of the present disclosure; -
FIG. 8 shows an exemplary curve of a scale factor as a function of an estimate noise according to one embodiment of the present disclosure; -
FIG. 9 illustrates exemplary curves of gain vs. a ratio of noise power to threshold according to one embodiment of the present disclosure; -
FIG. 10 illustrates an exemplary gain regulation curve according to one embodiment of the present disclosure; and -
FIG. 11 depicts a block diagram of ageneric controller 1100 for a wireless terminal according to one embodiment of the present disclosure. - The problem of removing or suppressing noise corrupting a speech signal in a communication system has been studied for a long time. Reported approaches can be broadly classified into several categories: spectral subtraction, spectral weighting and model based. Spectral subtraction works by estimating the power of additive noise and subtracting it from the noisy signal power to obtain an estimated spectrum of the clean speech, based on the assumption that the corrupting noise is uncorrelated with speech, which is generally true in practice. Special treatment is needed to avoid negative power after subtraction. In spectral subtraction, the phase information is generally taken the same as the noisy signal, as it is found to be less important for perception than power.
- Spectral weighting is to obtain a weight for each frequency that corresponds to an optimum filter that minimizes the mean-square error of the processed signal against the desired signal (clean speech), a form of Wiener filter implemented in the frequency domain. It involves estimating the noise power and computing the spectrum of the noisy signal, after which a weighting gain is calculated. These two methods can be considered as special cases of generalized Wiener filtering, and one issue is that it relies on accurate estimation of the noise power.
- The model based approach is based on an underlying speech model and has also been investigated in the past. In such approach, the parameters of the model are first estimated and then the speech is generated using the estimated parameters. One issue associated with this approach is that a high level of complexity. The fact that accurate estimation of the model parameters for a noisy signal is itself difficult. Practically, for better accuracy, a higher model order is necessary, which in turns increases the complexity significantly, in some cases exponentially.
- There is therefore a need for an improved system and method to adequately suppress the corrupting noise in a noisy speech signal to improve its quality and intelligibility with low computational cost. In particular, there is a need for a system and method to be applied in situations where there is only one single recording device, in contrast to when there is a separate recording device for the background noise. The implication of one recording device is that the input signal is mono.
-
FIG. 1 shows acommunication system 100. Thecommunication system 100 includes asender 110 and areceiver 130. Thesender 110 can include one or more software modules and one or more hardware modules. The examples of thesender 110 can be a wireless terminal or a wireline phone terminal. The first block diagram shows thesender 110 of thecommunication system 100, where noise suppression is carried out before the speech is encoded. Thesender 110 includes amicrophone input unit 111, an analog-to-digital converter (ADC) 113, anoise suppression unit 200, aspeech encoding unit 117, and modulation andtransmission unit 119. - The
microphone input unit 111 can receive speech from a speaker and generate analog signals. TheADC 113 converts the analog speech signals to corresponding digital signals. Thenoise suppression module 200 is configured to suppress noise in the speech signals before the speech signals are transmitted to thereceiver 130. More details of thenoise suppression module 200 are shown inFIG. 2 andFIG. 3 and described therein. The input tonoise suppression module 200 is in Pulse Code Modulation (PCM) format obtained by theADC 113. Typical sampling frequency, denoted as Fs, is 8 KHz, though 16 KHz or other frequencies are sometimes used. After the noise suppression operation at thenoise suppression module 200, the speech signals in digital format are encoded at thespeech encoding module 117. Then the encoded speech data are modulated and transmitted to the receiver by the modulation andtransmission module 119. - The
receiver 130 can include one or more software modules and one or more hardware modules. The examples of thesender 110 can be a wireless terminal or a wireline phone terminal. Thereceiver 130 can include a reception anddemodulation unit 139,speech decoding unit 137, anoise suppression module 200, a digital-to-analog converter (DAC) convert 133, and aspeaker output unit 131. Thenoise suppression module 200 on thereceiver 130 is identical to the one on thesender 110. In one embodiment, noise suppression is carried out after the signal is decoded by thedecoding unit 137, also operating in PCM format. The operations at thereceiver 130 are the mirror image of those at thesender 110. The reception anddemodulation unit 139 receive and demodulate the speech data and then thespeed decoding module 137 decodes the speech data into the PCM format. Thenoise suppression module 200 is configured to suppress the noise in the speech data. TheDAC 133 converts the speech data back to the analog format to be played back by thespeaker output unit 131. - With the assumption that there is only one microphone for recording the input signal, these two use cases are the same, regardless of any effects caused by the speech codec used. Hence, one embodiment of the present disclosure should work equally well in either scenario. Practically, it is preferred to carry out noise suppression at the
sender 110; because the receiver often has no information as to whether the received signal had its noise suppressed at thesender 110 and simply reapplying noise suppression may compromise the speech quality. Thus, following the well-established principles of Wiener filtering, a method according to one embodiment of the present disclosure works in the frequency domain to suppress the noise. To make the processing more closely related to human perception and to keep cost low in terms of memory and computation, processing is done in the psychoacoustically motivated bands, for example the Bark bands as shown in Table 1 below. -
TABLE 1 Bark Bands BARK BAND FREQUENCY RANGE BAND GROUP NUMBER (HZ) Low Range 1 0~100 2 100~200 3 200~300 Middle Range 4 300~400 5 400~510 6 510~630 7 630~770 8 770~920 9 920~1080 10 1080~1270 11 1270~1480 12 1480~1720 13 1720~2000 14 2000~2320 High Range 15 2320~2700 16 2700~3150 17 3150~3700 18 (3700~4000 for Fs = 8 KHz) 3700~4400 19 4400~5300 20 5300~6400 21 6400~7700 22 7700~8000 - As known, intelligibility of speech is derived largely from the pattern of voice formants distribution, and the relative positioning of the first two formants is normally sufficient to distinguish a human sound from others. Hence the frequency range covering the first two or three formants is identified as more important, also referred to as speech band. Accordingly the psychoacoustic bands are divided into three groups: Low Range (LR) for bands below the speech band, Middle Range (MR) for those in the speech band, and High Range (HR) for those above the speech band. An example of such a classification is shown in Table 1. Processing is discriminatively carried out for bands in different groups according to one embodiment of the present disclosure.
-
FIG. 2 shows a high-level functional block diagram of anoise suppression module 200, according to one embodiment of the present disclosure. Thenoise suppression module 200 can include one or more software modules and one or more hardware module. In one embodiment, thenoise suppression module 200 is implemented in thegeneric controller 1100 as illustrated inFIG. 11 . Thenoise suppression module 200 includes aninput windowing module 211, afrequency analysis module 213, aprocessing engine 300, afrequency synthesis module 217, and an output overlapping and addingmodule 219. Theprocess engine 300 includes a voiceactivity scoring module 313, a perceptual analysis andprocessing module 331, and anoise estimation module 315. In one embodiment, the method works in block-processing mode; that is, input stream is segmented into overlapping frames, each frame processed separately, and output obtained by overlap-and-adding the processed frames. - The
input Windowing module 211, in one embodiment, segments the input signal into overlapping frames. Overlapping ratio is typically chosen to be half; that is, the first half of the current frame is in fact the second half of the previous frame. A window is multiplied with the frame to ensure smooth transition from frame to frame, and to suppress high frequencies introduced by segmentation. - The
frequency analysis 213 then transforms the windowed frame to the frequency domain using a frequency analysis method. Fast Fourier Transform (FFT) is a common choice of frequency analysis method. For a sampling frequency of 8 KHz, a frame size of 256 samples is often a good trade-off between frequency resolution and time resolution. - The
processing engine 300 is configured to analyze and identify the noise in the input signal spectrum and then suppress the noise. Theprocessing engine 300 includes a voiceactivity score module 313, a perceptual analysis andprocessing module 331, and anoise estimation module 315. These component modules of theprocessing engine 300 for noise suppression are depicted in more details inFIG. 3 andFIG. 4 and described therein. - The
frequency synthesis module 217 and the output overlap-and-add module 219 are configured to the transform processed signal spectrum back to time-domain, after the noise suppression operations on the input signal spectrum. The frequency synthesis and overlap-and-add module 219 may use an inverse transformation method of frequency analysis to convert the processed signal spectrum in frequency domain back to the time domain. If FFT was used for frequency analysis, then Inverse FFT is applied. The processed time domain signal of the current frame is aligned with the corresponding part of the previously processed frame and they are summed to produce the output. The overlapping region of current frame with the next frame is saved for synthesis of next output frame. -
FIG. 3 shows a block diagram of aprocessing engine 300 for noise suppression according to one embodiment of the present disclosure.FIG. 3 shows more details of thesame processing engine 300 than the one shown inFIG. 2 . Theprocessing engine 300 can include one or more software modules and one or more hardware module. In one embodiment, theprocessing engine 300 is implemented in thegeneric controller 1100 for a wireless terminal, as illustrated inFIG. 11 . Theprocessing engine 300 includes a Bark bankpower computation module 311, a voiceactivity scoring module 313, a noisepower estimation module 315, and again computation module 317. Theprocessing engine 300 also includes again post-processing module 400, a signalspectrum adjustment module 321, and a modeswitching decision module 323. Theprocessing engine 300 also includes a signal powerarray updating module 314 and aninformation store 316. The information of a certain number of past frames may be stored in theinformation store 316 to facilitate modules such as Voice Activity Scoring (VAS)module 313, thenoise power estimation 315, thegain computation module 317 and the gain post-processing module 319. - The voice activity scoring (VAS)
module 313 is configured to compute a continuous score to rate the possibility of the presence of speech. In a Wiener filtering approach, noise power is estimated for adjusting the noisy signal spectrum. To facilitate efficient estimation of noise power in a quasi-/non-stationary speech signal, it is desired to take advantage of voice activity information. TheVAS module 313 is particularly useful in making the estimation of noise power fuzzy so as to eliminate the risk of wrong classification by a traditional voice activity detector (VAD) that outputs binary decisions. - The
VAS module 313 computes a score in a continuous range such that a low score indicates the input frame highly likely being a noise-only frame and a high score indicates the input frame highly likely being a frame dominated by speech. This scoring scheme is found advantageous over the binary decision scheme of a conventional Voice Activity Detector (VAD) due to the quasi- and non-stationary nature of speech signals. - The noise
power estimation module 315 follows the principle of temporal tracking. Making use of the observation that noise power normally changes slowly. According to one embodiment of the present disclosure, taking advantage of the score output by the VAS, thenoise estimation module 315 can respond quickly to non-stationarity in the input, in addition to being able to cope with signals that are neither noise-only nor speech-dominated with a very high likelihood. - Then the
gain computation module 317 may compute a gain for each frequency according to a heuristic, based on the estimated noise power. The heuristic may be expressed as follows. As the ratio of the noisy signal frequency component power to the estimated noise frequency component power grows, the possibility of that frequency component of the noisy signal being noise decreases, and when the ratio is large enough the frequency component can eventually be taken as containing speech only. - Then the
gain post-process module 400 performs a post-gain processing on the computed gain for each frequency, with the estimated noise power, and according to probabilistic heuristics. Thepost-gain processing module 400 makes sure the processed signal sound natural.FIG. 4 shows details of thepost-gain processing module 400. - Then the signal
spectrum adjustment module 321 adjusts the noisy signal spectrum by multiplying the final gains with the magnitudes of the noisy signal spectrum to attenuate noise. This in effect suppresses the noise to achieve improved quality and intelligibility of speech. Then the modeswitching decision module 323 checks mode switching criteria for each frame to decide a mode for next frame. To cope with changing environments, the noise suppression engine may operate in and automatically switch between two modes: NORMAL for adequate noise and NOISY for extremely high noise. - The following sections describe these operations of the
processing engine 300 for noise suppression in more detail. These operations are performed by the Bark bandpower computation module 311, theVAS module 313, the signal powerarray updating module 314, the noisepower estimation module 315, thegain computation module 317, thegain post-processing module 400, the signalspectrum adjustment module 321 and the modeswitching decision module 323. - The Bark band
power computer module 311 computes the signal bank power in psychoacoustic bands.Equation 1 below represents the power in the psychoacoustic bands, where Xi,k denotes the ith frequency sample of kth frame after frequency analysis, j is the band index, k is the frame index, Bj is the set of frequency indices of the jth band according to Table 1 above. -
- The voice
activity scoring module 313 assigns a score, denoted as FRAME_SCOREk, to the current frame k to indicate the possibility of existence of speech. It is continuous and non-negative, with a larger value indicating higher possibility of containing speech. FRAME_SCOREk is computed based on a combination of two metrics: Score_1 taking into account the shape of the signal's power spectrum, and Score_2 the total power. Specifically, Score_1 is a function of the number of MR bands of the current frame having greater power than corresponding MR bands of the previously estimated noise scaled by a factor. A pseudo code is shown below to illustrate how the signal power and noise power are compared to obtain the input to the function for computing Score_1. -
Xj,k b : Signal power of psychoacoustic band j of current frame k (see Equation 0) Dj,k−1 b : Estimated noise power of psychoacoustic band j of previous frame k−1 (see (Equation 4) τ : A constant scaling factor, preferably in the range of 1.5 to 4. cnt = 0; for each band j in the MR If Xj,k b >τ*Dj,k−1 b, cnt = cnt + 1; end end -
FIG. 5 shows a curve that results from a function into which the computed value cnt that is fed to finally obtain Score_1. InFIG. 5 , threshold_cnt controls the turning point above which the curve, hence Score_1, increases more quickly as cnt increases. - Score_2 is related to the ratio of total power of the current frame to that of the previous estimated noise.
-
- Where θ is a constant and takes a value in the range of 0.25 to 0.5. The final score is a weighted sum of these two:
-
FRAME_SCOREk =w 1*Score —1+w 2*Score—2 (Equation 2) - where w1 and w2 are weights assigned to these two scores, respectively, and w1+w2=1. Typically, w1=0.5 and w2=0.5 are adequate. With the above derivations for FRMAE_SCORE, its range can be divided into, a few sections, each section corresponding to certain characteristics.
FIG. 6 shows a few sections and their corresponding characteristics. Both the function curve of Score_1 (as shown inFIG. 5 ) and the constant θ for Score_2 depend on which mode it is operating in, to better cope with different characteristics of different environments. Generally, it tends to assign a higher score when operating in NOISY mode than in NORMAL mode, as speech characteristics are more difficult to identify with high level noise. - The noise
power estimation module 315 estimates the noise power in psychoacoustic bands that are more closely related to human perception than individual frequencies. The estimation works in one of two modes that are adapted to different signal characteristics: one mode for noise-like signal, and the other for speech-like signal. - A frame is classified as noise-like if FRAME_SCOREk<=NOISE_SPEECH_TH, and as speech-like otherwise. The threshold NOISE_SPEECH_TH can be tuned with test signals.
- For a speech-like frame, the estimation is based on the principle of temporal tracking; that is, noise power in each band changes slowly in time and is closely related to the recent frames having small power. Specifically, for each band, the signal power of N recent frames is sorted in ascending order, and a portion of the array from the beginning is averaged as the estimated noise power in this band of the current frame. The total number of recent frames, N, for which the signal power is stored, may correspond to a time interval of about 200 to 400 milliseconds. Mathematically, estimated noise power for band j is
-
- where Fj is the set of recent frame indices selected for band j, and Mj is the total number of elements in Fj. In general, Mj is different for different bands and Mj<N. For simplicity, Mj can be dependent on band group. The final estimated noise power for band j of the current frame k, denoted as Dj,k b, is smoothed with that of the previous frame k−1, denoted as Dj,k-1 b, by
-
D j,k b =αD j,k-1 b+(1−α)*W j,k b (Equation 4) - where α is an adaptive smoothing factor to eliminate abrupt change, and is derived from a predefined constant NOISE_SMOOTH_FACTOR, which is greater than 0.5, and the normalized deviation of total power of current frame from the mean total power of a few recent frames. Specifically,
-
- and G is the set of frame indices for P most recent frames.
- For a noise-like frame, it is desirable to take advantage of the high proportion of noise in the noisy signal for estimating noise, so as to quickly respond to change in the signal, for example, the disappearance of voice. Hence, the signal power is taken as the estimated noise power:
-
Wj,k b=Xj,k b (Equation 6) - In addition, to avoid dramatic difference in estimated noise power due to the binary noise-like/speech-like decision when FRAME_SCOREk is close to NOISE_SPEECH_TH, the smoothing factor α gradually changes from the 1-NOISE_SMOOTH_FACTOR to NOISE_SMOOTH_FACTOR as FRAME_SCOREk increases from a lower score threshold NOISE_TH_L to a higher score threshold NOISE_TH_H, as depicted in
FIG. 7 . The final noise power is updated following (Equation 4. It can be seen that when FRAME_SCOREk is close to NOISE_SPEECH_TH, either slightly above or below it, the weight given to is close to NOISE_SMOOTH_FACTOR, resulting in a similar estimated noise power regardless of the binary noise-like/speech-like decision. - Due to the principle of temporal tracking for estimating noise power, when storing the noisy signal power, the previous noise power is substituted for the actual noisy signal power, scaled with a factor for correction, if FRAME_SCOREk>SPEECH_TH, because a speech-dominated frame does not give good estimation of noise power.
- The
gain computation module 317 computes a gain for each frequency component I according to a probabilistically driven heuristics. - For computing the gains of psychoacoustic band j, a threshold THRESj is first computed based on the estimated noise power Dj,k b:
-
THRESj=SCALE_FACTORk*βj *D j,k b /C j (Equation 7) - Where Cj is the total number of frequency components in band j, βj is a frequency-dependent constant, and SCALE_FACTORk is a variable dependent on the current frame's FRAME_SCOREk and the previous frame's FRAME_SCOREk-1. If either the current frame or the previous frame is speech-dominated, i.e., FRAME_SCOREk>SPEECH_TH or FRAME_SCOREk-1>SPEECH_TH, then SCALE_FACTORk=1; otherwise SCALE_FACTORk is proportional to the ratio of the total power of the current frame to that of the previous frame's estimated noise, i.e.,
-
- An example curve to compute SCALE_FACTORk with r is illustrated in
FIG. 8 . - For a frequency component i with power equal or larger than the threshold, i.e., |Xi,k|2≧THRESj, it is considered as having very strong speech content so that noise is masked by speech according to psychoacoustic principles, and a unity gain is assigned, i.e. Gi,k=1.
- For a frequency component i with power less than the threshold |Xi,k|2<THRESj, the gain Gi,k is computed according to a probabilistically driven curve that can be either linear or non-linear.
FIG. 9 shows some example curves that can be used. For non-linear curves, a turnover point is identified, below which the gain is attenuated and above which it is amplified. Different degrees of attenuation/attenuation amplification correspond to different probabilistic heuristics in the treatment of noise. Further improvement can be achieved by assigning the same gain to frequencies in one psychoacoustic band if they are in the LR or when current frame is found to be noise-only. This also simplifies computation. In summary, Gi,k is computed as -
- where iεBj Bj is the set of frequency indices of the jth band according to Table 1, Cj is the total number of frequency components in band j, and f( ) is a function designed according to probabilistic heuristics as mentioned above.
-
FIG. 4 shows the component modules of thegain post-processing module 400. Thegain post-processing module 400 further processes the computed gains to ensure the quality of processed signal and may include a gaintime smoothing module 411, a gainfrequency smoothing module 413, and, and again regulation 415. - The gain
time smoothing module 411 can smooth the gains in the time domain. As known, a filter that changes too fast in the time domain results in unnaturalness in the processed signal and in some cases may introduce musical noise. Hence, the gains are carefully smoothed in the time axis. The gaintime smoothing module 411 takes into account the signal temporal characteristics by detecting if the current frame is a release; if so, the time smoothing factor is adjusted according to Gi,k-1, based on the heuristic that the higher Gi,k-1 is the more likely frequency i corresponding to a decaying voice and hence is given a higher value to better preserve voice. If not a release, is assigned with the lowest value. - The time smoothing formula is expressed as shown by Equation 9 below.
-
G′ i,k=γi *G i,k-1+(1−γi)*G i,k (Equation 9) - where γi is a frequency-dependent time smoothing factor, preferably in the range of 0.3 to 0.7.
- The gain smoothing over
frequency smoothing module 413 can mitigate artifacts introduced into the computed gains. The computed gains are all positive real numbers, and they correspond to a zero-phase filter which is symmetric in the time domain. If the filter impulse response has significant energy near its beginning (and tail by symmetry), when convolving with the windowed input signal, some artifacts may be introduced into the output. This can be mitigated by multiplying the filter impulse response with a smoothing window. In the frequency domain, this can be accomplished by filtering gains {G′i,k} with a linear-phase low-pass filter. A finite impulse response (FIR) filter of order as low as four is normally adequate. - The
gain regulation module 415 can maintain the gains within a range between a minimum value and a maximum value to avoid loss of information. Since the bands in MR are considered the most important for perception, they should not be suppressed more than bands in LR and HR. Let GAIN_MAX be the maximum gain in MR, i.e., GAIN_MAX=MAX (G′i,k) where the frequency i is in MR. Then gains in LR and HR should not exceed GAIN_MAX. - To avoid completely losing information, gains are maintained above a threshold Gmin, (i.e., G′i,k Gmin. The threshold Gmin determines the maximum suppression of noise and it also serves as an injection of comfort noise. Furthermore, no gain should exceed unity, G′i,k 1, the gain. The
gain regulation curve 1000 is depicted inFIG. 10 according to one embodiment of the present disclosure. - The noisy signal
spectrum adjustment module 321 can adjust the noisy signal spectrum by multiplying the post-processed gain G′i,k with respective frequency component Xi,k to produce a filtered spectrum {Yi,k} as shown by Equation 10 below. -
Y i,k =G′ i,k *X i,k (Equation 10) - The mode
switching decision module 323 is configured to determine a mode of operation based on the empirical observation and then switch into the mode. In an environment with adequate noise, a significant portion of non-noise frames (if FRAME_SCOREk>NOISE_TH_H, seeFIG. 6 ) are in fact speech-dominated frames (if FRAME_SCOREk>SPEECH_TH). Hence, the mode is switched from NORMAL to NOISY when this portion falls below a threshold. On the other hand, when this portion is too large, mode is switched from NOISY to NORMAL. The exact proportion can be tuned with the actual test signals that comprise streams of normal noise and streams of high noise. - Accordingly, one embodiment of the present disclosure provides a system and method for adaptively suppressing noise in a speech signal with little memory and computation. The method and system can adaptively suppress additive noise in a speech signal for improved quality and intelligibility. Input signal is segmented into overlapping frames and each frame is processed in the frequency domain. Voice activity of an input frame is rated with a score in a continuous range to adapt other processing modules. Noise power is estimated in psychoacoustically motivated bands, making the processing closely related to human perception. With the voice activity score and estimated noise power, a gain for each frequency is computed according to probabilistic heuristics, smoothed in the time axis and frequency axis, and regulated before adjusting the noisy signal spectrum, to ensure the naturalness of the processed speech. To cope with changing environments, the method can operate in and automatically switch between two modes: one for adequate noise and the other for extremely high noise. This method is very efficient in terms of memory and computation as some processing is done in a psychoacoustic scale which has only about 20 bands.
-
FIG. 11 depicts a block diagram of ageneric controller 1100 for a wireless terminal. In thegeneric controller 1100, an embodiment of theprocessing engine 300 can be implemented. Thegeneric controller 1100 depicted includes aprocessor 1102 connected to a level two cache/bridge 1104, which is connected in turn to alocal system bus 1106.Local system bus 1106 may be, for example, a peripheral component interconnect (PCI) architecture bus. Also connected to local system bus in the depicted example are amain memory 1108 and agraphics adapter 1110. Thegraphics adapter 1110 may be connected todisplay 1111. - Other peripherals, such as local area network (LAN)/Wide Area Network/Wireless (e.g. WiFi)
adapter 1112, may also be connected tolocal system bus 1106.Expansion bus interface 1114 connectslocal system bus 1106 to input/output (I/O)bus 1116. I/O bus 1116 is connected to keyboard/mouse adapter 1118,disk controller 1120, and I/O adapter 1122.Disk controller 1120 can be connected to astorage 1126, which can be any suitable machine usable or machine readable storage medium, including but not limited to nonvolatile, hard-coded type mediums such as read only memories (ROMs) or erasable, electrically programmable-read only memories (EEPROMs), magnetic tape storage, and user-recordable type mediums such as floppy disks, hard disk drives and compact disk read only memories (CD-ROMs) or digital versatile disks (DVDs), and other known optical, electrical, or magnetic storage devices. - Also connected to I/
O bus 1116 in the example shown isaudio adapter 1124, to which speakers (not shown) may be connected for playing sounds. Keyboard/mouse adapter 1118 provides a connection for a pointing device (not shown), such as a mouse, a trackball, and a trackpointer, etc. - Those of ordinary skill in the art will appreciate that the hardware depicted in
FIG. 11 may vary for particular embodiments. For example, other peripheral devices, such as an optical disk drive and the like, also may be used in addition or in place of the hardware depicted. The depicted example is provided for the purpose of explanation only and is not meant to imply architectural limitations with respect to the present disclosure. - The
generic controller 1100 in accordance with an embodiment of the present disclosure includes an operating system employing a graphical user interface. The operating system permits multiple display windows to be presented in the graphical user interface simultaneously, with each display window providing an interface to a different application or to a different instance of the same application. A cursor in the graphical user interface may be manipulated by a user through the pointing device. The position of the cursor may be changed and/or an event, such as clicking a mouse button, generated to actuate a desired response. - One of various commercial operating systems, such as a version of Microsoft Windows™, a product of Microsoft Corporation located in Redmond, Wash. may be employed if suitably modified. The operating system is modified or created in accordance with the present disclosure as described.
- LAN/WAN/
Wireless adapter 1112 can be connected to a network 1130 (not a part of generic controller 1100), which can be any public or private data processing system network or combination of networks, as known to those of skill in the art, including the Internet. Thegeneric controller 1100 can communicate overnetwork 1130 withserver system 1140, which is also not part ofgeneric controller 1100, but can be implemented, for example, as a separategeneric controller 1100. - It may be advantageous to set forth definitions of certain words and phrases used in this patent document. The term “couple” and its derivatives refer to any direct or indirect communication between two or more elements, whether or not those elements are in physical contact with one another. The terms “include” and “comprise,” as well as derivatives thereof, mean inclusion without limitation. The term “or” is inclusive, meaning and/or. The phrases “associated with” and “associated therewith,” as well as derivatives thereof, may mean to include, be included within, interconnect with, contain, be contained within, connect to or with, couple to or with, be communicable with, cooperate with, interleave, juxtapose, be proximate to, be bound to or with, have, have a property of, or the like.
- While this disclosure has described certain embodiments and generally associated methods, alterations and permutations of these embodiments and methods will be apparent to those skilled in the art. Accordingly, the above description of example embodiments does not define or constrain this disclosure. Other changes, substitutions, and alterations are also possible without departing from the spirit and scope of this disclosure, as defined by the following claims.
Claims (20)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/009,601 US8275611B2 (en) | 2007-01-18 | 2008-01-18 | Adaptive noise suppression for digital speech signals |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US88102807P | 2007-01-18 | 2007-01-18 | |
US12/009,601 US8275611B2 (en) | 2007-01-18 | 2008-01-18 | Adaptive noise suppression for digital speech signals |
Publications (2)
Publication Number | Publication Date |
---|---|
US20080189104A1 true US20080189104A1 (en) | 2008-08-07 |
US8275611B2 US8275611B2 (en) | 2012-09-25 |
Family
ID=39676917
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/009,601 Active 2031-07-10 US8275611B2 (en) | 2007-01-18 | 2008-01-18 | Adaptive noise suppression for digital speech signals |
Country Status (1)
Country | Link |
---|---|
US (1) | US8275611B2 (en) |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080219472A1 (en) * | 2007-03-07 | 2008-09-11 | Harprit Singh Chhatwal | Noise suppressor |
US20090063143A1 (en) * | 2007-08-31 | 2009-03-05 | Gerhard Uwe Schmidt | System for speech signal enhancement in a noisy environment through corrective adjustment of spectral noise power density estimations |
US20090222264A1 (en) * | 2008-02-29 | 2009-09-03 | Broadcom Corporation | Sub-band codec with native voice activity detection |
US20100092000A1 (en) * | 2008-10-10 | 2010-04-15 | Kim Kyu-Hong | Apparatus and method for noise estimation, and noise reduction apparatus employing the same |
US7746958B2 (en) * | 2004-06-15 | 2010-06-29 | Infineon Technologies Ag | Receiver for a wire-free communication system |
US20110029310A1 (en) * | 2008-03-31 | 2011-02-03 | Transono Inc. | Procedure for processing noisy speech signals, and apparatus and computer program therefor |
US20110029305A1 (en) * | 2008-03-31 | 2011-02-03 | Transono Inc | Method for processing noisy speech signal, apparatus for same and computer-readable recording medium |
US20110144988A1 (en) * | 2009-12-11 | 2011-06-16 | Jongsuk Choi | Embedded auditory system and method for processing voice signal |
EP2383731A1 (en) * | 2008-12-31 | 2011-11-02 | Huawei Technologies Co., Ltd. | Signal processing method and apparatus |
KR101088627B1 (en) | 2008-10-24 | 2011-11-30 | 야마하 가부시키가이샤 | Noise suppression device and noise suppression method |
KR101088558B1 (en) | 2008-10-24 | 2011-12-05 | 야마하 가부시키가이샤 | Noise suppression device and noise suppression method |
US20120260736A1 (en) * | 2011-04-12 | 2012-10-18 | Shenzhen Mindray Bio-Medical Electronics Co., Ltd. | Methods, modules, and systems for gain control in b-mode ultrasonic imaging |
US20140180682A1 (en) * | 2012-12-21 | 2014-06-26 | Sony Corporation | Noise detection device, noise detection method, and program |
US20140270249A1 (en) * | 2013-03-12 | 2014-09-18 | Motorola Mobility Llc | Method and Apparatus for Estimating Variability of Background Noise for Noise Suppression |
US20150356978A1 (en) * | 2012-09-21 | 2015-12-10 | Dolby International Ab | Audio coding with gain profile extraction and transmission for speech enhancement at the decoder |
US20160104488A1 (en) * | 2013-06-21 | 2016-04-14 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method for improved signal fade out for switched audio coding systems during error concealment |
CN105679325A (en) * | 2010-11-09 | 2016-06-15 | 索尼公司 | Decoding apparatus, decoding method, and audio processing device |
US20160191007A1 (en) * | 2014-12-31 | 2016-06-30 | Stmicroelectronics Asia Pacific Pte Ltd | Adaptive loudness levelling method for digital audio signals in frequency domain |
US9484043B1 (en) * | 2014-03-05 | 2016-11-01 | QoSound, Inc. | Noise suppressor |
US9584087B2 (en) | 2012-03-23 | 2017-02-28 | Dolby Laboratories Licensing Corporation | Post-processing gains for signal enhancement |
US9659578B2 (en) | 2014-11-27 | 2017-05-23 | Tata Consultancy Services Ltd. | Computer implemented system and method for identifying significant speech frames within speech signals |
US10163313B2 (en) * | 2016-03-14 | 2018-12-25 | Tata Consultancy Services Limited | System and method for sound based surveillance |
US10186276B2 (en) * | 2015-09-25 | 2019-01-22 | Qualcomm Incorporated | Adaptive noise suppression for super wideband music |
US10319394B2 (en) * | 2013-01-08 | 2019-06-11 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method for improving speech intelligibility in background noise by amplification and compression |
US11735175B2 (en) | 2013-03-12 | 2023-08-22 | Google Llc | Apparatus and method for power efficient signal conditioning for a voice recognition system |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5229234B2 (en) * | 2007-12-18 | 2013-07-03 | 富士通株式会社 | Non-speech segment detection method and non-speech segment detection apparatus |
MY178597A (en) * | 2008-07-11 | 2020-10-16 | Fraunhofer Ges Forschung | Audio encoder, audio decoder, methods for encoding and decoding an audio signal, and a computer program |
GB2547459B (en) * | 2016-02-19 | 2019-01-09 | Imagination Tech Ltd | Dynamic gain controller |
CN108962275B (en) * | 2018-08-01 | 2021-06-15 | 电信科学技术研究院有限公司 | Music noise suppression method and device |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5757937A (en) * | 1996-01-31 | 1998-05-26 | Nippon Telegraph And Telephone Corporation | Acoustic noise suppressor |
US6088668A (en) * | 1998-06-22 | 2000-07-11 | D.S.P.C. Technologies Ltd. | Noise suppressor having weighted gain smoothing |
US20020012429A1 (en) * | 2000-06-24 | 2002-01-31 | Alcatel | Interference-signal-dependent adaptive echo suppression |
US6415253B1 (en) * | 1998-02-20 | 2002-07-02 | Meta-C Corporation | Method and apparatus for enhancing noise-corrupted speech |
US6487535B1 (en) * | 1995-12-01 | 2002-11-26 | Digital Theater Systems, Inc. | Multi-channel audio encoder |
US20030055627A1 (en) * | 2001-05-11 | 2003-03-20 | Balan Radu Victor | Multi-channel speech enhancement system and method based on psychoacoustic masking effects |
US20040101038A1 (en) * | 2002-11-26 | 2004-05-27 | Walter Etter | Systems and methods for far-end noise reduction and near-end noise compensation in a mixed time-frequency domain compander to improve signal quality in communications systems |
-
2008
- 2008-01-18 US US12/009,601 patent/US8275611B2/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6487535B1 (en) * | 1995-12-01 | 2002-11-26 | Digital Theater Systems, Inc. | Multi-channel audio encoder |
US5757937A (en) * | 1996-01-31 | 1998-05-26 | Nippon Telegraph And Telephone Corporation | Acoustic noise suppressor |
US6415253B1 (en) * | 1998-02-20 | 2002-07-02 | Meta-C Corporation | Method and apparatus for enhancing noise-corrupted speech |
US6088668A (en) * | 1998-06-22 | 2000-07-11 | D.S.P.C. Technologies Ltd. | Noise suppressor having weighted gain smoothing |
US6317709B1 (en) * | 1998-06-22 | 2001-11-13 | D.S.P.C. Technologies Ltd. | Noise suppressor having weighted gain smoothing |
US20020012429A1 (en) * | 2000-06-24 | 2002-01-31 | Alcatel | Interference-signal-dependent adaptive echo suppression |
US20030055627A1 (en) * | 2001-05-11 | 2003-03-20 | Balan Radu Victor | Multi-channel speech enhancement system and method based on psychoacoustic masking effects |
US20040101038A1 (en) * | 2002-11-26 | 2004-05-27 | Walter Etter | Systems and methods for far-end noise reduction and near-end noise compensation in a mixed time-frequency domain compander to improve signal quality in communications systems |
Cited By (59)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7746958B2 (en) * | 2004-06-15 | 2010-06-29 | Infineon Technologies Ag | Receiver for a wire-free communication system |
US20080219472A1 (en) * | 2007-03-07 | 2008-09-11 | Harprit Singh Chhatwal | Noise suppressor |
US7912567B2 (en) * | 2007-03-07 | 2011-03-22 | Audiocodes Ltd. | Noise suppressor |
US8364479B2 (en) * | 2007-08-31 | 2013-01-29 | Nuance Communications, Inc. | System for speech signal enhancement in a noisy environment through corrective adjustment of spectral noise power density estimations |
US20090063143A1 (en) * | 2007-08-31 | 2009-03-05 | Gerhard Uwe Schmidt | System for speech signal enhancement in a noisy environment through corrective adjustment of spectral noise power density estimations |
US8190440B2 (en) * | 2008-02-29 | 2012-05-29 | Broadcom Corporation | Sub-band codec with native voice activity detection |
US20090222264A1 (en) * | 2008-02-29 | 2009-09-03 | Broadcom Corporation | Sub-band codec with native voice activity detection |
US20110029310A1 (en) * | 2008-03-31 | 2011-02-03 | Transono Inc. | Procedure for processing noisy speech signals, and apparatus and computer program therefor |
US20110029305A1 (en) * | 2008-03-31 | 2011-02-03 | Transono Inc | Method for processing noisy speech signal, apparatus for same and computer-readable recording medium |
US8744846B2 (en) * | 2008-03-31 | 2014-06-03 | Transono Inc. | Procedure for processing noisy speech signals, and apparatus and computer program therefor |
US8744845B2 (en) * | 2008-03-31 | 2014-06-03 | Transono Inc. | Method for processing noisy speech signal, apparatus for same and computer-readable recording medium |
US9159335B2 (en) | 2008-10-10 | 2015-10-13 | Samsung Electronics Co., Ltd. | Apparatus and method for noise estimation, and noise reduction apparatus employing the same |
CN102779524A (en) * | 2008-10-10 | 2012-11-14 | 三星电子株式会社 | Apparatus and method for noise estimation, and noise reduction apparatus employing the same |
US20100092000A1 (en) * | 2008-10-10 | 2010-04-15 | Kim Kyu-Hong | Apparatus and method for noise estimation, and noise reduction apparatus employing the same |
KR101088558B1 (en) | 2008-10-24 | 2011-12-05 | 야마하 가부시키가이샤 | Noise suppression device and noise suppression method |
KR101088627B1 (en) | 2008-10-24 | 2011-11-30 | 야마하 가부시키가이샤 | Noise suppression device and noise suppression method |
EP2383731A1 (en) * | 2008-12-31 | 2011-11-02 | Huawei Technologies Co., Ltd. | Signal processing method and apparatus |
EP2383731B1 (en) * | 2008-12-31 | 2014-08-13 | Huawei Technologies Co., Ltd. | Audio signal processing method and apparatus |
US8468025B2 (en) | 2008-12-31 | 2013-06-18 | Huawei Technologies Co., Ltd. | Method and apparatus for processing signal |
US20110144988A1 (en) * | 2009-12-11 | 2011-06-16 | Jongsuk Choi | Embedded auditory system and method for processing voice signal |
CN105679325A (en) * | 2010-11-09 | 2016-06-15 | 索尼公司 | Decoding apparatus, decoding method, and audio processing device |
US20120260736A1 (en) * | 2011-04-12 | 2012-10-18 | Shenzhen Mindray Bio-Medical Electronics Co., Ltd. | Methods, modules, and systems for gain control in b-mode ultrasonic imaging |
US8795179B2 (en) * | 2011-04-12 | 2014-08-05 | Shenzhen Mindray Bio-Medical Electronics Co., Ltd. | Methods, modules, and systems for gain control in B-mode ultrasonic imaging |
US12112768B2 (en) | 2012-03-23 | 2024-10-08 | Dolby Laboratories Licensing Corporation | Post-processing gains for signal enhancement |
US10902865B2 (en) | 2012-03-23 | 2021-01-26 | Dolby Laboratories Licensing Corporation | Post-processing gains for signal enhancement |
US10311891B2 (en) | 2012-03-23 | 2019-06-04 | Dolby Laboratories Licensing Corporation | Post-processing gains for signal enhancement |
US11694711B2 (en) | 2012-03-23 | 2023-07-04 | Dolby Laboratories Licensing Corporation | Post-processing gains for signal enhancement |
US9584087B2 (en) | 2012-03-23 | 2017-02-28 | Dolby Laboratories Licensing Corporation | Post-processing gains for signal enhancement |
US11308976B2 (en) | 2012-03-23 | 2022-04-19 | Dolby Laboratories Licensing Corporation | Post-processing gains for signal enhancement |
US20150356978A1 (en) * | 2012-09-21 | 2015-12-10 | Dolby International Ab | Audio coding with gain profile extraction and transmission for speech enhancement at the decoder |
US9495970B2 (en) * | 2012-09-21 | 2016-11-15 | Dolby Laboratories Licensing Corporation | Audio coding with gain profile extraction and transmission for speech enhancement at the decoder |
US20140180682A1 (en) * | 2012-12-21 | 2014-06-26 | Sony Corporation | Noise detection device, noise detection method, and program |
US10319394B2 (en) * | 2013-01-08 | 2019-06-11 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method for improving speech intelligibility in background noise by amplification and compression |
US20140270249A1 (en) * | 2013-03-12 | 2014-09-18 | Motorola Mobility Llc | Method and Apparatus for Estimating Variability of Background Noise for Noise Suppression |
US10896685B2 (en) | 2013-03-12 | 2021-01-19 | Google Technology Holdings LLC | Method and apparatus for estimating variability of background noise for noise suppression |
US11557308B2 (en) | 2013-03-12 | 2023-01-17 | Google Llc | Method and apparatus for estimating variability of background noise for noise suppression |
US11735175B2 (en) | 2013-03-12 | 2023-08-22 | Google Llc | Apparatus and method for power efficient signal conditioning for a voice recognition system |
US10607614B2 (en) | 2013-06-21 | 2020-03-31 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method realizing a fading of an MDCT spectrum to white noise prior to FDNS application |
US9916833B2 (en) * | 2013-06-21 | 2018-03-13 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method for improved signal fade out for switched audio coding systems during error concealment |
US12125491B2 (en) | 2013-06-21 | 2024-10-22 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method realizing improved concepts for TCX LTP |
US9997163B2 (en) | 2013-06-21 | 2018-06-12 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method realizing improved concepts for TCX LTP |
US9978378B2 (en) | 2013-06-21 | 2018-05-22 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method for improved signal fade out in different domains during error concealment |
US9978377B2 (en) | 2013-06-21 | 2018-05-22 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method for generating an adaptive spectral shape of comfort noise |
US10672404B2 (en) | 2013-06-21 | 2020-06-02 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method for generating an adaptive spectral shape of comfort noise |
US10679632B2 (en) | 2013-06-21 | 2020-06-09 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method for improved signal fade out for switched audio coding systems during error concealment |
US10854208B2 (en) | 2013-06-21 | 2020-12-01 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method realizing improved concepts for TCX LTP |
US10867613B2 (en) | 2013-06-21 | 2020-12-15 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method for improved signal fade out in different domains during error concealment |
US9978376B2 (en) | 2013-06-21 | 2018-05-22 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method realizing a fading of an MDCT spectrum to white noise prior to FDNS application |
US20160104488A1 (en) * | 2013-06-21 | 2016-04-14 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method for improved signal fade out for switched audio coding systems during error concealment |
US11869514B2 (en) | 2013-06-21 | 2024-01-09 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method for improved signal fade out for switched audio coding systems during error concealment |
US11462221B2 (en) | 2013-06-21 | 2022-10-04 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method for generating an adaptive spectral shape of comfort noise |
US11501783B2 (en) | 2013-06-21 | 2022-11-15 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method realizing a fading of an MDCT spectrum to white noise prior to FDNS application |
US11776551B2 (en) | 2013-06-21 | 2023-10-03 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method for improved signal fade out in different domains during error concealment |
US9484043B1 (en) * | 2014-03-05 | 2016-11-01 | QoSound, Inc. | Noise suppressor |
US9659578B2 (en) | 2014-11-27 | 2017-05-23 | Tata Consultancy Services Ltd. | Computer implemented system and method for identifying significant speech frames within speech signals |
US20160191007A1 (en) * | 2014-12-31 | 2016-06-30 | Stmicroelectronics Asia Pacific Pte Ltd | Adaptive loudness levelling method for digital audio signals in frequency domain |
US9647624B2 (en) * | 2014-12-31 | 2017-05-09 | Stmicroelectronics Asia Pacific Pte Ltd. | Adaptive loudness levelling method for digital audio signals in frequency domain |
US10186276B2 (en) * | 2015-09-25 | 2019-01-22 | Qualcomm Incorporated | Adaptive noise suppression for super wideband music |
US10163313B2 (en) * | 2016-03-14 | 2018-12-25 | Tata Consultancy Services Limited | System and method for sound based surveillance |
Also Published As
Publication number | Publication date |
---|---|
US8275611B2 (en) | 2012-09-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8275611B2 (en) | Adaptive noise suppression for digital speech signals | |
RU2329550C2 (en) | Method and device for enhancement of voice signal in presence of background noise | |
US6523003B1 (en) | Spectrally interdependent gain adjustment techniques | |
US6529868B1 (en) | Communication system noise cancellation power signal calculation techniques | |
US7313518B2 (en) | Noise reduction method and device using two pass filtering | |
US6766292B1 (en) | Relative noise ratio weighting techniques for adaptive noise cancellation | |
CN102356427B (en) | Noise suppression device | |
EP1745468B1 (en) | Noise reduction for automatic speech recognition | |
US9142221B2 (en) | Noise reduction | |
US8930184B2 (en) | Signal bandwidth extending apparatus | |
US8352257B2 (en) | Spectro-temporal varying approach for speech enhancement | |
US7912567B2 (en) | Noise suppressor | |
JPH07306695A (en) | Method of reducing noise in sound signal, and method of detecting noise section | |
US6671667B1 (en) | Speech presence measurement detection techniques | |
WO2012158156A1 (en) | Noise supression method and apparatus using multiple feature modeling for speech/noise likelihood | |
Kato et al. | Noise suppression with high speech quality based on weighted noise estimation and MMSE STSA | |
US20080312916A1 (en) | Receiver Intelligibility Enhancement System | |
Azirani et al. | Speech enhancement using a Wiener filtering under signal presence uncertainty | |
Sunnydayal et al. | A survey on statistical based single channel speech enhancement techniques | |
Surendran et al. | Variance normalized perceptual subspace speech enhancement | |
Cheong et al. | Postfilter for Dual Channel Speech Enhancement Using Coherence and Statistical Model-Based Noise Estimation | |
Koval et al. | Broadband noise cancellation systems: new approach to working performance optimization | |
Chouki et al. | Comparative Study on Noisy Speech Preprocessing Algorithms | |
Krishnamoorthy et al. | Processing noisy speech for enhancement | |
Tsukamoto et al. | Speech enhancement based on MAP estimation with a variable speech distribution |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: STMICROELECTRONICS ASIA PACIFIC PTE., LTD., SINGAP Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ZONG, WENBO;WU, YUAN;GEORGE, SAPNA;REEL/FRAME:020814/0053;SIGNING DATES FROM 20080116 TO 20080123 Owner name: STMICROELECTRONICS ASIA PACIFIC PTE., LTD., SINGAP Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ZONG, WENBO;WU, YUAN;GEORGE, SAPNA;SIGNING DATES FROM 20080116 TO 20080123;REEL/FRAME:020814/0053 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 12 |
|
AS | Assignment |
Owner name: STMICROELECTRONICS INTERNATIONAL N.V., SWITZERLAND Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:STMICROELECTRONICS ASIA PACIFIC PTE LTD;REEL/FRAME:068434/0215 Effective date: 20240628 |