Stouten et al., 2004 - Google Patents
Joint removal of additive and convolutional noise with model-based feature enhancementStouten et al., 2004
- Document ID
- 15036839118449181416
- Author
- Stouten V
- Van Hamme H
- Wambacq P
- Publication year
- Publication venue
- 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing
External Links
Snippet
In this paper we describe how we successfully extended the model-based feature enhancement (MBFE) algorithm to jointly remove additive and convolutional noise from corrupted speech. Although a model of the clean speech can incorporate prior knowledge …
- 230000000996 additive 0 title abstract description 16
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/144—Training of HMMs
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/187—Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G10L2015/0635—Training updating or merging of old and new templates; Mean values; Weighting
- G10L2015/0636—Threshold criteria for the updating
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Gannot et al. | Iterative and sequential Kalman filter-based speech enhancement algorithms | |
Cui et al. | Noise robust speech recognition using feature compensation based on polynomial regression of utterance SNR | |
Han et al. | Deep neural network based spectral feature mapping for robust speech recognition. | |
Prasad et al. | Improved cepstral mean and variance normalization using Bayesian framework | |
Gales | Predictive model-based compensation schemes for robust speech recognition | |
Stouten et al. | Model-based feature enhancement with uncertainty decoding for noise robust ASR | |
Chowdhury et al. | Bayesian on-line spectral change point detection: a soft computing approach for on-line ASR | |
JP2003303000A (en) | Method and apparatus for feature domain joint channel and additive noise compensation | |
KR101065188B1 (en) | Apparatus and method for speaker adaptation by evolutional learning, and speech recognition system using thereof | |
Wolfel et al. | Minimum variance distortionless response spectral estimation | |
Cui et al. | A study of variable-parameter Gaussian mixture hidden Markov modeling for noisy speech recognition | |
Fujimoto et al. | Noise suppression with unsupervised joint speaker adaptation and noise mixture model estimation | |
KR100897555B1 (en) | Apparatus and method of extracting speech feature vectors and speech recognition system and method employing the same | |
Attias et al. | A new method for speech denoising and robust speech recognition using probabilistic models for clean speech and for noise. | |
Kalamani et al. | Continuous Tamil Speech Recognition technique under non stationary noisy environments | |
Stouten et al. | Joint removal of additive and convolutional noise with model-based feature enhancement | |
Yapanel et al. | Perceptual MVDR-based cepstral coefficients (PMCCs) for robust speech recognition | |
Windmann et al. | Approaches to iterative speech feature enhancement and recognition | |
JP2001067094A (en) | Voice recognizing device and its method | |
Zhao et al. | Recursive estimation of time-varying environments for robust speech recognition | |
Buera et al. | Speech enhancement using a pitch predictive model | |
Fujimoto et al. | Non-stationary noise estimation method based on bias-residual component decomposition for robust speech recognition | |
Fujimoto et al. | A Robust Estimation Method of Noise Mixture Model for Noise Suppression. | |
Nayak et al. | Instantaneous frequency filter-bank features for low resource speech recognition using deep recurrent architectures | |
Ghaemmaghami et al. | Noise reduction algorithm for robust speech recognition using MLP neural network |