Himawan et al., 2015 - Google Patents
Channel selection in the short-time modulation domain for distant speech recognitionHimawan et al., 2015
View PDF- Document ID
- 4713330665088679164
- Author
- Himawan I
- Motlicek P
- Sridharan S
- Dean D
- Tjondronegoro D
- Publication year
- Publication venue
- Proceedings of the 16th Annual Conference of the International Speech Communication Association, Interspeech 2015
External Links
Snippet
Automatic speech recognition from multiple distant micro-phones poses significant challenges because of noise and reverberations. The quality of speech acquisition may vary between microphones because of movements of speakers and channel distortions. This …
- 230000000051 modifying 0 title abstract description 40
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding, i.e. using interchannel correlation to reduce redundancies, e.g. joint-stereo, intensity-coding, matrixing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis using predictive techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/005—Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Kinoshita et al. | A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research | |
Cauchi et al. | Combination of MVDR beamforming and single-channel spectral processing for enhancing noisy and reverberant speech | |
Souden et al. | A multichannel MMSE-based framework for speech source separation and noise reduction | |
Falk et al. | Modulation spectral features for robust far-field speaker identification | |
US9240190B2 (en) | Formant based speech reconstruction from noisy signals | |
DEREVERBERATION et al. | REVERB Workshop 2014 | |
Sadjadi et al. | Blind spectral weighting for robust speaker identification under reverberation mismatch | |
JP7383122B2 (en) | Method and apparatus for normalizing features extracted from audio data for signal recognition or modification | |
Loellmann et al. | Single-channel maximum-likelihood T60 estimation exploiting subband information | |
Jin et al. | Multi-channel noise reduction for hands-free voice communication on mobile phones | |
Zhang et al. | Distant-talking speaker identification by generalized spectral subtraction-based dereverberation and its efficient computation | |
EP3847645B1 (en) | Determining a room response of a desired source in a reverberant environment | |
Zorilă et al. | Speaker reinforcement using target source extraction for robust automatic speech recognition | |
Himawan et al. | Channel selection in the short-time modulation domain for distant speech recognition | |
Chen et al. | A dual-stream deep attractor network with multi-domain learning for speech dereverberation and separation | |
Flores et al. | Cepstral distance based channel selection for distant speech recognition | |
Sadjadi et al. | Blind reverberation mitigation for robust speaker identification | |
Guzewich et al. | Improving Speaker Verification for Reverberant Conditions with Deep Neural Network Dereverberation Processing. | |
Mirsamadi et al. | Multichannel feature enhancement in distributed microphone arrays for robust distant speech recognition in smart rooms | |
Kindt et al. | Improved separation of closely-spaced speakers by exploiting auxiliary direction of arrival information within a u-net architecture | |
Haeb‐Umbach et al. | Reverberant speech recognition | |
Himawan et al. | Channel Selection in the Short-time Modulation Domain for Distant Speech Recognition; Comparison with the Envelope-Variance Measure | |
Tsilfidis et al. | Speech dereverberation based on a recorded handclap | |
Toh et al. | Investigation of robust features for speech recognition in hostile environments | |
Pandey et al. | Decoupled Spatial and Temporal Processing for Resource Efficient Multichannel Speech Enhancement |