KR20140077780A - Apparatus for adapting language model scale using signal-to-noise ratio - Google Patents
Apparatus for adapting language model scale using signal-to-noise ratio Download PDFInfo
- Publication number
- KR20140077780A KR20140077780A KR1020120146911A KR20120146911A KR20140077780A KR 20140077780 A KR20140077780 A KR 20140077780A KR 1020120146911 A KR1020120146911 A KR 1020120146911A KR 20120146911 A KR20120146911 A KR 20120146911A KR 20140077780 A KR20140077780 A KR 20140077780A
- Authority
- KR
- South Korea
- Prior art keywords
- signal
- language model
- noise ratio
- model scale
- present
- Prior art date
Links
- 230000006978 adaptation Effects 0.000 claims description 23
- 238000000034 method Methods 0.000 description 11
- 230000006870 function Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 239000002184 metal Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Probability & Statistics with Applications (AREA)
- Machine Translation (AREA)
Abstract
Description
BACKGROUND OF THE INVENTION 1. Field of the Invention [0001] The present invention relates to a speech recognition system, and more particularly, to a language model scale adaptation apparatus for enhancing speech recognition performance in a speech recognition system.
Speech recognition technology is relatively common and is being used in various applications. However, since speech recognition technology of isolated word level is commercialized, there is an increasing demand for speech recognition products having higher functions in terms of users.
That is, there is a need for a key word spotting technique capable of recognizing even if another word is included before and after a recognition target word, or a continuous speech recognition technique capable of recognizing a natural sentence type.
However, in the case of continuous speech recognition, the user's expectation level has not been reached yet.
In other words, there is a problem of how good a language model can be applied in addition to the performance of an acoustic model.
In most cases, the language model is constructed using text data, which is constructed using a text corpus to obtain various text data.
For example, if you have versatility such as dictation, you will use newspaper articles, novels, and other materials available on the Internet. However, in this case, the performance of the language model made using the data is limited.
In particular, if a language model is not sufficient for a particular application, the performance expected by the user becomes difficult to obtain.
The most ideal method is to obtain textual data suitable for the application field, but this is difficult in reality.
Efforts to overcome these problems have been made in many ways. Bilingual model adaptation can also be seen as one of these efforts.
However, acoustic models and language models have different ranges of probabilities due to differences in modeling methods, and the role of correcting these differences is the language model scale.
In general, the optimal language model scale is obtained through experimentation and the optimal value of speed vs. performance is used for the given evaluation corpus and system.
In general, when the signal-to-noise ratio is good, the discrimination power between the acoustic models is good, but when the signal-to-noise ratio is bad, the discrimination power between the acoustic models is deteriorated.
However, there is a problem that the probability value or the discriminating power of the language model is maintained irrespective of the quality of the input signal.
The present invention has been proposed in order to solve the problems described in the background art. In order to maintain a stable recognition performance even in a noisy environment, the language model scale is adjusted according to the degree of noise of an input signal.
In general, when the signal-to-noise ratio is good, the discrimination power between the acoustic models is good, but when the signal-to-noise ratio is bad, the discrimination power between the acoustic models is deteriorated.
However, the probability value or discriminating power of the language model is maintained irrespective of the quality of the input signal.
Therefore, if the signal-to-noise ratio is good, the probability value of the acoustic model is weighted more. Otherwise, the probability value of the acoustic model is more weighted so that the language model scale is adjusted so that the discrimination power of the language model is used more in the noisy environment. The present invention provides a language model scale adaptation apparatus using a signal-to-noise ratio that improves recognition performance in an environment.
In order to overcome the problems raised in the background art, the present invention is based on the assumption that the probability value of the acoustic model is weighted more when the signal-to-noise ratio is good, and is further weighted to the probability value of the acoustic model, The present invention provides a language model scale adaptation apparatus using a signal-to-noise ratio that improves recognition performance in a noisy environment by adjusting a language model scale to use more discriminating power.
Wherein the language model scale adaptation apparatus adjusts a language model scale by assigning different weights to a probability value of an acoustic model based on the signal-to-noise ratio in a language model scale adaptation apparatus using a signal-to-noise ratio of a speech recognition method .
On the other hand, another embodiment of the present invention is a speech signal input method comprising the steps of: inputting a voice signal; An end point detecting step of detecting an end point of the input voice signal; A signal-to-noise ratio measurement step of measuring a signal-to-noise ratio (SNR) for a speech signal as an end point is detected; A language model scale adaptation step of weighting the probability value of the acoustic model if the signal-to-noise ratio is good according to the measured signal-to-noise ratio, and adapting the language model scale by weighting the probability value of the acoustic model in a good case; Generating a search space for the speech signal as the language model scale is adapted; And a decoding step of decoding the search space signal to generate a final speech recognition result.
According to the present invention, the recognition performance of the noise environment is improved by weighting the discrimination power of the language model for a speech signal having a low signal-to-noise ratio.
That is, if the signal-to-noise ratio is good, the probability value of the acoustic model is weighted more, and if it is not good, the probability value of the acoustic model is more weighted so that the language model scale is adjusted so that the discrimination power of the language model is used more in the noisy environment, The recognition performance can be improved in the environment.
1 is a block diagram of a language model scale adaptation apparatus using a signal-to-noise ratio according to an embodiment of the present invention.
FIG. 2 is a flowchart illustrating a language model scale adaptation process using a signal-to-noise ratio according to an exemplary embodiment of the present invention.
While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It is to be understood, however, that the invention is not to be limited to the specific embodiments, but includes all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.
Like reference numerals are used for similar elements in describing each drawing.
The terms first, second, etc. may be used to describe various components, but the components should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another.
For example, without departing from the scope of the present invention, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component. The term "and / or" includes any combination of a plurality of related listed items or any of a plurality of related listed items.
Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
Terms such as those defined in commonly used dictionaries are to be interpreted as having a meaning consistent with the contextual meaning of the related art and are to be interpreted as either ideal or overly formal in the sense of the present application Should not.
Hereinafter, a language model scale adaptation apparatus using a signal-to-noise ratio according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings.
1 is a block diagram of a language model scale adaptation apparatus using a signal-to-noise ratio according to an embodiment of the present invention. 1, the language model scale adaptation apparatus comprises an
Generally, a probability-based speech recognition system obtains a word sequence W having a maximum likelihood a posteriori probability (ML-APP) with respect to an input speech signal X as shown in Equation (1).
At this time,
Acoustic model, The language model, alpha, is called the language model scale.The acoustic model is the probability that each word or phoneme will generate a specific speech signal, and the language model is the probability of occurrence for successive words.
The acoustic model and the language model have different ranges of probabilities due to differences in modeling methods, and the language model scale plays a role of correcting the differences.
In an embodiment of the present invention, a language model scale adaptive scheme based on the signal-to-noise ratio is used, and the expression is expressed by the following equation. As shown in Equation (2), the language model scale is a function of the time t and the signal-to-noise ratio.
Here, SNR (t) is the signal-to-noise ratio in time frame t, α is the optimal language model scale obtained through experiments, and β is obtained through experimentation with a weighting factor. At this time, the sigmoid function is obtained by the following equation.
FIG. 2 is a flowchart illustrating a language model scale adaptation process using a signal-to-noise ratio according to an embodiment of the present invention.
2, the language model scale adaptation process includes a speech signal input step S200 for inputting a speech signal, an end point detection step S210 for detecting an end point of the input speech signal, A signal-to-noise ratio measuring step (S220) of measuring a signal-to-noise ratio (SNR) of a speech signal; and a step of calculating a weighted value of the probability value of the acoustic model if the signal- A language model scale adaptation step (S230) of adapting a language model scale by weighting a probability value of an acoustic model in a good case, and a search space creation step of generating a search space for the speech signal as the language model scale is adapted A decoding step S250 of decoding the signal in the search space to generate a final speech recognition result, and the like.
In particular, in particular, the language model scale adaptation method using a signal-to-noise ratio according to an embodiment of the present invention may be implemented in the form of program command code that can be executed through various computer means and recorded in a computer-readable storage medium.
The computer-readable storage medium may include program instructions, data files, data structures, and the like, alone or in combination.
The program instructions recorded on the medium may be those specially designed and constructed for the present invention or may be available to those skilled in the art of computer software.
Examples of computer-readable storage media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magneto-optical media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like.
The medium may be a transmission medium such as an optical or metal line, a wave guide, or the like, including a carrier wave for transmitting a signal designating a program command, a data structure, or the like.
Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.
In addition, one embodiment of the present invention may be implemented in hardware, software, or a combination thereof. (DSP), a programmable logic device (PLD), a field programmable gate array (FPGA), a processor, a controller, a microprocessor, and the like, which are designed to perform the above- , Other electronic units, or a combination thereof.
In a software implementation, it may be implemented as a module that performs the functions described above. The software may be stored in a memory unit and executed by a processor. The memory unit or processor may employ various means well known to those skilled in the art.
100: End point detector
110: signal-to-noise ratio measuring unit
120: language model scale adaptation unit
130: Search space generating unit
140:
Claims (1)
Wherein the language model scale is adjusted by assigning a different weight to the probability value of the acoustic model based on the signal-to-noise ratio, and the speech model scale adaptation apparatus using the signal-to-noise ratio.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020120146911A KR102020782B1 (en) | 2012-12-14 | 2012-12-14 | Apparatus for adapting language model scale using signal-to-noise ratio |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020120146911A KR102020782B1 (en) | 2012-12-14 | 2012-12-14 | Apparatus for adapting language model scale using signal-to-noise ratio |
Publications (2)
Publication Number | Publication Date |
---|---|
KR20140077780A true KR20140077780A (en) | 2014-06-24 |
KR102020782B1 KR102020782B1 (en) | 2019-09-11 |
Family
ID=51129629
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020120146911A KR102020782B1 (en) | 2012-12-14 | 2012-12-14 | Apparatus for adapting language model scale using signal-to-noise ratio |
Country Status (1)
Country | Link |
---|---|
KR (1) | KR102020782B1 (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080010057A1 (en) * | 2006-07-05 | 2008-01-10 | General Motors Corporation | Applying speech recognition adaptation in an automated speech recognition system of a telematics-equipped vehicle |
KR20100138520A (en) * | 2009-06-25 | 2010-12-31 | 한국전자통신연구원 | Speech recognition apparatus and its method |
KR20120066530A (en) | 2010-12-14 | 2012-06-22 | 한국전자통신연구원 | Method of estimating language model weight and apparatus for the same |
-
2012
- 2012-12-14 KR KR1020120146911A patent/KR102020782B1/en active IP Right Grant
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080010057A1 (en) * | 2006-07-05 | 2008-01-10 | General Motors Corporation | Applying speech recognition adaptation in an automated speech recognition system of a telematics-equipped vehicle |
KR20100138520A (en) * | 2009-06-25 | 2010-12-31 | 한국전자통신연구원 | Speech recognition apparatus and its method |
KR20120066530A (en) | 2010-12-14 | 2012-06-22 | 한국전자통신연구원 | Method of estimating language model weight and apparatus for the same |
Also Published As
Publication number | Publication date |
---|---|
KR102020782B1 (en) | 2019-09-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109741736B (en) | System and method for robust speech recognition using generative countermeasure networks | |
US10930270B2 (en) | Processing audio waveforms | |
US11210475B2 (en) | Enhanced attention mechanisms | |
US11798535B2 (en) | On-device custom wake word detection | |
US10679643B2 (en) | Automatic audio captioning | |
CN109036391B (en) | Voice recognition method, device and system | |
US9779730B2 (en) | Method and apparatus for speech recognition and generation of speech recognition engine | |
US9202462B2 (en) | Key phrase detection | |
EP3966813A1 (en) | Online verification of custom wake word | |
JP7351018B2 (en) | Proper noun recognition in end-to-end speech recognition | |
JP5861649B2 (en) | Model adaptation device, model adaptation method, and model adaptation program | |
US10096317B2 (en) | Hierarchical speech recognition decoder | |
WO2016144988A1 (en) | Token-level interpolation for class-based language models | |
EP3739583A1 (en) | Dialog device, dialog method, and dialog computer program | |
US20190027133A1 (en) | Spoken language understanding using dynamic vocabulary | |
US12125482B2 (en) | Adaptively recognizing speech using key phrases | |
WO2014183411A1 (en) | Method, apparatus and speech synthesis system for classifying unvoiced and voiced sound | |
JP7326596B2 (en) | Voice data creation device | |
CN112863496B (en) | Voice endpoint detection method and device | |
CN112037772A (en) | Multi-mode-based response obligation detection method, system and device | |
US9892726B1 (en) | Class-based discriminative training of speech models | |
KR20200102309A (en) | System and method for voice recognition using word similarity | |
KR20140077780A (en) | Apparatus for adapting language model scale using signal-to-noise ratio | |
JP2014092750A (en) | Acoustic model generating device, method for the same, and program | |
Scarcella | Recurrent neural network language models in the context of under-resourced South African languages |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
E902 | Notification of reason for refusal | ||
E701 | Decision to grant or registration of patent right | ||
GRNT | Written decision to grant |