KR20190046124A - Method and apparatus for real-time automatic interpretation based on context information - Google Patents
Method and apparatus for real-time automatic interpretation based on context information Download PDFInfo
- Publication number
- KR20190046124A KR20190046124A KR1020170139323A KR20170139323A KR20190046124A KR 20190046124 A KR20190046124 A KR 20190046124A KR 1020170139323 A KR1020170139323 A KR 1020170139323A KR 20170139323 A KR20170139323 A KR 20170139323A KR 20190046124 A KR20190046124 A KR 20190046124A
- Authority
- KR
- South Korea
- Prior art keywords
- context information
- encoding
- unit
- node
- real
- Prior art date
Links
Images
Classifications
-
- G06F17/289—
Landscapes
- Machine Translation (AREA)
Abstract
Description
The present disclosure relates to automatic interpretation technology, and more particularly, to a method and apparatus for providing automatic interpretation based on contextual information in real time.
Real-time automatic interpretation technology refers to a technique of receiving speech data in the original language of a speaker and automatically translating the speech data into a target language of the listener in real time.
In the prior art of the automatic interpretation technology, the automatic interpretation system based on Statistical Machine Translation has been difficult to provide interpretation in real time because the translation process can be performed only after the utterance ends. In addition, an automatic interpretation method based on deep learning can provide a real-time interpretation using a sequence deepening learning model or a sequence-to-sequence deepening learning model, but the learning data is not sufficient When there is a large difference between the learning data and the input data, there is a problem that the performance deteriorates significantly.
A technical object of the present invention is to provide a real-time automatic interpretation method and apparatus improved interpreting accuracy by using input data of a speaker and context information together.
A technical object of the present invention is to provide a real-time automatic interpretation method and apparatus in which interpreting accuracy is improved by using a subject, a keyword, a previous speech, etc. of a speaker as context information and applying a convolution network method.
The technical objects to be achieved by the present disclosure are not limited to the above-mentioned technical subjects, and other technical subjects which are not mentioned are to be clearly understood from the following description to those skilled in the art It will be possible.
A real-time automatic interpretation method using context information according to an aspect of the present disclosure includes: encoding a current utterance; Encoding the context information related to the current speech content; Correcting the encoded result of the current speech content based on the encoded result of the context information; And decoding the current speech based on the corrected result of the current speech content.
The features briefly summarized above for this disclosure are only exemplary aspects of the detailed description of the disclosure which follow, and are not intended to limit the scope of the disclosure.
According to the present disclosure, by using the input data of the speaker and the context information together, a real-time automatic interpretation method and apparatus improved interpreting accuracy can be provided.
According to the present disclosure, it is possible to provide a real-time automatic interpretation method and apparatus that improves interpreting accuracy by using a subject, keyword, previous speech content, etc. of a speaker as context information and applying a convolution network method.
The effects obtainable from the present disclosure are not limited to the effects mentioned above, and other effects not mentioned can be clearly understood by those skilled in the art from the description below will be.
1 is a diagram for explaining an automatic translation method based on a deep learning to which the present disclosure can be applied.
FIG. 2 is a diagram for explaining an automatic translation based on deep learning using context information according to the present disclosure.
FIG. 3 is a view for explaining an example of an automatic translation based on the deep learning using the context information according to the present disclosure.
4 is a diagram for explaining a context information encoding unit and a context information combination unit in an example using multiple types of context information according to the present disclosure;
5 is a diagram for explaining an example of a context information combination unit according to the present disclosure;
6 is a flowchart for explaining a real-time automatic translation method using context information according to the present disclosure.
Hereinafter, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings, which will be easily understood by those skilled in the art. However, the present disclosure may be embodied in many different forms and is not limited to the embodiments described herein.
In the following description of the embodiments of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present disclosure rather unclear. Parts not related to the description of the present disclosure in the drawings are omitted, and like parts are denoted by similar reference numerals.
In the present disclosure, when an element is referred to as being "connected", "coupled", or "connected" to another element, it is understood that not only a direct connection relationship but also an indirect connection relationship May also be included. Also, when an element is referred to as " comprising " or " having " another element, it is meant to include not only excluding another element but also another element .
In the present disclosure, the terms first, second, etc. are used only for the purpose of distinguishing one element from another, and do not limit the order or importance of elements, etc. unless specifically stated otherwise. Thus, within the scope of this disclosure, a first component in one embodiment may be referred to as a second component in another embodiment, and similarly a second component in one embodiment may be referred to as a first component .
In the present disclosure, the components that are distinguished from each other are intended to clearly illustrate each feature and do not necessarily mean that components are separate. That is, a plurality of components may be integrated into one hardware or software unit, or a single component may be distributed into a plurality of hardware or software units. Thus, unless otherwise noted, such integrated or distributed embodiments are also included within the scope of this disclosure.
In the present disclosure, the components described in the various embodiments are not necessarily essential components, and some may be optional components. Thus, embodiments consisting of a subset of the components described in one embodiment are also included within the scope of the present disclosure. Also, embodiments that include other elements in addition to the elements described in the various embodiments are also included in the scope of the present disclosure.
In the following, various examples according to the present disclosure will be described.
The present disclosure includes various examples of methods and apparatus for using contextual information in real-time automatic interpretation. More specifically, a description will be given of a method of recognizing voice information input in real time and using context information related to or perceived voice information in performing real-time automatic interpretation based on artificial intelligence on recognized voice information .
In the present disclosure, real-time automatic interpretation using context information includes a deep-learning-based real-time automatic interpretation method using sentences that have already been spoken or similar sentences that have already been uttered.
More specifically, the context information may include a topic of a speaker, a previous utterance of a speaker, a previous utterance translation of a speaker, a key word of a speaker, and the like. Such various context information can be encoded and used for the translation of the current utterance contents. In addition, by applying a convolution network learning technique to real-time automatic interpretation using such various context information, the performance of real-time automatic interpretation can be maximized.
Real-time automatic interpretation technology refers to a technique of receiving speech data in the original language of a speaker and automatically translating the speech data into a target language of the listener in real time. That is, the real-time automatic interpretation system can be used in a situation where the voice data of the original language is input in real time (for example, when people using different languages are talking, when listening to lectures of people using different languages) To output the voice data in the target language in order to facilitate the understanding of the voice of the user.
A conventional automatic translation and interpretation system for processing voice data inputted in real time receives a speech utterance in one language as an input and converts the result of speech recognition into text and uses it as an input to an automatic translation and interpretation system. This automatic translation and interpretation system automatically translates input text information from one language to another (e.g., from Korean to English) using a rules-based or statistical machine translation (SMT) approach, The result can be synthesized again and delivered to the user. Such an automatic translation and interpretation system is complicated and lacks real-time character because it has a drawback in that translation input is not received until one sentence or person's utterance is over. In addition, SMT, which is a rule-based or statistical method, is not suitable for real-time automatic interpretation because its performance is not high.
Further, Deep Learning-based interpretation method, which is a conventional real-time automatic interpretation system, can be used for both speech recognition and automatic translation and has a great effect on the entire industry because of its excellent performance. Deepening learning can achieve the desired performance if the theoretically sufficient learning data is provided. However, it is difficult to obtain desired performance when the learning data is inappropriate or insufficient. For example, when learning is performed only with female voice data, the performance of recognizing male voice is deteriorated, and voice data not including learning data can not be interpreted and transmitted or the performance thereof is remarkably deteriorated.
In order to solve the problem of the automatic interpretation method based on the deepening learning, according to the various examples of the present disclosure, the performance of the real time automatic interpretation can be improved by using the context information.
First, the operation principle based on the deep learning will be described, and the features of this disclosure will be described later.
1 is a diagram for explaining an automatic translation method based on a deep learning to which the present disclosure can be applied.
The automatic learning method based on deepening learning can be summarized as generating a learning model using learning data and then converting the source language sentence of the input sentence in one language into the target language in the other language by using the generated learning model have. At this time, in the learning stage, encoding and decoding can be largely performed.
As an example of an automatic translation system based on deepening learning, FIG. 1 shows a sequence-sequence deepening learning-based
In the example of Fig. 1, the source language is Korean and the target language is English. For example, in Korean, "I go to school." Quot; to " I go to school. &Quot; In the example of FIG. 1, w represents a resultant vector value obtained by encoding an input sentence. Specifically, using the automatic translation system based on the deep learning shown in FIG. 1, " I go to school. &Quot; To generate an output value of a vector form < RTI ID = 0.0 > w. ≪ / RTI > In the decoding step, the value of the next node can be generated using the w vector and the value of the previous node.
More specifically, in the
The
Finally, based on the encoded vector value w for the input sentence " I go to school, " the
In the real-time automatic translation based on this deepening learning, both the source language input sentence and the target language correct sentence (that is, the accurate translation result of the source language sentence) are given in the learning process using the learning model, The learning process is repeated while adjusting each node of the deepening learning network. Adjusting the nodes of the deep learning network may include adjusting the parameters and bias values of each RNN or LSTM node of encoding and decoding.
As described with reference to FIG. 1, a source language input sentence has a great influence on a decoding process while being encoded, and is directly related to translation performance.
In this disclosure, a method of generating a more appropriate target language word or token in the decoding step by using context information in the encoding step in addition to the input sentence generated in real time by speech is described.
FIG. 2 is a diagram for explaining an automatic translation based on deep learning using context information according to the present disclosure.
2 includes context elements corresponding to the
The context
The context
FIG. 3 is a view for explaining an example of an automatic translation based on the deep learning using the context information according to the present disclosure.
3 includes components corresponding to the
3, the context
The
The context
The context
For example, when translating without contextual information on the input sentence "I go to school" (ie, based only on the encoding result w) as in the example of FIG. 1, the result of "I go to school." However, it may be more appropriate to express the will of the speaker in the input sentence "I go to school", considering the contextual information "Although school is far". That is, the intention of the speaker can be grasped through contextual information that "(yet) I go to school." Thus, if the context information includes meaning of concession, it is more appropriate if an element representing the will of the speaker, such as " still " in the translation of the input sentence (for example, the word "still" in English) It can be translated. To this end, the context
Accordingly, the
Finally, based on the encoding result value Wc corrected based on the encoding vector value C for the context information "Even if school is far" for the encoded vector value w for the input sentence "I am going to school" The
4 is a diagram for explaining a context information encoding unit and a context information combination unit in an example using multiple types of context information according to the present disclosure;
4, the context
The context
The scope of the present disclosure is not limited by the example and the number of context information types, and may include some types of context information encoding units in the example of FIG. 4, and may further include other types of context information encoding units. In the example of FIG. 4, one type-context information encoding unit is included for each type of context information. However, the scope of the present disclosure is not limited to this, and for the same type of context information, a plurality of type- It is possible.
For example, the first type context
The second type context
The third type context
The fourth type context information encoding unit 414 may output C4, which is a result value obtained by encoding the context information of the fourth type corresponding to the keyword of the current speech content. More specifically, the core keyword of the speaker can be encoded and utilized. Since most utterances are related to a particular topic, appropriate keywords for that topic can be used as contextual information. These keywords often appear in the contents of the speaker from the beginning to the end of the utterance. Therefore, if the frequency and importance of the words in the contents of the speaker are calculated in real time and the keywords are added to the context information, the contents of the current speaker can be greatly assisted. In addition, such a keyword can contribute to the translation performance by reducing the word selection error in the translation process by using the corresponding speech words together as the context information. Quot; kw1 ", " kw2 ", and " kw2 " corresponding to the keywords of the current utterance contents may be input and processed to the nodes 414_1, 414_2, and 414_3, respectively, and the finally encoded context information C4 may be generated.
Thus, by using one or more of various types of context information, it is possible to improve the translation performance by reducing the ambiguity in understanding and interpreting the context of the conversation.
5 is a diagram for explaining an example of a context information combination unit according to the present disclosure;
In the example of FIG. 5, the context
For example, the context
The context information encoding result
When one or more context information is provided in this manner, it is possible to maximize the effect of improving the translation performance by selecting the usefulness of the encoded context information without directly determining what context information influences the translation of the current utterance contents have. To this end, the context
Specifically, the
For example, C1, C2, and C3 may be combined in the first combination node 522_1, C1, C2, C3, and C4 may be combined in the second combination node 522_2, C2, C3 and C4 can be combined in the fourth combination node 522_4 and C1, C3 and C4 can be combined in the fifth combination node 522_3 in the fourth combination node 522_4 . In this case, the combinations of the types of the encoded context information to be input to the second combination node 522_2 and the third combination node 522_3 are the same, but the combination methods (for example, The sum of squares, the sum of values that are equal to or greater than a predetermined threshold) may be different, and the parameters or bias values applied to the combination may be different.
The scope of the present disclosure is not limited by the number of combination nodes, combinations of types of context information input to combination nodes, and the like, and may include examples of combining various context information in various ways. Also, some context information may be passed to the
The combination nodes 522_1, 522_2, 522_3, 522_4, and 522_5 included in the
The
6 is a flowchart for explaining a real-time automatic translation method using context information according to the present disclosure.
6 may be performed by a context-based real-time automatic interpretation apparatus (hereinafter referred to as apparatus) according to the present disclosure.
In step S610, the apparatus can perform encoding on the current utterance contents. For example, it is possible to construct an input sentence by speech recognition of the current utterance contents of the source language, perform encoding through one or more nodes based on the deepening learning model for the elements of the input sentence, Lt; / RTI >
In step S620, the device may encode context information related to the current utterance. For example, an encoding result value of one or more types of context information, such as the subject of the current speech content, the previous speech content, the translation result of the previous speech content, the keyword of the current speech content, etc., can be generated. In addition, the encoded resultant values of the plurality of context information may be combined in a convolution manner to generate the finally encoded context information (C).
In step S630, the apparatus can correct the encoded current speech content, which is the result of step S610, based on the encoded context information that is the result of step S620. That is, using the context information related to the current speech content, it is possible to generate the current speech content encoding result value Wc corrected to a form in which an optimal translation result suitable for the situation can be expected.
In step S640, the apparatus can decode the current utterance content based on the corrected current utterance content encoding result Wc. For example, the decoding result value of the first node in the target language is generated using the current spoken content encoding result value Wc of the corrected original language based on the context information, and the corrected current spoken content encoding result value Wc ) And the decoding result of the previous node to sequentially generate the result value of the next node.
Although the exemplary methods of this disclosure are represented by a series of acts for clarity of explanation, they are not intended to limit the order in which the steps are performed, and if necessary, each step may be performed simultaneously or in a different order. In order to implement the method according to the present disclosure, the illustrative steps may additionally include other steps, include the remaining steps except for some steps, or may include additional steps other than some steps.
The various embodiments of the disclosure are not intended to be all-inclusive and are intended to be illustrative of the typical aspects of the disclosure, and the features described in the various embodiments may be applied independently or in a combination of two or more.
In addition, various embodiments of the present disclosure may be implemented by hardware, firmware, software, or a combination thereof. In the case of hardware implementation, one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays A general processor, a controller, a microcontroller, a microprocessor, and the like.
The scope of the present disclosure is to be accorded the broadest interpretation as understanding of the principles of the invention, as well as software or machine-executable instructions (e.g., operating system, applications, firmware, Instructions, and the like are stored and are non-transitory computer-readable medium executable on the device or computer.
100 Advanced learning-based automatic translation structure
110 encoding unit
111, 112, 113, 114 encoding nodes
120 decoding unit
121, 122, 123, 124, 125, 126,
210, 310, 410 The context information encoding unit
220, 320, 420, 520,
311, 312, 313 Context information encoding node
411, 412, 413, 414 type context information encoding unit
411_1, 412_1, 412_2, 412_3, 413_1, 413_2, 413_3, 414_1, 414_2, 414_3 type context information encoding nodes
521 Context information encoding result value input unit
521_1, 521_2, 521_3, 521_4 type context information encoding result value input unit
522 Convolutional Learning Unit
522_1, 522_2, 522_3, 522_4, 522_5 combination nodes
523 Present Encoding Content Encoding Result Value Inputting Unit
524 correction unit
Claims (1)
Encoding the current speech content;
Encoding the context information related to the current speech content;
Correcting the encoded result of the current speech content based on the encoded result of the context information; And
And decoding the current speech content based on the encoded result of the corrected current speech content.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020170139323A KR20190046124A (en) | 2017-10-25 | 2017-10-25 | Method and apparatus for real-time automatic interpretation based on context information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020170139323A KR20190046124A (en) | 2017-10-25 | 2017-10-25 | Method and apparatus for real-time automatic interpretation based on context information |
Publications (1)
Publication Number | Publication Date |
---|---|
KR20190046124A true KR20190046124A (en) | 2019-05-07 |
Family
ID=66656438
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020170139323A KR20190046124A (en) | 2017-10-25 | 2017-10-25 | Method and apparatus for real-time automatic interpretation based on context information |
Country Status (1)
Country | Link |
---|---|
KR (1) | KR20190046124A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021235586A1 (en) * | 2020-05-21 | 2021-11-25 | 삼성전자 주식회사 | Electronic device for translating text sequence and operation method thereof |
KR20220003930A (en) * | 2020-07-02 | 2022-01-11 | 주식회사 엔씨소프트 | Learning method and cognition method for omission restoration and apparatus for executing the method |
-
2017
- 2017-10-25 KR KR1020170139323A patent/KR20190046124A/en unknown
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021235586A1 (en) * | 2020-05-21 | 2021-11-25 | 삼성전자 주식회사 | Electronic device for translating text sequence and operation method thereof |
KR20220003930A (en) * | 2020-07-02 | 2022-01-11 | 주식회사 엔씨소프트 | Learning method and cognition method for omission restoration and apparatus for executing the method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7514920B2 (en) | Speech recognition error correction method, related device and readable storage medium | |
KR102589637B1 (en) | Method and apparatus for performing machine translation | |
US9697201B2 (en) | Adapting machine translation data using damaging channel model | |
WO2018010455A1 (en) | Neural network-based translation method and apparatus | |
CN106486121B (en) | Voice optimization method and device applied to intelligent robot | |
US11093110B1 (en) | Messaging feedback mechanism | |
WO2015096564A1 (en) | On-line voice translation method and device | |
US20180089172A1 (en) | Communication system supporting blended-language messages | |
CN110895932A (en) | Multi-language voice recognition method based on language type and voice content collaborative classification | |
CN111292740B (en) | Speech recognition system and method thereof | |
JP2022548718A (en) | Decryption network construction method, speech recognition method, device, equipment and storage medium | |
CN111539199B (en) | Text error correction method, device, terminal and storage medium | |
US11907665B2 (en) | Method and system for processing user inputs using natural language processing | |
CN110147554B (en) | Simultaneous interpretation method and device and computer equipment | |
WO2022142823A1 (en) | Human-machine conversation method and apparatus, computer device, and readable storage medium | |
Niehues et al. | Dynamic Transcription for Low-Latency Speech Translation. | |
KR20190046124A (en) | Method and apparatus for real-time automatic interpretation based on context information | |
JP2021503104A (en) | Automatic speech recognition device and method | |
Fujita et al. | Toward streaming ASR with non-autoregressive insertion-based model | |
US11984125B2 (en) | Speech recognition using on-the-fly-constrained language model per utterance | |
CN115346520A (en) | Method, apparatus, electronic device and medium for speech recognition | |
CN113035200B (en) | Voice recognition error correction method, device and equipment based on human-computer interaction scene | |
KR20140079543A (en) | Auto Interpreting and Translating apparatus | |
Miranda et al. | Improving ASR by integrating lecture audio and slides | |
KR101543024B1 (en) | Method and Apparatus for Translating Word based on Pronunciation |