Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
As used in this disclosure, "module," "device," "system," and the like can refer to a computer-related entity, either hardware, a combination of hardware and software, or software in execution. In particular, for example, an element may be, but is not limited to being, a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. Also, an application or script running on a server, or a server, may be an element. One or more elements may be in a process and/or thread of execution and an element may be localized on one computer and/or distributed between two or more computers and may be operated by various computer-readable media. The elements may also communicate by way of local and/or remote processes based on a signal having one or more data packets, e.g., from a data packet interacting with another element in a local system, distributed system, and/or across a network in the internet with other systems by way of the signal.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
As shown in fig. 1, an embodiment of the present invention provides a method for training a word-embedded language model, where the method includes:
and S11, determining the attributes of all the words in the corpus to generate a word list, wherein the attributes comprise part of speech classification of all the words, probability distribution of all the part of speech classification and probability distribution of all the words under the part of speech classification.
In the word list in this embodiment, all words from the corpus are stored according to part-of-speech classifications, which may include nouns, adjectives, verbs, adverbs, and the like, and the proportion of the words included in all the part-of-speech classifications to all the words from the corpus is determined by statistics, that is, the probability distribution of all the part-of-speech classifications is determined.
Then, the proportion of the words belonging to each part of speech classification to all the words under the part of speech classification is further counted, namely, the probability distribution of all the words under the part of speech classification is determined.
And S12, generating word vectors of all words in the word list. Morphological information of the words is obtained by obtaining word vectors of the words.
S13, generating part-of-speech classification vectors corresponding to part-of-speech classifications of all words in the word list; syntactic level information (i.e., semantic information) of words belonging to a corresponding part-of-speech classification can be determined by determining part-of-speech classification vectors such that all words belonging to the same part-of-speech classification share the same part-of-speech classification vector.
S14, taking word vectors of words in the word list and part-of-speech classification vectors of the words in the word list as input, and taking probability distribution of part-of-speech classification to which the words in the word list belong and probability distribution of the words in the word list under the part-of-speech classification as output to train so as to obtain the word embedding language model.
The two vectors are directly spliced together as input, and a normalized exponential function of factorization is adopted to calculate the probability of each word at an output layer. The probability distribution of each semantic classification is calculated first, and then the probability distribution of the words under the parts of speech is calculated. Finally, the two probability distributions are multiplied to obtain the required probability distribution.
In the embodiment of the invention, when the language model is trained, the words in the corpus are not directly taken for training, but the attributes of all the words are determined at first, wherein the attributes comprise part-of-speech classification of all the words, probability distribution of all the part-of-speech classification and probability distribution of all the words under the part-of-speech classification; and morphological information and syntactic level information of words are comprehensively considered during language model training, particularly the introduction of the syntactic level information, certain commonalities of the words on semantics are utilized, parameters of the words in a word list can be utilized by an OOV word through a parameter sharing method, and the commonalities of the words belonging to the same part of speech classification are comprehensively considered, so that even if the OOV word is encountered, the trained language model can be accurately identified in practical application through the morphological information of the OOV word and the syntactic level information of the part of speech classification.
In addition, because we introduce extra information (semantic classification, morphological decomposition), we can directly model unseen new words, and the modeling is more accurate. However, in the conventional approach, modeling a new word requires collecting new data and then retraining the model, which is very time consuming. Therefore, in the actual use process, the model can greatly save the time for adding new words. The method can be further integrated into a voice recognition system to realize quick word list updating, and meanwhile, the recognition accuracy of low-frequency words is improved.
As shown in fig. 2, in some embodiments, the generating a word vector for all words in the word list includes:
s21, judging whether the words obtained from the word list are low-frequency words or not;
s22, if yes, resolving the words obtained from the word list into words, and coding the resolved words to determine corresponding word vectors;
and S23, if not, extracting the vector of the words obtained from the word list as a word vector.
In the embodiment of the invention, different processing methods are adopted for high-frequency words and low-frequency words: for high frequency words, each word has its own independent word vector (e.g., one-hot vectors may be used); for low-frequency words, firstly, the words are morphologically decomposed (for Chinese, i.e. decomposed into words), then, some sequence coding methods are further adopted to convert the words into vectors with Fixed length, and common coding methods include character-level Fixed-size conventional forgetting coding (FOFE), direct addition of word vectors, coding by using a cyclic or convolutional neural network, and the like. In the embodiment of the invention, a word is not regarded as a one-hot vector alone, but classified from a syntax level (pop, part-of-speed tag) (namely, representation of the syntax level of one-hot), and then, from a morphological level, a FOFE code is coded.
The distinction between high-frequency words and low-frequency words is made because not all the meanings of a word can be well represented by its word, so the embodiments of the present invention avoid the impact on the performance of the speech model under the above circumstances.
In addition, a set of parameters needs to be set for each word in the conventional language model. In the method of the embodiment of the present invention, since the low-frequency words are all decomposed into words (morphological decomposition), we only need to set parameters for all the high-frequency words and words, which brings about the effect that the required parameter amount is greatly reduced (usually by about 80%), and the advantage brought by the small parameter amount is that the word embedding language model obtained by the embodiment of the present invention can be embedded into some smaller devices (such as mobile phones).
Morphological decomposition of words may also be performed using phonemes from the beginning, but since some homophones have very different meanings, the phoneme decomposition method is not very effective, and the problems are overcome by using the method of the embodiment of the present invention.
In some embodiments, the training with the word vectors of the words in the vocabulary and the part-of-speech classification vectors of the words in the vocabulary as inputs and the probability distribution of the part-of-speech classifications to which the words in the vocabulary belong and the probability distribution of the words in the vocabulary under the part-of-speech classifications as outputs to obtain the word-embedded language model includes:
inputting word vectors of words in the word list and part-of-speech classification vectors of the words in the word list into a long-term and short-term memory network;
inputting the output of the long and short term memory network into a part of speech classifier to obtain the probability distribution of part of speech classification to which the words in the word list belong;
and inputting the output of the long-time and short-time memory network into a word classifier to obtain the probability distribution of the words in the word list under the part-of-speech classification.
The trained word embedded language model comprises a long-time and short-time memory network, a part-of-speech classifier and a word classifier.
As shown in fig. 3, an embodiment of the present invention further provides a word recognition method, where the method uses the word embedding language model in the embodiment of the present invention, and the method includes:
s31, generating word vectors of the words to be recognized;
s32, determining part-of-speech classification vectors of part-of-speech classifications of the words to be recognized;
s33, inputting the word vector and the part of speech classification vector of the word to be recognized into the word embedding language model to obtain the probability distribution of the part of speech classification to which the word to be recognized belongs and the probability distribution of the word to be recognized under the part of speech classification.
The language model adopted in the embodiment of the invention does not directly take the words in the corpus for training when training, but firstly determines the attributes of all the words, including part-of-speech classification of all the words, the probability distribution of all the part-of-speech classification and the probability distribution of all the words under the part-of-speech classification; and morphological information and syntactic level information of words are comprehensively considered during language model training, particularly the introduction of the syntactic level information, certain commonalities of the words on semantics are utilized, parameters of the words in a word list can be utilized by an OOV word through a parameter sharing method, and the commonalities of the words belonging to the same part of speech classification are comprehensively considered, so that even if the OOV word is encountered, the trained language model can be accurately identified in practical application through the morphological information of the OOV word and the syntactic level information of the part of speech classification.
In addition, because we introduce extra information (semantic classification, morphological decomposition), we can directly model unseen new words, and the modeling is more accurate. However, in the conventional approach, modeling a new word requires collecting new data and then retraining the model, which is very time consuming. Therefore, in the actual use process, the model can greatly save the time for adding new words. The method can be further integrated into a voice recognition system to realize quick word list updating, and meanwhile, the recognition accuracy of low-frequency words is improved.
As shown in fig. 4, in some embodiments, when the word to be recognized belongs to a vocabulary for training the word embedding language model, the generating a word vector of the word to be recognized includes:
s41, judging whether the words to be recognized are low-frequency words or not;
s42, if yes, the words to be recognized are disassembled into words, and the words obtained through disassembly are coded to be used for determining corresponding word vectors;
and S43, if not, extracting the vector of the word to be recognized as a word vector.
In the embodiment of the invention, different processing methods are adopted for high-frequency words and low-frequency words: for high frequency words, each word has its own independent word vector (e.g., one-hot vectors may be used); for low-frequency words, firstly, the words are morphologically decomposed (for Chinese, i.e. decomposed into words), then, some sequence coding methods are further adopted to convert the words into vectors with Fixed length, and common coding methods include character-level Fixed-size conventional forgetting coding (FOFE), direct addition of word vectors, coding by using a cyclic or convolutional neural network, and the like. In the embodiment of the invention, a word is not regarded as a one-hot vector alone, but classified from a syntax level (pop, part-of-speed tag) (namely, representation of the syntax level of one-hot), and then, from a morphological level, a FOFE code is coded.
The distinction between high-frequency words and low-frequency words is made because not all the meanings of a word can be well represented by its word, so the embodiments of the present invention avoid the impact on the performance of the speech model under the above circumstances.
In addition, a set of parameters needs to be set for each word in the conventional language model. In the method of the embodiment of the present invention, since the low-frequency words are all decomposed into words (morphological decomposition), we only need to set parameters for all the high-frequency words and words, which brings about the effect that the required parameter amount is greatly reduced (usually by about 80%), and the advantage brought by the small parameter amount is that the word embedding language model obtained by the embodiment of the present invention can be embedded into some smaller devices (such as mobile phones).
As shown in fig. 5, in some embodiments, when the word to be recognized does not belong to a vocabulary used for training the word embedding language model, the generating the word vector of the word to be recognized includes:
s51, determining the attribute of the word to be recognized to update the word list;
s52, the words to be recognized are disassembled into words, and the words obtained through disassembly are coded so as to be used for determining corresponding word vectors.
The method for rapidly adding the new words in the word embedding language model is realized in the embodiment. The method can be further integrated into a voice recognition system to realize quick word list updating, and meanwhile, the recognition accuracy of low-frequency words is improved.
The following further describes the embodiment of the present invention by comparing the processing of OOV words by the conventional LSTM (Long Short-Term Memory) language model with the technical solution of the embodiment of the present invention.
Introduction of LSTM language model:
the deep learning method is widely applied to the language model and has great success. A Long Short Term Memory (LSTM) network is a Recurrent Neural Network (RNN) architecture that is particularly well suited for sequences. Let V be a vocabulary, at each timestamp t, the word w is enteredtFrom a one-hot vector etRepresentation, then word embedding can be obtained as xt:
xt=Eiet(1)
Wherein Ei∈Rm×|V|An input word embedding matrix, and m represents the gradation of the input word embedding. Specifically, one step of the LSTM is to convert xt,ht-1,ct-1As input, and generates ht,ct. The details of the calculations are omitted herein. The probability distribution of the next word is computed on the output layer by affine transformation of the hidden layer followed by the softmax function:
wherein E iso jIs Eo ∈ Rm×|V|Is also called output embedding, and bjIs the bias term. We have found that the bias terms of the output layer play an important role, being highly correlated with the frequency of words.
Since most of the computational cost depends on the output layer, a factorized softmax output layer is proposed to increase the speed of the language model. This approach is based on the assumption that words can be mapped to classes. Let S be a class set. Unlike equation (2), the probability distribution of the next word of the factorized output layer is calculated as follows:
P(wt+1=j|w1:t)=P(st+1=sj|ht)P(wt+1=j|sj,ht) (3)
wherein s isjRepresents the word wt+1Class V ofsjIs of class sjA set of all words of (a). Here, the probability calculation of a word is divided into two stages: we first estimate the probability distribution of the class and then compute the probability of a particular word from the desired class. In fact a word may belong to multiple classes. But in this context, each word maps to a different class, i.e., all classes are mutually exclusive. Common parts of speech are frequency-based classes or classes derived from data-driven methods.
OOV word processing
As previously mentioned, two methods have been used in the classic LSTM language model to handle OOV word problems:
1. a special class of < UNK > is used to replace all OOV words, using another measure called alignment complexity:
wherein VOOVIs a collection of words for all OOV words. We refer to this method as "unk" in the experiment.
2. The model is retrained with the updated vocabulary. Since OOV words have no or few positive examples in the training set, their probabilities will be assigned a small value after training. This approach may be similar to the smoothing approach used in n-gram language models. We refer to this method as "retraining" in the experiment.
Both of these conventional methods have their disadvantages: in the uk LSTM language model, the probability of misjudging OOV words is due to the frequency of OOV words not matching the training data and the test data. Furthermore, this approach ignores the linguistic information of the OOV words. The main problem with retrained LSTM language models is that they are time consuming.
In the traditional LSTM language model, word embedding for each word is independent, which creates two problems. First, new words cannot be embedded with training words. Second, this is rare due to the lack of training data. The motivation for structured word embedding is to use parameter sharing to solve both problems. Unlike the data-driven approach, the parameter sharing approach must be based on explicit rules. By using syntactic and morphological rules, we can easily find the shared parameters of OOV words and build their own structured word embedding in our model.
Morphological syntax structured embedding:
at a syntactic level, each word is assigned to a part-of-speech (POS) class. All words in the same POS (part-of-speech) class share the same POS class embedding, called syntactic embedding. A part of speech is a word with similar grammatical features. Therefore, we assume that syntactic embedding represents the basic syntactic function of a word.
For each word, we labeled its POS tag with several example sentences and selected the most common tag as a part of speech (POS tag is also available from dictionary). Example sentences for words in the vocabulary (IV) are selected from the training set. For OOV words, illustrative sentences may be composed of or selected from other data sources, such as network data. Unlike data-driven approaches, POS tag-based syntactic embedding can be easily generated for OOV words using rules.
Character (or sub-word) representations are widely used in many nlp (natural Language processing) tasks as an additional feature to improve the performance of low frequency words, particularly in morphologically rich languages. But for high frequency words, the improvement is limited. Morphological embedding is established herein to further capture the semantics of low frequency words. This is based on the assumption that the sparsity of data for low frequency words is less severe at the character level. For high frequency words, word embedding is preserved. Therefore, hybrid embedding, i.e., morphological embedding of low frequency words and word embedding of high frequency words, should be in the same dimension.
In previous literature, word embedding was combined with sub-word level features to obtain enhanced embedding of all words. In contrast, the morphological embedding of low frequency words proposed herein relies only on character-level features. Thus, it has the ability to model OOV words.
The proposed morphological embedding utilizes character-level fixed-size conventional forgetting encoding (FOFE) character information. In our model, all low frequency words are represented by the character sequence e1: t represents, wherein etIs a one-hot representation of the character at the timestamp t. FOFE is based on a simple recursive formula (z)00) the entire sequence is encoded:
zt=αzt-1+et(1≤t≤T) (7)
where 0< α <1 is a constant forgetting factor that controls the impact of history on the final time step in addition, a feed-Forward Neural Network (FNN) is used to convert the character-level FOFE code into the final morphological embeddings.
Combining structured embedding with the LSTM language model:
fig. 6 is a schematic structural diagram of a word embedding language model according to an embodiment of the present invention.
At the input level, structured embedding of input words is obtained by concatenating its syntactic embedding with word embedding (for high frequency words) or morphological embedding (for low frequency words).
In the output layer it is easy to use a factorised softmax structure. The output class embedding matrix Ec in formula (4) is replaced by syntactic embedding, and the output embedding matrix Eo in formula (5) is replaced by word and shape embedding.
Once training is complete, syntactic and morphological embedding of OOV words is readily available. To calculate the probability of an OOV word, we need to reconstruct the output layer parameters in equation (5): eo, b. All embedding and bias terms b in Eo are retained for IV words, and the embedding of OOV words in Eo is filled by its morphological embedding. In experiments, we find that the bias term is highly correlated with the word frequency, which means that the higher the word frequency, the larger the bias value. In this context, we use the bias term for OOV words as a small constant value of experience.
By utilizing structured word embedding, OOV words can be incorporated into the LSTM language model without requiring retraining. As we have mentioned earlier, the data sparsity of OOV words can also be mitigated by sharing parameters in the proposed model during training.
The structured word embedding language model provided by the embodiment of the invention also realizes parameter compression. In the LSTM language model, word embedding of low frequency words occupies a significant portion of the model parameters but is not well-trained. By replacing low frequency words with character representations, the number of parameters can be greatly reduced.
In the LSTM language model, the number of parameters for word embedding is 2 × | V | × H, whereas in the structured embedded LSTM language model, the total number of parameters is (| Vh | + | Vchar | + | S |) × H3Where Vh denotes a high frequency word, Vchar denotes a character set, and S denotes a POS tag group. Experiments have shown that parameters can be reduced by nearly 90% when V60000, Vh 8000, Vchar 5000, and S32.
In order to verify that the method and the system of the present invention can achieve the expected effect, the inventor performs a test based on a Short Message Service (SMS) data set to evaluate a word embedding language model (hereinafter, referred to as a structured word embedding LSTM language model).
TABLE 1 data set information
Table 1 gives details of the data set. Two different sized vocabularies are used for each data set. Complete vocabulary set VfCovering all the words that appear in the corpus. Small vocabulary set VsIs VfToA subset of. In this word (IV) is defined as VsThe word in (1) means V, out-of-vocabulary (OOV)fInstead of VsThe word in (1). The sms-30m dataset was also used as a training set and Mandarin spontaneous dialog test set (about 25 hours, 3K utterances) for ASR (automatic speech recognition) re-execution tasks.
1. By small vocabulary sets VsTo train the LSTM language model and all OOV words are treated as a single entity<UNK>Symbol, called "UNK".
2. By complete vocabulary set VfThe LSTM language model is retrained, referred to as "retraining".
For LSTM language models with structured word embedding, a small vocabulary set V is used in the training phasesUpdating model vocabulary to V in testing stagef。
To keep with the size of the proposed model, the input embedding size of the LSTM baseline is set to 600 and the output embedding size is set to 300. In the LSTM language model with structured embedding, the syntactic embedding size is set to 300. We use 1-layer 5000-300FNN for FOFE encoding, where 5000 is the character set VcSet FOFE's α to 0.7, set the bias term for the new word to 0, fine tune these two empirical parameters in the active set the most frequent 8192 words are selected as high frequency words, with the other words considered low frequency words in our model.
Complexity evaluation
The results of the confusability evaluations are shown in Table 2. In particular, for "unk" LSTM, the calculation of the PPL for OOV words is replaced by equation (6). The results show that the proposed Structure Embedding (SE) method has similar performance to the uk LSTM. Retrained LSTM, however, performs worse. For further investigation, we performed PPL calculations for the in-vocabulary (IV) and out-of-vocabulary (OOV) words separately for each model. The results of the experiment are shown in table 3. The un LSTM performs best in IV words at the expense of OOV words, because its OOV words have a very high PPL. Retrained LSTM greatly increases the PPL of OOV words and decreases in IV words relative to uk LSTM. Our approach further improves PPL in OOV words with similar performance in IV words.
TABLE 2 comparison of complexity between different OOV binding methods
TABLE 3 complexity resolution of words inside and outside the vocabulary
Fast vocabulary update in ASR
In an Automatic Speech Recognition (ASR) system, a back-off n-gram is used as a language model to generate a lattice from which an n-best list is generated. The n-best list can then be re-adjusted using the neural network language model for better performance. Typically, the n-gram and neural network language models share the same vocabulary. Thus, both the n-gram and neural network language models need to be retrained when the vocabulary is updated. Compared with the neural network language model, the training time of the n-element language model can be ignored.
This experiment was divided into two stages. In the first stage, the LSTM language model and the LSTM language model with Structured Embedding (SE) are trained using the small vocabulary representations as uk LSTM and SE, respectively. An n-gram language model trained with Vs is used to generate the n-best list. We then used the uk LSTM to perform n-best list re-scoring. In the second stage, the vocabulary Vs is expanded to a larger vocabulary Vf. As the vocabulary changes, we need to retrain the unkLSTM and n-gram models. But the vocabulary of LSTM with SE is reconstructed without retraining. The retrained LSTM and the LSTM with SE are then used to redefine the n-best list generated by the new n-gram.
TABLE 4 character error Rate comparison and decomposition of in-vocabulary sentences and out-of-vocabulary sentences
The results of the experiment are shown in table 4. With the benefit of lexical expansion, retrained LSTM achieved an absolute 0.38% CER improvement in all sentences. The proposed LSTM model with structured embedding (LSTM with SE) achieves the best performance. To investigate what sentences can be derived from the proposed model to get the greatest benefit, we divided the re-scored sentences into two categories, called intra-lexical sentences (IVS) and extra-lexical sentences (OOVS), depending on whether all words are present in Vs. As shown in Table 4, the unk LSTM trained with Vs has a higher CER for out-of-vocabulary sentences because the n-grams constructed from Vs cannot produce these OOV words. By expanding the vocabulary, the retrained LSTM yields a significant improvement in CER for out-of-vocabulary sentences. Compared to retrained LSTM, the proposed model outperforms CER on both IV and OOV sentences. Moreover, the improvement in CER for OOV sentences (1.13% absolute) is significantly higher than the improvement in CER for IV sentences (0.13% absolute), meaning LSTM with SE has better ability to model OOV words. Please note that by using the proposed structured word embedded LSTM language model, the model retraining time in the conventional method can be saved and better performance can be achieved.
It should be noted that for simplicity of explanation, the foregoing method embodiments are described as a series of acts or combination of acts, but those skilled in the art will appreciate that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
As shown in FIG. 7, an embodiment of the present invention further provides a word embedding language model training system 700, including:
a word list generation program module 710, configured to determine attributes of all words in the corpus to generate a word list, where the attributes include part-of-speech classifications of all words, probability distributions of all part-of-speech classifications, and probability distributions of all words under the part-of-speech classifications to which the words belong;
a word vector generator module 720, configured to generate word vectors of all words in the word list;
a part-of-speech classification vector generation program module 730 for generating part-of-speech classification vectors corresponding to the part-of-speech classifications of all the words in the vocabulary;
and the model training program module 740 is configured to train by taking word vectors of words in the word list and part-of-speech classification vectors of the words in the word list as inputs and taking probability distribution of part-of-speech classifications to which the words in the word list belong and probability distribution of the words in the word list under the part-of-speech classifications to which the words belong as outputs, so as to obtain the word embedding language model.
As shown in fig. 8, in some embodiments, the word vector generator module 720 includes:
a frequency word determination program unit 721 configured to determine whether a word obtained from the vocabulary is a low frequency word;
a first word vector generating program unit 722, configured to, when it is determined that a word obtained from the word list is a low-frequency word, disassemble the word obtained from the word list into words, and encode the disassembled words to determine corresponding word vectors;
a second word vector generating program unit 723, configured to, when it is determined that a word obtained from the word list is a high-frequency word, extract a vector of the word obtained from the word list as a word vector.
As shown in fig. 9, an embodiment of the present invention further provides a word recognition system 900, including:
the word embedding language model 910 described in the above embodiments of the present invention;
a word vector generating program module 920, configured to generate a word vector of a word to be recognized;
a vocabulary generating program module 930 for determining a part-of-speech classification vector of the part-of-speech classification of the word to be recognized;
a word recognition program module 940, configured to input the word vector and the part-of-speech classification vector of the word to be recognized into the word embedding language model, so as to obtain a probability distribution of the part-of-speech classification to which the word to be recognized belongs and a probability distribution of the word to be recognized under the part-of-speech classification to which the word to be recognized belongs.
As shown in fig. 10, in some embodiments, when the word to be recognized belongs to a vocabulary for training the word embedding language model, the word vector generator module 920 includes:
a frequency word judging program unit 921 for judging whether the word to be recognized is a low frequency word;
the first word vector generation program unit 922 is configured to, when it is determined that a word obtained from the word list is a low-frequency word, parse the word to be recognized into a word, and encode the parsed word to determine a corresponding word vector;
and a second word vector generation program unit 923 configured to, when it is determined that the word obtained from the word list is a high-frequency word, extract a vector of the word to be recognized as a word vector.
As shown in fig. 11, in some embodiments, when the word to be recognized does not belong to the vocabulary used for training the word embedding language model, the word vector generator module 920 includes:
a vocabulary updating program unit 921' for determining the attribute of the word to be recognized to update the vocabulary;
the word vector generator 922' is configured to disassemble the word to be recognized into words, and encode the disassembled words to determine corresponding word vectors.
In some embodiments, the present invention provides a non-transitory computer-readable storage medium, in which one or more programs including executable instructions are stored, and the executable instructions can be read and executed by an electronic device (including but not limited to a computer, a server, or a network device, etc.) to perform any one of the word embedding language model training method and/or the word recognition method of the present invention.
In some embodiments, the present invention further provides a computer program product comprising a computer program stored on a non-volatile computer-readable storage medium, the computer program comprising program instructions that, when executed by a computer, cause the computer to perform any of the above word embedding language model training methods and/or word recognition methods.
In some embodiments, an embodiment of the present invention further provides an electronic device, which includes: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of a word embedding language model training method and/or a word recognition method.
In some embodiments, the present invention further provides a storage medium having a computer program stored thereon, wherein the program is executed by a processor to perform the steps of a word embedding language model training method and/or a word recognition method.
The system and/or the input method system for implementing the language model construction according to the embodiment of the present invention may be used to execute the method and/or the input method for implementing the language model construction according to the embodiment of the present invention, and accordingly achieve the technical effects achieved by the method and/or the input method for implementing the language model construction according to the embodiment of the present invention, and are not described herein again.
In the embodiment of the present invention, the relevant functional module may be implemented by a hardware processor (hardware processor).
Fig. 12 is a schematic hardware structure diagram of an electronic device for a word embedding language model training method and/or a word recognition method according to another embodiment of the present application, where as shown in fig. 12, the device includes:
one or more processors 1210 and a memory 1220, with one processor 1210 being an example in fig. 12.
The apparatus for performing the method for implementing the word embedding language model training method and/or the word recognition method may further include: an input device 1230 and an output device 1240.
The processor 1210, memory 1220, input device 1230, and output device 1240 may be connected by a bus or other means, such as by a bus connection in fig. 12.
The memory 1220 is a non-volatile computer-readable storage medium, and can be used for storing non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the method for implementing the word embedding language model training and/or the word recognition method in the embodiments of the present application. The processor 1210 executes various functional applications of the server and data processing by executing nonvolatile software programs, instructions and modules stored in the memory 1220, that is, implementing the above method embodiments to implement the word embedding language model training method and/or the word recognition method.
The memory 1220 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the stored data area may store data created from use of the device for implementing word embedding language model training and/or the word recognition device, and the like. Further, the memory 1220 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 1220 optionally includes memory located remotely from processor 1210, and such remote memory may be connected to the word embedding language model training device and/or the word recognition device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 1230 may receive input numeric or character information and generate signals related to user settings and function controls of the word embedding language model training device and/or the word recognition device. The output device 1240 may include a display device such as a display screen.
The one or more modules are stored in the memory 1220 and, when executed by the one or more processors 1210, perform the word embedding language model training method and/or the word recognition method of any of the method embodiments described above.
The product can execute the method provided by the embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the methods provided in the embodiments of the present application.
The electronic device of the embodiments of the present application exists in various forms, including but not limited to:
(1) mobile communication devices, which are characterized by mobile communication capabilities and are primarily targeted at providing voice and data communications. Such terminals include smart phones (e.g., iphones), multimedia phones, functional phones, and low-end phones, among others.
(2) The ultra-mobile personal computer equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include PDA, MID, and UMPC devices, such as ipads.
(3) Portable entertainment devices such devices may display and play multimedia content. Such devices include audio and video players (e.g., ipods), handheld game consoles, electronic books, as well as smart toys and portable car navigation devices.
(4) The server is similar to a general computer architecture, but has higher requirements on processing capability, stability, reliability, safety, expandability, manageability and the like because of the need of providing highly reliable services.
(5) And other electronic devices with data interaction functions.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a general hardware platform, and certainly can also be implemented by hardware. Based on such understanding, the above technical solutions substantially or contributing to the related art may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.