CN110288081A

CN110288081A - A kind of Recursive Networks model and learning method based on FW mechanism and LSTM

Info

Publication number: CN110288081A
Application number: CN201910476156.XA
Authority: CN
Inventors: 王军茹; 卢继华; 易军凯; 徐懿; 李梦泽; 何天恺
Original assignee: Beijing Institute of Technology BIT; Beijing Information Science and Technology University
Current assignee: Beijing Institute of Technology BIT; Beijing Information Science and Technology University
Priority date: 2019-06-03
Filing date: 2019-06-03
Publication date: 2019-09-27

Abstract

The present invention relates to a kind of Recursive Networks model and learning method based on FW mechanism and LSTM, belongs to recurrent neural network and natural language processing technique field.Learning method including Recursive Networks model and support based on FW mechanism and LSTM；The former includes data import modul, data generation module, load and iteration module, parameter setting module, definition module, Recursive Networks training, assessment and test module；Learning method includes: 1 importing data；2, which will import data, is split to obtain training data, assessment data and test data；3, according to data are imported, obtain pre-set configuration parameter；4 complete the initialization of weight parameter；Training, assessment and test data are sent into LSTM unit by 5 calculates output vector；6 calculate loss function, optimize to network parameter, export complexity.The network model and learning method further improve the accuracy and convergence rate of LSTM model treatment.

Description

A kind of Recursive Networks model and learning method based on FW mechanism and LSTM

Technical field

The present invention relates to a kind of Recursive Networks model and learning method based on FW mechanism and LSTM, belongs to recurrent neural net Network and natural language processing technique field.

Background technique

Natural Language Processing Models generally use recurrent neural network (Recurrent Neural Network, RNN) knot Structure.RNN is made of the variable for hiding both time scales of layer state and weight.Layer state is hidden in each time stepping It will be updated primary；And weight is then updated after all information of sequence fully enter network again.Therefore, it represents in network The weight of each interlayer connection relationship often corresponds to " long-term memory " of network.But each layer relationship change of real network with pass Into often related to list entries length, it may be possible to 3,5 time steppings, it is also possible to which 30,50 time steppings need more Newly.

Language model based on LSTM unit is one of widely used improvement network of RNN, which passes through to text The training of notebook data, according to the next word that will occur in this section of text of text prediction of input.Network parameter initial shape State is initialized with null vector, and is updated after reading each word.Model is when handling input data using reversed The optimization of Law of Communication progress network parameter.The paragraph of input data, i.e., several sentence compositions is divided into the input of regular length Block, there are also regular length words for each input block, and backpropagation is executed after having handled an input block to network parameter It is updated.

Jimmy Ba et al. proposes quick weight (Fast Weights, FW) mechanism, i.e. the introducing update cycle is in hidden New variables between hiding layer state and weight both time scales stores the hiding layer state quickly updated, for sequence To series model study have been demonstrated it is largely effective.For above-mentioned consideration, retaining existing hiding layer state and criteria weights While introduce new variable, the update cycle of this variable is longer than the hidden layer update cycle, but more than criteria weights The new period is shorter, also referred to as quick weight.

In terms of neural metwork training, it need to generally pass through complicated and time-consuming processing, preferable learning performance could be obtained, It needs the higher time and calculates cost.Therefore, researchers are to reduce this time and calculate cost, often at selection batch Reason.

Wherein, batch canonical turns to one of those very typical technology, however its recurrent neural network effect simultaneously It is unobvious.Therefore, G.Hinton et al. proposes a layer regularization (layer normalization, LN), is implemented as calculating In recurrent neural network in some training sample on a hidden layer state of all hidden units mean value and standard deviation. LN solves the overflow problem during hidden layer updated value as training becomes more in quick weight mechanism for solving.

The evaluation index parameter for measuring language model performance is complexity perplexity and loss.Wherein, Perplexity representation language model averagely may be used after learning text data according to a word under the Word prediction before sentence Select quantity.For example, a sequence is formed at random by five letters of A, B, C, D, E are irregular, then when predicting next letter, There are 5 equiprobable options, then the value of complexity is 5.Therefore, if as soon as the complexity of language model be K, illustrate language Speech model averagely has K word to possess selection of the identical probability as reasonable prediction in the word that prediction will occur.Its In, K is integer, is the sum of target word.By taking PTB model as an example, the complexity of PTB model performance index is evaluated The calculation formula of perplexity value is (1):

Wherein, Ptarget_iIndicate i-th of target word, ln is logarithmic function；

Another evaluation index parameter loss for measuring language model performance is defined as the average negative of target word probability of occurrence Logarithm, expression formula such as (2):

Perplexity value and the relationship of loss are (3):

Perplexity=e^loss (3)

When language model learns to be mutually related logical relation between word and word in sentence, the learning ability of model Stronger, when word next according to the Word prediction occurred before, alternative word quantity is fewer, corresponding complexity Perplexity is lower.So complexity perplexity can be well reflected the learning performance of network.Complexity Perplexity is lower, and the ability for representing next word in neural network forecast sentence is stronger, and effect is also better.

Summary of the invention

It is existing strong in processing association in time degree based on LSTM recurrent neural network it is an object of the invention to further be promoted Natural language when there are the state of the art that complexity performance needs to be further improved propose it is a kind of based on FW mechanism and LSTM Recursive Networks model and learning method.

The Recursive Networks model and learning method based on FW mechanism and LSTM includes passing based on FW mechanism and LSTM The learning method returning network model and being relied on；

Wherein, the Recursive Networks model based on FW mechanism and LSTM include data import modul, data generation module, Load with iteration module, parameter setting module, definition module, Recursive Networks training module, Recursive Networks evaluation module with And Recursive Networks test module；

Wherein, data generation module includes data split cells again；Load with iteration module include data loading unit and Iteration unit；

Data split cells includes training data generation unit, assessment data generating unit and Test data generation list Member；

Recursive Networks training module includes dropout unit, updating unit and result storage element；Recursive Networks assess mould Block and Recursive Networks test module only include updating unit and result storage element；

Wherein, updating unit includes long memory unit and quick weight unit in short-term；

The connection relationship of each module is as follows in the Recursive Networks model based on FW mechanism and LSTM:

Data import modul is connected with data generation module, and data generation module and load are connected with iteration module, parameter Setting module and load be connected with iteration module and definition module, Recursive Networks training module and load and iteration mould Block, Recursive Networks evaluation module and definition module are connected, Recursive Networks evaluation module and load and iteration module, recurrence Network training module, Recursive Networks test module and definition module are connected；Recursive Networks test module is with load and repeatedly It is connected for module, Recursive Networks evaluation module with definition module；

The connection relationship of each unit is as follows in data generation module: training data in data split cells, assessment data and Test data is connected with training label generation unit, assessment tag generation unit and test label generation unit respectively；

Load as follows with the connection relationship of each unit in iteration module: data loading unit and iteration unit are connected；

The signal of each module generates in the Recursive Networks model based on FW mechanism and LSTM and output relation is as follows:

The output of data import modul accesses data generation module；Access load and iteration mould after data generation module processing Block；Parameter setting module provides input parameter and FW model parameter for load and iteration module and definition module；Load with Iteration module is respectively that Recursive Networks training module, Recursive Networks evaluation module and Recursive Networks test module provide training data With training label, assessment data and assessment tag and test data and test label；Definition module is by FW model parameter Recursive Networks training module, Recursive Networks evaluation module and Recursive Networks test module are inputted respectively；Recursive Networks train mould Trained network parameter is sent into Recursive Networks evaluation module by block；Recursive Networks evaluation module send the network parameter after assessment Enter Recursive Networks test module；

Each unit connection relationship in Recursive Networks training module, evaluation module and test module is as follows:

Dropout unit receives data and is simultaneously connected with long memory unit in short-term, long memory unit in short-term and data input and Quick weight unit is connected, and as a result storage element is connected with quick weight unit and result.

The Recursive Networks model based on FW mechanism and LSTM and the learning method relied on, include the following steps:

Step 1: the data wait train and test are imported through data import modul, specifically:

By reading text path, text data is obtained；

Step 2: data generation module splits the data imported through data import modul through data split cells, Respectively obtain training data, assessment data and test data；

Wherein, it splits specifically: in short split the text data that step 1 imports according to every j character；

Wherein, the value range of j is 5 to 50；

It is generated Step 3: training data generation unit randomly selects data of the x% ratio after the fractionation of data split cells Training set；Assessment data generating unit randomly selects data of the y% ratio after the fractionation of data split cells and generates assessment collection；It surveys Examination data generating unit randomly selects data generating test set of the z% ratio after the fractionation of data split cells；

Wherein, x%+y%+z%=1；

Step 4: training label generation unit moves back each data in training set that training data generation unit generates One obtains training label；After each data is concentrated in the assessment that assessment tag generation unit will assess data generating unit generation It moves one and obtains assessment tag；Each data in test set are moved back one and obtain test label by test label generation unit；

Step 5: parameter setting module imports the scale of model of text according to data import modul, configuration parameter is obtained, The configuration parameter input parameter setting module that will acquire again；

Wherein, configuration parameter include initial scale, learning rate, greatest gradient regular value, the number of plies, step number, hidden layer size, Maximum epoch number, very big epoch value, dropout rate, attenuation rate, batch size and vocab size；

Step 6: load is with the data loading unit in iteration module according to the configuration parameter obtained in parameter setting module The data in training set, assessment collection and test set are loaded, and set initialization data serial number i as 1；

Step 7: definition module is being configured according to the configuration parameter in parameter setting module using pseudo-random function Random value is generated in range as weight matrix parameter, completes the initialization of weight parameter；

Step 8: whether the data that load judges that current data is concentrated with iteration module in iteration module are sent, and It is operated according to judging result, specifically:

If the data that current data is concentrated have not been sent, i-th group of data is sent, judgement is trained, assesses or surveys Examination, and step 9 is skipped to, skip to step 8；Otherwise stop iteration；

Step 9: judging whether current data is training data, if then taking out according to dropout rate to input data It takes, data after extraction skip to step 10；Otherwise, step 10 is skipped to；

Step 10: the data of step 9 input are sent into the length in updating unit memory unit and quick weight list in short-term Output vector is calculated in member, while being optimized using gradient descent method to network, specifically:

Step 10.1 updating unit is based on input layer weight Wx, criteria weights Wh and calculates the hiding layer state of starting, passes through public affairs Formula (4) calculates the initial hidden of current t moment:

h⁰ _t=f (LN (W_x*x_t+W_h*h_t-1)) (4)

Wherein, input layer weight is denoted as Wx, criteria weights are denoted as Wh；h₀Layer state is hidden to originate, LN is layer regularization Function；F is activation primitive；x_tFor the input layer data of current t moment；h_t-1For the previous moment at current time, i.e. t-1 moment, The corresponding data of layer state are hidden, layer state is referred to as hidden；

Preferably, activation primitive f is one of SeLU function, Leaky Relu function and Swish function；

Criteria weights Wh is the weight that hidden layer is propagated to next time stepping in RNN network；Input layer weight Wx is Weight of the input layer to hiding Es-region propagations；

Step 10.2, quick weight unit calculate quick weight, calculate especially by formula (5):

W_A(t)=λ W_A(t-1)+ηh_t-1h^T _t-1 (5)

Wherein, W_A(t) be t moment quick weight, be the weight only acted upon in each time stepping of hidden layer；One The total degree that a time stepping updates, is denoted as s+1 times；It is learning rate, h that λ, which is attenuation rate, η,_t-1For t-1 moment corresponding hidden layer State；h^T _t-1It is h_t-1That is the transposition of t-1 moment corresponding hiding layer state；

Wherein, s, that is, step number in total degree s+1 that time stepping updates；

Wherein, the value range of attenuation rate is 0.9 to 0.995, and the value range of learning rate is 0.3 to 0.8；

Step 10.3, quick weight unit, which calculate, to be hidden layer state and updates s hiding layer state；

Step 10.4, at a slow speed weight unit calculate normalized output；

Wherein, the normalized output of network is realized by Softmax or sigmoid function one of both；

Step 10.5, result storage unit calculate based on the calculated normalized output of step 10.4 calculate loss loss and Complexity perplexity；

Step 10.6, at a slow speed weight unit judge whether to reach the last one Epoch, if do not reached, updates list The hiding layer state of first then update and training parameter or test parameter, add 1 for current i, skip to step 8.

Beneficial effect

A kind of Recursive Networks model and learning method based on FW mechanism and LSTM of the present invention, compared with prior art, tool It has the advantages that:

1. the Recursive Networks model introduces quick weight and LSTM mechanism, pass through the parameter of attenuation coefficient and learning rate Optimization, so that being increased substantially with storing the study accuracy of the network model of short-term memory information；

2. the method for the invention is compared with existing LSTM model and the RNN model of the quick weight of introducing, the instruction of model Practice the method and makes the convergence rate of training, assessment and test big using LSTM combination SeLU activation primitive and layer regularization It is big to improve.

Detailed description of the invention

Fig. 1 is the composition of the Recursive Networks model the present invention is based on FW mechanism and LSTM and the connection schematic diagram of each module；

Fig. 2 is that the present invention is based on the compositions of data generation module in the Recursive Networks model of FW mechanism and LSTM and connection to show It is intended to；

Fig. 3 is that the present invention is based on the composition schematic diagrams loaded in the Recursive Networks model of FW mechanism and LSTM with iteration module And mould is assessed with data generation module, parameter setting module, definition module, Recursive Networks training module, Recursive Networks The connection relationship of block and Recursive Networks test module；

Fig. 4 is that the present invention is based on Recursive Networks training modules, Recursive Networks in the Recursive Networks model of FW mechanism and LSTM The relationship and composition schematic diagram of evaluation module and Recursive Networks test module three；

Fig. 5 is that the present invention is based on memory unit and quick weight lists in short-term long in the Recursive Networks model of FW mechanism and LSTM The composition schematic diagram of member；

Fig. 6 is that the method relied on the present invention is based on the Recursive Networks model of FW mechanism and LSTM handles the big short sentence of the degree of association The learning effect of text data set difference batch size compares；

Fig. 7 is that the method relied on the present invention is based on the Recursive Networks model of FW mechanism and LSTM handles the big short sentence of the degree of association The log (perplexity) of text data set difference model is compared.

Specific embodiment

With reference to the accompanying drawings and examples to the present invention is based on the Recursive Networks models and learning method of FW mechanism and LSTM It is described further and is described in detail.

Embodiment 1

This example illustrates based on the Recursive Networks model of the present invention based on FW mechanism and LSTM composition and Workflow.

When it is implemented, corpus using popular application in the NLTK text corpus of natural language processing rich in representative Property short sentence library --- European Union member countries meeting corpus europarl_raw is tested.

Europarl_raw corpus text data is talked with from meeting, and most of sentence is middle short sentence, and length is about For ten words or so, clause is relatively simple, is mostly Subject, Predicate and Object structure.Specific to the present embodiment, using module pair each in Fig. 1 The data set is handled.

Fig. 1 illustrates the composition of the Recursive Networks model based on FW mechanism and LSTM and the connection of each module, can from Fig. 1 To find out, the data that data import modul imports are sent into data generation module；Data generation module generates training data, assessment Data and test data and its label input in load and iteration module；Load connects with iteration module and definition module The parameter of parameter setting module is received, and is connected into Recursive Networks training module, evaluation module and test module respectively, is instructed Practice, assess and tests.

.Text data is imported by reading text path by data import modul first；It exports after importing to data Initial data is further split into training data, assessment data and test data by generation module, data generation module, using Training label generation unit, assessment tag generation unit and test label generation unit generate the label of each data set, knot Structure is as shown in the connection schematic diagram of data generation module in Fig. 2.

Wherein, pre-set configuration parameter has as described in the following table 14 kinds in parameter setting module:

The respectively configuration design parameter setting of table 1

The scale of model for the text that parameter setting module is imported according to data import modul obtains as shown in Table 1 suitable Configuration parameter, is inputted parameter setting module, is sent to definition module and load and iteration module thereafter.Model defines mould Root tuber uses pseudo-random function to generate random value in configuration range as weight square according to the configuration parameter in parameter setting module Battle array parameter, completes the initialization of weight parameter.

Load is instructed with the data loading unit in iteration module according to the configuration parameter load obtained in parameter setting module Practice the data in collection, assessment collection and test set；Whether the data that iteration module judges that current data is concentrated are sent, and according to It is judged that result is operated.If current data set is training data, output to Recursive Networks training module；If assessment number According to then output to Recursive Networks evaluation module；If test data, then output to Recursive Networks test module.As seen from Figure 3 Load and the operation schematic diagram of iteration module and with data generation module, parameter setting module, definition module, recurrence net The connection relationship of network training module, Recursive Networks evaluation module and Recursive Networks test module.

Fig. 4 illustrates Recursive Networks training module, Recursive Networks in the Recursive Networks model based on FW mechanism and LSTM and comments Estimate the relationship and composition of module and Recursive Networks test module three.Recursive Networks evaluation module and Recursive Networks test mould Block and Recursive Networks training module difference are not include dropout unit and only include updating unit and result storage element.It passs Return network training module that trained network parameter is sent into Recursive Networks evaluation module；After Recursive Networks evaluation module will be assessed Network parameter be sent into Recursive Networks test module.

From fig. 4, it can be seen that updating unit includes long memory unit and quick weight unit in short-term；Fig. 4, which can be seen that, to be passed Returning network evaluation module and Recursive Networks test module and Recursive Networks training module difference to be does not include dropout unit； It only include updating unit and result storage element.

Fig. 5 illustrates the composition of long memory unit and quick weight unit in short-term in this model.In Fig. 5, X_tCorresponding t moment Input layer data；C (t-1) and C ' (t) respectively correspond LSTM outputting and inputting in t moment memory unit C；C ' (t) again It is updated, generates C (t) by quick weight；Input as subsequent time LSTM memory unit C；h_t-1And h_tRespectively The output of the LSTMcell of t-1 moment and t moment.σ in Fig. 5 is activation primitive sigmoid；Tanh is that tanh activates letter Number.

In Fig. 5, C ' (t)=h₀(t) and C (t) h_s(t) the t moment memory before respectively corresponding initial quickly weight update is single The input of t moment memory unit after the input of member, and update.

Embodiment 2

This example elaborates the method relied on based on Recursive Networks model of the present invention, the big punctuate text of the processing degree of association The learning effect of data set compares.

Sight is transferred to stronger by relevance between sentence by we, and the place of the shorter text data of sentence length Reason, since sentence is shorter, more focuses on inputting contacting between word and word in a short time.We are using popular application in nature Typical short sentence library in the NLTK text corpus of Language Processing --- European Union member countries meeting corpus europarl_ Raw is tested.

When using europarl_raw corpus, 10 are uniformly set by num_steps, represents network according to every input Ten words are a complete sentence processing.

It needs to determine suitable update times s first.

After quick weight obtains update at the time of current, circulation s times update will be carried out to hidden state, compared to The front and back word associations of the sample data of toy game scene, text data are complex, it would be desirable to accelerate renewal frequency, Increase the numerical value of s with the function of the bigger quick weight processing short-term memory of performance.We adjust hidden in a time step The update times of hiding state, fixed hidden unit number are 50, batch_size 20, change S=5,6,7,8, record cast instruction Practice effect, as shown in table 2 below:

The complexity comparison when training is to the 5,10,13rd epoch respectively of 2 different update number drag of table

Update times s	Complexity -5	Complexity -10	Complexity -13
				5	189.380	108.083	105.231
6	145.939	73.875	71.331
				7	138.889	68.323	65.946
8	139.400	70.049	67.642

As shown in table 2, as update times s=7, quick weight model complexity when training is to the 5th epoch is 138.889, the 10th epoch converges on 65.946 when falling to the 68.323, the 13rd epoch.

Hereafter we will determine suitable batch size.

Suitable batch size is most important for the learning performance of a network, and batch size is excessive, will lead to What model was found when carrying out gradient descent method and finding optimal solution is local minimum rather than global minimum, and batch Size is too small, will lead to that convergence rate is slow, and model learning effect is poor.So in order to promote the new model for introducing quick weight Performance, it is 50 that we, which fix hidden unit quantity, and update times s is set as hereinbefore authenticated optimal value 7, forms sentence Word number num_steps=10, change batch size be equal to 10,20,30,50, record cast training effect, such as the following table 3 It is shown:

Complexity comparison of the 3 difference batch size drag of table in the 10th epoch

As can be seen from Table 3, complexity when batch_size=20 after model convergence is minimum, in training to the 10th Complexity is 45.139 when epoch, and when training is to the 13rd epoch, complexity is equal to 10 down to 43.344, batch size When with 30, the complexity of model training to the 13rd epoch are about 51.In order to more intuitively represent under different batch size The difference of complexity, we take denary logarithm log (perplexity) to complexity, compare under different batch size Log (perplexity) difference of quick weight model, as shown in Figure 6.

In Fig. 6, abscissa is training epoch number, and it is bottom, the logarithm log of complexity that ordinate, which is with 10, (perplexity).It can be seen that the complexity of model is minimum when being trained using every 20 epoch as batch of data, study effect Fruit is best, and carries out the comparison of language model.

Fixed hidden unit quantity is 50, forms the word number num_steps=10 of sentence, uses SeLU function as sharp Function living.LSTM model is compared, RNN model, quick weight is with the model of LSTM network integration and quick weight in conjunction with RNN The training effect of model totally four models.Model training complexity is as shown in table 4:

Table 4: different models are compared based on the training complexity of europarl_raw database

Model name	Complexity -5	Complexity -10	Complexity -15	Complexity -20
					LSTM	267.602	178.175	174.935	174.824
LSTM+FW	90.945	45.139	43.280	43.208
					RNN	1037.719	421.531	412.841	412.510
RNN+FW	533.806	378.564	369.842	369.474

From table 4 it can be seen that introducing the LSTM model of the quick weight complexity when training is to the 5th epoch and being 90.945, model complexity is when being further reduced to 45.139. training in the 10th epoch to the 15th epoch 43.280 model reaches convergence.When same training is to the 15th epoch, the complexity of LSTM model converges to 174.824, than The LSTM model for introducing quick weight is higher by 131, followed by introduces the RNN network of quick weight, and complexity converges on 369.474, effect it is worst be RNN model, complexity converges on 412.510.

In order to more intuitively indicate the complexity difference of different models, complexity is taken into denary logarithm, comparison is not With log (perplexity) difference of model, as shown in Figure 7.

It can be seen from figure 7 that the complexity after the LSTM model for introducing quick weight is restrained is minimum, model learning effect Preferably, and it is very big with the LSTM model difference that is not introduced into quick weight, illustrate to introduce quick weight in LSTM network, model instruction It is obvious to practice effect promoting.Complexity highest after the convergence of RNN model, the RNN model training effect after quick weight is added slightly have It is promoted, but effect is unobvious.

The above is presently preferred embodiments of the present invention, and it is public that the present invention should not be limited to embodiment and attached drawing institute The content opened.It is all not depart from the lower equivalent or modification completed of spirit disclosed in this invention, both fall within the model that the present invention protects It encloses.

Claims

1. the Recursive Networks model based on FW mechanism and LSTM, it is characterised in that: generate mould including data import modul, data Block, load and iteration module, parameter setting module, definition module, Recursive Networks training module, Recursive Networks evaluation module And Recursive Networks test module；

Wherein, data generation module includes data split cells again；Load and iteration module include data loading unit and iteration Unit；

Data split cells includes training data generation unit, assessment data generating unit and test data generation unit；

Recursive Networks training module includes dropout unit, updating unit and result storage element；Recursive Networks evaluation module with And Recursive Networks test module only includes updating unit and result storage element；

Data import modul is connected with data generation module, and data generation module and load are connected with iteration module, parameter setting Module and load be connected with iteration module and definition module, Recursive Networks training module and load with iteration module, pass Network evaluation module and definition module is returned to be connected, Recursive Networks evaluation module and load and iteration module, Recursive Networks Training module, Recursive Networks test module and definition module are connected；Recursive Networks test module and load and iteration mould Block, Recursive Networks evaluation module are connected with definition module；

The connection relationship of each unit is as follows in data generation module: training data, assessment data and test in data split cells Data are connected with training label generation unit, assessment tag generation unit and test label generation unit respectively；

The output of data import modul accesses data generation module；Access load and iteration module after data generation module processing； Parameter setting module provides input parameter and FW model parameter for load and iteration module and definition module；Load and iteration Module is respectively that Recursive Networks training module, Recursive Networks evaluation module and Recursive Networks test module provide training data and instruction Practice label, assessment data and assessment tag and test data and test label；Definition module distinguishes FW model parameter Input Recursive Networks training module, Recursive Networks evaluation module and Recursive Networks test module；Recursive Networks training module will Trained network parameter is sent into Recursive Networks evaluation module；Recursive Networks evaluation module passs the network parameter feeding after assessment Return network test module；

Dropout unit receives data and is connected with long memory unit in short-term, and long memory unit in short-term is with data input and quickly Weight unit is connected, and as a result storage element is connected with quick weight unit and result.

2. the learning method relied on as described in claim 1 based on the Recursive Networks model of FW mechanism and LSTM, feature are existed In: include the following steps:

Step 3: training data generation unit, which randomly selects data of the x% ratio after the fractionation of data split cells, generates training Collection；Assessment data generating unit randomly selects data of the y% ratio after the fractionation of data split cells and generates assessment collection；Test number Data generating test set of the z% ratio after the fractionation of data split cells is randomly selected according to generation unit；

Step 4: each data in training set that training data generation unit generates are moved back one by training label generation unit It obtains training label；Assessment tag generation unit concentrates each data to move back one for the assessment that data generating unit generates is assessed Position obtains assessment tag；Each data in test set are moved back one and obtain test label by test label generation unit；

Step 5: parameter setting module imports the scale of model of text according to data import modul, configuration parameter is obtained, then will obtain The configuration parameter input parameter setting module taken；

Wherein, configuration parameter includes initial scale, learning rate, greatest gradient regular value, the number of plies, step number, hidden layer size, maximum Epoch number, very big epoch value, dropout rate, attenuation rate, batch size and vocab size；

Step 6: load is loaded with the data loading unit in iteration module according to the configuration parameter obtained in parameter setting module Data in training set, assessment collection and test set, and initialization data serial number i is set as 1；

Step 7: definition module is configuring range according to the configuration parameter in parameter setting module, using pseudo-random function Interior generation random value completes the initialization of weight parameter as weight matrix parameter；

Step 8: whether the data that load judges that current data is concentrated with iteration module in iteration module are sent, and foundation Judging result is operated, specifically:

If the data that current data is concentrated have not been sent, i-th group of data is sent, judgement is trained, assesses or tests, and Step 9 is skipped to, step 8 is skipped to；Otherwise stop iteration；

Step 9: judge whether current data is training data, if then being extracted according to dropout rate to input data, Data after extraction, skip to step 10；Otherwise, step 10 is skipped to；

Step 10: the data of step 9 input are sent into the length in updating unit memory unit and quick weight unit meter in short-term Calculation obtains output vector, while being optimized using gradient descent method to network, specifically:

Step 10.1 updating unit is based on input layer weight Wx, criteria weights Wh and calculates the hiding layer state of starting, passes through formula (4) Calculate the initial hidden of current t moment:

h⁰ _t=f (LN (W_x*x_t+ W_h*h_t-1)) (4)

Wherein, input layer weight is denoted as Wx, criteria weights are denoted as Wh；h₀Layer state is hidden to originate, LN is layer Regularization function；f For activation primitive；x_tFor the input layer data of current t moment；h_t-1For the previous moment at current time, i.e. t-1 moment, hidden layer The corresponding data of state referred to as hide layer state；

Criteria weights Wh is the weight that hidden layer is propagated to next time stepping in RNN network；Input layer weight Wx is input Weight of the layer to hiding Es-region propagations；

W_A(t)=λ W_A(t-1)+ηh_t-1h^T _t-1 (5)

Wherein, W_A(t) be t moment quick weight, be the weight only acted upon in each time stepping of hidden layer；At one Between stepping update total degree, be denoted as s+1 times；It is learning rate, h that λ, which is attenuation rate, η,_t-1For t-1 moment corresponding hiding stratiform State；h^T _t-1It is h_t-1That is the transposition of t-1 moment corresponding hiding layer state；

Step 10.4, at a slow speed weight unit calculate normalized output；

Step 10.5, result storage unit, which are calculated, calculates loss loss and complexity based on the calculated normalized output of step 10.4 Spend perplexity；

Step 10.6, at a slow speed weight unit judge whether to reach the last one Epoch, if do not reached, updating unit is then It updates and hides layer state and training parameter or test parameter, current i is added 1, skips to step 8.

3. the learning method relied on as claimed in claim 2 based on the Recursive Networks model of FW mechanism and LSTM, feature are existed In: step 1 obtains text data by reading text path.

4. the learning method relied on as claimed in claim 2 based on the Recursive Networks model of FW mechanism and LSTM, feature are existed In: it in step 2, splits specifically: in short split the text data that step 1 imports according to every j character.

5. the learning method relied on as claimed in claim 4 based on the Recursive Networks model of FW mechanism and LSTM, feature are existed In: the value range of j is 5 to 50.

6. the learning method relied on as claimed in claim 2 based on the Recursive Networks model of FW mechanism and LSTM, feature are existed In: in step 3, x%+y%+z%=1.

7. the learning method relied on as claimed in claim 2 based on the Recursive Networks model of FW mechanism and LSTM, feature are existed In: activation primitive f is one of SeLU function, Leaky Relu function and Swish function in step 10.1.

8. the learning method relied on as claimed in claim 2 based on the Recursive Networks model of FW mechanism and LSTM, feature are existed In: in step 10.2, the value range of attenuation rate is 0.9 to 0.995.

9. the learning method relied on as claimed in claim 2 based on the Recursive Networks model of FW mechanism and LSTM, feature are existed In: in step 10.2, the value range of learning rate is 0.3 to 0.8.