CN110288081A - A kind of Recursive Networks model and learning method based on FW mechanism and LSTM - Google Patents
A kind of Recursive Networks model and learning method based on FW mechanism and LSTM Download PDFInfo
- Publication number
- CN110288081A CN110288081A CN201910476156.XA CN201910476156A CN110288081A CN 110288081 A CN110288081 A CN 110288081A CN 201910476156 A CN201910476156 A CN 201910476156A CN 110288081 A CN110288081 A CN 110288081A
- Authority
- CN
- China
- Prior art keywords
- data
- module
- unit
- recursive networks
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Machine Translation (AREA)
Abstract
The present invention relates to a kind of Recursive Networks model and learning method based on FW mechanism and LSTM, belongs to recurrent neural network and natural language processing technique field.Learning method including Recursive Networks model and support based on FW mechanism and LSTM;The former includes data import modul, data generation module, load and iteration module, parameter setting module, definition module, Recursive Networks training, assessment and test module;Learning method includes: 1 importing data;2, which will import data, is split to obtain training data, assessment data and test data;3, according to data are imported, obtain pre-set configuration parameter;4 complete the initialization of weight parameter;Training, assessment and test data are sent into LSTM unit by 5 calculates output vector;6 calculate loss function, optimize to network parameter, export complexity.The network model and learning method further improve the accuracy and convergence rate of LSTM model treatment.
Description
Technical field
The present invention relates to a kind of Recursive Networks model and learning method based on FW mechanism and LSTM, belongs to recurrent neural net
Network and natural language processing technique field.
Background technique
Natural Language Processing Models generally use recurrent neural network (Recurrent Neural Network, RNN) knot
Structure.RNN is made of the variable for hiding both time scales of layer state and weight.Layer state is hidden in each time stepping
It will be updated primary;And weight is then updated after all information of sequence fully enter network again.Therefore, it represents in network
The weight of each interlayer connection relationship often corresponds to " long-term memory " of network.But each layer relationship change of real network with pass
Into often related to list entries length, it may be possible to 3,5 time steppings, it is also possible to which 30,50 time steppings need more
Newly.
Language model based on LSTM unit is one of widely used improvement network of RNN, which passes through to text
The training of notebook data, according to the next word that will occur in this section of text of text prediction of input.Network parameter initial shape
State is initialized with null vector, and is updated after reading each word.Model is when handling input data using reversed
The optimization of Law of Communication progress network parameter.The paragraph of input data, i.e., several sentence compositions is divided into the input of regular length
Block, there are also regular length words for each input block, and backpropagation is executed after having handled an input block to network parameter
It is updated.
Jimmy Ba et al. proposes quick weight (Fast Weights, FW) mechanism, i.e. the introducing update cycle is in hidden
New variables between hiding layer state and weight both time scales stores the hiding layer state quickly updated, for sequence
To series model study have been demonstrated it is largely effective.For above-mentioned consideration, retaining existing hiding layer state and criteria weights
While introduce new variable, the update cycle of this variable is longer than the hidden layer update cycle, but more than criteria weights
The new period is shorter, also referred to as quick weight.
In terms of neural metwork training, it need to generally pass through complicated and time-consuming processing, preferable learning performance could be obtained,
It needs the higher time and calculates cost.Therefore, researchers are to reduce this time and calculate cost, often at selection batch
Reason.
Wherein, batch canonical turns to one of those very typical technology, however its recurrent neural network effect simultaneously
It is unobvious.Therefore, G.Hinton et al. proposes a layer regularization (layer normalization, LN), is implemented as calculating
In recurrent neural network in some training sample on a hidden layer state of all hidden units mean value and standard deviation.
LN solves the overflow problem during hidden layer updated value as training becomes more in quick weight mechanism for solving.
The evaluation index parameter for measuring language model performance is complexity perplexity and loss.Wherein,
Perplexity representation language model averagely may be used after learning text data according to a word under the Word prediction before sentence
Select quantity.For example, a sequence is formed at random by five letters of A, B, C, D, E are irregular, then when predicting next letter,
There are 5 equiprobable options, then the value of complexity is 5.Therefore, if as soon as the complexity of language model be K, illustrate language
Speech model averagely has K word to possess selection of the identical probability as reasonable prediction in the word that prediction will occur.Its
In, K is integer, is the sum of target word.By taking PTB model as an example, the complexity of PTB model performance index is evaluated
The calculation formula of perplexity value is (1):
Wherein, PtargetiIndicate i-th of target word, ln is logarithmic function;
Another evaluation index parameter loss for measuring language model performance is defined as the average negative of target word probability of occurrence
Logarithm, expression formula such as (2):
Perplexity value and the relationship of loss are (3):
Perplexity=eloss (3)
When language model learns to be mutually related logical relation between word and word in sentence, the learning ability of model
Stronger, when word next according to the Word prediction occurred before, alternative word quantity is fewer, corresponding complexity
Perplexity is lower.So complexity perplexity can be well reflected the learning performance of network.Complexity
Perplexity is lower, and the ability for representing next word in neural network forecast sentence is stronger, and effect is also better.
Summary of the invention
It is existing strong in processing association in time degree based on LSTM recurrent neural network it is an object of the invention to further be promoted
Natural language when there are the state of the art that complexity performance needs to be further improved propose it is a kind of based on FW mechanism and LSTM
Recursive Networks model and learning method.
The Recursive Networks model and learning method based on FW mechanism and LSTM includes passing based on FW mechanism and LSTM
The learning method returning network model and being relied on;
Wherein, the Recursive Networks model based on FW mechanism and LSTM include data import modul, data generation module,
Load with iteration module, parameter setting module, definition module, Recursive Networks training module, Recursive Networks evaluation module with
And Recursive Networks test module;
Wherein, data generation module includes data split cells again;Load with iteration module include data loading unit and
Iteration unit;
Data split cells includes training data generation unit, assessment data generating unit and Test data generation list
Member;
Recursive Networks training module includes dropout unit, updating unit and result storage element;Recursive Networks assess mould
Block and Recursive Networks test module only include updating unit and result storage element;
Wherein, updating unit includes long memory unit and quick weight unit in short-term;
The connection relationship of each module is as follows in the Recursive Networks model based on FW mechanism and LSTM:
Data import modul is connected with data generation module, and data generation module and load are connected with iteration module, parameter
Setting module and load be connected with iteration module and definition module, Recursive Networks training module and load and iteration mould
Block, Recursive Networks evaluation module and definition module are connected, Recursive Networks evaluation module and load and iteration module, recurrence
Network training module, Recursive Networks test module and definition module are connected;Recursive Networks test module is with load and repeatedly
It is connected for module, Recursive Networks evaluation module with definition module;
The connection relationship of each unit is as follows in data generation module: training data in data split cells, assessment data and
Test data is connected with training label generation unit, assessment tag generation unit and test label generation unit respectively;
Load as follows with the connection relationship of each unit in iteration module: data loading unit and iteration unit are connected;
The signal of each module generates in the Recursive Networks model based on FW mechanism and LSTM and output relation is as follows:
The output of data import modul accesses data generation module;Access load and iteration mould after data generation module processing
Block;Parameter setting module provides input parameter and FW model parameter for load and iteration module and definition module;Load with
Iteration module is respectively that Recursive Networks training module, Recursive Networks evaluation module and Recursive Networks test module provide training data
With training label, assessment data and assessment tag and test data and test label;Definition module is by FW model parameter
Recursive Networks training module, Recursive Networks evaluation module and Recursive Networks test module are inputted respectively;Recursive Networks train mould
Trained network parameter is sent into Recursive Networks evaluation module by block;Recursive Networks evaluation module send the network parameter after assessment
Enter Recursive Networks test module;
Each unit connection relationship in Recursive Networks training module, evaluation module and test module is as follows:
Dropout unit receives data and is simultaneously connected with long memory unit in short-term, long memory unit in short-term and data input and
Quick weight unit is connected, and as a result storage element is connected with quick weight unit and result.
The Recursive Networks model based on FW mechanism and LSTM and the learning method relied on, include the following steps:
Step 1: the data wait train and test are imported through data import modul, specifically:
By reading text path, text data is obtained;
Step 2: data generation module splits the data imported through data import modul through data split cells,
Respectively obtain training data, assessment data and test data;
Wherein, it splits specifically: in short split the text data that step 1 imports according to every j character;
Wherein, the value range of j is 5 to 50;
It is generated Step 3: training data generation unit randomly selects data of the x% ratio after the fractionation of data split cells
Training set;Assessment data generating unit randomly selects data of the y% ratio after the fractionation of data split cells and generates assessment collection;It surveys
Examination data generating unit randomly selects data generating test set of the z% ratio after the fractionation of data split cells;
Wherein, x%+y%+z%=1;
Step 4: training label generation unit moves back each data in training set that training data generation unit generates
One obtains training label;After each data is concentrated in the assessment that assessment tag generation unit will assess data generating unit generation
It moves one and obtains assessment tag;Each data in test set are moved back one and obtain test label by test label generation unit;
Step 5: parameter setting module imports the scale of model of text according to data import modul, configuration parameter is obtained,
The configuration parameter input parameter setting module that will acquire again;
Wherein, configuration parameter include initial scale, learning rate, greatest gradient regular value, the number of plies, step number, hidden layer size,
Maximum epoch number, very big epoch value, dropout rate, attenuation rate, batch size and vocab size;
Step 6: load is with the data loading unit in iteration module according to the configuration parameter obtained in parameter setting module
The data in training set, assessment collection and test set are loaded, and set initialization data serial number i as 1;
Step 7: definition module is being configured according to the configuration parameter in parameter setting module using pseudo-random function
Random value is generated in range as weight matrix parameter, completes the initialization of weight parameter;
Step 8: whether the data that load judges that current data is concentrated with iteration module in iteration module are sent, and
It is operated according to judging result, specifically:
If the data that current data is concentrated have not been sent, i-th group of data is sent, judgement is trained, assesses or surveys
Examination, and step 9 is skipped to, skip to step 8;Otherwise stop iteration;
Step 9: judging whether current data is training data, if then taking out according to dropout rate to input data
It takes, data after extraction skip to step 10;Otherwise, step 10 is skipped to;
Step 10: the data of step 9 input are sent into the length in updating unit memory unit and quick weight list in short-term
Output vector is calculated in member, while being optimized using gradient descent method to network, specifically:
Step 10.1 updating unit is based on input layer weight Wx, criteria weights Wh and calculates the hiding layer state of starting, passes through public affairs
Formula (4) calculates the initial hidden of current t moment:
h0 t=f (LN (Wx*xt+Wh*ht-1)) (4)
Wherein, input layer weight is denoted as Wx, criteria weights are denoted as Wh;h0Layer state is hidden to originate, LN is layer regularization
Function;F is activation primitive;xtFor the input layer data of current t moment;ht-1For the previous moment at current time, i.e. t-1 moment,
The corresponding data of layer state are hidden, layer state is referred to as hidden;
Preferably, activation primitive f is one of SeLU function, Leaky Relu function and Swish function;
Criteria weights Wh is the weight that hidden layer is propagated to next time stepping in RNN network;Input layer weight Wx is
Weight of the input layer to hiding Es-region propagations;
Step 10.2, quick weight unit calculate quick weight, calculate especially by formula (5):
WA(t)=λ WA(t-1)+ηht-1hT t-1 (5)
Wherein, WA(t) be t moment quick weight, be the weight only acted upon in each time stepping of hidden layer;One
The total degree that a time stepping updates, is denoted as s+1 times;It is learning rate, h that λ, which is attenuation rate, η,t-1For t-1 moment corresponding hidden layer
State;hT t-1It is ht-1That is the transposition of t-1 moment corresponding hiding layer state;
Wherein, s, that is, step number in total degree s+1 that time stepping updates;
Wherein, the value range of attenuation rate is 0.9 to 0.995, and the value range of learning rate is 0.3 to 0.8;
Step 10.3, quick weight unit, which calculate, to be hidden layer state and updates s hiding layer state;
Step 10.4, at a slow speed weight unit calculate normalized output;
Wherein, the normalized output of network is realized by Softmax or sigmoid function one of both;
Step 10.5, result storage unit calculate based on the calculated normalized output of step 10.4 calculate loss loss and
Complexity perplexity;
Step 10.6, at a slow speed weight unit judge whether to reach the last one Epoch, if do not reached, updates list
The hiding layer state of first then update and training parameter or test parameter, add 1 for current i, skip to step 8.
Beneficial effect
A kind of Recursive Networks model and learning method based on FW mechanism and LSTM of the present invention, compared with prior art, tool
It has the advantages that:
1. the Recursive Networks model introduces quick weight and LSTM mechanism, pass through the parameter of attenuation coefficient and learning rate
Optimization, so that being increased substantially with storing the study accuracy of the network model of short-term memory information;
2. the method for the invention is compared with existing LSTM model and the RNN model of the quick weight of introducing, the instruction of model
Practice the method and makes the convergence rate of training, assessment and test big using LSTM combination SeLU activation primitive and layer regularization
It is big to improve.
Detailed description of the invention
Fig. 1 is the composition of the Recursive Networks model the present invention is based on FW mechanism and LSTM and the connection schematic diagram of each module;
Fig. 2 is that the present invention is based on the compositions of data generation module in the Recursive Networks model of FW mechanism and LSTM and connection to show
It is intended to;
Fig. 3 is that the present invention is based on the composition schematic diagrams loaded in the Recursive Networks model of FW mechanism and LSTM with iteration module
And mould is assessed with data generation module, parameter setting module, definition module, Recursive Networks training module, Recursive Networks
The connection relationship of block and Recursive Networks test module;
Fig. 4 is that the present invention is based on Recursive Networks training modules, Recursive Networks in the Recursive Networks model of FW mechanism and LSTM
The relationship and composition schematic diagram of evaluation module and Recursive Networks test module three;
Fig. 5 is that the present invention is based on memory unit and quick weight lists in short-term long in the Recursive Networks model of FW mechanism and LSTM
The composition schematic diagram of member;
Fig. 6 is that the method relied on the present invention is based on the Recursive Networks model of FW mechanism and LSTM handles the big short sentence of the degree of association
The learning effect of text data set difference batch size compares;
Fig. 7 is that the method relied on the present invention is based on the Recursive Networks model of FW mechanism and LSTM handles the big short sentence of the degree of association
The log (perplexity) of text data set difference model is compared.
Specific embodiment
With reference to the accompanying drawings and examples to the present invention is based on the Recursive Networks models and learning method of FW mechanism and LSTM
It is described further and is described in detail.
Embodiment 1
This example illustrates based on the Recursive Networks model of the present invention based on FW mechanism and LSTM composition and
Workflow.
When it is implemented, corpus using popular application in the NLTK text corpus of natural language processing rich in representative
Property short sentence library --- European Union member countries meeting corpus europarl_raw is tested.
Europarl_raw corpus text data is talked with from meeting, and most of sentence is middle short sentence, and length is about
For ten words or so, clause is relatively simple, is mostly Subject, Predicate and Object structure.Specific to the present embodiment, using module pair each in Fig. 1
The data set is handled.
Fig. 1 illustrates the composition of the Recursive Networks model based on FW mechanism and LSTM and the connection of each module, can from Fig. 1
To find out, the data that data import modul imports are sent into data generation module;Data generation module generates training data, assessment
Data and test data and its label input in load and iteration module;Load connects with iteration module and definition module
The parameter of parameter setting module is received, and is connected into Recursive Networks training module, evaluation module and test module respectively, is instructed
Practice, assess and tests.
.Text data is imported by reading text path by data import modul first;It exports after importing to data
Initial data is further split into training data, assessment data and test data by generation module, data generation module, using
Training label generation unit, assessment tag generation unit and test label generation unit generate the label of each data set, knot
Structure is as shown in the connection schematic diagram of data generation module in Fig. 2.
Wherein, pre-set configuration parameter has as described in the following table 14 kinds in parameter setting module:
The respectively configuration design parameter setting of table 1
The scale of model for the text that parameter setting module is imported according to data import modul obtains as shown in Table 1 suitable
Configuration parameter, is inputted parameter setting module, is sent to definition module and load and iteration module thereafter.Model defines mould
Root tuber uses pseudo-random function to generate random value in configuration range as weight square according to the configuration parameter in parameter setting module
Battle array parameter, completes the initialization of weight parameter.
Load is instructed with the data loading unit in iteration module according to the configuration parameter load obtained in parameter setting module
Practice the data in collection, assessment collection and test set;Whether the data that iteration module judges that current data is concentrated are sent, and according to
It is judged that result is operated.If current data set is training data, output to Recursive Networks training module;If assessment number
According to then output to Recursive Networks evaluation module;If test data, then output to Recursive Networks test module.As seen from Figure 3
Load and the operation schematic diagram of iteration module and with data generation module, parameter setting module, definition module, recurrence net
The connection relationship of network training module, Recursive Networks evaluation module and Recursive Networks test module.
Fig. 4 illustrates Recursive Networks training module, Recursive Networks in the Recursive Networks model based on FW mechanism and LSTM and comments
Estimate the relationship and composition of module and Recursive Networks test module three.Recursive Networks evaluation module and Recursive Networks test mould
Block and Recursive Networks training module difference are not include dropout unit and only include updating unit and result storage element.It passs
Return network training module that trained network parameter is sent into Recursive Networks evaluation module;After Recursive Networks evaluation module will be assessed
Network parameter be sent into Recursive Networks test module.
From fig. 4, it can be seen that updating unit includes long memory unit and quick weight unit in short-term;Fig. 4, which can be seen that, to be passed
Returning network evaluation module and Recursive Networks test module and Recursive Networks training module difference to be does not include dropout unit;
It only include updating unit and result storage element.
Fig. 5 illustrates the composition of long memory unit and quick weight unit in short-term in this model.In Fig. 5, XtCorresponding t moment
Input layer data;C (t-1) and C ' (t) respectively correspond LSTM outputting and inputting in t moment memory unit C;C ' (t) again
It is updated, generates C (t) by quick weight;Input as subsequent time LSTM memory unit C;ht-1And htRespectively
The output of the LSTMcell of t-1 moment and t moment.σ in Fig. 5 is activation primitive sigmoid;Tanh is that tanh activates letter
Number.
In Fig. 5, C ' (t)=h0(t) and C (t) hs(t) the t moment memory before respectively corresponding initial quickly weight update is single
The input of t moment memory unit after the input of member, and update.
Embodiment 2
This example elaborates the method relied on based on Recursive Networks model of the present invention, the big punctuate text of the processing degree of association
The learning effect of data set compares.
Sight is transferred to stronger by relevance between sentence by we, and the place of the shorter text data of sentence length
Reason, since sentence is shorter, more focuses on inputting contacting between word and word in a short time.We are using popular application in nature
Typical short sentence library in the NLTK text corpus of Language Processing --- European Union member countries meeting corpus europarl_
Raw is tested.
When using europarl_raw corpus, 10 are uniformly set by num_steps, represents network according to every input
Ten words are a complete sentence processing.
It needs to determine suitable update times s first.
After quick weight obtains update at the time of current, circulation s times update will be carried out to hidden state, compared to
The front and back word associations of the sample data of toy game scene, text data are complex, it would be desirable to accelerate renewal frequency,
Increase the numerical value of s with the function of the bigger quick weight processing short-term memory of performance.We adjust hidden in a time step
The update times of hiding state, fixed hidden unit number are 50, batch_size 20, change S=5,6,7,8, record cast instruction
Practice effect, as shown in table 2 below:
The complexity comparison when training is to the 5,10,13rd epoch respectively of 2 different update number drag of table
Update times s | Complexity -5 | Complexity -10 | Complexity -13 |
5 | 189.380 | 108.083 | 105.231 |
6 | 145.939 | 73.875 | 71.331 |
7 | 138.889 | 68.323 | 65.946 |
8 | 139.400 | 70.049 | 67.642 |
As shown in table 2, as update times s=7, quick weight model complexity when training is to the 5th epoch is
138.889, the 10th epoch converges on 65.946 when falling to the 68.323, the 13rd epoch.
Hereafter we will determine suitable batch size.
Suitable batch size is most important for the learning performance of a network, and batch size is excessive, will lead to
What model was found when carrying out gradient descent method and finding optimal solution is local minimum rather than global minimum, and batch
Size is too small, will lead to that convergence rate is slow, and model learning effect is poor.So in order to promote the new model for introducing quick weight
Performance, it is 50 that we, which fix hidden unit quantity, and update times s is set as hereinbefore authenticated optimal value 7, forms sentence
Word number num_steps=10, change batch size be equal to 10,20,30,50, record cast training effect, such as the following table 3
It is shown:
Complexity comparison of the 3 difference batch size drag of table in the 10th epoch
As can be seen from Table 3, complexity when batch_size=20 after model convergence is minimum, in training to the 10th
Complexity is 45.139 when epoch, and when training is to the 13rd epoch, complexity is equal to 10 down to 43.344, batch size
When with 30, the complexity of model training to the 13rd epoch are about 51.In order to more intuitively represent under different batch size
The difference of complexity, we take denary logarithm log (perplexity) to complexity, compare under different batch size
Log (perplexity) difference of quick weight model, as shown in Figure 6.
In Fig. 6, abscissa is training epoch number, and it is bottom, the logarithm log of complexity that ordinate, which is with 10,
(perplexity).It can be seen that the complexity of model is minimum when being trained using every 20 epoch as batch of data, study effect
Fruit is best, and carries out the comparison of language model.
Fixed hidden unit quantity is 50, forms the word number num_steps=10 of sentence, uses SeLU function as sharp
Function living.LSTM model is compared, RNN model, quick weight is with the model of LSTM network integration and quick weight in conjunction with RNN
The training effect of model totally four models.Model training complexity is as shown in table 4:
Table 4: different models are compared based on the training complexity of europarl_raw database
Model name | Complexity -5 | Complexity -10 | Complexity -15 | Complexity -20 |
LSTM | 267.602 | 178.175 | 174.935 | 174.824 |
LSTM+FW | 90.945 | 45.139 | 43.280 | 43.208 |
RNN | 1037.719 | 421.531 | 412.841 | 412.510 |
RNN+FW | 533.806 | 378.564 | 369.842 | 369.474 |
From table 4 it can be seen that introducing the LSTM model of the quick weight complexity when training is to the 5th epoch and being
90.945, model complexity is when being further reduced to 45.139. training in the 10th epoch to the 15th epoch
43.280 model reaches convergence.When same training is to the 15th epoch, the complexity of LSTM model converges to 174.824, than
The LSTM model for introducing quick weight is higher by 131, followed by introduces the RNN network of quick weight, and complexity converges on
369.474, effect it is worst be RNN model, complexity converges on 412.510.
In order to more intuitively indicate the complexity difference of different models, complexity is taken into denary logarithm, comparison is not
With log (perplexity) difference of model, as shown in Figure 7.
It can be seen from figure 7 that the complexity after the LSTM model for introducing quick weight is restrained is minimum, model learning effect
Preferably, and it is very big with the LSTM model difference that is not introduced into quick weight, illustrate to introduce quick weight in LSTM network, model instruction
It is obvious to practice effect promoting.Complexity highest after the convergence of RNN model, the RNN model training effect after quick weight is added slightly have
It is promoted, but effect is unobvious.
The above is presently preferred embodiments of the present invention, and it is public that the present invention should not be limited to embodiment and attached drawing institute
The content opened.It is all not depart from the lower equivalent or modification completed of spirit disclosed in this invention, both fall within the model that the present invention protects
It encloses.
Claims (9)
1. the Recursive Networks model based on FW mechanism and LSTM, it is characterised in that: generate mould including data import modul, data
Block, load and iteration module, parameter setting module, definition module, Recursive Networks training module, Recursive Networks evaluation module
And Recursive Networks test module;
Wherein, data generation module includes data split cells again;Load and iteration module include data loading unit and iteration
Unit;
Data split cells includes training data generation unit, assessment data generating unit and test data generation unit;
Recursive Networks training module includes dropout unit, updating unit and result storage element;Recursive Networks evaluation module with
And Recursive Networks test module only includes updating unit and result storage element;
Wherein, updating unit includes long memory unit and quick weight unit in short-term;
The connection relationship of each module is as follows in the Recursive Networks model based on FW mechanism and LSTM:
Data import modul is connected with data generation module, and data generation module and load are connected with iteration module, parameter setting
Module and load be connected with iteration module and definition module, Recursive Networks training module and load with iteration module, pass
Network evaluation module and definition module is returned to be connected, Recursive Networks evaluation module and load and iteration module, Recursive Networks
Training module, Recursive Networks test module and definition module are connected;Recursive Networks test module and load and iteration mould
Block, Recursive Networks evaluation module are connected with definition module;
The connection relationship of each unit is as follows in data generation module: training data, assessment data and test in data split cells
Data are connected with training label generation unit, assessment tag generation unit and test label generation unit respectively;
Load as follows with the connection relationship of each unit in iteration module: data loading unit and iteration unit are connected;
The signal of each module generates in the Recursive Networks model based on FW mechanism and LSTM and output relation is as follows:
The output of data import modul accesses data generation module;Access load and iteration module after data generation module processing;
Parameter setting module provides input parameter and FW model parameter for load and iteration module and definition module;Load and iteration
Module is respectively that Recursive Networks training module, Recursive Networks evaluation module and Recursive Networks test module provide training data and instruction
Practice label, assessment data and assessment tag and test data and test label;Definition module distinguishes FW model parameter
Input Recursive Networks training module, Recursive Networks evaluation module and Recursive Networks test module;Recursive Networks training module will
Trained network parameter is sent into Recursive Networks evaluation module;Recursive Networks evaluation module passs the network parameter feeding after assessment
Return network test module;
Each unit connection relationship in Recursive Networks training module, evaluation module and test module is as follows:
Dropout unit receives data and is connected with long memory unit in short-term, and long memory unit in short-term is with data input and quickly
Weight unit is connected, and as a result storage element is connected with quick weight unit and result.
2. the learning method relied on as described in claim 1 based on the Recursive Networks model of FW mechanism and LSTM, feature are existed
In: include the following steps:
Step 1: the data wait train and test are imported through data import modul, specifically:
Step 2: data generation module splits the data imported through data import modul through data split cells, respectively
Obtain training data, assessment data and test data;
Step 3: training data generation unit, which randomly selects data of the x% ratio after the fractionation of data split cells, generates training
Collection;Assessment data generating unit randomly selects data of the y% ratio after the fractionation of data split cells and generates assessment collection;Test number
Data generating test set of the z% ratio after the fractionation of data split cells is randomly selected according to generation unit;
Step 4: each data in training set that training data generation unit generates are moved back one by training label generation unit
It obtains training label;Assessment tag generation unit concentrates each data to move back one for the assessment that data generating unit generates is assessed
Position obtains assessment tag;Each data in test set are moved back one and obtain test label by test label generation unit;
Step 5: parameter setting module imports the scale of model of text according to data import modul, configuration parameter is obtained, then will obtain
The configuration parameter input parameter setting module taken;
Wherein, configuration parameter includes initial scale, learning rate, greatest gradient regular value, the number of plies, step number, hidden layer size, maximum
Epoch number, very big epoch value, dropout rate, attenuation rate, batch size and vocab size;
Step 6: load is loaded with the data loading unit in iteration module according to the configuration parameter obtained in parameter setting module
Data in training set, assessment collection and test set, and initialization data serial number i is set as 1;
Step 7: definition module is configuring range according to the configuration parameter in parameter setting module, using pseudo-random function
Interior generation random value completes the initialization of weight parameter as weight matrix parameter;
Step 8: whether the data that load judges that current data is concentrated with iteration module in iteration module are sent, and foundation
Judging result is operated, specifically:
If the data that current data is concentrated have not been sent, i-th group of data is sent, judgement is trained, assesses or tests, and
Step 9 is skipped to, step 8 is skipped to;Otherwise stop iteration;
Step 9: judge whether current data is training data, if then being extracted according to dropout rate to input data,
Data after extraction, skip to step 10;Otherwise, step 10 is skipped to;
Step 10: the data of step 9 input are sent into the length in updating unit memory unit and quick weight unit meter in short-term
Calculation obtains output vector, while being optimized using gradient descent method to network, specifically:
Step 10.1 updating unit is based on input layer weight Wx, criteria weights Wh and calculates the hiding layer state of starting, passes through formula (4)
Calculate the initial hidden of current t moment:
h0 t=f (LN (Wx*xt+ Wh*ht-1)) (4)
Wherein, input layer weight is denoted as Wx, criteria weights are denoted as Wh;h0Layer state is hidden to originate, LN is layer Regularization function;f
For activation primitive;xtFor the input layer data of current t moment;ht-1For the previous moment at current time, i.e. t-1 moment, hidden layer
The corresponding data of state referred to as hide layer state;
Criteria weights Wh is the weight that hidden layer is propagated to next time stepping in RNN network;Input layer weight Wx is input
Weight of the layer to hiding Es-region propagations;
Step 10.2, quick weight unit calculate quick weight, calculate especially by formula (5):
WA(t)=λ WA(t-1)+ηht-1hT t-1 (5)
Wherein, WA(t) be t moment quick weight, be the weight only acted upon in each time stepping of hidden layer;At one
Between stepping update total degree, be denoted as s+1 times;It is learning rate, h that λ, which is attenuation rate, η,t-1For t-1 moment corresponding hiding stratiform
State;hT t-1It is ht-1That is the transposition of t-1 moment corresponding hiding layer state;
Wherein, s, that is, step number in total degree s+1 that time stepping updates;
Step 10.3, quick weight unit, which calculate, to be hidden layer state and updates s hiding layer state;
Step 10.4, at a slow speed weight unit calculate normalized output;
Wherein, the normalized output of network is realized by Softmax or sigmoid function one of both;
Step 10.5, result storage unit, which are calculated, calculates loss loss and complexity based on the calculated normalized output of step 10.4
Spend perplexity;
Step 10.6, at a slow speed weight unit judge whether to reach the last one Epoch, if do not reached, updating unit is then
It updates and hides layer state and training parameter or test parameter, current i is added 1, skips to step 8.
3. the learning method relied on as claimed in claim 2 based on the Recursive Networks model of FW mechanism and LSTM, feature are existed
In: step 1 obtains text data by reading text path.
4. the learning method relied on as claimed in claim 2 based on the Recursive Networks model of FW mechanism and LSTM, feature are existed
In: it in step 2, splits specifically: in short split the text data that step 1 imports according to every j character.
5. the learning method relied on as claimed in claim 4 based on the Recursive Networks model of FW mechanism and LSTM, feature are existed
In: the value range of j is 5 to 50.
6. the learning method relied on as claimed in claim 2 based on the Recursive Networks model of FW mechanism and LSTM, feature are existed
In: in step 3, x%+y%+z%=1.
7. the learning method relied on as claimed in claim 2 based on the Recursive Networks model of FW mechanism and LSTM, feature are existed
In: activation primitive f is one of SeLU function, Leaky Relu function and Swish function in step 10.1.
8. the learning method relied on as claimed in claim 2 based on the Recursive Networks model of FW mechanism and LSTM, feature are existed
In: in step 10.2, the value range of attenuation rate is 0.9 to 0.995.
9. the learning method relied on as claimed in claim 2 based on the Recursive Networks model of FW mechanism and LSTM, feature are existed
In: in step 10.2, the value range of learning rate is 0.3 to 0.8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910476156.XA CN110288081A (en) | 2019-06-03 | 2019-06-03 | A kind of Recursive Networks model and learning method based on FW mechanism and LSTM |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910476156.XA CN110288081A (en) | 2019-06-03 | 2019-06-03 | A kind of Recursive Networks model and learning method based on FW mechanism and LSTM |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110288081A true CN110288081A (en) | 2019-09-27 |
Family
ID=68003232
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910476156.XA Pending CN110288081A (en) | 2019-06-03 | 2019-06-03 | A kind of Recursive Networks model and learning method based on FW mechanism and LSTM |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110288081A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018151125A1 (en) * | 2017-02-15 | 2018-08-23 | 日本電信電話株式会社 | Word vectorization model learning device, word vectorization device, speech synthesis device, method for said devices, and program |
CN109214452A (en) * | 2018-08-29 | 2019-01-15 | 杭州电子科技大学 | Based on the HRRP target identification method for paying attention to depth bidirectional circulating neural network |
US20190087709A1 (en) * | 2016-04-29 | 2019-03-21 | Cambricon Technologies Corporation Limited | Apparatus and method for executing recurrent neural network and lstm computations |
CN109508377A (en) * | 2018-11-26 | 2019-03-22 | 南京云思创智信息科技有限公司 | Text feature, device, chat robots and storage medium based on Fusion Model |
US20190114544A1 (en) * | 2017-10-16 | 2019-04-18 | Illumina, Inc. | Semi-Supervised Learning for Training an Ensemble of Deep Convolutional Neural Networks |
-
2019
- 2019-06-03 CN CN201910476156.XA patent/CN110288081A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190087709A1 (en) * | 2016-04-29 | 2019-03-21 | Cambricon Technologies Corporation Limited | Apparatus and method for executing recurrent neural network and lstm computations |
WO2018151125A1 (en) * | 2017-02-15 | 2018-08-23 | 日本電信電話株式会社 | Word vectorization model learning device, word vectorization device, speech synthesis device, method for said devices, and program |
US20190114544A1 (en) * | 2017-10-16 | 2019-04-18 | Illumina, Inc. | Semi-Supervised Learning for Training an Ensemble of Deep Convolutional Neural Networks |
CN109214452A (en) * | 2018-08-29 | 2019-01-15 | 杭州电子科技大学 | Based on the HRRP target identification method for paying attention to depth bidirectional circulating neural network |
CN109508377A (en) * | 2018-11-26 | 2019-03-22 | 南京云思创智信息科技有限公司 | Text feature, device, chat robots and storage medium based on Fusion Model |
Non-Patent Citations (1)
Title |
---|
T. ANDERSON KELLER ET AL.: "FAST WEIGHT LONG SHORT-TERM MEMORY", 《ICLR 2018》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111858945B (en) | Deep learning-based comment text aspect emotion classification method and system | |
CN105975573B (en) | A kind of file classification method based on KNN | |
Dong et al. | A commodity review sentiment analysis based on BERT-CNN model | |
CN110188272B (en) | Community question-answering website label recommendation method based on user background | |
US11107250B2 (en) | Computer architecture for artificial image generation using auto-encoder | |
US11531824B2 (en) | Cross-lingual information retrieval and information extraction | |
CN104598611B (en) | The method and system being ranked up to search entry | |
CN112232087B (en) | Specific aspect emotion analysis method of multi-granularity attention model based on Transformer | |
WO2020140073A1 (en) | Neural architecture search through a graph search space | |
CN108879732B (en) | Transient stability evaluation method and device for power system | |
CN110516070A (en) | A kind of Chinese Question Classification method based on text error correction and neural network | |
US11195053B2 (en) | Computer architecture for artificial image generation | |
Krantsevich et al. | Stochastic tree ensembles for estimating heterogeneous effects | |
CN111695024A (en) | Object evaluation value prediction method and system, and recommendation method and system | |
CN112529684A (en) | Customer credit assessment method and system based on FWA _ DBN | |
CN116822593A (en) | Large-scale pre-training language model compression method based on hardware perception | |
CN114519508A (en) | Credit risk assessment method based on time sequence deep learning and legal document information | |
CN112989803B (en) | Entity link prediction method based on topic vector learning | |
CN112651499A (en) | Structural model pruning method based on ant colony optimization algorithm and interlayer information | |
CN116561314B (en) | Text classification method for selecting self-attention based on self-adaptive threshold | |
CN110288081A (en) | A kind of Recursive Networks model and learning method based on FW mechanism and LSTM | |
CN117851781A (en) | Household appliance demand prediction method and device, computer equipment and storage medium | |
US20240028931A1 (en) | Directed Acyclic Graph of Recommendation Dimensions | |
CN114818463B (en) | Vulnerability assessment method and system of feature-based pre-training model selection algorithm | |
CN112651168B (en) | Construction land area prediction method based on improved neural network algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20190927 |
|
WD01 | Invention patent application deemed withdrawn after publication |