CN106886516A - The method and device of automatic identification statement relationship and entity - Google Patents
The method and device of automatic identification statement relationship and entity Download PDFInfo
- Publication number
- CN106886516A CN106886516A CN201710108288.8A CN201710108288A CN106886516A CN 106886516 A CN106886516 A CN 106886516A CN 201710108288 A CN201710108288 A CN 201710108288A CN 106886516 A CN106886516 A CN 106886516A
- Authority
- CN
- China
- Prior art keywords
- entity
- read statement
- relation
- deep learning
- statement
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 41
- 239000013598 vector Substances 0.000 claims abstract description 49
- 238000013135 deep learning Methods 0.000 claims abstract description 46
- 238000012549 training Methods 0.000 claims abstract description 22
- 230000006870 function Effects 0.000 claims description 15
- 230000009466 transformation Effects 0.000 claims description 6
- 230000008859 change Effects 0.000 claims description 3
- 230000003252 repetitive effect Effects 0.000 claims description 2
- 238000002372 labelling Methods 0.000 abstract description 5
- 230000014509 gene expression Effects 0.000 description 23
- 238000013527 convolutional neural network Methods 0.000 description 5
- 238000000605 extraction Methods 0.000 description 3
- 230000007547 defect Effects 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000007935 neutral effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000012797 qualification Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Machine Translation (AREA)
Abstract
The invention belongs to intelligent identification technology field, there is provided the method and device of a kind of automatic identification statement relationship and entity.The method of automatic identification statement relationship of the invention and entity includes:During the read statement of user projected into a space for fixed dimension, sentence vector of the read statement in the space of the fixed dimension is obtained;By the good deep learning grader of sentence vector input training in advance, the relation classification of the read statement is obtained;If identifying relation classification, the entity in the read statement is recognized.The method and system that the present invention is provided, using deep learning, judge user input from semantically, can precisely recognize relation;Entity recognition is modeled as sequence labelling problem, optimal mark is solved using condition random field, so as to precisely recognize entity;With reference to deep learning and condition random field, the automatic decimation of relation and entity is realized.
Description
Technical field
The present invention relates to Intelligent Recognition art field, and in particular to the method and dress of a kind of automatic identification statement relationship and entity
Put.
Background technology
In interactive system, we be often required to identifying user whether be express some specific areas information, than
Such as hobby, pet name information;If user is to express these information, we often there is a need for being able to accurately extract these letters
Cease signified specific object.Generally, these information can be indicated by relation and entity.Relation is primarily referred to as user and exists
Which type of information is expressed, than such as whether being hobby, pet name etc.;And entity refers to then the signified specific object of relation.Such as use
" I likes eating spicy hot pot " is expressed at family, and corresponding relation is " liking ", and corresponding entity is " spicy hot pot ".In conversational system
In, how this specific area of automatic identification relation and entity be a problem for having much challenge.
The most frequently used method to recognize relation and entity mainly has two kinds:Based on keyword and based on regular expression.
Method based on keyword is mainly by keyword to recognize relation.By taking hobby as an example, if user input
" liking " one word is included in sentence, is taken as liking in expression;If comprising " not liking " one word, be taken as in expression not
Like.Then the entity of the relation is extracted in conjunction with grammer dependency analysis or semantic character labeling (SRL).Such as " I likes
Joyous Zhou Jielun ", wherein comprising liking, the method based on keyword thinks that the words is in expression " liking ";By dependency analysis
It is recognised that " Zhou Jielun " depends on core word " liking ", thus like pair as if " Zhou Jielun ", that is, the entity for identifying is
" Zhou Jielun ".The shortcoming of the method based on keyword is the presence of substantial amounts of erroneous judgement, i.e., the sentence comprising certain keyword and differ
It is fixed necessarily to express the relation.Take as a example by the hobby in face, user input " I is also unable to say for certain whether like Zhou Jielun at present " is inner
Face both includes keyword " liking ", and the meaning of expression is but a kind of uncertain state.If including " liking " according to the inside, just
It is considered to like relation, just loses unavoidably biased.This example is disclosed still cannot be judged in itself only according to keyword
Go out relation, because the Limited information that keyword is included.Included in itself than keyword for the information required for judgement relation
The big situation of information, such as " being unable to say for certain and whether like " information for being included will than the information content of single " liking " one word
Greatly, the method based on keyword is just helpless.
In order to solve the problems, such as above, people generally add more qualifications using regular expression, so as to enter
Row relation judges and entity is extracted.Relation is such as liked to recognize by regular expression " I likes (.*) ", represents there was only sentence
Included in son " I likes ", just relation is liked in expression at last;" (.*) " below is represented and is followed all behind " I likes "
Word, is regarded as the object liked, i.e. entity.Such as " I likes Zhou Jielun ", the relation that can be recognized is " liking ", real
Body is " Zhou Jielun ".
Method based on regular expression there is also with the same shortcoming of the method based on keyword, that is, there is substantial amounts of mistake
Sentence, the situation for being not belonging to the relation is also identified as the relation.Another of method based on regular expression has the disadvantage reality
The function that body is extracted is more fragile, can usually extract the entity of mistake.Such as " I likes Zhou Jielun just to blame " meets above
" I likes (.*) " pattern, and the meaning is completely contradicted, user's expression is the relation not liked.If according to it is above just
Then, system identification is the relation liked, and like pair as if " Zhou Jielun just monster ";Under such case, relation and entity are all
Identification mistake.
Another of method based on keyword and regular expression has the disadvantage to be difficult to safeguard.Due to natural language expressing
Diversity is, it is necessary to substantial amounts of keyword and regular expression cover various situations.And with keyword and canonical table
Up to increasing for formula, system can also become very complicated.Newly-increased keyword and regular expression be possible to it is existing in keyword and
Regular expression mutually conflicts.What is worse, this conflict is generally more hidden, and people are generally difficult to judge whether this in advance
Plant conflict.Many situations are after going wrong, by the root of tracing problem, just to find the conflict being originally between rule
Caused.
Entity is extracted based on SRL or dependence also perfect not to the utmost.Due to Chinese expression complexity, SRL or
Dependence accuracy rate in itself is not just high.Under this accuracy situation not high, various rules are recycled to carry out Entity recognition,
Its precision can also be affected, and cause the problem that entity extraction is inaccurate.
In sum, the defect of prior art is as follows:
1st, relation judges inaccurate problem.Only according to keyword or canonical, sentence language in itself is not accounted for
Justice, so as to cause relation to be judged by accident.
2nd, the inaccurate problem of entity extraction.Extracted according to regular expression, SRL, syntactic analysis, dependency analysis
Entity, is easily influenceed by the precision that the method exists in itself, causes entity extraction mistake.
3rd, increasing with rule, system complexity is uprised, it is difficult to judge in advance newly-increased rule whether can with it is original
Rule it is compatible, therefore system is difficult to safeguard.
The content of the invention
The automatic identification statement relationship and the method and device of entity provided for defect of the prior art, the present invention,
Using deep learning, user input is judged from semantically, can precisely recognize relation;Entity recognition is modeled as sequence
Mark problem, solves optimal mark, so as to precisely recognize entity using condition random field;With reference to deep learning and condition random
, realize the automatic decimation of relation and entity.
In a first aspect, the method for a kind of automatic identification statement relationship of present invention offer and entity, including:By the defeated of user
Enter during sentence projects to a space for fixed dimension, obtain sentence of the read statement in the space of the fixed dimension
Vector;By the good deep learning grader of sentence vector input training in advance, the relation classification of the read statement is obtained;
If identifying relation classification, the entity in the read statement is recognized.
Automatic identification statement relationship and the method for entity that the present invention is provided, using deep learning, from semantically to user
Read statement judged, can precisely recognize relation, be favorably improved the degree of accuracy of Entity recognition.
Preferably, it is described that the read statement of user projected into a space for fixed dimension, obtain the input language
Sentence vector of the sentence in the space of the fixed dimension, including:Read statement to user carries out participle;By searching
Word2vec term vectors, corresponding term vector is converted into by each participle;According to the term vector of each participle, the input is obtained
Sentence vector of the sentence in a space for fixed dimension.
Preferably, the deep learning grader that sentence vector input training in advance is good, obtains the input
The relation classification of sentence, including:By the input of sentence vector, CNN layers carries out convolution operation, obtains the office of the read statement
Portion's feature;The local feature is input into LSTM layers, the relation coding between the front and rear word in the read statement is obtained;By institute
Stating ReLU layers of relation coding input carries out nonlinear transformation;Nonlinear transformation result is passed into output layer, the input is obtained
The relation classification of sentence.
Preferably, the deep learning grader includes CNN layers of multiple.
Preferably, the deep learning grader includes LSTM layers of multiple.
Preferably, the output layer of the deep learning grader uses Softmax functions or Sigmoid functions.
Preferably, the entity in the identification read statement, including:The read statement is input into CRF models, is obtained
Optimal sequence to the read statement is marked, and the entity in the read statement is obtained according to optimal sequence mark.
Preferably, the training step of the deep learning grader includes:The sentence vector input of training sample is advance
The deep learning grader of structure, the projected relationship classification LP of training sample is obtained by feedforward;By loss function F (LP, L)
Loss values are obtained, wherein, L is the relation classification of the actual mark of sample, and loss values are the difference degree between LP and L, according to institute
Loss values are stated, gradient backpropagation is carried out using stochastic gradient descent, change the parameter of the deep learning grader;Iteration
The deep learning grader is trained, until the projected relationship classification and the actual mark of sample of deep learning grader output
The other loss values of relation object be less than threshold value set in advance, or iterations exceed frequency threshold value set in advance.
Preferably, the loss function can be cross entropy or mean square error.
Second aspect, a kind of automatic identification statement relationship and the device of entity that the present invention is provided, including:Pretreatment mould
Block, in the read statement of user projected into a space for fixed dimension, obtains the read statement in the fixation
Sentence vector in the space of dimension;Relation recognition module, for sentence vector to be input into the good depth of training in advance
Grader is practised, the relation classification of the read statement is obtained;Entity recognition module, if for identifying relation classification, recognizing
Entity in the read statement.
Automatic identification statement relationship and the device of entity that the present invention is provided, using deep learning, from semantically to user
Read statement judged, can precisely recognize relation, be favorably improved the degree of accuracy of Entity recognition.
Brief description of the drawings
A kind of automatic identification statement relationship and the flow chart of the method for entity that Fig. 1 is provided by the embodiment of the present invention;
A kind of automatic identification statement relationship and the structured flowchart of the device of entity that Fig. 2 is provided by the embodiment of the present invention;
Fig. 3 is the deep learning framework that deep learning grader provided in an embodiment of the present invention is used.
Specific embodiment
The embodiment of technical solution of the present invention is described in detail below in conjunction with accompanying drawing.Following examples are only used for
Technical scheme is clearly illustrated, therefore is intended only as example, and protection of the invention can not be limited with this
Scope.
It should be noted that unless otherwise indicated, technical term used in this application or scientific terminology should be this hair
The ordinary meaning that bright one of ordinary skill in the art are understood.
As shown in figure 1, the method for a kind of automatic identification statement relationship provided in an embodiment of the present invention and entity, including:
Step S1, during the read statement of user projected into a space for fixed dimension, obtains read statement in fixation
Sentence vector in the space of dimension.
Step S2, by the good deep learning grader of sentence vector input training in advance, obtains the relation object of read statement
Not.
Step S3, if identifying relation classification, the entity in identified input sentence.
Wherein, entity first must be a noun, and entity refers to a self-existent object, such as name or
Person's things name etc., but do not include pronoun, such as " I " " you " " he ".Such as, read statement is " I likes Zhou Jielun ", reality therein
Body is " Zhou Jielun ".
Automatic identification statement relationship and the method for entity that the present embodiment is provided, using deep learning, from semantically to
The read statement at family judged, can precisely recognize relation, is favorably improved the degree of accuracy of Entity recognition.
Wherein, the preferred embodiment of step S1 is as follows, including:
Step S11, the read statement to user carries out participle.
Step S12, by searching word2vec term vectors, corresponding term vector is converted into by each participle.
Step S13, according to the term vector of each participle, obtains sentence of the read statement in a space for fixed dimension
Vector.
Wherein, the concrete methods of realizing of step S11~step S13 is as follows:
Participle is carried out to read statement, if vocabulary quantity gives up the vocabulary of overage more than N.N is to preset
Read statement vocabulary quantity maximum, such as N be 25.Because user is input into the form of chatting, N values are not
It is very big.By statistics, user chat when, the number of words being input into when most is within 10 words.
By searching word2vec term vectors, each participle is converted into corresponding term vector.Might as well assume each word to
The dimension of amount is M, such as M is 300 dimensions.Wherein, Word2vec term vectors are good off-line trainings, need to only be called related disclosed
Interface, by searching Word2vec term vectors, participle vocabulary is converted into corresponding term vector.
These term vectors are spliced.If vocabulary lazy weight N, 0 is mended below, until formed NM dimension to
Amount.Such as N is 300 for 25, M, if user input only has 23 vocabulary, except splice this 23 300 dimension term vectors it
Outward, in addition it is also necessary to fill 20 vectors of M dimensions later, that is, fill 2 × 300 zero (i.e. 600 zero).This kind of vector of filling M dimensions 0
Way be called padding.
By above step, in read statement being projected into a space for fixed dimension, such as above example
It is in projecting to N × M dimension spaces, if N is 300 for 25, M, then in projecting to the spaces of 25 × 300 dimensions.
Vector representation of the read statement in N × M dimension spaces is the sentence vector of the read statement.
Wherein, the deep learning framework that the deep learning grader in step S2 is used is as shown in figure 3, the bottom is using volume
Product neutral net (Convolutional Neural Network, CNN), for the sentence extracted from read statement vector
Convolution operation is carried out, the local feature of read statement is obtained, it is preferred to use two-layer CNN is superimposed, and can get more abstract
Local feature;The local feature is passed through as the input of time recurrent neural network (Long Short-Term Memory, LSTM)
Two-layer LSTM is crossed, the dependence between front and rear word in sentence is encoded;The relation coding for obtaining passes to activation letter again
Several layers (Rectified Linear Units, ReLu), carries out nonlinear transformation;Nonlinear transformation result passes to output layer,
Finally give the relation classification of read statement.Wherein, output layer can use Softmax functions or Sigmoid functions, if adopting
Softmax functions are used, then deep learning grader is output as many-valued output, such as, for preference categories device, can be modeled as
Multi-class Classifier:Like, do not like, other;According to Sigmoid functions, then to be output as two-value defeated for deep learning grader
Go out, such as, for pet name grader, two-value grader can be modeled as:The pet name, other.
Based on above-mentioned deep learning framework, the training for carrying out having supervision by the labeled data of specific area so that depth
Study strategies and methods can accurately and efficiently recognize the relation classification represented in sentence, the training step bag of deep learning grader
Include:
Step S21, the deep learning grader that the sentence vector input of training sample is built in advance, by feedforward
(forward pass) obtains the projected relationship classification LP of training sample.
Step S22, loss values are obtained by loss function F (LP, L).Wherein, LP is projected relationship classification, and L is sample reality
The relation classification of border mark, loss values have weighed the difference journey between the relation classification of projected relationship classification and the actual mark of sample
Degree, F can be cross entropy (Cross Entropy) or mean square error (MSE, Mean Squared Error).
Step S23, according to loss values, carries out backward pass and (is also back using stochastic gradient descent (SGD)
Propagation, gradient backpropagation), change the parameter of deep learning grader so that the deep learning classification after modification
Relation classification of the projected relationship classification of device output closer to the actual mark of sample.
Step S24, repetitive exercise deep learning grader, until deep learning grader output projected relationship classification with
The other loss values of relation object of the actual mark of sample are less than threshold value set in advance, or iterations exceedes set in advance time
Number threshold value.
The framework that above-mentioned deep learning grader is used, can well model the succession in sentence between vocabulary and close
System.For this reason, this framework has suitable sensitiveness to negative word, can distinguish such as " I likes Zhou Jielun " and " I
Like the Zhou Jielun just strange " as difference, while being also capable of identify that situation of " I does not like Zhou Jielun " so expression negative
And the situation of " I is not not like Zhou Jielun " so multiple negative.
Identification entity can be modeled as sequence labelling problem, specifically, to each character in sentence, be labeled as
BMESO, wherein B (Begin) expression are the beginning characters of entity, and M (Middle) expressions are the intermediate character of entity, E (End) table
Show be entity termination character, S (Single) represents the entity of single character composition.For the character of non-physical, O can be used
(Other) it is labeled, expression is not belonging to the part of entity.Such as " I/happiness/joyous/week/outstanding person/human relations ", the O/ for me can be marked
The joyous O/ weeks B/ outstanding person's M/ human relations E of happiness O/ ", wherein BME altogether, is obtained " Zhou Jielun ", and the entity that expression is liked is " Zhou Jielun ";Compare again
As " I/happiness/joyous/song ", can mark as my the joyous O/ songs S of O/ happinesses O/ " and, wherein S represents single character entity, likes here
Entity is " song ".
Entity recognition problem can solve optimal mark with condition random field, so as to accurately extract the reality in sentence
Body, therefore, the preferred embodiment that step S3 is used is as follows:Read statement is input into CRF models, the optimal sequence of read statement is obtained
Mark, the entity in read statement is obtained according to optimal sequence mark.
Wherein, the detailed process that the optimal sequence for obtaining read statement by CRF models is marked is as follows:
Sequence labelling problem can be solved by condition random field.Formally, for given read statement x (i.e.
One character string) and annotated sequence y based on the sequence, condition random field modeled conditional probability:
Wherein, exp (x) represents ex, e is natural constant, and w can be the weight vectors of training, wTIt is the transposition of vectorial w, y'
It is all possible marks of sequence x, F (x, y) is characteristic vectors of the annotated sequence y on x.Conditional probability p (y | x, w) represent
The given weight w in the case of, character string x is marked into the possibility size of annotated sequence y.
Given n is to training data { xi,yi, solve following object function:
Optimal w can be found by the method for stochastic gradient descent (SGD).
Find after optimal w, for each possible mark y', we can calculate its corresponding p (y'| x, w)
Value.Optimal mark y is so that the maximum annotated sequences of p (y | x, w).In order to improve calculating performance, can be calculated by Viterbi
Method finds optimal annotated sequence.
After finding optimal annotated sequence, then marked by BME therein or S and accurately to extract the reality in sentence
Body.
Based on the method identical inventive concept with above-mentioned automatic identification statement relationship and entity, the embodiment of the present invention is also carried
The device of a kind of automatic identification statement relationship and entity has been supplied, including:Pretreatment module 101, for by the read statement of user
Project in a space for fixed dimension, obtain sentence vector of the read statement in the space of fixed dimension;Relation recognition
Module 102, for by the good deep learning grader of sentence vector input training in advance, obtaining the relation classification of read statement;
Entity recognition module 103, if for identifying relation classification, the entity in identified input sentence.
The method and device of automatic identification statement relationship provided in an embodiment of the present invention and entity, using deep learning, from
Semantically the read statement to user judges, can precisely recognize relation;Entity recognition is modeled as sequence labelling problem,
Optimal mark is solved using condition random field, so as to precisely recognize entity;With reference to deep learning and condition random field, pass is realized
System and the automatic decimation of entity;Using machine learning, relation and entity are judged from semantically, overcome due to nature
Language performance diversity brings influence.Such as " I likes the song of Zhou Jielun ", " song of Zhou Jielun is my favorite ", " love is dead
The song of Zhou Jielun " can be identified as in expression " liking " relation, and the object liked is then " song of Zhou Jielun ".Separately
Outward, method and system provided in an embodiment of the present invention are more readily maintained compared to traditional method.If necessary to increase coverage rate, only
The data for needing addition new, train new model.
Finally it should be noted that:Various embodiments above is merely illustrative of the technical solution of the present invention, rather than its limitations;To the greatest extent
Pipe has been described in detail with reference to foregoing embodiments to the present invention, it will be understood by those within the art that:Its according to
The technical scheme described in foregoing embodiments can so be modified, or which part or all technical characteristic are entered
Row equivalent;And these modifications or replacement, the essence of appropriate technical solution is departed from various embodiments of the present invention technology
The scope of scheme, it all should cover in the middle of the scope of claim of the invention and specification.
Claims (10)
1. a kind of method of automatic identification statement relationship and entity, it is characterised in that including:
During the read statement of user projected into a space for fixed dimension, the read statement is obtained in the fixed dimension
Space in sentence vector;
By the good deep learning grader of sentence vector input training in advance, the relation classification of the read statement is obtained;
If identifying relation classification, the entity in the read statement is recognized.
2. method according to claim 1, it is characterised in that described that the read statement of user is projected into a fixed dimension
In the space of degree, sentence vector of the read statement in the space of the fixed dimension is obtained, including:
Read statement to user carries out participle;
By searching word2vec term vectors, each participle is converted into corresponding term vector;
According to the term vector of each participle, sentence vector of the read statement in a space for fixed dimension is obtained.
3. method according to claim 2, it is characterised in that the depth that sentence vector input training in advance is good
Degree Study strategies and methods, obtain the relation classification of the read statement, including:
By the input of sentence vector, CNN layers carries out convolution operation, obtains the local feature of the read statement;
The local feature is input into LSTM layers, the relation coding between the front and rear word in the read statement is obtained;
By the relation coding input, ReLU layers carries out nonlinear transformation;
Nonlinear transformation result is passed into output layer, the relation classification of the read statement is obtained.
4. method according to claim 3, it is characterised in that the deep learning grader includes CNN layers of multiple.
5. method according to claim 3, it is characterised in that the deep learning grader includes LSTM layers of multiple.
6. method according to claim 3, it is characterised in that the output layer of the deep learning grader is used
Softmax functions or Sigmoid functions.
7. method according to claim 1, it is characterised in that the entity in the identification read statement, including:
The read statement is input into CRF models, the optimal sequence mark of the read statement is obtained, according to the optimal sequence
Mark obtains the entity in the read statement.
8. method according to claim 1, it is characterised in that the training step of the deep learning grader includes:
The deep learning grader that the sentence vector input of training sample is built in advance, the pre- of training sample is obtained by feedforward
Survey relation classification LP;
Loss values are obtained by loss function F (LP, L), wherein, L is the relation classification of the actual mark of sample, loss values for LP and
Difference degree between L,
According to the loss values, gradient backpropagation is carried out using stochastic gradient descent, change the deep learning grader
Parameter;
Deep learning grader described in repetitive exercise, until the projected relationship classification and sample of deep learning grader output
The other loss values of relation object of actual mark are less than threshold value set in advance, or iterations exceedes number of times threshold set in advance
Value.
9. method according to claim 8, it is characterised in that the loss function is cross entropy or mean square error.
10. the device of a kind of automatic identification statement relationship and entity, it is characterised in that including:
Pretreatment module, in the read statement of user projected into a space for fixed dimension, obtains the input language
Sentence vector of the sentence in the space of the fixed dimension;
Relation recognition module, for by the good deep learning grader of sentence vector input training in advance, obtaining described defeated
Enter the relation classification of sentence;
Entity recognition module, if for identifying relation classification, recognizing the entity in the read statement.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710108288.8A CN106886516A (en) | 2017-02-27 | 2017-02-27 | The method and device of automatic identification statement relationship and entity |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710108288.8A CN106886516A (en) | 2017-02-27 | 2017-02-27 | The method and device of automatic identification statement relationship and entity |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106886516A true CN106886516A (en) | 2017-06-23 |
Family
ID=59180680
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710108288.8A Pending CN106886516A (en) | 2017-02-27 | 2017-02-27 | The method and device of automatic identification statement relationship and entity |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106886516A (en) |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107316654A (en) * | 2017-07-24 | 2017-11-03 | 湖南大学 | Emotion identification method based on DIS NV features |
CN107451433A (en) * | 2017-06-27 | 2017-12-08 | 中国科学院信息工程研究所 | A kind of information source identification method and apparatus based on content of text |
CN107526799A (en) * | 2017-08-18 | 2017-12-29 | 武汉红茶数据技术有限公司 | A kind of knowledge mapping construction method based on deep learning |
CN107622050A (en) * | 2017-09-14 | 2018-01-23 | 武汉烽火普天信息技术有限公司 | Text sequence labeling system and method based on Bi LSTM and CRF |
CN107797989A (en) * | 2017-10-16 | 2018-03-13 | 平安科技(深圳)有限公司 | Enterprise name recognition methods, electronic equipment and computer-readable recording medium |
CN107797993A (en) * | 2017-11-13 | 2018-03-13 | 成都蓝景信息技术有限公司 | A kind of event extraction method based on sequence labelling |
CN108038209A (en) * | 2017-12-18 | 2018-05-15 | 深圳前海微众银行股份有限公司 | Answer system of selection, device and computer-readable recording medium |
CN108228568A (en) * | 2018-01-24 | 2018-06-29 | 上海互教教育科技有限公司 | A kind of mathematical problem semantic understanding method |
CN108416058A (en) * | 2018-03-22 | 2018-08-17 | 北京理工大学 | A kind of Relation extraction method based on the enhancing of Bi-LSTM input informations |
CN108920448A (en) * | 2018-05-17 | 2018-11-30 | 南京大学 | A method of the comparison based on shot and long term memory network extracts |
CN109033068A (en) * | 2018-06-14 | 2018-12-18 | 北京慧闻科技发展有限公司 | It is used to read the method, apparatus understood and electronic equipment based on attention mechanism |
CN109062897A (en) * | 2018-07-26 | 2018-12-21 | 苏州大学 | Sentence alignment method based on deep neural network |
CN109062910A (en) * | 2018-07-26 | 2018-12-21 | 苏州大学 | Sentence alignment method based on deep neural network |
CN109460434A (en) * | 2018-10-25 | 2019-03-12 | 北京知道创宇信息技术有限公司 | Data extract method for establishing model and device |
CN109815456A (en) * | 2019-02-13 | 2019-05-28 | 北京航空航天大学 | A method of it is compressed based on term vector memory space of the character to coding |
WO2019174422A1 (en) * | 2018-03-16 | 2019-09-19 | 北京国双科技有限公司 | Method for analyzing entity association relationship, and related apparatus |
CN110826320A (en) * | 2019-11-28 | 2020-02-21 | 上海观安信息技术股份有限公司 | Sensitive data discovery method and system based on text recognition |
CN111046180A (en) * | 2019-12-05 | 2020-04-21 | 竹间智能科技(上海)有限公司 | Label identification method based on text data |
CN111209751A (en) * | 2020-02-14 | 2020-05-29 | 全球能源互联网研究院有限公司 | Chinese word segmentation method, device and storage medium |
CN111339250A (en) * | 2020-02-20 | 2020-06-26 | 北京百度网讯科技有限公司 | Mining method of new category label, electronic equipment and computer readable medium |
CN111914547A (en) * | 2020-07-17 | 2020-11-10 | 深圳宜搜天下科技股份有限公司 | Improved semantic intention recognition method and LSTM framework system |
CN112270179A (en) * | 2020-10-15 | 2021-01-26 | 和美(深圳)信息技术股份有限公司 | Entity identification method and device and electronic equipment |
WO2021073254A1 (en) * | 2019-10-18 | 2021-04-22 | 平安科技(深圳)有限公司 | Knowledge graph-based entity linking method and apparatus, device, and storage medium |
CN113011170A (en) * | 2021-02-25 | 2021-06-22 | 万翼科技有限公司 | Contract processing method, electronic equipment and related products |
CN113468309A (en) * | 2021-06-30 | 2021-10-01 | 竹间智能科技(上海)有限公司 | Answer extraction method in text and electronic equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101510221A (en) * | 2009-02-17 | 2009-08-19 | 北京大学 | Enquiry statement analytical method and system for information retrieval |
CN103309926A (en) * | 2013-03-12 | 2013-09-18 | 中国科学院声学研究所 | Chinese and English-named entity identification method and system based on conditional random field (CRF) |
CN105628951A (en) * | 2015-12-31 | 2016-06-01 | 北京小孔科技有限公司 | Method and device for measuring object speed |
CN106096568A (en) * | 2016-06-21 | 2016-11-09 | 同济大学 | A kind of pedestrian's recognition methods again based on CNN and convolution LSTM network |
CN106446526A (en) * | 2016-08-31 | 2017-02-22 | 北京千安哲信息技术有限公司 | Electronic medical record entity relation extraction method and apparatus |
-
2017
- 2017-02-27 CN CN201710108288.8A patent/CN106886516A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101510221A (en) * | 2009-02-17 | 2009-08-19 | 北京大学 | Enquiry statement analytical method and system for information retrieval |
CN103309926A (en) * | 2013-03-12 | 2013-09-18 | 中国科学院声学研究所 | Chinese and English-named entity identification method and system based on conditional random field (CRF) |
CN105628951A (en) * | 2015-12-31 | 2016-06-01 | 北京小孔科技有限公司 | Method and device for measuring object speed |
CN106096568A (en) * | 2016-06-21 | 2016-11-09 | 同济大学 | A kind of pedestrian's recognition methods again based on CNN and convolution LSTM network |
CN106446526A (en) * | 2016-08-31 | 2017-02-22 | 北京千安哲信息技术有限公司 | Electronic medical record entity relation extraction method and apparatus |
Non-Patent Citations (1)
Title |
---|
李弼程 等: "《网络舆情分析理论技术与应对策略》", 31 March 2015 * |
Cited By (39)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107451433A (en) * | 2017-06-27 | 2017-12-08 | 中国科学院信息工程研究所 | A kind of information source identification method and apparatus based on content of text |
CN107451433B (en) * | 2017-06-27 | 2020-05-22 | 中国科学院信息工程研究所 | Information source identification method and device based on text content |
CN107316654A (en) * | 2017-07-24 | 2017-11-03 | 湖南大学 | Emotion identification method based on DIS NV features |
CN107526799A (en) * | 2017-08-18 | 2017-12-29 | 武汉红茶数据技术有限公司 | A kind of knowledge mapping construction method based on deep learning |
CN107622050A (en) * | 2017-09-14 | 2018-01-23 | 武汉烽火普天信息技术有限公司 | Text sequence labeling system and method based on Bi LSTM and CRF |
CN107622050B (en) * | 2017-09-14 | 2021-02-26 | 武汉烽火普天信息技术有限公司 | Bi-LSTM and CRF-based text sequence labeling system and method |
CN107797989A (en) * | 2017-10-16 | 2018-03-13 | 平安科技(深圳)有限公司 | Enterprise name recognition methods, electronic equipment and computer-readable recording medium |
CN107797993A (en) * | 2017-11-13 | 2018-03-13 | 成都蓝景信息技术有限公司 | A kind of event extraction method based on sequence labelling |
CN108038209A (en) * | 2017-12-18 | 2018-05-15 | 深圳前海微众银行股份有限公司 | Answer system of selection, device and computer-readable recording medium |
CN108228568A (en) * | 2018-01-24 | 2018-06-29 | 上海互教教育科技有限公司 | A kind of mathematical problem semantic understanding method |
CN108228568B (en) * | 2018-01-24 | 2021-06-04 | 上海互教教育科技有限公司 | Mathematical problem semantic understanding method |
WO2019174422A1 (en) * | 2018-03-16 | 2019-09-19 | 北京国双科技有限公司 | Method for analyzing entity association relationship, and related apparatus |
CN110276066A (en) * | 2018-03-16 | 2019-09-24 | 北京国双科技有限公司 | The analysis method and relevant apparatus of entity associated relationship |
CN108416058A (en) * | 2018-03-22 | 2018-08-17 | 北京理工大学 | A kind of Relation extraction method based on the enhancing of Bi-LSTM input informations |
CN108416058B (en) * | 2018-03-22 | 2020-10-09 | 北京理工大学 | Bi-LSTM input information enhancement-based relation extraction method |
CN108920448A (en) * | 2018-05-17 | 2018-11-30 | 南京大学 | A method of the comparison based on shot and long term memory network extracts |
CN108920448B (en) * | 2018-05-17 | 2021-09-14 | 南京大学 | Comparison relation extraction method based on long-term and short-term memory network |
CN109033068B (en) * | 2018-06-14 | 2022-07-12 | 北京慧闻科技(集团)有限公司 | Method and device for reading and understanding based on attention mechanism and electronic equipment |
CN109033068A (en) * | 2018-06-14 | 2018-12-18 | 北京慧闻科技发展有限公司 | It is used to read the method, apparatus understood and electronic equipment based on attention mechanism |
CN109062897A (en) * | 2018-07-26 | 2018-12-21 | 苏州大学 | Sentence alignment method based on deep neural network |
CN109062910A (en) * | 2018-07-26 | 2018-12-21 | 苏州大学 | Sentence alignment method based on deep neural network |
CN109460434A (en) * | 2018-10-25 | 2019-03-12 | 北京知道创宇信息技术有限公司 | Data extract method for establishing model and device |
CN109815456A (en) * | 2019-02-13 | 2019-05-28 | 北京航空航天大学 | A method of it is compressed based on term vector memory space of the character to coding |
WO2021073254A1 (en) * | 2019-10-18 | 2021-04-22 | 平安科技(深圳)有限公司 | Knowledge graph-based entity linking method and apparatus, device, and storage medium |
CN110826320B (en) * | 2019-11-28 | 2023-10-13 | 上海观安信息技术股份有限公司 | Sensitive data discovery method and system based on text recognition |
CN110826320A (en) * | 2019-11-28 | 2020-02-21 | 上海观安信息技术股份有限公司 | Sensitive data discovery method and system based on text recognition |
CN111046180A (en) * | 2019-12-05 | 2020-04-21 | 竹间智能科技(上海)有限公司 | Label identification method based on text data |
CN111209751A (en) * | 2020-02-14 | 2020-05-29 | 全球能源互联网研究院有限公司 | Chinese word segmentation method, device and storage medium |
CN111209751B (en) * | 2020-02-14 | 2023-07-28 | 全球能源互联网研究院有限公司 | Chinese word segmentation method, device and storage medium |
CN111339250A (en) * | 2020-02-20 | 2020-06-26 | 北京百度网讯科技有限公司 | Mining method of new category label, electronic equipment and computer readable medium |
CN111339250B (en) * | 2020-02-20 | 2023-08-18 | 北京百度网讯科技有限公司 | Mining method for new category labels, electronic equipment and computer readable medium |
US11755654B2 (en) | 2020-02-20 | 2023-09-12 | Beijing Baidu Netcom Science Technology Co., Ltd. | Category tag mining method, electronic device and non-transitory computer-readable storage medium |
CN111914547A (en) * | 2020-07-17 | 2020-11-10 | 深圳宜搜天下科技股份有限公司 | Improved semantic intention recognition method and LSTM framework system |
CN112270179B (en) * | 2020-10-15 | 2021-11-09 | 和美(深圳)信息技术股份有限公司 | Entity identification method and device and electronic equipment |
CN112270179A (en) * | 2020-10-15 | 2021-01-26 | 和美(深圳)信息技术股份有限公司 | Entity identification method and device and electronic equipment |
CN113011170A (en) * | 2021-02-25 | 2021-06-22 | 万翼科技有限公司 | Contract processing method, electronic equipment and related products |
CN113011170B (en) * | 2021-02-25 | 2022-10-14 | 万翼科技有限公司 | Contract processing method, electronic equipment and related products |
CN113468309A (en) * | 2021-06-30 | 2021-10-01 | 竹间智能科技(上海)有限公司 | Answer extraction method in text and electronic equipment |
CN113468309B (en) * | 2021-06-30 | 2023-12-22 | 竹间智能科技(上海)有限公司 | Answer extraction method in text and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106886516A (en) | The method and device of automatic identification statement relationship and entity | |
CN107133224B (en) | Language generation method based on subject word | |
CN111241294B (en) | Relationship extraction method of graph convolution network based on dependency analysis and keywords | |
CN111931506B (en) | Entity relationship extraction method based on graph information enhancement | |
CN106599032B (en) | Text event extraction method combining sparse coding and structure sensing machine | |
CN107153642A (en) | A kind of analysis method based on neural network recognization text comments Sentiment orientation | |
CN110427616B (en) | Text emotion analysis method based on deep learning | |
CN104598611B (en) | The method and system being ranked up to search entry | |
CN110222178A (en) | Text sentiment classification method, device, electronic equipment and readable storage medium storing program for executing | |
CN106503192A (en) | Name entity recognition method and device based on artificial intelligence | |
CN107480143A (en) | Dialogue topic dividing method and system based on context dependence | |
CN107180023A (en) | A kind of file classification method and system | |
CN108268539A (en) | Video matching system based on text analyzing | |
CN107316654A (en) | Emotion identification method based on DIS NV features | |
CN107798624A (en) | A kind of technical label in software Ask-Answer Community recommends method | |
CN112101040A (en) | Ancient poetry semantic retrieval method based on knowledge graph | |
CN112989033B (en) | Microblog emotion classification method based on emotion category description | |
CN108255813A (en) | A kind of text matching technique based on term frequency-inverse document and CRF | |
CN106257455A (en) | A kind of Bootstrapping algorithm based on dependence template extraction viewpoint evaluation object | |
CN107506377A (en) | This generation system is painted in interaction based on commending system | |
CN112559734A (en) | Presentation generation method and device, electronic equipment and computer readable storage medium | |
CN109933787B (en) | Text key information extraction method, device and medium | |
CN109543176A (en) | A kind of abundant short text semantic method and device based on figure vector characterization | |
CN114036246A (en) | Commodity map vectorization method and device, electronic equipment and storage medium | |
CN107894975A (en) | A kind of segmenting method based on Bi LSTM |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170623 |