CN109918680A - Entity recognition method, device and computer equipment - Google Patents

Entity recognition method, device and computer equipment Download PDF

Info

Publication number
CN109918680A
CN109918680A CN201910243952.9A CN201910243952A CN109918680A CN 109918680 A CN109918680 A CN 109918680A CN 201910243952 A CN201910243952 A CN 201910243952A CN 109918680 A CN109918680 A CN 109918680A
Authority
CN
China
Prior art keywords
corpus
identified
slot position
slot
annotated sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910243952.9A
Other languages
Chinese (zh)
Other versions
CN109918680B (en
Inventor
杨奇
杨君
吴丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shanghai Co Ltd
Original Assignee
Tencent Technology Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shanghai Co Ltd filed Critical Tencent Technology Shanghai Co Ltd
Priority to CN201910243952.9A priority Critical patent/CN109918680B/en
Publication of CN109918680A publication Critical patent/CN109918680A/en
Application granted granted Critical
Publication of CN109918680B publication Critical patent/CN109918680B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Machine Translation (AREA)

Abstract

This application provides a kind of entity recognition methods, device and computer equipment, corpus to be identified recall in treatment process, using dictionary pattern matching mode, recall possibility candidate's slot position combination of corpus to be identified, and corpus to be identified is labeled accordingly, obtain corresponding slot position annotated sequence, utilize deep learning network, the coding vector of corpus to be identified and the coding vector of each slot position annotated sequence are obtained respectively, the coding vector of the selection slot position annotated sequence most like with the coding vector of corpus to be identified, and using its corresponding slot position annotated sequence as best slot position annotated sequence, the name entity of corpus to be identified is obtained accordingly.It can be seen that, the application utilizes the advantages of rule template and deep learning algorithm, realizes the quick, simple of the name entity to various types corpus and accurately identifies, solves the problems, such as cold start-up, and obtained recognition result is usually computer language, electronic equipment can be directly in response to recognition result.

Description

Entity recognition method, device and computer equipment
Technical field
This application involves field of artificial intelligence, and in particular to a kind of entity recognition method, device and computer equipment.
Background technique
In recent years, with the development of artificial intelligence, interactive system is commonly applied to the application platform in each field.People Machine conversational system is that a kind of computer system that can be engaged in the dialogue with people needs to know after the problem of obtaining user's proposition Name entity in other problem simplifies the process of human-computer interaction so as to answer or corresponding operating needed for providing user accordingly.
Currently, proposing in the application of name Entity recognition (Named Entity Recognition, NER) using deep Degree learns to identify the name entity in corpus, i.e., entity will be named as sequence labelling, large-scale corpus is utilized to learn bid Injection molding type is realized to the mark of each position of sentence, thus obtains the name entity in target corpus.But this entity Identification method needs the mark corpus of a large amount of high quality to carry out model training, and feasibility is poor;It can only identify some simple entities, For proper noun and it is full of ambiguous noun, recognition accuracy is lower, entity higher for renewal frequency, or even can not know Not.
In this regard, technical staff proposes a kind of method using the name entity in rule template identification corpus, i.e., it will be pre- The template first constructed is matched with target corpus, the name entity in target corpus is identified, although this mode can be accurate It identifies the entity of all complexity, still, in product function domain in the case where complexity, needs to construct magnanimity template, workload pole Greatly, and later maintenance and handover property are poor, it is also difficult to be multiplexed into different scenes.
It can be seen that how identification that is accurate and easily realizing the various name entities under various scenes, become ability One of domain important research direction.
Summary of the invention
In view of this, the embodiment of the present application provides a kind of entity recognition method, device and computer equipment, regular mould is utilized The advantages of plate and deep learning algorithm, realizes to the quick, simple of various corpus name entity and accurately identifies, is not only applicable in It is identified in simple entity, can also be suitable for complicated Entity recognition, also solve the problems, such as cold start-up, and obtained recognition result is logical It is often computer language, electronic equipment can be directly in response to recognition result.
To achieve the above object, the embodiment of the present application provides the following technical solutions:
A kind of entity recognition method, which comprises
Obtain corpus to be identified;
Using slot value matching way, the candidate slot position combination of the corpus to be identified is obtained;
The corpus to be identified is labeled using the candidate slot position combination, obtains corresponding slot position annotated sequence;
Based on deep learning network, the word sequence to the corpus to be identified and the slot position annotated sequence are compiled respectively Code, obtains the first coding vector of the corpus to be identified and the second coding vector of the slot position annotated sequence;
Similarity based on first coding vector Yu second coding vector, determines in the slot position annotated sequence Target annotated sequence;
Using the target annotated sequence, the name entity of the corpus to be identified is obtained.
A kind of entity recognition device, described device include:
First obtains module, for obtaining corpus to be identified;
Second obtains module, for utilizing slot value matching way, obtains the candidate slot position combination of the corpus to be identified;
Labeling module is obtained corresponding for being labeled using the candidate slot position combination to the corpus to be identified Slot position annotated sequence;
Third obtains module, for being based on deep learning network, the word sequence to the corpus to be identified and described respectively Slot position annotated sequence is encoded, and the first coding vector and the slot position annotated sequence of the corpus to be identified are obtained Second coding vector;
Identification module, for the similarity based on first coding vector Yu second coding vector, determine described in Target annotated sequence in slot position annotated sequence, and the target annotated sequence is utilized, obtain the name of the corpus to be identified Entity.
A kind of computer equipment, comprising:
Communication interface;
Memory, for storing the program for realizing entity recognition method as described above;
Processor realizes entity recognition method as described above for loading and executing the program of the memory storage Each step.
A kind of storage medium is stored thereon with program, and described program is loaded and executed by processor, realizes that above-mentioned entity is known Each step of other method.
It can be seen that the application, which converts the slot position extraction problem of corpus to be identified to, recalls-sequencing problem, and treating Identification corpus recall in treatment process, using dictionary pattern matching mode, recalls possibility candidate's slot position combination of corpus to be identified, And corpus to be identified is labeled accordingly, obtains corresponding slot position annotated sequence, later, by the word sequence and slot of corpus to be identified Position annotated sequence inputs deep learning network, obtains the coding vector of corpus to be identified and the volume of each slot position annotated sequence respectively Code vector, the coding vector of the selection slot position annotated sequence most like with the coding vector of corpus to be identified, and its is corresponding Slot position annotated sequence obtains the name entity of corpus to be identified as best slot position annotated sequence accordingly.As it can be seen that the application is sharp The advantages of with rule template and deep learning algorithm, realizes to the quick, simple of various corpus name entity and accurately identifies, It is applicable not only to simple entity identification, can also be suitable for complicated, long-tail and emerging Entity recognition, also solve cold start-up Problem, and obtained recognition result is usually computer language, electronic equipment can be directly in response to recognition result.
Detailed description of the invention
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The embodiment of application for those of ordinary skill in the art without creative efforts, can also basis The attached drawing of offer obtains other attached drawings.
Fig. 1 is a kind of structural schematic diagram of entity recognition system provided by the embodiments of the present application;
Fig. 2 is a kind of flow diagram of entity recognition method provided by the embodiments of the present application;
Fig. 3 is the flow diagram of another entity recognition method provided by the embodiments of the present application;
Fig. 4 is the outflow schematic diagram of another entity recognition method provided by the embodiments of the present application;
Fig. 5 is a kind of structural schematic diagram of entity recognition device provided by the embodiments of the present application;
Fig. 6 is the structural schematic diagram of another entity recognition device provided by the embodiments of the present application;
Fig. 7 is the structural schematic diagram of another entity recognition device provided by the embodiments of the present application;
Fig. 8 is a kind of hardware structural diagram of computer equipment provided by the embodiments of the present application.
Specific embodiment
In conjunction with the analysis of background technology part, in the prior art using such as BiLSTM-CRF (Bi-directional Long Short-TermMemory-ConditionalRandomFields, long short-term memory-condition random field) algorithm or other depth Learning algorithm, the method for realizing name Entity recognition there is versatility, code to realize simple, and the maintainability having had with can Handover property can be good at extracting complicated syntactic structure, some simple entities that the difference of the mankind is spoken in habit.And it utilizes The method that rule template realizes name Entity recognition is suitable for entity complex, but the simple field of clause, such as stock, sport, matchmaker Body related fields.
By above-mentioned analysis, for entity complex while the field of clause complexity, such as the music field of voice assistant, sheet Application needs a kind of scheme that not only can recognize that all complicated entities but also can reduce " heap template " algorithm complexity, while also having There are maintainable and methodology reusability, that is, new departure that the advantages of above two entity recognition method is combined.
Based on above-mentioned design, it is contemplated that template identifies that the advantages of text vocabulary and deep learning handle complicated clause Advantage, the Match Ranking that present inventor introduces recommender system (recall-sort) theory, will be in identification corpus Name entity problem, i.e. slot position extracts problem and is converted into Match Ranking problem, that is, first gives the side to sort afterwards for change Formula.Specifically, dictionary pattern matching can first pass through and simple rule recalls out the possibility slot position combination of corpus to be identified, that is, recall more A slot position combination;It reuses each slot position combination to be labeled original language material, after obtaining multiple sequence labellings, establishes a depth The model for spending study is ranked up each candidate slot position combination (candidate sequence mark), obtains optimal slot position and combine, and then The name entity for including to corpus to be identified.
Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of embodiments of the present application, instead of all the embodiments.It is based on Embodiment in the application, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall in the protection scope of this application.
It referring to Fig.1, is a kind of system structure diagram for realizing entity recognition method provided by the present application, which can be with Including server 11 and electronic equipment 12, it should be appreciated that the server 11 that Fig. 1 is presented is only exemplary with electronic equipment 12 Illustrate, can't both way of realization limit.
In practical applications, it can be wired or wireless network connection between server 11 and electronic equipment 12, specifically may be used It is communicated to connect with being realized by mobile networks such as GSM, GPRS, LTE, or carried out by modes such as bluetooth, WIFI, infrared rays Communication connection, the application to the specific communication connection mode before server 11 and electronic equipment 12 without limitation.
Server 11, which can be, provides the service equipment of service for user, specifically can be independent application service equipment, Be also possible to the service cluster being made of multiple servers, in practical application, can be Cloud Server, cloud host, it is virtual in Heart etc., the application are not construed as limiting the structure and its way of realization of the server.
Under human-computer dialogue application scenarios, server 11 can be mainly responsible for speech recognition in man-machine dialog procedure, language It says the functions such as understanding, language generation, speech synthesis, and electronic equipment 12 is cooperated to realize human-computer dialogue.As it can be seen that provided by the present application Entity recognition method can be executed by server 11, and specific implementation process is referred to the description of hereafter embodiment of the method.
Electronic equipment 12 can be user oriented, and the terminal of interactive voice can be carried out with user, such as mobile phone, notebook Computer, iPad, intelligent sound etc., can with various speech interactive self-aided terminals, such as hospital, bank, in the place of station Self-service machine, in addition, electronic equipment 12 can also be the intelligence machine for supporting interactive voice, such as chat robots, sweeper Device people, service robot of ordering etc..Without limitation to the product type and its physical aspect of electronic equipment, the application needs the application It wants it with voice interactive function, can be realized by installing such as voice assistant interactive voice class application program.
The system structure diagram in conjunction with shown in figure 1 above provides a kind of entity knowledge referring to Fig. 2 for the embodiment of the present application The flow diagram of other method can be applied to the human-computer dialogue process under various application scenarios, can specifically be held by server Row, as shown in Fig. 2, this method may include but be not limited to following steps:
Step S101 obtains corpus to be identified;
In the present embodiment, corpus to be identified can be certain application program that user starts electronic equipment, and the application program The data inputted in the case where with voice interactive function, can be voice data, be also possible to text data.It needs It is bright, if corpus to be identified is text data, without carrying out speech recognition in subsequent processes, can also not have to carry out voice Synthesis.
After electronic equipment collects the corpus to be identified of user's input, server can be sent to and handled, in this phase Between, user starting electronic equipment application program can be can be with waiting for server feedback result, the feedback result voice or Text data is also possible to control instruction, can determine that the application is to corpus to be identified according to the particular content of corpus to be identified Content and its type without limitation.
Step S102 pre-processes corpus to be identified;
Without limitation to pretreatment operation content, if corpus to be identified is a long sentence, the present embodiment can incite somebody to action the application It is multiple short sentences that it, which is truncated, if in corpus to be identified including the wake-up word of electronic equipment, in order to avoid rushing with the function of waking up word Prominent, the present embodiment can reject wake-up word in corpus to be identified etc., concrete methods of realizing of the present embodiment to step S102 It is not detailed.
Step S103 is obtained and the pretreated matched multiple slot position groups of corpus to be identified using slot value matching way It closes;
In this field, semantic slot can be NLU (NaturalLanguage Understanding, natural language understanding) The specific concept that wooden unit is extracted from corpus such as names entity;Fill out slot and can be in order to allow user be intended to be converted into user it is bright True instruction and the process of completion information;Slot filling can regard sequence labelling problem as, i.e., for every in given corpus A word stamps corresponding label respectively.In practical applications, slot can be made of slot position, and slot value can be the possible value of slot, Such as the slot of departure place, the slot positions such as it can obtain, directly obtain by context and obtain, specific departure place content can be slot Value, a slot position can be one kind and fill out slot mode, and the application can will be known as slot group comprising a variety of slots for filling out slot mode, each Slot position can be with one of equivalent slot and interface slot type of slots
For example: by taking scene of booking tickets as an example, semantic slot may include " departure place ", " departure time ", " destination Point " etc., " departure place " corresponding slot value may include the place names such as " Beijing ", " Shanghai ", " Shenzhen ", and " departure time " is corresponding Slot value may include the times such as " at 7 points in the morning ", " 10 AM ", " at 3 points in afternoon ";" destination " corresponding slot value can wrap Include the place names such as " Hangzhou ", " Beijing ", " Hainan ".
Based on above-mentioned analysis, the corresponding slot value dictionary of the available various slot positions of the present embodiment, thus constitutes slot position dictionary And store, it is seen then that slot position dictionary may include the corresponding various slot values of different slot positions, and the application is to slot value dictionary and slot position dictionary Content and acquisition modes be not construed as limiting.
Later, corpus to be identified can be matched one by one with each slot value in slot position dictionary, slot value is matched to Possible slot position is saved, it is generally the case that the matching each time to corpus to be identified, the slot position that slot value is matched to are often Multiple, i.e. a matching result can be slot position combination, by the multiple matching with slot position dictionary, it will usually obtain multiple slot positions Combination.
It optionally, can since in practical application, the slot value of same meaning can such as play this slot value there are many saying With by put it is one first, come it is one first, broadcast, multiple lexical representations such as point, therefore, this kind of vocabulary can be all mapped to by the application " broadcasting ".Based on this, corpus that the available user of the application inputs in man-machine interactive system and its corresponding can be counted The command adapted thereto content (i.e. slot value) for calculating machine equipment identification, obtains the various corpus for corresponding to same command content, at this point it is possible to recognize Belong to synonym for these corpus, and maps that the command content.In this manner it is achieved that the available various slots of the application The synonymous mapping of value, and thus generate synonymicon.As it can be seen that the synonymicon can characterize between slot value and synonym Mapping relations include the corresponding synonym of various slot values.
Therefore, it is carried out in slot value matching process to corpus to be identified, can use slot value dictionary and synonymicon is total With realizing, i.e., corpus to be identified is matched one by one with slot value dictionary and synonymicon, it is corresponding to obtain corpus to be identified Slot position combination, specific matching process is without limitation.
To sum up, the present embodiment can use the Dictionary Mechanism of rule template, and entity correct in corpus to be identified is accurate And completely identify, various possible entity Candidate Sets are obtained, that is, obtain that corpus meaning to be identified may be expressed Combination of entities.
Step S104 is filtered multiple slot positions combination of acquisition, obtains candidate slot according to logic of propositions restrictive condition Bit combination;
In practical applications, the vocabulary for being included by the slot value dictionary of certain slot positions is too many, and part entity belongs to often With word, after carrying out slot position matching in the manner described above, obtained slot position combination is often relatively more, this will will affect corpus name Entity recognition efficiency and accuracy, so, the application can be for the relationship between some slot value numbers, frequency of occurrence, slot position Etc. logics, set some filter conditions, be denoted as logic of propositions restrictive condition.
Later, multiple slot positions combination that corpus to be identified is matched to can be filtered according to these filter conditions, In the case where not reducing recall rate, the lower slot position combination of a part of matching degree is filtered out, most probable slot position combination, note are retained For the combination of candidate slot position.It is to be appreciated that the application is not construed as limiting the content of logic of propositions restrictive condition, the industry can be passed through Experience or test result determine, for different types of corpus to be identified, default logic restrictive condition content can not Together.
Step S105 is labeled corpus to be identified using the combination of candidate slot position, obtains slot position annotated sequence;
In the present embodiment, each slot position combination obtained above may include slot value and its position in corpus to be identified It sets and the information such as type of slots.By above-mentioned filtration treatment, after obtaining several candidate slot position combinations, can use each Type of slots in a candidate's slot position combination, marks type of slots to slot position value corresponding in corpus to be identified, obtains by multiple slots The slot position annotated sequence that position type is constituted.
Later, it can use deep learning algorithm, concentrated from obtained slot position annotated sequence, obtained and corpus to be identified Most like one determines best mark corpus, thus to obtain the name entity of corpus to be identified.
The word sequence of corpus to be identified and slot position annotated sequence are inputted deep learning network respectively and compiled by step S106 Code, obtains the first coding vector of corpus to be identified and the second coding vector of slot position annotated sequence;
The application will utilize deep learning network, by initial corpus, that is, corpus to be identified word order in the part Ranking Column and slot position annotated sequence collection are converted into corresponding coding vector, then the similarity of the two coding vectors is calculated with fused layer, The corresponding best slot position annotated sequence of corpus to be identified is obtained with this, and then obtains the name entity of corpus to be identified.
Wherein, the word sequence of corpus to be identified can be each Chinese character, English words letter composition in the corpus to be identified, That is, the present embodiment can be using each Chinese character of corpus to be identified as a sequential element, English word if it exists can Each English alphabet is obtained the word sequence of corpus to be identified according to this rule as a sequential element.
Optionally, above-mentioned deep learning network can be neural network, specifically, the word sequence to corpus to be identified carries out The neural network of coding can be Attention Based on BiLSTM, the nerve encoded for slot position annotated sequence Network can be BiLSTM, but be not limited to neural network given herein.
In the practical application of the present embodiment, referring to neural network shown in Fig. 3 to the stream of the processing method of list entries Journey schematic diagram, since the word sequence of corpus to be identified belongs to the different descriptions to same intension, institute from slot position annotated sequence essence To say, the deep learning network architecture cannot be shared, i.e. word sequence and slot position annotated sequence cannot input a neural network simultaneously, By using two neural networks, both sequences are handled respectively.
As shown in figure 3, W1~W5 can indicate each element of the word sequence of corpus to be identified (i.e. initial corpus), i.e. Chinese character Or English alphabet, slot1~slot5 indicates each element in slot position annotated sequence, i.e., to each slot after corpus labeling to be identified Position type, can be using a type of slots as a character.It should be understood that the quantity that corpus to be identified includes W is not limited to In 5, the quantity of W can be determined, the present embodiment is only said for 5 according to the corpus content to be identified actually obtained It is bright, similarly, the number of elements for including to the slot position annotated sequence that corpus labeling to be identified obtains, the i.e. quantity of slot, It is not limited to 5 shown in Fig. 3, it is usually identical as the quantity for the W that corpus to be identified includes.
It optionally, in practical applications, can be by the word sequence component quantity of a large amount of corpus of statistics, to set mind Element number through network inputs sequence, i.e., default list entries length, in the word sequence and slot of real revenue corpus to be identified It, can be in zero padding below if sequential element number is less than predetermined sequence length when the annotated sequence of position.The application to it is above-mentioned preset it is defeated Enter the numerical value of sequence length without limitation.
In the present embodiment, W1~W5 can be the index input of corpus word to be identified, (i.e. in advance by BiLSTM network The obtained slot position extraction model of training) processing after, the hidden layer of available each time quantum, meanwhile, introduce Attention mechanism adjusts the weight of the hidden layer of different moments, later, calculates the defeated of weight adjusted and corresponding hidden layer The dot product of result ht (t=1,2,3,4,5) out carries out linear, additive to each dot product result, obtain the coding of corpus to be identified to C1 is measured, the present embodiment can be denoted as the first coding vector C1.
Also, the present embodiment can according to it is above-described using BiLSTM model to the processing mode of corpus to be identified, Respective handling, such as Fig. 3 are carried out to slot position annotated sequence using BiLSTM network (i.e. the slot position extraction model that training obtains in advance) It is shown, i.e., slot position annotated sequence slot1~slot5 is inputted into BiLSTM network, the coding of the available slot position annotated sequence Vector C2, the present embodiment are denoted as the second coding vector, and specific implementation process can be in conjunction with the work original of this slot position extraction model Reason determines that the present embodiment is not detailed.
Wherein, BiLSTM indicates two-way LSTM, considers past feature (extracting by forward procedure) and not simultaneously The feature (being extracted by back pass) come, the back pass are equivalent to original series (W1~W5 or slot1 in such as Fig. 3 ~slot5) inversely it is input in LSTM, therefore, two-way LSTM is equivalent to two LSTM, and a positive list entries, one is anti- Combine to list entries, then by the output of the two as final result.The application is to BiLSTM network to list entries Concrete processing procedure be not detailed, can in conjunction with BiLSTM network principle realize.
It is to be appreciated that for the above-mentioned neural network handled respectively corpus to be identified and slot position annotated sequence, it can To utilize neural network algorithm, the training of great amount of samples data is obtained, is such as obtained from this application platform or other application platform big Amount mark corpus recalls 10 for every mark corpus and (is not limited to 10 according to the processing method of the above-mentioned part Match Item) slot position annotated sequence is as sample data, and then using BiLSTM algorithm to the slot position annotated sequence recalled and corresponding mark Note corpus is trained, and obtains slot position extraction model, i.e., above-mentioned neural network, specific implementation of the application to the model training Journey is not detailed.
In addition, being used to carry out the word sequence and slot position annotated sequence of corpus to be identified the net of coded treatment for the application Network, it is not limited to which neural network given above can also use other deep learning networks, and the application does not make one by one herein It is described in detail.
Based on above to deep learning network to the description of the treatment process of list entries it is found that in deep learning network Embeding layer, have chosen the word sequence (being referred to as word vector) of corpus to be identified as list entries, and be not introduced into outside Word sequence avoids the introducing of noise, improves the accuracy of output result.And deep learning layer, for corpus to be identified Word sequence, more suitable neural network has been respectively adopted with slot position annotated sequence and has been handled, has not been shared consolidated network frame Structure further improves the accuracy of output result.It in last fused layer, selects that cosine is similar to be calculated, avoids introducing Other parameters are solved and are handled using transposition dot product parameter matrix, because the parameter of the matrix is affected to result, are easy Over-fitting, so that the problem of network model generalization ability declines, that is to say, that this similarity calculation side that the application uses Formula avoids overfitting problem, and then improves the accuracy for obtaining the most matching slot bit combination solution of corpus to be identified.
Optionally, due to the word sequence of the corpus to be identified of the part Ranking input and the length phase of slot position annotated sequence Deng, and the slot position annotated sequence be corpus to be identified is labeled, so, the application can also be using being different from this The deep learning network architecture that embodiment provides, such as directly carries out convolution for the word sequence of corpus to be identified and slot position annotated sequence Or the processing such as fusion, specific implementation process the application are not described further.
Step S107 calculates the first coding vector of acquisition and the similarity of each second coding vector;
Step S108 selects the corresponding slot position annotated sequence of the second coding vector of maximum similarity to mark sequence for target Column, and the target annotated sequence is utilized, extract the name entity of corpus to be identified.
Optionally, the present embodiment can use similarity algorithm, realize the similarity calculation between two coding vectors, have Body can realize that specific calculating process is not detailed, and this similarity calculation mode makes fused layer using cosine Similarity algorithm Not comprising any parameter, overfitting problem is avoided, and improves model generalization ability.
It is using candidate slot position combination to language to be identified in conjunction with the description of the acquisition process above to slot position annotated sequence What material was labeled, it is possible to the corresponding candidate slot position combination of target annotated sequence is determined as the combination of target slot position, To using it includes slot position value, obtain the name entity of corpus to be identified, i.e., by the slot position value and/or corpus packet to be identified The synonym contained is as name entity, and the present embodiment does not limit for specific implementation process.
In conclusion the present embodiment converts the slot position extraction problem of corpus to be identified to and recalls-sort referring to Fig. 4 (Mach-Ranking) problem, and corpus to be identified recall in treatment process, using dictionary pattern matching mode, recall to Identify possibility candidate's slot position combination of corpus, and corpus to be identified is labeled accordingly, obtains corresponding slot position annotated sequence, it Afterwards, the word sequence of corpus to be identified and slot position annotated sequence are inputted into deep learning network, obtains the volume of corpus to be identified respectively The coding vector of code vector and each slot position annotated sequence, selection are marked with the most like slot position of the coding vector of corpus to be identified The coding vector of sequence, and using its corresponding slot position annotated sequence as best slot position annotated sequence, language to be identified is obtained accordingly The name entity of material.
As it can be seen that the application utilizes the advantages of rule template and deep learning algorithm, realizes and entity is named to various corpus It is quick, simple and accurately identify, be applicable not only to simple entity identification, can also be suitable for complicated, long-tail and emerging The recognition result that Entity recognition solves the problems, such as cold start-up, and obtains is usually computer language, and electronic equipment can directly be rung Answer recognition result.
Based on above-mentioned analysis, the application will be in conjunction with concrete application scene, clearly to illustrate the application Entity recognition Method is specifically illustrated so that corpus to be identified is " carrying out first cheerful and light-hearted song to us " as an example.In order to accurately extract this wait know Name entity in other corpus, can perform the following operation:
The present embodiment can first pre-process " carrying out first cheerful and light-hearted song to us ", if wherein containing wake-up word, It can will wake up word to reject, then the corpus to be identified is matched one by one with slot position dictionary and synonymicon, save matching The possible candidate slot position combination arrived.
Wherein, for same slot position value, different slot positions, the slot position value content multiplicity of same slot position may be corresponded to, for example " is given We " it can be song title, it is also possible to album name etc., is based on this, corpus to be identified passes through and slot position dictionary and synonymous After word matching, many slot position combinations are often obtained, wherein containing some and corpus to be identified true slot position combinations matches Very low slot position combination is spent, so, the present embodiment can also do further filtering to the multiple slot positions combination recalled, and obtain several Candidate slot position combination.
In the synonymicon of the present embodiment, can will put it is one first, sing, carry out this kind of vocabulary such as first and be mapped to " to broadcast Put ", so, for corpus " carrying out first cheerful and light-hearted song to us " to be identified after above-mentioned processing, available following four groups are candidate Slot position combination:
(3,4, ' broadcasting ' ' operation '), (0,2, ' giving us ' ' songname '), (8,8, ' music ' ‘object')};
{ (3,4, ' broadcasting ' ' operation '), (5,8, ' cheerful and light-hearted song ' ' songname ');
{ (3,4, ' broadcasting ' ' operation '), (8,8, ' music ' ' object '), (5,6, ' cheerful and light-hearted ' ' style ') };
{ (5,6, ' cheerful and light-hearted ' ' style '), (8,8, ' music ' ' object '), (1,5, ' we ' ' albumname ') };
Wherein, in above-mentioned candidate slot position combination, the content in each round bracket successively indicates the position in corpus to be identified It sets, slot position value, type of slots after synonym is translated.It can be seen that the candidate slot position combination that the application obtains can be by A few part compositions such as type of slots, slot position value and its position in corpus to be identified, however, it is not limited to this.
Later, four groups of candidate's slot position combinations can be obtained according to above-mentioned, corpus to be identified is labeled respectively, to obtain Corresponding slot position annotated sequence, notation methods are as shown in following table one:
Table one
A kind of, the first behavior corpus to be identified in upper table, subsequent every a line indicate that a candidate slot position combination waits for this Identify that the annotated sequence of corpus, the present embodiment can be as the inputs of the subsequent part Ranking, to judge this four groups candidates The correct one group of slot position combination of most probable in slot position combination.
The processing of the part Mach based on description exemplified above, in the practice of live network data (voice assistant), with For music domain, usually it include 12 kinds of core slot positions, as singer's name, album name, stylistic category, sound type, music type, Scene, type of theme etc., the present embodiment will not enumerate, by above-mentioned steps S102~step S105 description to be identified The matching and Screening Treatment of corpus can achieve 89.6% or so to the recall rate of the slot position combination of corpus to be identified, and average one Sentence corpus to be identified can recall 11 groups of slot position candidate combinations.It is to be appreciated that the recall rate and every candidate slot position recalled Combined quantity, it is not limited to which example enumerated herein can be calculated according to the actual situation.
It is learnt through research, the reason for causing slot position combination recall rate low, mainly mark corpus quality, upstream speech recognition There are the reasons such as ambiguity, preset logic qualifications are stringent for ASR resultant error, corpus itself, so, in order to improve slot position group Recall rate is closed, can be optimized from this several respect, specific implementation process the present embodiment is not described further.
Based on the processing for identifying corpus and slot position annotated sequence (i.e. mark sentence) using deep learning network handles above The description of process, for " carrying out first cheerful and light-hearted song to us " this corpus to be identified, using each Chinese character therein as one Character, be successively denoted as W1, W2 ..., W10, similarly by the element in slot position annotated sequence be successively denoted as slot1, slot2 ..., Slot10, according to actual needs, if the number of elements of the list entries of deep learning network is preset as 14, then, to above-mentioned Obtained word sequence and slot position annotated sequence is extended, i.e., mends 0 at last four, meets network inputs requirement to be formed Sequence.
Later, according to coding mode as shown in Figure 3, can obtain corpus to be identified coding vector and each slot position The corresponding coding vector of annotated sequence, then the coding vector of corpus to be identified is calculated separately, it is corresponding with each slot position annotated sequence The similarity of coding vector is combined with determination with the most matched slot position of corpus to be identified.
In the present embodiment, as shown in figure 4, the result Y of output can be a number after the similarity calculation of fused layer Value, such as 0 or 1,0 can indicate the corresponding candidate slot position combination of slot position composite sequence, the true slot position group with corpus to be identified Difference is closed, i.e. candidate's slot position combination is lower with the matching degree of corpus to be identified;1 can indicate the corresponding time of slot position composite sequence It selects slot position to combine, identical, the i.e. matching of candidate's slot position combination and corpus to be identified is combined with the true slot position of corpus to be identified Spend higher, four candidate slot position combinations as obtained in exemplified above, the corresponding obtained Y=1 of third candidate's slot position combination;Its He is the candidate corresponding obtained Y=0 of slot position combination, and however, it is not limited to this.
Based on this, the numerical value for the Y that the present embodiment can be exported based on deep learning network directly determines corpus to be identified most Matched slot position combination solution, i.e. the corresponding candidate slot position combination of Y=1, and then the name that can obtain corpus to be identified accordingly is real Body, such as the example above are combined using the candidate slot position of third, the name entity of obtained corpus to be identified can for " broadcasting ", " music " and " cheerful and light-hearted ".
In this application, the framework of above-mentioned deep learning network can be trained in advance and be obtained, training language used in training Material can be the 100w+ item mark corpus in enterprises application, average according to the processing mode of the above-mentioned part mach description Every training corpus can give 10 slot position marks sentence (i.e. slot position annotated sequence) for change and later test it on test set, To obtain the deep learning network model for meeting certain convergent requirement, in practical applications, to call directly the deep learning Network model uses.Through overtesting, test set combines congruent rate in the slot position of the network model of the part Ranking and can reach 98.4%, the accuracy rate and recall rate for exporting result can reach 98% or more, and specific value is without limitation.
As it can be seen that compared to needing to stack magnanimity template using rule template merely, simple deep learning it is almost unavailable Test result, the application proposed the entity recognition method realized using rule template+deep learning, both can effectively borrow mould Plate effectively improves the ability of recalling, and can borrow depth to the resolution of complicated entity while reducing call back number Practise the generalization ability to general syntax rule.While reducing complexity, algorithm performance index has been effectively ensured, has made it multiple The slot position in miscellaneous field achieves good effect on extracting.
It is a kind of structural schematic diagram of entity recognition device provided by the present application shown in Fig. 5, described device includes:
First obtains module 21, for obtaining corpus to be identified;
Second obtains module 22, for utilizing slot value matching way, obtains the candidate slot position combination of the corpus to be identified;
Optionally, as shown in fig. 6, the second acquisition module 22 may include:
First acquisition unit 221 obtains multiple slot position groups of the corpus to be identified for utilizing slot value matching way It closes;
Filter element 222, for being filtered, obtaining to the combination of the multiple slot position according to logic of propositions restrictive condition Candidate slot position combination.
Wherein, first acquisition unit may include:
Dictionary acquiring unit, for obtaining slot value dictionary and synonym dictionary;
Matching unit, for will the corpus to be identified and the slot value dictionary and synonym dictionary progress one by one Match, obtains multiple slot position combinations.
Labeling module 23 is obtained corresponding for being labeled using the candidate slot position combination to the corpus to be identified Slot position annotated sequence;
Third obtains module 24, is used to be based on deep learning network, respectively the word sequence to the corpus to be identified and institute It states slot position annotated sequence to be encoded, obtains the first coding vector of corpus to be identified and the second of slot position annotated sequence is compiled Code vector;
Optionally, as shown in fig. 7, third acquisition module 24 may include:
Retrieval unit 241, for obtaining the word sequence of the corpus to be identified;
First coding unit 242 obtains first for encoding word sequence input first nerves network model Coding vector;
Second coding unit 243 is obtained for encoding slot position annotated sequence input nervus opticus network model To the second coding vector.
Wherein, first nerves network model and the nervus opticus network model are different types of two-way shot and long term memories Network B iLSTM, however, it is not limited to this.
Identification module 25 determines institute for the similarity based on first coding vector Yu second coding vector The target annotated sequence in slot position annotated sequence is stated, and utilizes the target annotated sequence, obtains the life of the corpus to be identified Name entity.
Optionally, the identification module 25 may include:
Similarity calculated, for calculating the similarity of first coding vector Yu second coding vector;
Selecting unit, for selecting the corresponding slot position annotated sequence of the second coding vector of maximum similarity for target mark Sequence.
The embodiment of the present application also provides a kind of storage mediums, are stored thereon with computer program, the computer program quilt Processor executes, and realizes that each step of above-mentioned entity recognition method, specific implementation process are referred to above method embodiment Description, therefore not to repeat here for the present embodiment.
As shown in figure 8, the embodiment of the present application also provides a kind of hardware structural diagram of computer equipment, the computer Equipment can be above-mentioned server, may include communication interface 31, memory 32 and processor 33;
In the embodiment of the present application, communication interface 31, memory 32 and processor 33 can realize phase by communication bus Communication between mutually, and the quantity of communication interface 31, memory 32 and processor 33 and communication bus can be at least one.
Optionally, communication interface 31 can be the interface of communication module, such as the interface of gsm module;
Processor 33 may be a central processor CPU or specific integrated circuit ASIC
(Application Specific Integrated Circuit), or be arranged to implement the application reality Apply one or more integrated circuits of example.
Memory 32 may include high speed RAM memory, it is also possible to further include nonvolatile memory (non-volatile Memory), a for example, at least magnetic disk storage.
Wherein, memory 32 is stored with program, the program that processor 33 calls memory 32 to be stored, to realize above-mentioned reality Each step of body recognition methods.
In practical applications, it after server obtains the name entity of corpus to be identified, can send it to corresponding Client, so that client carries out subsequent operation according to the name entity, alternatively, server can be according to obtained name entity Data search is carried out, then the related data searched is fed back into client output etc., the application obtains server to be identified Application scenarios after the name entity of corpus are without limitation.
It is to be appreciated that each embodiment in this specification is described in a progressive manner, what each embodiment stressed It is the difference from other embodiments, the same or similar parts in each embodiment may refer to each other.For embodiment For disclosed device, server, since it is corresponded to the methods disclosed in the examples, so be described relatively simple, it is related Place is referring to method part illustration.
Professional further appreciates that, unit described in conjunction with the examples disclosed in the embodiments of the present disclosure And algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware and The interchangeability of software generally describes each exemplary composition and step according to function in the above description.These Function is implemented in hardware or software actually, the specific application and design constraint depending on technical solution.Profession Technical staff can use different methods to achieve the described function each specific application, but this realization is not answered Think beyond scope of the present application.
The step of method described in conjunction with the examples disclosed in this document or algorithm, can directly be held with hardware, processor The combination of capable software module or the two is implemented.Software module can be placed in random access memory (RAM), memory, read-only deposit Reservoir (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technology In any other form of storage medium well known in field.
The foregoing description of the disclosed embodiments makes professional and technical personnel in the field can be realized or use the application. Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein General Principle can be realized in other embodiments in the case where not departing from the core concept or range of the application.Therefore, originally Application is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein Consistent widest scope.

Claims (10)

1. a kind of entity recognition method characterized by comprising
Obtain corpus to be identified;
Using slot value matching way, the candidate slot position combination of the corpus to be identified is obtained;
The corpus to be identified is labeled using the candidate slot position combination, obtains corresponding slot position annotated sequence;
Based on deep learning network, the word sequence to the corpus to be identified and the slot position annotated sequence are encoded respectively, Obtain the first coding vector of the corpus to be identified and the second coding vector of the slot position annotated sequence;
Similarity based on first coding vector Yu second coding vector, determines the mesh in the slot position annotated sequence Mark annotated sequence;
Using the target annotated sequence, the name entity of the corpus to be identified is obtained.
2. according to the method described in claim 2, acquisition is described to be identified it is characterized in that, described utilize slot value matching way The candidate slot position of corpus combines, comprising:
Using slot value matching way, multiple slot positions combination of the corpus to be identified is obtained;
According to logic of propositions restrictive condition, the combination of the multiple slot position is filtered, obtains candidate slot position combination.
3. according to the method described in claim 2, it is characterized in that, described compiled based on first coding vector with described second The similarity of code vector determines the target annotated sequence in the slot position annotated sequence, comprising:
Calculate the similarity of first coding vector Yu second coding vector;
Selecting the corresponding slot position annotated sequence of the second coding vector of maximum similarity is target annotated sequence.
4. according to the method described in claim 2, acquisition is described to be identified it is characterized in that, described utilize slot value matching way Multiple slot positions of corpus combine, comprising:
Obtain slot value dictionary and synonym dictionary;
The corpus to be identified is matched one by one with the slot value dictionary and the synonym dictionary, obtains multiple slot position groups It closes.
5. method according to any one of claims 1 to 4, which is characterized in that it is described based on deep learning network, it is right respectively The word sequence of the corpus to be identified and the slot position annotated sequence are encoded, and the first coding of the corpus to be identified is obtained Second coding vector of vector and the slot position annotated sequence, comprising:
Obtain the word sequence of the corpus to be identified;
Word sequence input first nerves network model is encoded, the first coding vector is obtained;
Slot position annotated sequence input nervus opticus network model is encoded, the second coding vector is obtained.
6. according to the method described in claim 5, it is characterized in that, the first nerves network model and the nervus opticus net Network model is different types of two-way shot and long term memory network.
7. a kind of entity recognition device, which is characterized in that described device includes:
First obtains module, for obtaining corpus to be identified;
Second obtains module, for utilizing slot value matching way, obtains the candidate slot position combination of the corpus to be identified;
Labeling module obtains corresponding slot position for being labeled using the candidate slot position combination to the corpus to be identified Annotated sequence;
Third obtains module, is used to be based on deep learning network, respectively the word sequence to the corpus to be identified and the slot position Annotated sequence is encoded, obtain the corpus to be identified the first coding vector and the slot position annotated sequence second Coding vector;
Identification module determines the slot position for the similarity based on first coding vector Yu second coding vector Target annotated sequence in annotated sequence, and the target annotated sequence is utilized, obtain the name entity of the corpus to be identified.
8. device according to claim 7, which is characterized in that described second, which obtains module, includes:
First acquisition unit obtains multiple slot positions combination of the corpus to be identified for utilizing slot value matching way;
Filter element, for being filtered to the combination of the multiple slot position, obtaining candidate slot position according to logic of propositions restrictive condition Combination.
9. device according to claim 8, the first acquisition unit include:
Dictionary acquiring unit, for obtaining slot value dictionary and synonym dictionary;
Matching unit, for the corpus to be identified to be matched one by one with the slot value dictionary and the synonym dictionary, Obtain multiple slot position combinations.
10. a kind of computer equipment characterized by comprising
Communication interface;
Memory, for storing the program for realizing entity recognition method as described in any one of claims 1 to 6;
Processor realizes the described in any item realities of claim 1~6 for loading and executing the program of the memory storage Each step of body recognition methods.
CN201910243952.9A 2019-03-28 2019-03-28 Entity identification method and device and computer equipment Active CN109918680B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910243952.9A CN109918680B (en) 2019-03-28 2019-03-28 Entity identification method and device and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910243952.9A CN109918680B (en) 2019-03-28 2019-03-28 Entity identification method and device and computer equipment

Publications (2)

Publication Number Publication Date
CN109918680A true CN109918680A (en) 2019-06-21
CN109918680B CN109918680B (en) 2023-04-07

Family

ID=66967499

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910243952.9A Active CN109918680B (en) 2019-03-28 2019-03-28 Entity identification method and device and computer equipment

Country Status (1)

Country Link
CN (1) CN109918680B (en)

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110413756A (en) * 2019-07-29 2019-11-05 北京小米智能科技有限公司 The method, device and equipment of natural language processing
CN110442332A (en) * 2019-07-04 2019-11-12 深圳壹账通智能科技有限公司 Natural language is converted into the method, apparatus and computer equipment of program language
CN110533324A (en) * 2019-08-29 2019-12-03 深圳市慧择时代科技有限公司 Insure the method and device of customer service distributed automatically
CN110705267A (en) * 2019-09-29 2020-01-17 百度在线网络技术(北京)有限公司 Semantic parsing method, semantic parsing device and storage medium
CN110727782A (en) * 2019-10-22 2020-01-24 苏州思必驰信息科技有限公司 Question and answer corpus generation method and system
CN110825875A (en) * 2019-11-01 2020-02-21 科大讯飞股份有限公司 Text entity type identification method and device, electronic equipment and storage medium
CN111079418A (en) * 2019-11-06 2020-04-28 科大讯飞股份有限公司 Named body recognition method and device, electronic equipment and storage medium
CN111339770A (en) * 2020-02-18 2020-06-26 百度在线网络技术(北京)有限公司 Method and apparatus for outputting information
CN111738016A (en) * 2020-06-28 2020-10-02 中国平安财产保险股份有限公司 Multi-intention recognition method and related equipment
CN111782278A (en) * 2019-12-26 2020-10-16 北京沃东天骏信息技术有限公司 Functional program awakening method and device
CN112003960A (en) * 2020-08-06 2020-11-27 杭州安恒信息技术股份有限公司 Network interface management method and device of industrial control equipment and electronic device
CN112183075A (en) * 2020-10-10 2021-01-05 网易(杭州)网络有限公司 Method and device for extracting slots in conversation, electronic equipment and storage medium
CN112215010A (en) * 2019-07-10 2021-01-12 北京猎户星空科技有限公司 Semantic recognition method and equipment
CN112215005A (en) * 2020-10-12 2021-01-12 小红书科技有限公司 Entity identification method and device
CN112364649A (en) * 2020-09-08 2021-02-12 平安医疗健康管理股份有限公司 Named entity identification method and device, computer equipment and storage medium
CN112395414A (en) * 2019-08-16 2021-02-23 北京地平线机器人技术研发有限公司 Text classification method and training method, device, medium and equipment of classification model
CN112528666A (en) * 2019-08-30 2021-03-19 北京猎户星空科技有限公司 Semantic recognition method and device and electronic equipment
CN112528001A (en) * 2020-12-23 2021-03-19 北京百度网讯科技有限公司 Information query method and device and electronic equipment
WO2021051872A1 (en) * 2019-09-18 2021-03-25 平安科技(深圳)有限公司 Entity identification method, device, apparatus, and computer readable storage medium
CN112580298A (en) * 2019-09-29 2021-03-30 大众问问(北京)信息科技有限公司 Method, device and equipment for acquiring marked data
CN112599214A (en) * 2021-03-04 2021-04-02 中译语通科技股份有限公司 Method and system for extracting ICD (interface control document) codes from electronic medical records
CN112668336A (en) * 2020-05-14 2021-04-16 北京金山数字娱乐科技有限公司 Task model-based word processing method
CN112800769A (en) * 2021-02-20 2021-05-14 深圳追一科技有限公司 Named entity recognition method and device, computer equipment and storage medium
CN112926326A (en) * 2021-02-20 2021-06-08 深圳追一科技有限公司 Named entity recognition method and device, computer equipment and storage medium
CN113051919A (en) * 2019-12-26 2021-06-29 中国电信股份有限公司 Method and device for identifying named entity
CN113128225A (en) * 2019-12-31 2021-07-16 阿里巴巴集团控股有限公司 Named entity identification method and device, electronic equipment and computer storage medium
CN113221644A (en) * 2021-04-06 2021-08-06 珠海远光移动互联科技有限公司 Slot position word recognition method and device, storage medium and electronic equipment
CN113408292A (en) * 2020-11-03 2021-09-17 腾讯科技(深圳)有限公司 Semantic recognition method and device, electronic equipment and computer-readable storage medium
CN114023319A (en) * 2021-11-02 2022-02-08 北京声智科技有限公司 Slot identification method and device, electronic equipment and readable storage medium
CN114625850A (en) * 2022-03-09 2022-06-14 上海弘玑信息技术有限公司 Method and device for generating field extraction engine, electronic equipment and storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003044502A (en) * 2001-07-30 2003-02-14 Nippon Telegr & Teleph Corp <Ntt> Information generation system for supporting ontology, method, program, recording medium
CN101154379A (en) * 2006-09-27 2008-04-02 夏普株式会社 Method and device for locating keywords in voice and voice recognition system
KR20100001141A (en) * 2008-06-26 2010-01-06 주식회사 예스피치 Grammar-creating method and the system
CN104834849A (en) * 2015-04-14 2015-08-12 时代亿宝(北京)科技有限公司 Dual-factor identity authentication method and system based on voiceprint recognition and face recognition
CN106782560A (en) * 2017-03-06 2017-05-31 海信集团有限公司 Determine the method and device of target identification text
CN107797992A (en) * 2017-11-10 2018-03-13 北京百分点信息科技有限公司 Name entity recognition method and device
CN108304439A (en) * 2017-10-30 2018-07-20 腾讯科技(深圳)有限公司 A kind of semantic model optimization method, device and smart machine, storage medium
CN108694225A (en) * 2017-03-31 2018-10-23 阿里巴巴集团控股有限公司 A kind of image search method, the generation method of feature vector, device and electronic equipment
CN109284361A (en) * 2018-09-29 2019-01-29 深圳追科技有限公司 A kind of entity abstracting method and system based on deep learning
CN109492215A (en) * 2018-09-18 2019-03-19 平安科技(深圳)有限公司 News property recognition methods, device, computer equipment and storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003044502A (en) * 2001-07-30 2003-02-14 Nippon Telegr & Teleph Corp <Ntt> Information generation system for supporting ontology, method, program, recording medium
CN101154379A (en) * 2006-09-27 2008-04-02 夏普株式会社 Method and device for locating keywords in voice and voice recognition system
KR20100001141A (en) * 2008-06-26 2010-01-06 주식회사 예스피치 Grammar-creating method and the system
CN104834849A (en) * 2015-04-14 2015-08-12 时代亿宝(北京)科技有限公司 Dual-factor identity authentication method and system based on voiceprint recognition and face recognition
CN106782560A (en) * 2017-03-06 2017-05-31 海信集团有限公司 Determine the method and device of target identification text
CN108694225A (en) * 2017-03-31 2018-10-23 阿里巴巴集团控股有限公司 A kind of image search method, the generation method of feature vector, device and electronic equipment
CN108304439A (en) * 2017-10-30 2018-07-20 腾讯科技(深圳)有限公司 A kind of semantic model optimization method, device and smart machine, storage medium
CN107797992A (en) * 2017-11-10 2018-03-13 北京百分点信息科技有限公司 Name entity recognition method and device
CN109492215A (en) * 2018-09-18 2019-03-19 平安科技(深圳)有限公司 News property recognition methods, device, computer equipment and storage medium
CN109284361A (en) * 2018-09-29 2019-01-29 深圳追科技有限公司 A kind of entity abstracting method and system based on deep learning

Cited By (49)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021000512A1 (en) * 2019-07-04 2021-01-07 深圳壹账通智能科技有限公司 Method and apparatus for converting natural language into programing language, and computer device
CN110442332A (en) * 2019-07-04 2019-11-12 深圳壹账通智能科技有限公司 Natural language is converted into the method, apparatus and computer equipment of program language
CN112215010B (en) * 2019-07-10 2024-10-22 北京猎户星空科技有限公司 Semantic recognition method and device
CN112215010A (en) * 2019-07-10 2021-01-12 北京猎户星空科技有限公司 Semantic recognition method and equipment
CN110413756A (en) * 2019-07-29 2019-11-05 北京小米智能科技有限公司 The method, device and equipment of natural language processing
US11501078B2 (en) 2019-07-29 2022-11-15 Beijing Xiaomi Intelligent Technology Co., Ltd. Method and device for performing reinforcement learning on natural language processing model and storage medium
CN112395414B (en) * 2019-08-16 2024-06-04 北京地平线机器人技术研发有限公司 Text classification method, training method of classification model, training device of classification model, medium and training equipment
CN112395414A (en) * 2019-08-16 2021-02-23 北京地平线机器人技术研发有限公司 Text classification method and training method, device, medium and equipment of classification model
CN110533324A (en) * 2019-08-29 2019-12-03 深圳市慧择时代科技有限公司 Insure the method and device of customer service distributed automatically
CN110533324B (en) * 2019-08-29 2022-04-22 深圳市慧择时代科技有限公司 Method and device for automatically distributing insurance customer service
CN112528666A (en) * 2019-08-30 2021-03-19 北京猎户星空科技有限公司 Semantic recognition method and device and electronic equipment
WO2021051872A1 (en) * 2019-09-18 2021-03-25 平安科技(深圳)有限公司 Entity identification method, device, apparatus, and computer readable storage medium
CN110705267B (en) * 2019-09-29 2023-03-21 阿波罗智联(北京)科技有限公司 Semantic parsing method, semantic parsing device and storage medium
CN110705267A (en) * 2019-09-29 2020-01-17 百度在线网络技术(北京)有限公司 Semantic parsing method, semantic parsing device and storage medium
CN112580298B (en) * 2019-09-29 2024-05-07 大众问问(北京)信息科技有限公司 Method, device and equipment for acquiring annotation data
CN112580298A (en) * 2019-09-29 2021-03-30 大众问问(北京)信息科技有限公司 Method, device and equipment for acquiring marked data
CN110727782A (en) * 2019-10-22 2020-01-24 苏州思必驰信息科技有限公司 Question and answer corpus generation method and system
CN110825875A (en) * 2019-11-01 2020-02-21 科大讯飞股份有限公司 Text entity type identification method and device, electronic equipment and storage medium
CN110825875B (en) * 2019-11-01 2022-12-06 科大讯飞股份有限公司 Text entity type identification method and device, electronic equipment and storage medium
CN111079418A (en) * 2019-11-06 2020-04-28 科大讯飞股份有限公司 Named body recognition method and device, electronic equipment and storage medium
CN111079418B (en) * 2019-11-06 2023-12-05 科大讯飞股份有限公司 Named entity recognition method, device, electronic equipment and storage medium
CN111782278A (en) * 2019-12-26 2020-10-16 北京沃东天骏信息技术有限公司 Functional program awakening method and device
CN113051919A (en) * 2019-12-26 2021-06-29 中国电信股份有限公司 Method and device for identifying named entity
CN113051919B (en) * 2019-12-26 2023-04-04 中国电信股份有限公司 Method and device for identifying named entity
CN113128225A (en) * 2019-12-31 2021-07-16 阿里巴巴集团控股有限公司 Named entity identification method and device, electronic equipment and computer storage medium
CN111339770B (en) * 2020-02-18 2023-07-21 百度在线网络技术(北京)有限公司 Method and device for outputting information
CN111339770A (en) * 2020-02-18 2020-06-26 百度在线网络技术(北京)有限公司 Method and apparatus for outputting information
CN112668336A (en) * 2020-05-14 2021-04-16 北京金山数字娱乐科技有限公司 Task model-based word processing method
CN112668336B (en) * 2020-05-14 2024-01-09 北京金山数字娱乐科技有限公司 Word processing method based on task model
CN111738016A (en) * 2020-06-28 2020-10-02 中国平安财产保险股份有限公司 Multi-intention recognition method and related equipment
CN111738016B (en) * 2020-06-28 2023-09-05 中国平安财产保险股份有限公司 Multi-intention recognition method and related equipment
CN112003960B (en) * 2020-08-06 2022-11-15 杭州安恒信息技术股份有限公司 Network interface management method and device of industrial control equipment and electronic device
CN112003960A (en) * 2020-08-06 2020-11-27 杭州安恒信息技术股份有限公司 Network interface management method and device of industrial control equipment and electronic device
CN112364649B (en) * 2020-09-08 2022-07-19 深圳平安医疗健康科技服务有限公司 Named entity identification method and device, computer equipment and storage medium
CN112364649A (en) * 2020-09-08 2021-02-12 平安医疗健康管理股份有限公司 Named entity identification method and device, computer equipment and storage medium
CN112183075A (en) * 2020-10-10 2021-01-05 网易(杭州)网络有限公司 Method and device for extracting slots in conversation, electronic equipment and storage medium
CN112183075B (en) * 2020-10-10 2023-06-30 网易(杭州)网络有限公司 Method and device for extracting slots in conversation, electronic equipment and storage medium
CN112215005A (en) * 2020-10-12 2021-01-12 小红书科技有限公司 Entity identification method and device
CN113408292A (en) * 2020-11-03 2021-09-17 腾讯科技(深圳)有限公司 Semantic recognition method and device, electronic equipment and computer-readable storage medium
CN112528001B (en) * 2020-12-23 2023-07-25 北京百度网讯科技有限公司 Information query method and device and electronic equipment
CN112528001A (en) * 2020-12-23 2021-03-19 北京百度网讯科技有限公司 Information query method and device and electronic equipment
CN112926326A (en) * 2021-02-20 2021-06-08 深圳追一科技有限公司 Named entity recognition method and device, computer equipment and storage medium
CN112926326B (en) * 2021-02-20 2024-01-19 深圳追一科技有限公司 Named entity recognition method, named entity recognition device, named entity recognition computer equipment and named entity recognition storage medium
CN112800769A (en) * 2021-02-20 2021-05-14 深圳追一科技有限公司 Named entity recognition method and device, computer equipment and storage medium
CN112599214A (en) * 2021-03-04 2021-04-02 中译语通科技股份有限公司 Method and system for extracting ICD (interface control document) codes from electronic medical records
CN113221644A (en) * 2021-04-06 2021-08-06 珠海远光移动互联科技有限公司 Slot position word recognition method and device, storage medium and electronic equipment
CN114023319A (en) * 2021-11-02 2022-02-08 北京声智科技有限公司 Slot identification method and device, electronic equipment and readable storage medium
CN114023319B (en) * 2021-11-02 2024-09-17 北京声智科技有限公司 Slot position identification method and device, electronic equipment and readable storage medium
CN114625850A (en) * 2022-03-09 2022-06-14 上海弘玑信息技术有限公司 Method and device for generating field extraction engine, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN109918680B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
CN109918680A (en) Entity recognition method, device and computer equipment
CN109359293B (en) Mongolian name entity recognition method neural network based and its identifying system
CN112417880B (en) Automatic case information extraction method for court electronic files
CN101566998B (en) Chinese question-answering system based on neural network
CN111444726A (en) Method and device for extracting Chinese semantic information of long-time and short-time memory network based on bidirectional lattice structure
CN108446286A (en) A kind of generation method, device and the server of the answer of natural language question sentence
CN110110041A (en) Wrong word correcting method, device, computer installation and storage medium
CN107818164A (en) A kind of intelligent answer method and its system
CN109582767A (en) Conversational system processing method, device, equipment and readable storage medium storing program for executing
CN107729309A (en) A kind of method and device of the Chinese semantic analysis based on deep learning
CN110287482B (en) Semi-automatic participle corpus labeling training device
CN111325029A (en) Text similarity calculation method based on deep learning integration model
CN110222184A (en) A kind of emotion information recognition methods of text and relevant apparatus
CN108763529A (en) A kind of intelligent search method, device and computer readable storage medium
CN112364623A (en) Bi-LSTM-CRF-based three-in-one word notation Chinese lexical analysis method
CN108846138A (en) A kind of the problem of fusion answer information disaggregated model construction method, device and medium
CN112527986A (en) Multi-round dialog text generation method, device, equipment and storage medium
CN108304424A (en) Text key word extracting method and text key word extraction element
CN112115242A (en) Intelligent customer service question-answering system based on naive Bayes classification algorithm
CN111209363A (en) Corpus data processing method, apparatus, server and storage medium
Zhuang et al. An ensemble approach to conversation generation
CN116166688A (en) Business data retrieval method, system and processing equipment based on natural language interaction
CN111553157A (en) Entity replacement-based dialog intention identification method
CN110188201A (en) A kind of information matching method and equipment
CN113010676B (en) Text knowledge extraction method, device and natural language inference system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TG01 Patent term adjustment
TG01 Patent term adjustment