CN111695357B - Text labeling method and related product - Google Patents

Text labeling method and related product Download PDF

Info

Publication number
CN111695357B
CN111695357B CN202010465811.4A CN202010465811A CN111695357B CN 111695357 B CN111695357 B CN 111695357B CN 202010465811 A CN202010465811 A CN 202010465811A CN 111695357 B CN111695357 B CN 111695357B
Authority
CN
China
Prior art keywords
text data
piece
evaluation
data set
labeling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010465811.4A
Other languages
Chinese (zh)
Other versions
CN111695357A (en
Inventor
李文斌
喻宁
冯晶凌
柳阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202010465811.4A priority Critical patent/CN111695357B/en
Priority to PCT/CN2020/099493 priority patent/WO2021114634A1/en
Publication of CN111695357A publication Critical patent/CN111695357A/en
Application granted granted Critical
Publication of CN111695357B publication Critical patent/CN111695357B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The application relates to the technical field of emotion recognition in artificial intelligence, and particularly discloses a text labeling method and a related product, wherein the method comprises the following steps: acquiring a first text data set from a first three-party platform, wherein each piece of first text data in the first text data set comprises emoji expressions; marking each piece of first text data according to the emoji expression of each piece of first text data in the first text data set to obtain a first marking result of each piece of first text data, wherein the first marking result comprises positive evaluation or negative evaluation; obtaining a first training sample set according to a first labeling result of each piece of first text data; training the first neural network using the first training sample set; obtaining a second text data set from a second three-party platform; and labeling the second text data set by using the first neural network to obtain a second labeling result of each piece of second text data in the second text data set, wherein the second labeling result comprises one of positive evaluation, negative evaluation or neutral evaluation.

Description

Text labeling method and related product
Technical Field
The application relates to the technical field of emotion recognition in artificial intelligence, in particular to a text labeling method and related products.
Background
With the development of artificial intelligence, the application range of the neural network is wider and wider. For example, in the field of video surveillance, a neural network may be used to identify a person in a surveillance video or in the medical field, a neural network may be used to identify a tumor in a nuclear magnetic resonance image; in the field of character recognition, a neural network is used to classify a text into emotion.
Although neural networks perform well for image recognition. But the earlier training of the neural network requires a sufficiently large number of training data sets of sufficiently high quality. The production of training data sets is a very costly item. Firstly, acquiring some original data sets with higher quality from a database, and labeling the original data sets. For example, when training a text emotion classification network, a large number of texts with complete semantics and clear emotion needs to be acquired, and then the large number of texts are manually marked. However, since the number of texts is extremely large, manual labeling requires a lot of time and labor cost, and labeling efficiency is low.
Disclosure of Invention
The embodiment of the application provides a text labeling method and a related product. The application scene of the text annotation is increased, and the efficiency of the text annotation is improved.
In a first aspect, an embodiment of the present application provides a text labeling method, applied to an electronic device, including:
The electronic equipment acquires a first text data set from a first three-party platform, wherein each piece of first text data in the first text data set comprises emoji expressions;
The electronic equipment marks each piece of first text data according to the emoji expression of each piece of first text data in the first text data set to obtain a first marking result of each piece of first text data, wherein the first marking result comprises positive evaluation or negative evaluation;
the electronic equipment obtains a first training sample set according to a first labeling result of each piece of first text data;
the electronic device trains a first neural network by using the first training sample set;
The electronic device obtains a second text data set from a second three-party platform;
the electronic equipment marks the second text data set by using the first neural network to obtain a second marking result of each piece of second text data in the second text data set, wherein the second marking result comprises one of positive evaluation, negative evaluation or neutral evaluation.
In a second aspect, an embodiment of the present application provides an electronic device, including:
the device comprises an acquisition unit, a first processing unit and a second processing unit, wherein the acquisition unit is used for acquiring a first text data set from a first three-party platform, and each piece of first text data in the first text data set comprises emoji expressions;
The labeling unit is used for labeling each piece of first text data according to the emoji expression of each piece of first text data in the first text data set to obtain a first labeling result of each piece of first text data, wherein the first labeling result comprises positive evaluation or negative evaluation;
The training unit is used for obtaining a first training sample set according to a first labeling result of each piece of first text data, and training a first neural network by using the first training sample set;
The acquisition unit is further used for acquiring a second text data set from a second three-party platform;
The labeling unit is further configured to label the second text data set by using the first neural network, so as to obtain a second labeling result of each piece of second text data in the second text data set, where the second labeling result includes one of positive evaluation, negative evaluation or neutral evaluation.
In a third aspect, an embodiment of the present application provides an electronic device comprising a processor, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the processor, the programs comprising instructions for performing the steps in the method according to the first aspect.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium storing a computer program that causes a computer to perform the method according to the first aspect.
In a fifth aspect, embodiments of the present application provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program, the computer being operable to cause a computer to perform the method according to the first aspect.
The embodiment of the application has the following beneficial effects:
It can be seen that in the embodiment of the application, comment data is marked by emoji expression in text data, and semantic analysis is not needed for the comment data, so that the comment data is not limited by the language type of the text data during marking, and the application scene of the text marking is increased; in addition, text data can be automatically marked through emoji expressions, manual marking is not needed, and manpower and material resources are saved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a labeling method according to an embodiment of the present application;
FIG. 2 is a schematic flow chart of another labeling method according to an embodiment of the present application;
FIG. 3 is a flowchart of another labeling method according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application;
Fig. 5 is a functional unit composition block diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The terms "first," "second," "third," and "fourth" and the like in the description and in the claims and drawings are used for distinguishing between different objects and not necessarily for describing a particular sequential or chronological order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, result, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
The electronic device in the application can comprise a smart Phone (such as an Android Mobile Phone, an iOS Mobile Phone, a Windows Phone Mobile Phone and the like), a tablet computer, a palm computer, a notebook computer, a Mobile internet device MID (Mobile INTERNET DEVICES, abbreviated as MID) or a wearable device and the like. The above-described electronic devices are merely examples and are not intended to be exhaustive and include, but are not limited to, the above-described electronic devices. In practical applications, the electronic device may further include: intelligent vehicle terminals, computer devices, etc.
Referring to fig. 1, fig. 1 is a flowchart of a text labeling method according to an embodiment of the present application, where the method is applied to an electronic device, and the method includes the following steps:
101: the electronic device obtains a first text data set from a first three-party platform.
The first three-party platform can be a microblog, twitter, facebook, etc. social application or an Amazon Taobaogong, etc. electronic commerce platform. Namely, the first three-party platform is a third-party platform with more text data of positive evaluation and more text data of negative evaluation. The electronic device obtains a first text data set from a plurality of pieces of first text data randomly in the first platform through an application program interface (Application Programming Interface, API) provided by the first three-party platform. I.e. the electronic device complies with the Robot protocol of the first three-party platform, from which the first text data set is obtained via the API of the first three-party platform.
In some possible implementations, since the first text data is obtained through an API of the first-party platform, no manual review is performed, and some of the first text data may be unsatisfactory. For example, emoji expressions are not contained or text content is too short. Therefore, after obtaining a plurality of pieces of first text data, cleaning the first text data in the first text data set to clean the first text data which does not contain emoji expressions or text contents which are too short, and forming the cleaned first text data into the first text data set.
Thus, each piece of first text data in the first text data set contains an emoji expression.
102: The electronic equipment marks each piece of first text data according to the emoji expression of each piece of first question data in the first text data, and a first marking result of each piece of first text data is obtained.
Wherein the first labeling result comprises a proof evaluation or a negative evaluation.
Illustratively, the first text data is purged, and each piece of the first text data in the first text data set includes an emoji expression. Since emoji expressions themselves carry emotional assessment. For example, emoji expressionThe emotion evaluation represented is positive evaluation, and Emoji expressionIndicating a negative evaluation. Thus, the first emotion evaluation of each piece of first text data can be determined according to the emoji expression of each piece of first text data; and then, labeling each piece of first text data according to the first emotion evaluation of each piece of first text data, namely adding emotion labels for each piece of first text data. That is, if any one piece of first text data has an emoji expression set of positive evaluation, the first text data is marked as positive evaluation, and if the emoji expression set of negative evaluation has the emoji expression, the first text data is marked as negative evaluation.
Wherein, corresponding to the first text data, the first labeling result comprises positive evaluation and negative evaluation, emotion corresponding to the positive evaluation comprises happy emotion, endorsement emotion, appreciation emotion and the like, and emotion corresponding to the negative evaluation comprises anger emotion, pessimistic emotion, endorsement emotion and the like.
It should be noted that some emoji expressions do not grasp and determine emotion evaluations corresponding to the emoji expressions. For example, emoji expressionCan be used to indicate both happiness, i.e. positive emotion, and jeicism, i.e. negative emotion. And marking the first text data which does not contain the emoji expressions in the first text data set, and marking the first text data which contains the emoji expressions corresponding to the positive evaluation or the emoji expressions corresponding to the negative evaluation.
Further, in order to improve the accuracy of expression annotation by emoji, text content of each piece of first text data can be extracted, and semantic analysis is carried out on the text content of each piece of first text data to obtain semantic information of each piece of first text data; determining a first emotion evaluation of each piece of first text data according to semantic information of each piece of first text data; and reserving first text data with consistent first emotion evaluation and second emotion evaluation in the first text data set, and deleting the first text data with inconsistent first emotion evaluation and second emotion evaluation. Double labeling is carried out through semantic analysis and emoji expression, so that labeling errors brought by unilateral emoji expression labeling are reduced, and accuracy of labeling the first text dataset is improved.
103: And the electronic equipment obtains a first training sample set according to the first labeling result of each piece of first text data.
And taking the marked first text data as a training sample with a label to obtain the first training sample set.
104: The electronic device trains the first neural network using the first training sample set.
Specifically, initial parameters of a first neural network are firstly constructed, training samples in the first training sample set are input into the first neural network, and a prediction result of the training samples is obtained; then, determining a loss gradient based on the prediction result and the labeling result of the training sample, and constructing a loss function based on the loss gradient; finally, reversely updating the parameter value of the initial parameter based on the loss function and a gradient descent method; and training the first neural network is completed until the first neural network converges.
105: The electronic device obtains a second text data set from a second three-party platform.
The second party platform may be a news platform for publishing science and technology news, wiki or surary text. Namely, the second three-party platform is a three-party platform containing a large amount of neutral evaluation text data.
Likewise, the electronic device complies with the Robot protocol of the second-party platform, and obtains a plurality of pieces of second text data from the second-party platform through the API of the second-party platform to obtain the second text data set.
Of course, after the plurality of pieces of second text data are acquired, the plurality of pieces of second text data may be cleaned to clean out illegal second text data with too short text content.
106: And the electronic equipment marks the second text data set by using the first neural network to obtain a marking result of each piece of second text data in the second text data set.
Wherein the second labeling result comprises one of a positive rating, a negative rating, or a neutral rating.
Specifically, the electronic equipment uses a first neural network to classify each piece of second text data in the second text data set to obtain a first probability that each piece of second text data is positively evaluated and a second probability that each piece of second text data is negatively evaluated; then, labeling the second text data with the first probability greater than the first threshold (i.e., with 100% confidence that the emotion rating of the second text data is considered to be a positive rating) as a positive rating; marking second text data having a second probability greater than the first threshold (i.e., having 100% confidence that the emotional rating of the second text data is considered negative) as negative; the training sample rate of the first general rating is less than the first threshold and the second text data that is greater than the second threshold (i.e., whether the emotion rating of the second text data is considered positive or negative without 100% confidence) is marked as a neutral rating.
Wherein the first threshold may be 0.7, 0.75, 0.8, or other values. The second threshold may be 0.4, 0.45, 0.5, or other value.
It can be seen that in the embodiment of the application, the text data is marked by the emoji expression in the text data, and semantic analysis is not needed for the text data, so that the text data is not limited by the language type of the text data during marking, and the application scene of the marking method is further increased; in addition, text data can be automatically marked through emoji expressions, and marking of the text data can be completed without manual marking, so that manpower and material resources are saved.
In some possible embodiments, the method further comprises:
The electronic equipment obtains a second training sample set according to a second labeling result of each piece of second text data in the second text data set, namely, the second text data set is formed into a second training sample set with labels according to the labeling result of each piece of second text data in the second text data set; then, training a second neural network using the second training sample set; the comment data of any to-be-sent list is obtained, and the comment data to be sent out are classified by using a second neural network, so that a classification result of the comment data of the to-be-sent list is obtained; and determining whether the comment data to be published is disclosed or not according to the classification result.
In the case that the comment data to be posted may be comment data to be posted in any news website, when the classification result is positive evaluation or neutral evaluation, the comment data of the to-be-posted is disclosed, and when the classification result is negative evaluation, the comment data to be posted is not disclosed. Compared with the existing comment data to be posted through manual verification, the comment data to be posted can be automatically verified through the second neural network, and therefore human resources are saved.
And checking the comment data of the to-be-posted table with the purchase record of the user when the classification result is positive evaluation or negative evaluation under the condition that the comment data of the to-be-posted table can be the comment data under any one of the e-commerce platforms, determining the authenticity of the comment data of the to-be-posted table, and not disclosing the comment data of the to-be-posted table under the condition that the comment data of the to-be-posted table is determined to be a malicious comment. According to the application, the comment data to be posted can be automatically audited through the second neural network, so that the authenticity of the comment data of the to-be-posted is determined, and further, the manpower resources are saved.
In some possible embodiments, the second text data obtained from the second-party platform is mostly neutral text data, while the first text data obtained from the first-party platform is mostly positively rated text data and negatively rated text data. Therefore, in order to increase the number of the positively evaluated training samples and the negatively evaluated training samples in the second training sample set, the second training sample set may be combined with the first training sample set to obtain a new second training sample set with sufficient training samples, and the second neural network is trained by using the new second training sample set, so that the trained second neural network is more accurate.
In some possible embodiments, after determining the first emotion rating for each piece of first text data from the emoji expression for each piece of first text data in the first set of text data, the method further comprises:
Extracting text content of each piece of first text data; converting the text content into a second emoji expression; determining a second emotion evaluation corresponding to each piece of first text data according to the second emoji expression; and determining whether the first emotion evaluation and the second emotion evaluation of each piece of first text data are consistent, and if so, marking each piece of first text data according to the first emotion evaluation of each piece of first text data. And verifying emotion evaluation corresponding to each piece of first text data through text emoji operation, so that the accuracy of the follow-up labeling of the first text data is improved.
In some possible embodiments, the method further comprises:
Comment data of any user is obtained, wherein the comment data is comment data of the user on a target product, and the target product comprises financial products; classifying the comment data of the user by using the second neural network to obtain a classification result of the comment data of the user; screening target users according to the classification result of the comment data of the users, namely, taking the users with the classification result of positive evaluation as target users; recommending the target product to the target user.
It can be seen that in this embodiment, the second neural network is used to screen out the users interested in the target product (financial product), so as to ensure the accuracy of user screening and improve the success rate of recommendation.
Referring to fig. 2, fig. 2 is a schematic flow chart of another text labeling method according to an embodiment of the present application, and the content of the embodiment is the same as that of the embodiment shown in fig. 1, and the description is not repeated here. The method is applied to the electronic equipment, and comprises the following steps:
201: the electronic device obtains a first text data set from a first platform.
202: The electronic equipment cleans each piece of first text data in the first text data set, deletes the first text data which does not contain emoji expression, obtains a new first text data set, and takes the new first text data set as the first text data set.
203: The electronic device determines a first emotion rating of each piece of first text data according to the emoji expression of each piece of first text data in the first text data set, wherein the first emotion rating comprises a positive rating or a negative rating.
204: The electronic equipment extracts the text content of each piece of first text data, and performs semantic analysis on the text content of each piece of first text data to obtain semantic information of each piece of first text data.
205: And the electronic equipment determines a second emotion evaluation of each piece of first text data according to the semantic information of each piece of first text data.
206: And the electronic equipment reserves the first text data with the consistent first emotion evaluation and the second emotion evaluation in the first text data set, and deletes the first text data with the inconsistent first emotion evaluation and the inconsistent second emotion evaluation.
207: And the electronic equipment marks the residual first text data according to the first emotion evaluation of the residual first text data to obtain a first training sample set.
The remaining first text data is the first text data remaining after deleting the first comment data with inconsistent first emotion evaluation and second emotion evaluation in the first text data set.
208: The electronic device trains the first neural network using the first training sample set.
209: The electronic device obtains a second text data set from a second platform.
210: The electronic equipment uses the first neural network to label the second text data set, and a second labeling result of each piece of second text data in the second text data set is obtained, wherein the second labeling result comprises one of positive evaluation, negative evaluation or neutral evaluation.
It can be seen that in the embodiment of the application, comment data is marked by the emoji expression in the comment data, and semantic analysis is not needed for the comment data, so that the comment data is not limited by the language type of the comment data during marking, and the application scene of the marking method is further increased; in addition, comment data can be automatically marked through emoji expressions, a training sample set containing emotion classification labels can be obtained without manual marking, and therefore manpower and material resources are saved; moreover, before the first text data set is marked, the first text data set is cleaned, and high-quality first text data is reserved, so that the marking accuracy is improved.
Referring to fig. 3, fig. 3 is a schematic flow chart of another text labeling method according to an embodiment of the present application, and the content of the embodiment is the same as that of the embodiment shown in fig. 1 and fig. 2, and will not be repeated here. The method is applied to the electronic equipment, and comprises the following steps:
301: the electronic device obtains a first text data set from a first platform.
302: The electronic equipment cleans each piece of first text data in the first text data set, deletes the first text data which does not contain emoji expression, obtains a new first text data set, and takes the new first text data set as the first text data set.
303: The electronic device determines a first emotion rating of each piece of first text data according to the emoji expression of each piece of first text data in the first text data set, wherein the first emotion rating comprises a positive rating or a negative rating.
304: The electronic equipment extracts the text content of each piece of first text data, and performs semantic analysis on the text content of each piece of first text data to obtain semantic information of each piece of first text data.
305: And the electronic equipment determines a second emotion evaluation of each piece of first text data according to the semantic information of each piece of first text data.
306: And the electronic equipment reserves the first text data with the consistent first emotion evaluation and the second emotion evaluation in the first text data set, and deletes the first text data with the inconsistent first emotion evaluation and the inconsistent second emotion evaluation.
307: And the electronic equipment marks the residual first text data according to the first emotion evaluation of the residual first text data to obtain a first training sample set.
The remaining first text data is the first text data remaining after deleting the first comment data with inconsistent first emotion evaluation and second emotion evaluation in the first text data set.
308: The electronic device trains the first neural network using the first training sample set.
309: The electronic device obtains a second text data set from a second platform.
310: The electronic equipment uses the first neural network to label the second text data set, and a second labeling result of each piece of second text data in the second text data set is obtained, wherein the second labeling result comprises one of positive evaluation, negative evaluation or neutral evaluation.
311: The electronic equipment obtains a second training sample set by using a second labeling result according to each piece of second text data, and trains the second neural network by using the second training sample set.
312: The electronic equipment acquires any piece of comment data, classifies the comment data by using the second neural network to obtain a classification result of the comment data, and determines whether to disclose the comment data according to the classification result.
It can be seen that in the embodiment of the application, the text data is marked by the emoji expression in the text data, and semantic analysis is not needed for the text data, so that the text data is not limited by the language type of the text data during marking, and the application scene of the marking method is further increased; in addition, text data can be automatically marked through emoji expressions, a training sample set containing emotion classification labels can be obtained without manual marking, and therefore manpower and material resources are saved; before the first text data set is marked, the first text data set is cleaned, and high-quality first text data is reserved, so that the marking accuracy is improved; in addition, the trained second neural network is used for classifying the comment data to be published, so that the comment data of the to-be-published form which does not meet the requirements is automatically shielded, manual auditing is not needed, and human resources are saved.
Referring to fig. 4, fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 4, the electronic device 400 includes a processor, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the processor, the programs including instructions for performing the steps of:
obtaining a first text data set from a first three-party platform, wherein each piece of first text data in the first text data set comprises emoji expressions;
Marking each piece of first text data according to the emoji expression of each piece of first text data in the first text data set to obtain a first marking result of each piece of first text data, wherein the first marking result comprises positive evaluation or negative evaluation;
obtaining a first training sample set according to a first labeling result of each piece of first text data;
Training a first neural network using the first training sample set;
Obtaining a second text data set from a second three-party platform;
And labeling the second text data set by using the first neural network to obtain a second labeling result of each piece of second text data in the second text data set, wherein the second labeling result comprises one of positive evaluation, negative evaluation or neutral evaluation.
In some possible implementations, in labeling each piece of first text data according to emoji expression of each piece of first text data in the first text data set, the program is specifically configured to execute instructions for:
determining a first emotion evaluation of each piece of first text data according to the emoji expression of each piece of first text data in the first text data set, wherein the first emotion evaluation comprises positive evaluation or negative evaluation;
And labeling each piece of first text data according to the first emotion evaluation of each piece of first text data.
In some possible embodiments, after determining the first emotion rating of each piece of first text data according to emoji expression of each piece of first text data in the first text data set, the program is further configured to execute instructions for:
extracting text content of each piece of first text data;
Carrying out semantic analysis on the text content of each piece of first text data to obtain semantic information of each piece of first text data;
determining a second emotion evaluation of each piece of first text data according to semantic information of each piece of first text data;
And reserving first text data with consistent first emotion evaluation and second emotion evaluation in the first text data set, and deleting the first text data with inconsistent first emotion evaluation and second emotion evaluation.
In some possible embodiments, before labeling the first text data set, the above program is further configured to execute instructions for:
cleaning each piece of first text data in the first text data set, deleting the first text data which does not contain emoji expressions, and obtaining a new first text data set;
the new first text data set is taken as the first text data set.
In some possible embodiments, in the aspect of labeling the second text data set with the first neural network to obtain a second labeling result of each piece of second text data in the second text data set, the program is specifically configured to execute instructions for:
Classifying each piece of second text data in the second text data set by using the first neural network to obtain a first probability of positive evaluation and a second probability of negative evaluation of each piece of second text data;
determining a second labeling result of the second text data with the first probability larger than the first threshold value as positive evaluation;
Determining that a second labeling result of second text data with a second probability greater than the first threshold is negative evaluation;
and the second labeling result of the second text data with the first probability smaller than the first threshold and larger than the second threshold is neutral evaluation.
In some possible embodiments, the above program is further configured to execute instructions for:
Obtaining a second training sample set according to a second labeling result of each piece of second text data in the second text data set;
training a second neural network using the second training sample set;
The electronic equipment acquires any piece of comment data to be posted;
performing emotion classification on the comment data to be published by using the second neural network to obtain a classification result of the comment data of the to-be-published list;
and determining whether the comment data to be published is published or not according to the classification result.
In some possible embodiments, after obtaining a second training sample set according to the second labeling result of each piece of the second text data in the second text data set, the program is further configured to execute instructions for:
combining the second training sample with the first training sample set to obtain a new second training sample set;
In training a second neural network using the second training sample set, the program is specifically configured to execute instructions for:
Training a second neural network using the new second training sample set.
Referring to fig. 5, fig. 5 is a functional unit block diagram of an electronic device according to an embodiment of the present application. The electronic device 500 includes: an acquisition unit 510, a labeling unit 520, and a training unit 530; wherein:
an obtaining unit 510, configured to obtain a first text data set from a first three-party platform, where each piece of first text data in the first text data set includes an emoji expression;
The labeling unit 520 labels each piece of first text data according to the emoji expression of each piece of first text data in the first text data set, so as to obtain a first labeling result of each piece of first text data, wherein the first labeling result comprises positive evaluation or negative evaluation;
The training unit 530 is configured to obtain a first training sample set according to a first labeling result of each piece of first text data, and train the first neural network using the first training sample set;
the obtaining unit 510 is further configured to obtain a second text data set from a second three-party platform;
The labeling unit 520 is further configured to label the second text data set by using the first neural network, so as to obtain a second labeling result of each piece of second text data in the second text data set, where the second labeling result includes one of positive evaluation, negative evaluation, or neutral evaluation.
In some possible embodiments, in labeling each piece of first text data according to the emoji expression of each piece of first text data in the first text data set, the labeling unit 520 is specifically configured to:
determining a first emotion evaluation of each piece of first text data according to the emoji expression of each piece of first text data in the first text data set, wherein the first emotion evaluation comprises positive evaluation or negative evaluation;
And labeling each piece of first text data according to the first emotion evaluation of each piece of first text data.
In some possible embodiments, the electronic device 500 further includes a cleaning unit 540, after determining the first emotion evaluation of each piece of the first text data according to the emoji expression of each piece of the first text data in the first text data set, the cleaning unit 540 is configured to:
extracting text content of each piece of first text data;
Carrying out semantic analysis on the text content of each piece of first text data to obtain semantic information of each piece of first text data;
determining a second emotion evaluation of each piece of first text data according to semantic information of each piece of first text data;
And reserving first text data with consistent first emotion evaluation and second emotion evaluation in the first text data set, and deleting the first text data with inconsistent first emotion evaluation and second emotion evaluation.
In some possible implementations, the electronic device 500 further comprises a cleaning unit 540, the cleaning unit 540 being configured to, prior to labeling the first text data set:
cleaning each piece of first text data in the first text data set, deleting the first text data which does not contain emoji expressions, and obtaining a new first text data set;
the new first text data set is taken as the first text data set.
In some possible implementations, in labeling the second text data set using the first neural network, the labeling unit 520 is specifically configured to obtain a second labeling result aspect of each piece of second text data in the second text data set:
Classifying each piece of second text data in the second text data set by using the first neural network to obtain a first probability of positive evaluation and a second probability of negative evaluation of each piece of second text data;
determining a second labeling result of the second text data with the first probability larger than the first threshold value as positive evaluation;
Determining that a second labeling result of second text data with a second probability greater than the first threshold is negative evaluation;
and the second labeling result of the second text data with the first probability smaller than the first threshold and larger than the second threshold is neutral evaluation.
In some possible embodiments, the method further comprises a determining unit 550;
the training unit 530 is further configured to obtain a second training sample set according to a second labeling result of each piece of second text data in the second text data set;
A training unit 530, further configured to train a second neural network using the second training sample set;
A determining unit 550, configured to obtain comment data of any one of the pending tables; performing emotion classification on the comment data to be published by using the second neural network to obtain a classification result of the comment data of the to-be-published list; and determining whether the comment data to be published is published or not according to the classification result.
In some possible embodiments, after obtaining the second training sample set according to the second labeling result of each piece of the second text data in the second text data set, the training unit 530 is further configured to:
combining the second training sample with the first training sample set to obtain a new second training sample set;
in training the second neural network using the second training sample set, the training unit 530 is specifically configured to:
Training a second neural network using the new second training sample set.
Embodiments of the present application also provide a computer storage medium storing a computer program that is executed by a processor to implement some or all of the steps of any one of the text labeling methods described in the method embodiments above.
Embodiments of the present application also provide a computer program product comprising a non-transitory computer-readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps of any one of the text labeling methods described in the method embodiments above.
It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are alternative embodiments, and that the acts and modules referred to are not necessarily required for the present application.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, such as the division of the units, merely a logical function division, and there may be additional manners of dividing the actual implementation, such as multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, or may be in electrical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units described above may be implemented either in hardware or in software program modules.
The integrated units, if implemented in the form of software program modules, may be stored in a computer-readable memory for sale or use as a stand-alone product. Based on this understanding, the technical solution of the present application may be embodied essentially or partly in the form of a software product, or all or part of the technical solution, which is stored in a memory, and includes several instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned memory includes: a usb disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Those of ordinary skill in the art will appreciate that all or a portion of the steps in the various methods of the above embodiments may be implemented by a program that instructs associated hardware, and the program may be stored in a computer readable memory, which may include: flash disk, read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk.
The foregoing has outlined rather broadly the more detailed description of embodiments of the application, wherein the principles and embodiments of the application are explained in detail using specific examples, the above examples being provided solely to facilitate the understanding of the method and core concepts of the application; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims (6)

1. A method for labeling text, applied to an electronic device, comprising:
The electronic equipment acquires a first text data set from a first three-party platform, wherein each piece of first text data in the first text data set comprises emoji expressions;
The electronic equipment cleans each piece of first text data in the first text data set, deletes the first text data which does not contain emoji expression, and obtains a new first text data set; taking the new first text data set as the first text data set;
The electronic equipment marks each piece of first text data according to the emoji expression of each piece of first text data in the first text data set to obtain a first marking result of each piece of first text data, wherein the first marking result comprises positive evaluation or negative evaluation; comprising the following steps:
determining a first emotion evaluation of each piece of first text data according to the emoji expression of each piece of first text data in the first text data set, wherein the first emotion evaluation comprises positive evaluation or negative evaluation; extracting text content of each piece of first text data; carrying out semantic analysis on the text content of each piece of first text data to obtain semantic information of each piece of first text data; determining a second emotion evaluation of each piece of first text data according to semantic information of each piece of first text data; first text data with consistent first emotion evaluation and second emotion evaluation in the first text data set is reserved, and first text data with inconsistent first emotion evaluation and second emotion evaluation is deleted; labeling each piece of first text data according to the first emotion evaluation of each piece of first text data;
the electronic equipment obtains a first training sample set according to a first labeling result of each piece of first text data;
the electronic device trains a first neural network by using the first training sample set;
The electronic device obtains a second text data set from a second three-party platform;
the electronic equipment marks the second text data set by using the first neural network to obtain a second marking result of each piece of second text data in the second text data set, wherein the second marking result comprises one of positive evaluation, negative evaluation or neutral evaluation;
The electronic equipment obtains a second training sample set according to a second labeling result of each piece of second text data in the second text data set; the electronic device trains a second neural network using the second training sample set; the electronic equipment acquires any piece of comment data to be posted; the electronic equipment uses the second neural network to carry out emotion classification on the comment data to be published to obtain a classification result of the comment data of the to-be-published table; and the electronic equipment determines whether the comment data to be published is disclosed or not according to the classification result.
2. The method of claim 1, wherein labeling the second text data set using the first neural network results in a second labeling result for each piece of second text data in the second text data set, comprising:
Classifying each piece of second text data in the second text data set by using the first neural network to obtain a first probability of positive evaluation and a second probability of negative evaluation of each piece of second text data;
determining a second labeling result of the second text data with the first probability larger than the first threshold value as positive evaluation;
Determining that a second labeling result of second text data with a second probability greater than the first threshold is negative evaluation;
and determining that the first probability is smaller than the first threshold value and the second labeling result of the second text data which is larger than the second threshold value is neutral evaluation.
3. The method of claim 1, wherein the electronic device obtains a second training sample set based on the second labeling result of each piece of the second text data in the second text data set, and further comprising:
combining the second training sample with the first training sample set to obtain a new second training sample set;
the electronic device training a second neural network using the second training sample set, comprising:
the electronic device trains a second neural network using the new second training sample set.
4. An electronic device for performing the method of any of claims 1-3, the electronic device comprising:
the device comprises an acquisition unit, a first processing unit and a second processing unit, wherein the acquisition unit is used for acquiring a first text data set from a first three-party platform, and each piece of first text data in the first text data set comprises emoji expressions;
The labeling unit is used for labeling each piece of first text data according to the emoji expression of each piece of first text data in the first text data set to obtain a first labeling result of each piece of first text data, wherein the first labeling result comprises positive evaluation or negative evaluation;
The training unit is used for obtaining a first training sample set according to a first labeling result of each piece of first text data, and training a first neural network by using the first training sample set;
The acquisition unit is further used for acquiring a second text data set from a second three-party platform;
The labeling unit is further configured to label the second text data set by using the first neural network, so as to obtain a second labeling result of each piece of second text data in the second text data set, where the second labeling result includes one of positive evaluation, negative evaluation or neutral evaluation.
5. An electronic device comprising a processor, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the processor, the programs comprising instructions for performing the steps in the method of any of claims 1-3.
6. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program, which is executed by a processor to implement the method of any of claims 1-3.
CN202010465811.4A 2020-05-28 2020-05-28 Text labeling method and related product Active CN111695357B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010465811.4A CN111695357B (en) 2020-05-28 2020-05-28 Text labeling method and related product
PCT/CN2020/099493 WO2021114634A1 (en) 2020-05-28 2020-06-30 Text annotation method, device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010465811.4A CN111695357B (en) 2020-05-28 2020-05-28 Text labeling method and related product

Publications (2)

Publication Number Publication Date
CN111695357A CN111695357A (en) 2020-09-22
CN111695357B true CN111695357B (en) 2024-11-01

Family

ID=72478683

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010465811.4A Active CN111695357B (en) 2020-05-28 2020-05-28 Text labeling method and related product

Country Status (2)

Country Link
CN (1) CN111695357B (en)
WO (1) WO2021114634A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117172248B (en) * 2023-11-03 2024-01-30 翼方健数(北京)信息科技有限公司 Text data labeling method, system and medium
CN117689998B (en) * 2024-01-31 2024-05-03 数据空间研究院 Nonparametric adaptive emotion recognition model, method, system and storage medium
CN117725909B (en) * 2024-02-18 2024-05-14 四川日报网络传媒发展有限公司 Multi-dimensional comment auditing method and device, electronic equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109034203A (en) * 2018-06-29 2018-12-18 北京百度网讯科技有限公司 Training, expression recommended method, device, equipment and the medium of expression recommended models
CN109325112A (en) * 2018-06-27 2019-02-12 北京大学 A kind of across language sentiment analysis method and apparatus based on emoji

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB201322037D0 (en) * 2013-12-12 2014-01-29 Touchtype Ltd System and method for inputting images/labels into electronic devices
US20170364797A1 (en) * 2016-06-16 2017-12-21 Sysomos L.P. Computing Systems and Methods for Determining Sentiment Using Emojis in Electronic Data
CN111339306B (en) * 2018-12-18 2023-05-12 腾讯科技(深圳)有限公司 Classification model training method, classification method and device, equipment and medium
CN110188615B (en) * 2019-04-30 2021-08-06 中国科学院计算技术研究所 Facial expression recognition method, device, medium and system
CN110704581B (en) * 2019-09-11 2024-03-08 创新先进技术有限公司 Text emotion analysis method and device executed by computer

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325112A (en) * 2018-06-27 2019-02-12 北京大学 A kind of across language sentiment analysis method and apparatus based on emoji
CN109034203A (en) * 2018-06-29 2018-12-18 北京百度网讯科技有限公司 Training, expression recommended method, device, equipment and the medium of expression recommended models

Also Published As

Publication number Publication date
WO2021114634A1 (en) 2021-06-17
CN111695357A (en) 2020-09-22

Similar Documents

Publication Publication Date Title
CN111695357B (en) Text labeling method and related product
Desai et al. Techniques for sentiment analysis of Twitter data: A comprehensive survey
Alshamsi et al. Sentiment analysis in English texts
CN104281622B (en) Information recommendation method and device in a kind of social media
US20190163742A1 (en) Method and apparatus for generating information
US9449287B2 (en) System and method for predicting personality traits using disc profiling and big five personality techniques
CN108874832B (en) Target comment determination method and device
CN109299258A (en) A kind of public sentiment event detecting method, device and equipment
CN108021651B (en) Network public opinion risk assessment method and device
CN112559800B (en) Method, apparatus, electronic device, medium and product for processing video
JP2019511036A (en) System and method for linguistic feature generation across multiple layer word representations
CN109460512A (en) Recommendation information processing method, device, equipment and storage medium
US20140149105A1 (en) Identifying product references in user-generated content
CN110046293B (en) User identity correlation method and device
CN107862058B (en) Method and apparatus for generating information
TWI705411B (en) Method and device for identifying users with social business characteristics
CN108009297B (en) Text emotion analysis method and system based on natural language processing
CN111104590A (en) Information recommendation method, device, medium and electronic equipment
CN115661302A (en) Video editing method, device, equipment and storage medium
CN113837836A (en) Model recommendation method, device, equipment and storage medium
CN105786929B (en) A kind of information monitoring method and device
CN114303352B (en) Push content processing method and device, electronic equipment and storage medium
CN114548263A (en) Method and device for verifying labeled data, computer equipment and storage medium
CN111050194B (en) Video sequence processing method, video sequence processing device, electronic equipment and computer readable storage medium
CN114550157A (en) Bullet screen gathering identification method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant