CN113705238A

CN113705238A - Method and model for analyzing aspect level emotion based on BERT and aspect feature positioning model

Info

Publication number: CN113705238A
Application number: CN202110670846.6A
Authority: CN
Inventors: 庞光垚; 陆科达; 玉振明; 彭子真; 朱肖颖; 黄宏本; 莫智懿; 农健; 冀肖榆
Original assignee: Wuzhou University
Current assignee: Wuzhou University
Priority date: 2021-06-17
Filing date: 2021-06-17
Publication date: 2021-11-26
Anticipated expiration: 2041-06-17
Also published as: CN113705238B

Abstract

The invention relates to an aspect level emotion analysis method and a model based on a BERT and an aspect feature positioning model, wherein the method comprises the following steps: firstly, obtaining high-quality context information representation and aspect information representation by using a BERT model so as to keep the integrity of text information; then constructing an attention encoder based on a multi-head attention mechanism to learn the interaction between the surface and the context representation, integrating the relationship between the body words and the context, and further distinguishing the contributions of different sentences and aspect words to the classification result; then, an aspect feature positioning model is constructed to capture aspect information during sentence modeling, and complete information of aspects is integrated into interactive semantics, so that the influence of interference words irrelevant to the aspect words is reduced, and the integrity of the aspect word information is improved; and finally, fusing context related to the target and important target information, and predicting the probability of different emotion polarities by using the emotion prediction factor on the basis of the fused information. The implicit relationship between the contexts can be better simulated, the information of the aspect words is better utilized, and the interference of the information which is not related to the aspect words is reduced, so that higher accuracy and the macro F1 are obtained.

Description

Method and model for analyzing aspect level emotion based on BERT and aspect feature positioning model

Technical Field

The invention belongs to the technical field of aspect level emotion analysis, and particularly relates to an aspect level emotion analysis method and model (ALM-BERT) based on a BERT and an aspect feature positioning model.

Background

Electronic commerce is a rapidly developing industry, and the importance of electronic commerce to global economy is increasing day by day. In particular, with the rapid development of social media and the continuous popularization of social networking platforms, more and more users begin to express comments with emotion on various networking platforms. These reviews reflect the mood of the user and consumer and provide the seller and government with a lot of valuable feedback information about the quality of goods or services. For example: before purchasing an item, a user may browse through a number of reviews on the item on an e-commerce platform to determine whether the item is worth purchasing. Also, governments and enterprises can collect a large amount of public comments directly from the internet, analyze the opinions and satisfaction of users, and further meet their needs. Therefore, sentiment analysis has attracted a great deal of attention from the theoretical and practical world as a fundamental and critical task of natural language processing.

However, common emotion analysis tasks (e.g., sentence-level emotion analysis) can only determine the user's emotional polarity (e.g., positive, negative, and neutral) for a product or event from the entire sentence, and cannot determine the emotional polarity of a particular aspect of the sentence. In contrast, aspect level sentiment analysis is a more granular classification task that can identify sentiment polarity of aspects in a sentence. For example, as shown in FIG. 9, some examples of sentence-level sentiment analysis and aspect-based sentiment analysis are provided (a consumer review example with three aspect terms), and we can see from the review text that "it does not have any accompanying software installed outside the windows media, but for price i are very satisfied with its condition and overall product", the emotional polarity of the aspect term "software" is negative, "windows media" is neutral, "price" and "very satisfied" are positive.

In prior studies, researchers have proposed various methods to accomplish aspect level emotion analysis tasks. Most of the methods are based on supervised machine learning algorithm, and certain effect is achieved. However, these statistical methods require careful design of manual features on large-scale data sets, resulting in significant labor and time costs. In view of the ability of neural network models to automatically learn low-dimensional representations of facets and contexts from comment text without relying on artificial feature engineering, neural networks have received increasing attention in recent years for facet-level emotion analysis tasks.

Unfortunately, existing methods mostly utilize either a Recurrent Neural Network (RNN) or a Convolutional Neural Network (CNN) directly to model independently and express semantic information of aspect words (aspect words) and their contexts, but ignore the fact that they lack sensitivity to the location of critical components. In practice, researchers have demonstrated that the emotional polarity of body words is highly correlated with body word information and word order information, which means that the emotional polarity of facet words is more susceptible to contextual words that are closer to the facet words. In addition, it is difficult for neural networks to capture long-term dependencies between facet words and context, resulting in loss of valuable information.

Disclosure of Invention

In order to solve the above problems in the prior art, the present invention provides a BERT and facet feature localization model-based facet emotion analysis method capable of better utilizing information of facet words and reducing interference of information irrelevant to facet words, thereby obtaining higher accuracy and a macro F1, and a system based on the BERT and facet feature localization model-based facet emotion analysis method.

In order to solve the technical problems, the invention adopts the following technical scheme:

the invention provides an aspect level emotion analysis method based on a BERT and an aspect feature positioning model, which comprises the following steps:

s1, obtaining high-quality context information representation and aspect information representation by using a BERT model so as to keep the integrity of text information;

s2, constructing an attention encoder based on a multi-head attention mechanism to learn the interaction between the surface and the context representation, integrating the relation between the body words and the context, and further distinguishing the contributions of different sentences and aspect words to the classification result;

s3, constructing an aspect feature positioning model to capture aspect information during sentence modeling, and integrating complete information of aspects into interactive semantics so as to reduce the influence of interference words irrelevant to the aspect words and improve the integrity of the aspect word information;

and S4, fusing context related to the target and important target information, and predicting the probability of different emotion polarities by using the emotion prediction factor on the basis of the fused information.

Further, the "obtaining high-quality context information representation and aspect information representation by using BERT model" refers to generating high-quality text feature vector representation by using a pre-trained BERT model as a text vectorization mechanism, where BERT is a pre-trained language representation model, and the text vectorization mechanism refers to mapping each word to a high-dimensional vector space, specifically: the BERT model generates text representation by using a deep-layer bidirectional converter coder, simultaneously divides a given word sequence into different segments by adding special word segmentation markers at the beginning and the end of the input sequence respectively, generates marker embedding, segment embedding and position embedding for different segments, and finally converts annotation text and aspect words respectively to obtain context information representation and aspect information representation.

Wherein, the "dividing a given word sequence into different segments by adding special word segmentation markers at the beginning and end of the input sequence, generating marker embedding, segment embedding and position embedding for different segments, and finally converting the annotation text and the aspect words respectively to obtain context information representation and aspect information representation" specifically includes:

the BERT model adds special word segmentation marks [ CLS ] at the beginning and the end of an input sequence respectively]And [ SEP ]]Dividing a given word sequence into different segments, generating mark embedding, segment embedding and position embedding for different segments, enabling the embedded representation of the input sequence to contain all the information of the three embedding, and finally respectively converting the annotation text and the aspect words into 'CLS' in a BERT model]+ annotate text + [ SEP]"and" [ CLS]+ target + [ SEP]"get context representation E_cAnd aspect represents E_a：

E_c＝{we_[CLS],we₁,we₂,...,we_[SEP]}；

E_a＝{ae_[CLS],ae₁,ae₂,...,ae_[SEP]}；

Wherein we_[CLS]，ae_[CLS]Indicates a Classification marker [ CLS]Vector of (2), we_[SEP]And ae_[SEP]Representation delimiter [ SEP]The vector of (2).

Further, the "constructing an attention encoder based on a multi-head attention mechanism to learn the interaction between the representation and the context characterization and integrate the relationship between the body word and the context" means that the important feature extraction of the aspect-level emotion analysis is realized based on the multi-head attention mechanism, and the important information of the context and the target is extracted, specifically: firstly, introducing a conversion encoder, wherein the conversion encoder is a novel feature extractor based on a multi-head attention mechanism and a position feedforward network, and can learn different important information in different feature representation subspaces and directly capture long-term correlation in a sequence; and then, interactive semantics are extracted from the aspect information representation and the context information representation generated by the BERT model through a conversion encoder, the context which is most important for emotion qualification of the aspect words is determined, meanwhile, the long-term dependence information and the context perception information of the context are used as input data of a position feed-forward network to respectively generate hidden states, and the final interactive hidden state of the context interaction and the final interactive hidden state of the context and the aspect words are obtained after mean value pooling operation.

The "extracting interactive semantics from the aspect information representation and the context information representation generated by the BERT model through the transcoder, determining the context which is most important for emotion qualification of the aspect word, simultaneously generating hidden states by using the long-term dependence information and the context perception information of the context as the input data of the position feed-forward network, and obtaining the final interactive hidden state of the context interaction and the final interactive hidden state of the context and the aspect word after the mean pooling operation" specifically includes:

s201, mapping a query sequence and a series of key (K) values (V) for capturing different important information in a parallel subspace from aspect information representation and context information representation generated by a BERT model through a plurality of self-attention mechanisms forming a multi-head attention mechanism in a conversion encoder;

s202, through an attention score function formula f_s(Q，K，V)＝σ(f_e(Q, K)) V calculates an attention score for each important captured message, where σ (f)_e(Q, K)) represents a normalized exponential function, f_e(Q, K) is an energy function for learning the correlation characteristics between K and Q, and is calculated by the following formula;

wherein

Denotes a scale factor, d_kIs the dimensionality of query Q and key vector K;

s203, inputting the context expression and the aspect expression into attentionFractional function formula f_mh(Q，K，V)＝[a¹；a²；...；aⁱ；...；a^n-head]W_dRespectively obtaining long-term dependency information c of context_ccAnd context-aware information t_caTo capture long term dependencies of contexts and to determine which contexts are most important for sentiment characterization of the facet words; wherein, aⁱAttention score, which represents the ith important information captured, [ a ]¹；a²；...；aⁱ；...；a^n-head]Representing a concatenation vector, W_dIs an attention weight matrix, c_cc＝f_mh(E_c,E_c)，t_ca＝f_mh(E_c,E_a)；

S204, converting the encoder with c_ccAnd t_caGenerating hidden states h as input data to a position feedforward network_cAnd h_aThe position feedforward network PFN (h) is a variant of the multi-layer perceptron_cAnd h_aThe definition is as follows:

h_c＝PFN(c_cc)

h_a＝PFN(t_ca)；

PFN(h)＝ζ(hW₁+b₁)W₂+b₂；

wherein, ζ (hW)₁+b₁) Is a corrected linear unit, b₁And b₂Is an offset value, W₁And W₂Representing a learnable weight parameter;

s205. in the hidden state h_cAnd h_aAfter the mean value pooling operation is carried out, the final interactive hidden state h of the context interaction is obtained_cmAnd final interactive hidden state h of context and aspect word_am。

Further, the working process of the aspect feature localization model is as follows, algorithm 1:

in particular, the feature localization algorithm represents E from the context according to the position and length of the facet words_cExtracting the most important related information of the aspect word af; while taking the most important feature AF from the AF using max pooling, then performing a dropout operation on the most important feature AF, and representing E in context_cImportant characteristics h of the Chinese obtained aspect word_af。

Further, the "fusing context and target important information related to the target, and predicting probabilities and category numbers of different emotion polarities by using the emotion prediction factors on the basis of the fused information" specifically includes:

s401, h is spliced by using a vector splicing mode_cm、h_amAnd h_afTaken together to give the overall characteristic r:

r＝[h_cm；h_am；h_af]；

s402, performing data preprocessing on r by adopting a linear function, namely:

x＝W_ur+b_uwherein W is_uIs a weight matrix, b_uIs a bias value;

and S403, calculating the probability Pr (a is p) that the emotion polarity of the aspect word a in the sentence is p by using a softmax function:

where p represents the candidate emotion polarity and C is the number of categories of emotion polarities.

Further, the method for analyzing the aspect level emotion based on the BERT and the aspect feature positioning model further comprises the following steps: training was performed using cross entropy and L2 regularization as a loss function, defined as:

where D represents all training data, j and i are indices of the training data samples and emotion classes, respectively, λ represents a factor for L2 regularization, θ represents a parameter set for the model, y represents predicted emotion polarity,

indicating the correct emotional polarity.

The invention also provides an aspect level emotion analysis model, which comprises the following components:

the text vectorization mechanism obtains high-quality context information representation and aspect information representation by using a BERT model so as to keep the integrity of text information;

the feature extraction model of the aspect level emotion analysis is used for learning the interaction between the surface representation and the context representation, integrating the relationship between the body words and the context to distinguish the contributions of different sentences and the aspect words to the classification result, capturing aspect information during sentence modeling, and integrating the complete information of the aspects into the interactive semantics to reduce the influence of interference words which are irrelevant to the aspect words and improve the integrity of the information of the aspect words;

and the emotion predictor is used for fusing context related to the target and important information of the target and predicting the probability of different emotion polarities by utilizing the emotion prediction factors on the basis of the fused information.

Further, the BERT model is a pre-trained language representation model, a text representation is generated by using a deep-layer multi-layer bidirectional converter encoder, meanwhile, a given word sequence is divided into different segments by adding special word segmentation marks at the beginning and the end of the input sequence respectively, mark embedding, segment embedding and position embedding are generated for the different segments, and finally, the annotation text and the aspect words are respectively converted to obtain a context information representation and an aspect information representation;

the feature extraction model of the aspect level emotion analysis comprises an important feature extraction model and an aspect feature positioning model; the important feature extraction model is an attention encoder based on a multi-head attention mechanism and is used for learning the interaction between the feature and the context representation and integrating the relationship between the body words and the context so as to distinguish the contributions of different sentences and aspect words to the classification result; the aspect feature positioning model is used for capturing aspect information during sentence modeling and integrating complete information of aspects into interactive semantics so as to reduce the influence of interference words irrelevant to the aspect words and improve the integrity of the aspect word information;

the emotion predictor connects the final interactive hidden state, the context, the final interactive hidden state of the aspect words and the important features of the aspect words by using a vector splicing mode to obtain overall features, then performs data preprocessing on the overall features by adopting a linear function, and finally calculates the probability that the emotion polarity of the aspect words in the sentence is the candidate emotion polarity by utilizing a softmax function.

The invention has the beneficial effects that:

according to the technical scheme, the implicit relation between the contexts can be better simulated through the conversion encoder, the information of the aspect words can be better utilized through the aspect feature positioning model, the interference of the information irrelevant to the aspect words is reduced, and therefore higher accuracy and macro F1 (the accuracy rate of the macro F1 and the accuracy rate of the macro F1 on sentences of different lengths are respectively 3.1% higher and 6.56% higher) are obtained, and meanwhile the feasibility and the effectiveness of the BERT model and the aspect information in the aspect-level emotion analysis task are verified.

Drawings

FIG. 1 is a flow chart of an embodiment of a method for facet emotion analysis based on BERT and facet feature location models;

FIG. 2 is a schematic structural diagram of an embodiment of an aspect level sentiment analysis system based on BERT and aspect feature localization models according to the present invention;

FIG. 3 is a graph of experimental results of drop rate parameter optimization in an evaluation experiment according to the aspect level emotion analysis method based on BERT and the aspect feature localization model of the present invention;

FIG. 4 is a graph of experimental results of learning rate parameter optimization in an evaluation experiment of the aspect level emotion analysis method based on BERT and the aspect feature localization model according to the present invention;

FIG. 5 is a graph of experimental results of L2 regularization parameter optimization in an evaluation experiment of the aspect level emotion analysis method based on BERT and the aspect feature localization model according to the present invention;

FIG. 6 is a graph of ROUGE scores (ROUGE-1) of different lengths of source text for a BERT and aspect feature localization model-based aspect level emotion analysis method and TD-LSTM validation experiment according to the present invention;

FIG. 7 is a graph of ROUGE scores (ROUGE-2) of different lengths of source text for a BERT and aspect feature localization model-based aspect level emotion analysis method and TD-LSTM validation experiment in accordance with the present invention;

FIG. 8 is a graph of ROUGE scores (ROUGE-L) of different lengths of source text for a BERT and aspect feature localization model-based aspect level emotion analysis method and TD-LSTM validation experiment in accordance with the present invention;

FIG. 9 is an example of prior art facet sentiment analysis.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As shown in fig. 1, an aspect level emotion analysis method based on BERT and aspect feature localization models according to an embodiment of the present invention includes the following steps:

s1, obtaining high-quality context information representation and aspect information representation by using a BERT model so as to keep the integrity of text information; specifically, a pre-trained BERT model is used as a text vectorization mechanism to generate high-quality text feature vector representation, the BERT is a pre-trained language representation model, and the text vectorization mechanism is used for mapping each word to a high-dimensional vector space, and specifically includes: the BERT model generates text representation by using a deep-layer bidirectional converter coder, simultaneously divides a given word sequence into different segments by adding special word segmentation markers at the beginning and the end of the input sequence respectively, generates marker embedding, segment embedding and position embedding for different segments, and finally converts annotation text and aspect words respectively to obtain context information representation and aspect information representation.

S2, constructing an attention encoder based on a multi-head attention mechanism to learn the interaction between the surface and the context representation, integrating the relation between the body words and the context, and further distinguishing the contributions of different sentences and aspect words to the classification result; specifically, the method is to extract important features of aspect-level emotion analysis based on a multi-head attention mechanism, and extract important information of context and a target, and specifically includes: firstly, introducing a conversion encoder, wherein the conversion encoder is a novel feature extractor based on a multi-head attention mechanism and a position feedforward network, and can learn different important information in different feature representation subspaces and directly capture long-term correlation in a sequence; and then, interactive semantics are extracted from the aspect information representation and the context information representation generated by the BERT model through a conversion encoder, the context which is most important for emotion qualification of the aspect words is determined, meanwhile, the long-term dependence information and the context perception information of the context are used as input data of a position feed-forward network to respectively generate hidden states, and the final interactive hidden state of the context interaction and the final interactive hidden state of the context and the aspect words are obtained after mean value pooling operation.

S3, constructing an aspect feature positioning model to capture aspect information during sentence modeling, and integrating complete information of aspects into interactive semantics to reduce the influence of interference words irrelevant to the aspect words and improve the integrity of the aspect word information; the aspect feature positioning module is constructed based on a maximum pooling function, namely, the extracted aspect words and the context hidden features thereof are divided into a plurality of areas, the maximum value is selected in each area to represent the area, and the aspect feature positioning module (positioning core features) is constructed in such a way; the working process of the aspect feature positioning model expresses E from the context according to the position and the length of the aspect word through a feature positioning algorithm_cExtracting the most important related information of the aspect word af; all in oneThe most important feature AF is obtained from the AF by maximum pooling, then a dropout operation is performed on the most important feature AF, and E is indicated in the context_cImportant characteristics h of the Chinese obtained aspect word_af。

S4, fusing context related to the target and important target information, and predicting probabilities of different emotion polarities by using emotion prediction factors on the basis of the fused information; the method specifically comprises the following steps: and connecting the final interactive hidden state, the context, the final interactive hidden state of the aspect words and the important features of the aspect words by using a vector splicing mode to obtain overall features, then performing data preprocessing on the overall features by adopting a linear function, and finally calculating the probability that the emotion polarity of the aspect words in the sentence is the candidate emotion polarity by utilizing a softmax function.

As shown in FIG. 2, the present invention further provides an aspect level emotion analysis model, which includes a text vectorization mechanism 100, a feature extraction model 200 for aspect level emotion analysis, and an emotion predictor 300.

The text vectorization mechanism 100 is a multi-angle text vectorization mechanism, and obtains high-quality context information representation and aspect information representation by using a BERT model to maintain the integrity of text information; the BERT model is a pre-trained language representation model, generates text representation by using a deep-layer bidirectional converter coder, simultaneously divides a given word sequence into different segments by adding special word segmentation markers at the beginning and the end of the input sequence respectively, generates marker embedding, segment embedding and position embedding for different segments, and finally converts annotation text and aspect words respectively to obtain context information representation and aspect information representation.

The feature extraction model 200 of the aspect level emotion analysis is used for learning the interaction between the representation of the features and the representation of the context, integrating the relationship between the body words and the context to distinguish the contributions of different sentences and the body words to the classification results, capturing aspect information during sentence modeling and integrating the complete information of the aspects into interactive semantics; the method specifically comprises the following steps: the feature extraction model 200 of the aspect level emotion analysis comprises an important feature extraction model and an aspect feature positioning model; the important feature extraction model is an attention encoder based on a multi-head attention mechanism and is used for learning the interaction between the feature and the context representation and integrating the relationship between the body words and the context so as to distinguish the contributions of different sentences and aspect words to the classification result; the aspect feature positioning model is used for capturing aspect information during sentence modeling and integrating complete information of aspects into interactive semantics; therefore, the influence of interference words irrelevant to the aspect words can be reduced, and the completeness of the aspect word information is improved;

the emotion predictor 300 is used for fusing context related to the target and important information of the target and predicting the probability of different emotion polarities by using emotion prediction factors on the basis of the fused information; specifically, the final interactive hidden state, the context, the final interactive hidden state of the aspect words and the important features of the aspect words are connected in a vector splicing mode to obtain overall features, then data preprocessing is performed on the overall features by adopting a linear function, and finally the probability that the emotion polarity of the aspect words in the sentence is the candidate emotion polarity is calculated by utilizing a softmax function.

In general, aspect level emotion analysis refers to a process of taking a sentence and some predefined aspect words as input data, and finally outputting the emotion polarity of each aspect word in the sentence. Here we use some practical review examples to illustrate the aspect level sentiment analysis task.

It is apparent that each example sentence contains two aspect words, each having four different emotional polarities, i.e., positive, neutral, negative, and conflicting, as shown in table 1. Then the aspect level sentiment analysis is defined as follows:

table 1 some examples of aspect level sentiment analysis

Defining one: formally, a comment sentence S ═ w is given₁,w₂,...,w_nWhere n is the total number of words in S. One squareFace word list a ═ a₁,...,a_i,...,a_mIs m in length, wherein a_iRepresents the ith aspect word in the aspect word table a, a being a subsequence of sentence S. P ═ P₁,...,p_j,...,p_CDenotes the candidate emotion polarity, where C denotes the number of categories of emotion polarity, p_jIndicating the jth emotion polarity.

The problems are as follows: the goal of the aspect level emotion analysis model is to predict the most likely emotion polarity for a particular aspect word, which can be expressed as:

where phi denotes a function for quantizing the facet word a_iAnd the emotional polarity p in the sentence s_jThe degree of match between. And finally, outputting the emotion polarity with the highest matching degree as a classification result by the model. Table 2 summarizes the symbols in the model and their descriptions.

TABLE 2 symbols used and their description

The invention relates to an aspect level emotion analysis method based on a BERT and aspect feature positioning model, which comprises the following steps: firstly, generating a high-quality sequence word vector by utilizing a pre-training BERT model, and providing effective support for the subsequent steps; then, in the feature extraction method of aspect-level emotion analysis, an important feature extraction module is realized based on a multi-head attention mechanism, and important information of context and a target is extracted; then, providing an aspect feature positioning model, and comprehensively considering the important features of the target words to obtain target related features; and finally, fusing context related to the target and important target information, and predicting the probability of different emotion polarities by using the emotion prediction factor on the basis of the fused information. The specific method and principle are as follows:

1. multi-angle text vectorization mechanism

The text vectorization mechanism essentially maps each word to a high-dimensional vector space. Generally, two context-based Word embedding models, namely Word2vec and Glove, are widely applied to text vectorization, and achieve great performance in aspect-level emotion analysis tasks. However, research has shown that the two-word embedding model cannot obtain enough information in the text, which results in insufficient classification accuracy and reduced performance. Therefore, a high-quality word embedding model has an important influence on improving the accuracy of the classification result.

The key to realizing the aspect-level emotion analysis is that natural language processing can be effectively understood, the method highly depends on large-scale high-quality labeled texts under normal conditions, and fortunately, a BERT model is a language pre-training model capable of effectively utilizing unlabeled texts, the BERT model adopts a mode of randomly shielding partial vocabularies, a deep multi-layer bidirectional converter encoder is utilized to extract a general natural language recognition model from massive unlabeled texts, and a small amount of labeled data is further used for fine tuning, so that high-quality text feature vector representation can be generated. It is inspired by this that in the ALM-BERT method proposed by the present invention, for a given word sequence, special segmentation markers [ CLS ] are added at the beginning and end of the input sequence, respectively]And [ SEP ]]In order to divide the sequence into different segments. That is, the word embedding vector input in this way includes vectors such as mark embedding, segment embedding, and position embedding generated for different segments. Specifically, the comment text and the aspect word are converted into "[ CLS ], respectively]+ annotate text + [ SEP]"and" [ CLS]+ target + [ SEP]", the resulting context indicates E_cAnd aspect represents E_a：

E_c＝{we_[CLS],we₁,we₂,...,we_[SEP]} (2)

E_a＝{ae_[CLS],ae₁,ae₂,...,ae_[SEP]} (3)

2. Feature extraction method for aspect-level emotion analysis

In order to extract hidden features of an aspect word and context thereof and emphatically consider auxiliary information contained in the aspect word, a converter encoder is introduced, and an aspect word feature positioning module is provided. The basic idea is to model the context and the target word interactively to integrate the information of the aspect words and the context fully. In addition, the emotion classification accuracy can be improved by acquiring the feature information of the aspect words in the context.

2.1 important feature extraction model

A transform encoder (transform encoder) is a new type of feature extractor based on a multi-head attention mechanism and a position feed-forward network. It can learn different important information in different feature representation subspaces. Moreover, the transcoder can directly capture long-term correlation in the sequence, is easier to parallelize than a recurrent neural network and a convolutional neural network, and greatly reduces training time. The invention extracts interactive semantics from the aspect information representation and the context information representation generated by the BERT model through a conversion encoder, determines the most important context for emotion qualification of aspect words, simultaneously uses the long-term dependence information and the context perception information of the context as the input data of a position feed-forward network, respectively generates hidden states, and obtains the final interactive hidden state of context interaction and the final interactive hidden state of the context and the aspect words after mean pooling operation.

Intuitively, a multi-head attention mechanism is composed of a plurality of self-attention mechanisms (self-attention mechanisms) that can map to a query sequence (Q) and a series of key (K) values (V) that capture different important information in a parallel subspace. Attention score function f_s(.) the calculation process in the self-attention mechanism is as follows:

f_s(Q,K,V)＝σ(f_e(Q,K))V (4)

where σ () denotes a normalized exponential functionNumber f_e(.) is an energy function that learns the correlation characteristics between K and Q, which can be calculated using the following formula:

wherein

Denotes a scale factor, d_kIs the dimensionality of query Q and key vector K.

Attention score function f of multi-head attention mechanism_mh(.) by connecting the attention scores of the self-attention mechanism:

f_mh(Q,K,V)＝[a¹；a²；...；aⁱ；...；a^n-head]W_d (6)

wherein a isⁱAttention score, which represents the ith important information captured, [ a ]¹；a²；...；aⁱ；...；a^n-head]Representing a concatenation vector, W_dIs the attention weight matrix.

As shown in equations (8) - (9) below, the context representation and the facet representation are input into a multi-attention mechanism to capture the long-term dependencies of the contexts and determine which contexts are most important for sentiment characterization of the facet words.

c_cc＝f_mh(E_c,E_c) (8)

t_ca＝f_mh(E_c,E_a) (9)

Wherein, c_ccAnd t_caLong term dependency information and context-aware information of the context, respectively.

Then, the encoders are switched to c respectively_ccAnd t_caGenerating hidden states h as input data to a position feedforward network_cAnd h_a. In particular, the position feedforward network pfn (h) is a variant of a multi-layer perceptron. Formally, a position feedforward network PFN, h_cAnd h_aThe definition is as follows:

h_c＝PFN(c_cc) (10)

h_a＝PFN(t_ca) (11)

PFN(h)＝ζ(hW₁+b₁)W₂+b₂ (12)

wherein, ζ (hW)₁+b₁) Is a corrected linear unit, b₁And b₂Is an offset value, W₁And W₂Representing a learnable weight parameter.

In pair h_cAnd h_aAfter the mean value pooling operation is carried out, the final interactive hidden state h of the context interaction is obtained_cmAnd final interactive hidden state h of context and aspect word_am。

2.2 aspect feature localization model

The transcoder captures long term dependencies of the context and generates semantic information of the interaction between the facet words and the context. In order to highlight the importance of different aspect words, the invention establishes an aspect word feature positioning model, and the main idea is to select information related to the aspect words from context feature representations, and better integrate the aspect information by capturing feature representation vectors containing the aspect information, thereby improving the accuracy of aspect level emotion classification. The working process of the aspect feature positioning model is shown as an algorithm 1:

in particular, the feature localization algorithm represents E from the context according to the position and length of the aspect word_cExtracting the most important related information of the aspect word af; at the same timeThe most important feature AF is obtained from AF with maximum pooling as follows:

AF＝Maxpooling(af,dim＝0) (13)

thereafter, a dropout operation is performed on the most important feature AF, and E is indicated in the context_cImportant characteristics h of the Chinese obtained aspect word_af。

3. Emotion predictor

Firstly, h is spliced by using a vector_cm、h_amAnd h_afTaken together to give the overall characteristic r:

r＝[h_cm；h_am；h_af] (14)

then, a linear function is used to perform data preprocessing on r, namely:

x＝W_ur+b_u (15)

wherein, W_uIs a weight matrix, b_uIs the offset value.

Finally, calculating the probability Pr (a is p) that the emotion polarity of the aspect word a in the sentence is p by using a softmax function:

In summary, the method for analyzing the emotion at the aspect level based on the BERT and the aspect feature positioning model of the present invention is an end-to-end operation process. Furthermore, to optimize the parameters of the method, the predicted emotional polarity y and the correct emotional polarity are made

And (3) minimizing losses therebetween, further comprising: training was performed using cross entropy and L2 regularization as a loss function, defined as:

indicating the correct emotional polarity.

4. Evaluation test

In order to evaluate the rationality and effectiveness of the BERT and aspect feature positioning model-based aspect level emotion analysis method and the model, the analysis is carried out through the following evaluation experiments.

4.1 data set and evaluation index

We constructed our relevant evaluation experiments in three published English review data sets. The details of these three data sets are shown in table 3: restaurant (Restaurant) and notebook (Laptop) datasets are provided by SemEval (references: Pontiki M, D Galanis, Pavlooplos J, et al. SemEval-2014 Task 4: Aspect Based Sentiment analysis. Proceedings of International Workshop on semiconductor Evaluation at, 2014.), each of these datasets containing facet words and corresponding Sentiment polarities, labeled as positive, negative, neutral and conflicting; the Twitter dataset consists of user comments on Twitter collected by Li et al (ref: Li D, Wei F, Tan C, et al. adaptive reactive Neural Network for Target-dependent Twitter sententiment Classification [ C ]// Meeting of the Association for Computational Linear constraints.2014.), with emotional polarities labeled as positive, negative and neutral. The three data sets are currently popular comment data sets and are widely applied to aspect-level emotion analysis tasks.

TABLE 3 statistical information of data sets

In addition, in order to objectively evaluate the performance of the BERT and aspect feature localization model-based aspect-level emotion analysis method and model, evaluation indexes commonly used in aspect-level emotion analysis tasks, namely macroscopic F1(macro-F1) and accuracy (Acc), are adopted. Is defined as:

Acc＝SC/N (18)

where SC represents the number of correctly sorted samples and N represents the total number of samples. In general, the higher the accuracy, the better the performance of the model.

In addition, the macro F1(macro-F1) is used to truly reflect the performance of the model, i.e., the weighted average of accuracy and recall. macro-F1 is calculated according to the following formula:

where T is the number of samples correctly classified as emotion polarity i, FP is the number of samples misclassified as emotion polarity i, FN is the number of samples whose emotion polarity i is misclassified as other emotion polarities, C is the number of categories of emotion polarities,

is the accuracy of the mood polarity i (precision),

indicating the recall (recall) of emotional polarity i. In our experiments, to more fully evaluate the performance of our model, we classified the emotional polarity as 3C ═ positive,neutral, negative } and 4C ═ positive, neutral, negative, conflict }.

4.2 parameter optimization

During the training of the model, we utilize the BERT model to generate vector representations of context and aspect words. Specifically, we use the standard parameters BERT of the BERT model_BASETo complete the model training. Wherein, in BERT_BASEThe number of conversion modules, the number of hidden neurons, and the number of self-attentive heads in (1) are 12, 768, and 12, respectively. Furthermore, to analyze the optimal hyper-parameter settings, we provide several important hyper-parameter setting examples.

First, the drop rate (Dropout) refers to the probability of dropping some neurons during the training of the neural network to solve the overfitting and enhance the generalization ability of the model. Where we initialize the value of dropout to 0.3 and then search for the best value at intervals of 0.1. Experimental results as shown in fig. 3, when dropout is 0.5, the precision and F1 value of the aspect level emotion analysis method and model based on BERT and aspect feature localization model of the present invention are best on three data sets.

Second, the learning rate (learning rate) determines whether and when the objective function converges to a local minimum. In our experiments, we used the Adam optimization algorithm to update the parameters of the model and explore at [10 ]^-5,0.1]An optimal learning rate parameter within a range. As shown in fig. 4, when the learning rate is 2 × 10^-5In time, the performance of the aspect level emotion analysis method and the aspect level emotion analysis model based on the BERT and the aspect feature positioning model is the best.

Finally, the L2 regularization parameter is a hyper-parameter that can prevent the model from overfitting. As shown in FIG. 5, when the value of the L2 regularization parameter is set to 0.01, the performance of the aspect level emotion analysis method and model based on the BERT and aspect feature localization model of the present invention is the best; meanwhile, the weights of the model are initialized by a Glorot parameter initialization method, the batch size is set to be 16, and 10 iteration times are trained in total.

4.3 comparison Algorithm

In order to verify the effectiveness of the BERT and aspect feature positioning model-based aspect level emotion analysis method and model of the present invention, the BERT and aspect feature positioning model-based aspect level emotion analysis method and model are compared with many popular aspect level emotion analysis models, as follows:

TD-LSTM is a classical classification model, which integrates related information of the aspect words and their contexts into the LSTM-based classification model, improving the classification accuracy.

ATAE-LSTM is a classification model that inputs the embedded representation of the aspect words as an embedded representation of the sentence into the model, and then applies an attention mechanism to compute weights to achieve high-precision emotion classification.

MemNet is a data-driven classification model that uses multiple attention-based models to capture the importance of each context word to complete emotion classification.

IAN is an interactive attention network that models the aspect words and their contexts, respectively, and generates an associative representation of the target and context.

RAM builds a framework based on the multi-attention mechanism to capture distant features in the text, enhancing the representation ability of the model.

TNet generates hidden representations of context and aspect words using bi-directional LSTM. The CNN layer is used instead of the attention mechanism to extract important features from the hidden representation.

Cabasc utilizes two attention-enhancing mechanisms, focusing on the aspect words and the context separately, and comprehensively considering the context and the correlation between the aspect words.

AOA constructs a dual attention module that links emotion words to facet words. In addition, the dual attention module automatically generates mutual attention weights from facet to text and from text to facet.

MGAN is a multi-granular attention model that captures information about interactions between terms and context from coarse to fine.

AEN-BERT is a model based on attention mechanism and BERT, showing good performance in the aspect-level sentiment analysis task.

BERT-base is a pre-trained BERT based aspect-level sentiment analysis model with complete connectivity layers and softmax layers for classification tasks.

To more accurately measure the performance of the models, we extended the AOA, IAN and MemNet models, replacing the embedded layers of these models with BERT models, resulting in AOA-BERT, IAN-BERT and MemNet-BERT models. The structure of the rest model is consistent with that described herein.

4.4 evaluation test analysis

As shown in table 4 below, the results of emotion classification when the emotion polarity C is 3 are shown. We can easily observe from the table that BERT-based (BERT pre-training based aspect-level sentiment analysis method) accuracy and macroscopic Fl are significantly higher than the model based on glove and word2vec methods. Particularly for restaurant data sets, the precision and macro F1 of the aspect-level emotion analysis method and model based on the BERT and aspect feature localization model are respectively 12.77% and 30.97% higher than those of the classical IAN model. This shows that BERT can better express semantic and grammatical features of text, and the facet-level emotion analysis method and model based on the BERT and facet feature localization model of the invention achieve the best classification performance on the three data sets. Specifically, in the restaurant dataset, the Bert and facet feature localization model-based facet emotion analysis method of the present invention has 4.2% and 8.81% improved accuracy and macro F1, respectively, compared to the AEN method. In addition, it can be easily found that on a notebook computer data set, the classification accuracy of the aspect-level emotion analysis method based on the BERT and aspect feature localization models and the macro F1 are respectively 3.29% and 3.15% higher than that of the BERT-base model, which shows that the aspect feature localization module in the invention plays a positive role in the aspect-level emotion analysis.

TABLE 4 Experimental evaluation results for various comparative methods

From the perspective of capturing long-term dependency relationships in comment texts, a series of verification experiments are constructed on texts with different lengths.

As shown in FIGS. 6-8, the aspect level emotion analysis method and model based on the BERT and aspect feature localization model of the present invention generally achieves higher accuracy and macro F1 than TD-LSTM, which means that we build a transform coder that can better simulate the implicit relationship between contexts than LSTM based coders. Furthermore, as shown in the following graph 7, we also note that the ALM-BERT model has 3.1% and 6.56% higher accuracy and mean of macro F1 on sentences of different lengths than AEN, respectively, because the aspect-level sentiment analysis method and model based on BERT and aspect feature localization model according to the present invention make better use of information of the aspect words than AEN, reducing interference of information unrelated to the aspect words.

In conclusion, the experiments show that the BERT and aspect feature positioning model-based aspect-level emotion analysis method and model can obtain higher accuracy and macro F1, and further verify the feasibility and effectiveness of the BERT model and aspect information in aspect-level emotion analysis tasks.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims

1. An aspect level sentiment analysis method based on BERT and an aspect feature localization model is characterized by comprising the following steps:

2. The method according to claim 1, wherein the "obtaining high-quality context information representation and aspect information representation using BERT model" refers to generating high-quality text feature vector representation using pre-trained BERT model as a text vectorization mechanism, wherein BERT is a pre-trained language representation model, and the text vectorization mechanism refers to mapping each word to a high-dimensional vector space, specifically: the BERT model generates text representation by using a deep-layer bidirectional converter coder, simultaneously divides a given word sequence into different segments by adding special word segmentation markers at the beginning and the end of the input sequence respectively, generates marker embedding, segment embedding and position embedding for different segments, and finally converts annotation text and aspect words respectively to obtain context information representation and aspect information representation.

3. The method according to claim 2, wherein the "constructing an attention encoder based on a multi-head attention mechanism to learn the interaction between the feature and the context characterization, and integrate the relationship between the volume word and the context" means that the important feature extraction for the aspect-level emotion analysis is realized based on the multi-head attention mechanism, and the important information of the context and the target is extracted, specifically: firstly, introducing a conversion encoder, wherein the conversion encoder is a novel feature extractor based on a multi-head attention mechanism and a position feedforward network, and can learn different important information in different feature representation subspaces and directly capture long-term correlation in a sequence; and then, interactive semantics are extracted from the aspect information representation and the context information representation generated by the BERT model through a conversion encoder, the context which is most important for emotion qualification of the aspect words is determined, meanwhile, the long-term dependence information and the context perception information of the context are used as input data of a position feed-forward network to respectively generate hidden states, and the final interactive hidden state of the context interaction and the final interactive hidden state of the context and the aspect words are obtained after mean value pooling operation.

4. The method according to claim 3, wherein said "dividing a given word sequence into different segments by adding special segmentation markers at the beginning and end of the input sequence, respectively, generating marker embedding, segment embedding and position embedding for different segments, and finally converting the annotation text and the facet words, respectively, to obtain the context information representation and the facet information representation", specifically:

E_c＝{we_[CLS],we₁,we₂,...,we_[SEP]}；

E_a＝{ae_[CLS],ae₁,ae₂,...,ae_[SEP]}；

5. The method according to claim 4, wherein the "extracting interaction semantics from the aspect information representation and the context information representation generated by the BERT model through the transcoder, determining a context which is most important for emotion characterization of the aspect word, and simultaneously generating hidden states by using long-term dependency information and context perception information of the context as input data of the location feed-forward network, and obtaining a final interaction hidden state of the context interaction and a final interaction hidden state of the context and the aspect word after the mean pooling operation" specifically comprises:

wherein

Denotes a scale factor, d_kIs the dimensionality of query Q and key vector K;

s203, inputting the context expression and the aspect expression into the attention score function formula f_mh(Q，K，V)＝[a¹；a²，...；aⁱ；...；a^n-head]W_dRespectively obtaining long-term dependency information c of context_ccAnd context-aware information t_caTo capture long term dependencies of contexts and to determine which contexts are most important for sentiment characterization of the facet words;wherein, aⁱAttention score, which represents the ith important information captured, [ a ]¹；a²；...；aⁱ；...；a^n-head]Representing a concatenation vector, W_dIs an attention weight matrix, c_cc＝f_mh(E_c，E_c)，t_ca＝f_mh(E_c，E_a)；

h_c＝PFN(c_cc)

h_a＝PFN(t_ca)；

PFN(h)＝ζ(hW₁+b₁)W₂+b₂；

6. The method of claim 5, wherein the aspect feature localization model works as the following algorithm 1:

algorithm 1 aspect feature positioning algorithm

In particular, the feature localization algorithm represents E from the context according to the position and length of the facet words_cExtracting the most important related information of the aspect word af; simultaneous interestMaximum pooling is used to obtain the most important feature AF from the AF, and then a dropout operation is performed on the most important feature AF, and E is indicated in the context_cImportant characteristics h of the Chinese obtained aspect word_af。

7. The method according to claim 6, wherein the fusing context related to the target and target importance information and predicting probabilities and category numbers of different emotion polarities using emotion prediction factors on the basis of the fused information specifically comprises:

r＝[h_cm；h_am；h_af]；

s402, performing data preprocessing on r by adopting a linear function, namely:

x＝W_ur+b_uwherein W is_uIs a weight matrix, b_uIs a bias value;

8. The method of any of claims 1-7, further comprising: training was performed using cross entropy and L2 regularization as a loss function, defined as:

indicating the correct emotional polarity.

9. An aspect-level sentiment analysis model, comprising:

10. The aspect level emotion analysis model of claim 9,

the BERT model is a pre-trained language representation model, generates text representation by using a deep-layer bidirectional converter encoder, simultaneously divides a given word sequence into different segments by respectively adding special word segmentation markers at the beginning and the end of the input sequence, generates marker embedding, segment embedding and position embedding for the different segments, and finally converts a comment text and an aspect word respectively to obtain context information representation and aspect information representation;