CN111259133A - Personalized recommendation method integrating multiple information - Google Patents
Personalized recommendation method integrating multiple information Download PDFInfo
- Publication number
- CN111259133A CN111259133A CN202010054209.1A CN202010054209A CN111259133A CN 111259133 A CN111259133 A CN 111259133A CN 202010054209 A CN202010054209 A CN 202010054209A CN 111259133 A CN111259133 A CN 111259133A
- Authority
- CN
- China
- Prior art keywords
- user
- project
- algorithm
- item
- adopting
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 63
- 230000004927 fusion Effects 0.000 claims abstract description 7
- 238000012545 processing Methods 0.000 claims abstract description 5
- 239000013598 vector Substances 0.000 claims description 61
- 230000003993 interaction Effects 0.000 claims description 27
- 239000011159 matrix material Substances 0.000 claims description 20
- 238000002156 mixing Methods 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 3
- 238000012552 review Methods 0.000 claims description 3
- 230000002194 synthesizing effect Effects 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 7
- 238000007500 overflow downdraw method Methods 0.000 abstract description 2
- 238000002474 experimental method Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 6
- 238000012360 testing method Methods 0.000 description 5
- 238000012549 training Methods 0.000 description 5
- 238000013135 deep learning Methods 0.000 description 4
- 238000001914 filtration Methods 0.000 description 4
- 239000003814 drug Substances 0.000 description 3
- 229940079593 drug Drugs 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 101100339496 Caenorhabditis elegans hop-1 gene Proteins 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 201000010099 disease Diseases 0.000 description 2
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 101001025416 Homo sapiens Homologous-pairing protein 2 homolog Proteins 0.000 description 1
- 102100037898 Homologous-pairing protein 2 homolog Human genes 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000002651 drug therapy Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/335—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a multi-information fused personalized recommendation method which comprises the steps of obtaining similarity between a user and a project by adopting a word2vec algorithm and an FM algorithm, obtaining a predicted click probability between the user and the project by adopting a RippleNet algorithm, obtaining a predicted score by adopting a dynamic fusion algorithm, and providing a personalized recommendation list for the user based on the predicted score. According to the invention, the knowledge graph and the comment content are used as multi-source data, different algorithms are used for processing the data, and a dynamic fusion method is adopted for effective combination, so that more accurate personalized recommendation service is provided for users, a better recommendation effect can be realized, and the problem of reduced recommendation accuracy caused by sparse data can be effectively solved.
Description
Technical Field
The invention belongs to the technical field of recommendation systems, and particularly relates to a personalized recommendation method fusing multiple information.
Background
With the rapid development of advanced technologies such as artificial intelligence, cloud computing and big data technology, and mobile internet, the scale of various information data also shows explosive growth. While enjoying the convenience of such data, it is necessary to deal with the problem of "information overload" caused by an excessive amount of data. The recommendation system is one of effective methods for solving the problem of "information overload", and can find the interest points of the user according to the related attributes of the user and the items (item), and recommend the items in which the user is interested to the user in a personalized directory manner.
Currently, collaborative filtering based recommendation systems have achieved some benefit by taking into account historical user interaction with items and then making recommendation suggestions for the user based on their underlying characteristics. But collaborative filtering based recommendation systems typically face sparsity of user and merchant historical interaction data and concomitant cold start problems. To address these limitations, researchers have incorporated auxiliary information such as user/item attributes, social networks, images, background, etc. into collaborative filtering based recommendation systems.
Among various auxiliary information, Knowledge Graph (KG) is widely focused by researchers due to its highly efficient fact description capability and associated information between interpretable projects. A knowledge graph is a directed heteromorphic graph in which nodes correspond to entities and edges correspond to relationships. Researchers have proposed a number of knowledge maps, such as: NELL, DBpedia, and commercial Knowledge maps such as Google Knowledge Graph and Microsoft Satori. These knowledge maps have been successfully applied in a number of areas, such as knowledge map filling, human-machine question-answering, word embedding (10), and text classification.
Deep learning is a research hotspot of the current internet and artificial intelligence. The deep learning mainly generates high-level semantic abstraction from low-level attribute features, automatically digs out distributed feature representation of data, solves the problem that features need to be designed manually in the traditional machine learning, and makes great progress in the fields of image recognition, machine translation and the like. The deep learning based recommendation system has recently attracted much attention, and uses data related to users and commodity items as input, obtains hidden representations of the users and the items with corresponding attribute characteristics through a deep learning model, and recommends the items for the users based on the hidden representations.
Knowledge maps are widely used in various fields and researchers try to improve the performance of recommendation systems using knowledge maps. Existing knowledge-graph-based recommendation systems are classified into two categories:
(1) embedding (embedding) -based methods of this type use the Knowledge Graph Embedding (KGE) algorithm to preprocess the KG and embed the learned entities into the recommendation system framework. The embedding-based method utilizes the KG auxiliary recommendation system to improve the flexibility of the algorithm, but the KGE algorithm adopted by the method is more suitable for link prediction rather than the recommendation system.
(2) Path-based methods that explore the association patterns between entities in the KG as additional auxiliary information for the recommendation system. The path-based method uses the KG in a more intuitive manner, but depends heavily on manually set meta-paths, the generality cannot be guaranteed, and different meta-paths need to be set in different application scenarios. Furthermore, entities and relationships are not manually designed meta-paths in certain scenarios (e.g., news recommendations) within a domain.
The literature earlier applied graph embedding techniques to the recommendation field. And embedding (embedding) the movies and the user information in the Movielens into the same vector space, further calculating the spatial distance between the user and the movies, and generating a recommendation list. Wang et al embed medical knowledge maps, disease & patient bipartite graphs, and disease & drug bipartite graphs into low-dimensional vector spaces, respectively, recommending safer drug therapy for patients. Combining the knowledge-graph with the bipartite graph by weighted averaging generates patient and drug vectors containing finer grained attribute information, ultimately generating a list of drugs top-k for a given patient.
Ostuni et al fuse the implicit semantic feedback information in the KG path and propose a path algorithm SPrank based on implicit semantic feedback. The data set is mined based on the path features to capture complex relationships between items. The main idea of sprink is to explore paths in the semantic graph in order to find items related to the items of interest to the user. And (3) extracting features based on the path by analyzing the path, and generating a recommendation result by utilizing a learning algorithm combining a random forest and a gradient enhanced regression tree.
Disclosure of Invention
In order to more effectively fuse various data information, solve the problem of data sparseness and improve the accuracy of a recommendation system, the invention provides a personalized recommendation method fusing multiple information.
In order to achieve the purpose of the invention, the invention adopts the technical scheme that:
a personalized recommendation method fusing multiple information comprises the following steps:
s1, acquiring a user-project comment data set, acquiring feature word vectors of a user and a project respectively by adopting a word2vec algorithm, and acquiring similarity between the user and the project by adopting an FM algorithm;
s2, constructing an interaction matrix of the user and the project according to historical click project information of the user, and obtaining the predicted click probability of the user and the project by adopting a RippleNet algorithm in combination with a knowledge map;
and S3, dynamically fusing the similarity between the user and the project obtained in the step S1 and the predicted click probability between the user and the project obtained in the step S2 by using a dynamic fusion algorithm to obtain a predicted score, and providing a personalized recommendation list for the user based on the predicted score.
Further, the step S1 specifically includes the following sub-steps:
s1-1, obtaining all user-project comment information in a database, synthesizing comments of a user on all projects into text data representing the user information by adopting a word2vec algorithm, and integrating comments of all users received by a project into the text data of the project;
s1-2, respectively carrying out vectorization processing on the text data of the user information and the text data of the project obtained in the step S1-1 by adopting a word2vec algorithm to obtain feature word vectors of the user and the project;
and S1-3, combining the feature word vectors of the users and the projects obtained in the step S1-2 pairwise by adopting an FM algorithm, and adding cross item features to obtain the similarity of the users and the projects.
Further, in step S1-3, the model of the FM algorithm is represented as:
wherein m is0Representing global bias terms, m being a feature vector z of user u and item vuvM is a weight matrix of second order interactions, Mj,cIs the value of j row and c column of M, ij,icIs equal to zuvJ and c, and an i-dimensional hidden vector.
Further, the step S1 takes the square loss as the objective function of the parameter optimization, and is expressed as:
where O represents the set of observed user-item score pairs, yu,vRepresents the interaction history of user u with item v, theta represents all parameters, lambdaΘIndicating the L2 regularization parameter.
Further, the step S2 specifically includes the following sub-steps:
s2-1, setting the user set and the item set to U ═ U, respectively1,u2,...,umV ═ V } and V ═ V1,v2,...,vnAnd constructing an interaction matrix of the user and the project, wherein the interaction matrix is represented as:
Yuv={yuv|u∈U,v∈V}
wherein, yu,vRepresenting the interaction history of the user u and the item v, m representing the number of users, and n representing the number of items;
s2-2, according to the interaction matrix of the user and the project and the knowledge graph containing the relation-entity triple, defining the kth associated entity of the user u as:
wherein, (H, r, t) represents a relationship-entity triple contained in the knowledge graph, H represents a head entity, r represents a relationship, t represents a tail entity, and H represents the farthest position associated with the origin item;
defining the k jump ripple set of the user u on the knowledge graph G as follows:
s2-3, correspondingly creating an embedded vector v with d dimensions for each item v, and combining each triple (h) of the 1 st jump ripple set of the user ui,ri,ti) The correlation coefficient with v is:
wherein R isiRepresents the relation riEmbedded vector of hiRepresents a head entity hiThe embedded vector of (2);
s2-4, according to the correlation coefficient, the tail entity t of the first hop ripple set of the user uiCalculating weighted sum to obtain user u pairsThe multi-level reverberations in item v are:
according to the multi-level reverberation of the user u on the item v, the embedded vector of the user u on the item v is defined as follows:
wherein, αiIs a positive mixing parameter;
s2-5, obtaining the predicted click probability of the user and the project according to the embedded vector of the user u of the project v, wherein the predicted click probability is expressed as follows:
wherein z isKGRepresenting recommendations based on knowledge-graph data.
Further, the loss function of the rippelenet algorithm in the step S2 is expressed as:
Γ=∑(u,v)∈Y-yuvlogσ(uTv)+(1-yuv)log(1-σ(uTv))。
further, in step S3, the similarity between the user and the item and the predicted click probability between the user and the item are dynamically fused by using a dynamic fusion algorithm to obtain a predicted score, which is expressed as:
The invention has the following beneficial effects: according to the invention, the knowledge graph and the comment content are used as multi-source data, different algorithms are used for processing the data, and a dynamic fusion method is adopted for effective combination, so that more accurate personalized recommendation service is provided for users, a better recommendation effect can be realized, and the problem of reduced recommendation accuracy caused by sparse data can be effectively solved.
Drawings
FIG. 1 is a flow chart of a personalized recommendation method fusing multiple information according to the present invention;
FIG. 2 is a schematic view of a corrugated structure in an embodiment of the present invention;
FIG. 3 is a schematic diagram of the REME model structure in the embodiment of the present invention;
FIG. 4 is a graph showing a comparison of recall ratios of different models of a data set AZ according to an embodiment of the present invention;
FIG. 5 is a graph illustrating the recall ratio comparison between different models of the data set SC according to the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1, an embodiment of the present invention provides a personalized recommendation method fusing multiple pieces of information, including the following steps S1 to S3:
s1, acquiring a user-project comment data set, acquiring feature word vectors of a user and a project respectively by adopting a word2vec algorithm, and acquiring similarity between the user and the project by adopting an FM algorithm;
in this embodiment, most current social media websites and e-commerce systems allow users to post text comments. The text contains rich information and can find the potential interest points of the user, so that the method applies the user comment text to the recommendation system, thereby improving the accuracy of the recommendation system.
The invention applies a Word2vec model based on deep learning to a recommendation system, wherein the Word2vec model is a Word embedding model based on Skip-gram or CBOW (Continuous Bag-of-Words). Under the condition of no part-of-speech tagging, word2vec can be used for learning vector representation of words from original linguistic data, and semantic and syntactic similarities among the words are compared.
The method utilizes word2vec to process the text, synthesizes comments of a user on all merchants into text data representing user information, similarly integrates comments of all users received by a merchant into the text data of the merchant, extracts potential text features of the user and the project, matches the potential text features according to the features, and finally carries out reasonable recommendation.
The step S1 specifically includes the following sub-steps:
s1-1, obtaining all user-project comment information in a database, synthesizing comments of a user on all projects into text data representing the user information by adopting a word2vec algorithm, and integrating comments of all users received by a project into the text data of the project;
s1-2, respectively carrying out vectorization processing on the text data of the user information and the text data of the project obtained in the step S1-1 by adopting a word2vec algorithm to obtain feature word vectors of the user and the project;
word2vec can be regarded as a neural network, which is mainly used for training each Word in the natural language into a Word vector through a three-layer neural network, so that the problems that the traditional bag-of-words (BOW) model cannot represent text context semantic information and cause dimension disasters are well solved, and words similar in semantics have similar vector representation.
The word2vec algorithm adopts a CBOW prediction model and a hierarchical softmax (high speed tree, HS) training model, CBOW predicts the posterior probability of a central word according to a known context word, and the model structure is as follows:
1) input layer, context word vector (w).
2) A projection layer to add the 2c context (w) word vectors of the input layer.
3) And the output layer outputs the intermediate word vectors.
The training function for CBOW is:
maxΦ=∑W∈Clogp(w|Context(w))
and S1-3, combining the feature word vectors of the users and the projects obtained in the step S1-2 pairwise by adopting an FM algorithm, and adding cross item features to obtain the similarity of the users and the projects.
The invention firstly sets the input of FM algorithm: constructing a feature vector of the user and the project based on Word2vec, wherein the feature vector comprises the following components:
tu=word2vec(Tu)
tv=word2vec(Tv)
wherein, Tu,TvComments, t, representing user u and item v, respectivelyuAnd tvAre the corresponding user and item feature vectors.
Combining the feature word vectors of the user and the project pairwise, and expressing as follows:
zuv=tu⊙tv
where ⊙ denotes the vector dot product operation, zuvIs a vector of correlation coefficients between u and v.
The invention adopts FM algorithm to combine the feature word vectors of the user and the project pairwise, and adds cross item features, thereby obviously improving the accuracy of the model.
The model for the FM algorithm is represented as:
wherein m is0Representing global bias terms, m being a feature vector z of user u and item vuvM is a weight matrix of second order interactions, Mj,cIs the value of j row and c column of M, ij,icIs equal to zuvJ and c, and an i-dimensional hidden vector.
Finally, the quadratic loss is taken as an objective function for parameter optimization, expressed as:
wherein O represents an observed user-itemSet of scoring pairs, yu,vRepresents the interaction history of user u with item v, theta represents all parameters, lambdaΘRepresenting the L2 regularization parameter, the second term λΘ||Θ||2Prevention of model overfitting is achieved.
S2, constructing an interaction matrix of the user and the project according to historical click project information of the user, and obtaining the predicted click probability of the user and the project by adopting a RippleNet algorithm in combination with a knowledge map;
in this embodiment, the existing rippet algorithm only uses a knowledge graph formed by historical click records of users and structured knowledge, and does not consider users and item comment data containing rich knowledge, so that the hidden features of the users and merchants are extracted by using word2vec, the hidden features are processed by a Factorization Machine (FM) algorithm, and then the click probability value of the users is calculated; and combining the value obtained by the RippleNet algorithm with the value obtained by the word2vec + FM by adding a dynamic parameter to finally obtain a click rate prediction value.
The step S2 specifically includes the following sub-steps:
s2-1, setting the user set and the item set to U ═ U, respectively1,u2,...,umV ═ V } and V ═ V1,v2,...,vnAnd constructing an interaction matrix of the user and the project, wherein the interaction matrix is represented as:
Yuv={yuv|u∈U,v∈V}
wherein, yu,vRepresenting the interaction history of the user u and the item v, m representing the number of users, and n representing the number of items; y isuvWhen the value is 1, it indicates that there is a history interaction between the user u and the item v, that is, the user u has clicked to view the item v.
S2-2, according to the interaction matrix of the user and the project and the knowledge graph containing the relation-entity triple, defining the kth associated entity of the user u as:
wherein, (h, r, t) represents a relation-entity triple contained in the knowledge graph, h represents a head entity, r represents a tail entity, and t represents a relation; h belongs to E, R belongs to R, t belongs to E, E and R respectively represent an entity set and a relation set in the knowledge graph G, and H represents the farthest position related to the origin item set by the experiment.
The objective of the RippleNet algorithm is to obtain the click prediction scores of a user u and an undetermined item v under the condition of the existing interaction matrix Y and knowledge graph G. Namely, the user u and the item v are used as input, and the probability that the user u can click the item v is output.
Defining the k jump ripple set of the user u on the knowledge graph G as follows:
wherein epsilonu 0={v|y uv1 represents that user u has clicked on item v, i.e. the user u's seed set in G. The superscript 0 indicates seed node.
The meaning of "corrugation" includes:
1) regarding the historical clicks of the user as individual water drops, a plurality of ripples are formed on the water surface of the knowledge graph, and the propagation of the ripples can be used for representing the potential interest propagation path of the user.
2) The user's degree of potential interest becomes smaller as k increases, i.e., the farther the propagation distance, the less similar to the initial item.
The "ripple set" is shown in FIG. 2: triangles represent the "seed set" that the user initially clicked on, squares represent the first Hop ripple set (Hop1) directly connected to the seed set, filled circles represent the second Hop ripple set (Hop2), and so on.
S2-3, correspondingly creating a d-dimensional embedding vector v for each item v, wherein the item embedding vector is an item represented by characteristic information such as one-hot ID, distribution, bag of words and the like. Hop1 ripple set S of existing user uu 1Embedding each triplet (h) of the 1 st jump ripple set of user u with an item into vector vi,ri,ti) Correlation with vThe coefficients are:
wherein R isiRepresents the relation riIs a d x d matrix; h isiRepresents a head entity hiIs a d-dimensional vector; coefficient of correlation piRepresenting item v and head entity hiIn the relation RiTo a similar degree above.
S2-4, obtaining the correlation coefficient piThen, for Su 1Tail entity t ofiCalculating the weighted sum to obtain a vector Ou 1:
Vector Ou 1Representing a 1 st order response (Responding) to item v based on user u's historical interactions is equivalent to representing user u with the characteristics of item v, rather than using a separate characteristic vector. Similarly, the 2 nd order reverberation and the multi-order reverberation of the user u on v can be obtained.
According to the multi-level reverberation of the user u on the item v, the embedded vector of the user u on the item v is defined as follows:
wherein, αiFor positive trainable blending parameters, αi>0, and the sum thereof is 1;
s2-5, obtaining the predicted click probability of the user and the project according to the embedded vector of the user u of the project v, wherein the predicted click probability is expressed as follows:
wherein z isKGRepresenting recommendations based on knowledge-graph data.
The penalty function for the rippeenet algorithm is derived from the above equation as:
Γ=∑(u,v)∈Y-yuvlogσ(uTv)+(1-yuv)log(1-σ(uTv))。
wherein, yuvAnd when the value is 1, the historical interaction between the user u and the item v is shown, namely the user u clicks and watches the item v once. The defined loss function is used to train and adjust the parameters.
And S3, dynamically fusing the similarity between the user and the project obtained in the step S1 and the predicted click probability between the user and the project obtained in the step S2 by using a dynamic fusion algorithm to obtain a predicted score, and providing a personalized recommendation list for the user based on the predicted score.
In order to make the integration of the two hidden features complement each other and generate a better prediction result, a linear interpolation α is added, and a dynamic fusion recommendation Model REME (rippeenet and word2 vecsuation Model) is provided, as shown in fig. 3, the similarity between the user and the project obtained in step S1 and the predicted click probability between the user and the project obtained in step S2 are dynamically fused to obtain a prediction score, which is expressed as:
The invention adopts a random gradient descent and back propagation method to optimize the parameters of the formula, and the specific process is as follows:
firstly, counting a ripple set of each user and a set of all comments of the user, and converting a comment set file into a corresponding user feature vector by using a word2vec algorithm;
within a preset iteration number T, updating the parameter { α) by using a random gradient descent algorithm and a back propagation algorithmi,i=1,2,....,H};
Calculating a corresponding project characteristic vector for each project by using the operation of calculating the same user characteristic vector;
after all the user-item feature vectors are calculated, traversing the user-item pairs of the test set, and calculating a user-item correlation coefficient vector zuv;
Updating the parameter theta by using a random gradient descent algorithm and a back propagation algorithm based on an FM algorithm;
final output parameter { αiI 1, 2.. said, H } and Θ.
To illustrate that the REME algorithm has better time performance while improving the accuracy of the algorithm, the invention analyzes the time complexity of the REME algorithm.
Firstly, creating a user feature vector: calculating the time complexity of the user ripple set to be O (a multiplied by m), wherein a is the number of users and is a constant; the time complexity of the word2vec algorithm is O (log (n)). Combining the above steps, the time complexity of creating the user feature vector is O (a (m + log (n))), and since the value of n is much larger than m, it is approximately O (log (n)). Similar to creating the user feature vector, the time complexity of creating the project feature vector is O (log (n)). The time complexity of calculating the cross vector of the user feature and the project feature is O (log)2(n)), overall, the algorithm time complexity of REME is O (log)2(n))。
The invention uses specific examples to compare the performance of the invention with different algorithms.
A general Yelp dataset was used in the experiments for the recommended performance analysis. The invention extracts restaurant data in two different regions, namely, Arizona (AZ) and Carolina (SC), in a Yelp dataset, and comprises comment data of users and attribute datasets of merchants. The comment data of the user mainly contains information such as comments, scores and the like of the user. The user comment was considered to have checked in once in the experiment. The attribute data set of the merchant mainly contains information such as the ID, name, location (region, city, longitude and latitude, etc.), restaurant category, and tag of the merchant. The experiment utilized Microsoft Satori to build a knowledge graph for Yelp merchants.
The statistical information of the data sets of the two screened areas is shown in table 1.
Table 1 various statistical information of data sets
From table 1, it can be found that the number of AZ users is about twice that of the SC, while the number of merchants is about five times that of the SC, thus bringing about differences in data sparsity, resulting in some differences in final experimental results.
In Ripplenet, the ripple jump number H is set to 2, and it is proved from experimental results that a large ripple jump number does not improve the performance, but rather increases the extra calculation overhead.
The parameters of the complete experiment are set as that the embedding dimension d of the merchant and the knowledge graph is 16, the learning rate η is 0.02, and the regularization parameter lambda is1=10-7,λ2=0.01,H=2。
For word2vec, the dimension of the resulting embedded vector is set to be d as well. The hyperparameter was determined by validating the AUC curve on the data set.
To achieve better experimental results, training was performed on each data set with a training, evaluation, and test set ratio of 6:2: 2. Each experiment was repeated 5 times and the average was taken as the final data.
The invention adopts the following two evaluation indexes to evaluate the performance of the algorithm:
1) for click-through rate (CTR) prediction, acc (accuracy) and AUC are used herein to evaluate the performance of CTR prediction.
2) For the top-k recommendation, a call @ k is used as an evaluation index, and the call @ k is defined as the formula:
wherein recall @ k represents the recall rate in the top-k recommendation list, i.e., the probability that the user clicks in the recommendation list. Where hit represents the number of times that a user in the test set clicks on a restaurant in the recommendation list, and recall represents the total number of check-ins for the test set.
In the present invention, the following three classical recommendation algorithms are mainly compared:
1) CKE: the CKE mainly combines collaborative filtering and structural knowledge, text knowledge and image knowledge into a unified framework for recommendation.
2) DKN: DKN treat entity embedding and word embedding as multiple channels and combine them in CNN for CTR prediction. The merchant tag was used in the experiment as a text input at DKN.
3) PMF: the PMF mainly utilizes check-in information of a user, decomposes a check-in matrix of 'user-interest point' into a user implicit factor matrix and an interest point implicit factor matrix, predicts the score of the user for the interest point by utilizing the implicit factor matrices, and further generates a recommendation list for the user.
The results of top-k recommendations and CTR predictions for different algorithms are shown in FIGS. 4 and 5, and in Table 2.
TABLE 2 AUC and Accuracy results in click Rate prediction
The experimental results show that:
(1) the experimental effect of SC was always better than that of AZ on different data sets, because there was a difference in the sparsity of the data in the two regions, and the average flow per merchant for AZ was less than that of SC.
(2) CKE uses only structural knowledge here and therefore is less effective than rippenet. RippleNet has a better result compared with other models, but only a knowledge graph is considered, and data such as comment text information are not effectively utilized, so that the recommendation effect is not good as REME. DKN, because only the label information is used here, no other effective information is considered.
(3) The recommended effect of PMF is always the worst in which dataset because the user's check-in data is sparse. In addition, the PMF algorithm does not fuse other content data information.
(4) In both data sets, REME achieved the best recommendations, which were 7.8% -19.3% and 4.9% -20% higher in AUC in both AZ and SC data sets, respectively, compared to the other baseline, and also achieved the best results in the call @ k test.
Compared with the conventional typical model, the REME model provided by the invention obviously improves the recommendation effect under the condition of effectively fusing various data, and can obtain good recommendation effect under the condition of sparse data, so that the REME model can effectively solve the negative influence of sparse data on the recommendation result.
It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.
Claims (7)
1. A personalized recommendation method fusing multiple information is characterized by comprising the following steps:
s1, acquiring a user-project comment data set, acquiring feature word vectors of a user and a project respectively by adopting a word2vec algorithm, and acquiring similarity between the user and the project by adopting an FM algorithm;
s2, constructing an interaction matrix of the user and the project according to historical click project information of the user, and obtaining the predicted click probability of the user and the project by adopting a RippleNet algorithm in combination with a knowledge map;
and S3, dynamically fusing the similarity between the user and the project obtained in the step S1 and the predicted click probability between the user and the project obtained in the step S2 by using a dynamic fusion algorithm to obtain a predicted score, and providing a personalized recommendation list for the user based on the predicted score.
2. The method for personalized recommendation fusing multiple information according to claim 1, wherein the step S1 specifically comprises the following sub-steps:
s1-1, obtaining all user-project comment information in a database, synthesizing comments of a user on all projects into text data representing the user information by adopting a word2vec algorithm, and integrating comments of all users received by a project into the text data of the project;
s1-2, respectively carrying out vectorization processing on the text data of the user information and the text data of the project obtained in the step S1-1 by adopting a word2vec algorithm to obtain feature word vectors of the user and the project;
and S1-3, combining the feature word vectors of the users and the projects obtained in the step S1-2 pairwise by adopting an FM algorithm, and adding cross item features to obtain the similarity of the users and the projects.
3. The method for personalized recommendation fusing multiple information according to claim 2, wherein in step S1-3, the model of FM algorithm is represented as:
wherein m is0Representing global bias terms, m being a feature vector z of user u and item vuvM is a weight matrix of second order interactions, Mj,cIs the value of j row and c column of M, ij,icIs equal to zuvJ and c, and an i-dimensional hidden vector.
4. The method for personalized recommendation fusing multiple informations according to claim 3, wherein the step S1 adopts a square loss as an objective function of parameter optimization, and is expressed as:
where O represents the set of observed user-item score pairs, yu,vRepresents the interaction history of user u with item v, theta represents all parameters, lambdaΘIndicating the L2 regularization parameter.
5. The method for personalized recommendation fusing multiple information according to claim 4, wherein the step S2 specifically comprises the following sub-steps:
s2-1, setting the user set and the item set to U ═ U, respectively1,u2,...,umV ═ V } and V ═ V1,v2,...,vnAnd constructing an interaction matrix of the user and the project, wherein the interaction matrix is represented as:
Yuv={yuv|u∈U,v∈V}
wherein, yu,vRepresenting the interaction history of the user u and the item v, m representing the number of users, and n representing the number of items;
s2-2, according to the interaction matrix of the user and the project and the knowledge graph containing the relation-entity triple, defining the kth associated entity of the user u as:
wherein, (H, r, t) represents a relationship-entity triple contained in the knowledge graph, H represents a head entity, r represents a relationship, t represents a tail entity, and H represents the farthest position associated with the origin item;
defining the k jump ripple set of the user u on the knowledge graph G as follows:
s2-3, creating an embedded vector v with d dimensions corresponding to each item v, and collecting the 1 st jump ripple set of the user uEach triplet (h)i,ri,ti) The correlation coefficient with v is:
wherein R isiRepresents the relation riEmbedded vector of hiRepresents a head entity hiThe embedded vector of (2);
s2-4, according to the correlation coefficient, the tail entity t of the first hop ripple set of the user uiCalculating the weighted sum to obtain the first-order reverberation of the user u to the item v as follows:
according to the multi-level reverberation of the user u on the item v, the embedded vector of the user u on the item v is defined as follows:
wherein, αiIs a positive mixing parameter;
s2-5, obtaining the predicted click probability of the user and the project according to the embedded vector of the user u of the project v, wherein the predicted click probability is expressed as follows:
wherein z isKGRepresenting recommendations based on knowledge-graph data.
6. The method for personalized recommendation fusing multiple information according to claim 5, wherein the loss function of the rippley algorithm in the step S2 is expressed as:
Γ=Σ(u,v)∈Y-yuvlogσ(uTv)+(1-yuv)log(1-σ(uTv))。
7. the method for recommending fused multiple information items according to claim 6, wherein in step S3, the similarity between users and items and the predicted click probability between users and items are dynamically fused by using a dynamic fusion algorithm to obtain a predicted score, which is expressed as:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010054209.1A CN111259133B (en) | 2020-01-17 | 2020-01-17 | Personalized recommendation method integrating multiple information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010054209.1A CN111259133B (en) | 2020-01-17 | 2020-01-17 | Personalized recommendation method integrating multiple information |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111259133A true CN111259133A (en) | 2020-06-09 |
CN111259133B CN111259133B (en) | 2021-02-19 |
Family
ID=70952218
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010054209.1A Active CN111259133B (en) | 2020-01-17 | 2020-01-17 | Personalized recommendation method integrating multiple information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111259133B (en) |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111784081A (en) * | 2020-07-30 | 2020-10-16 | 南昌航空大学 | Social network link prediction method adopting knowledge graph embedding and time convolution network |
CN111782813A (en) * | 2020-07-07 | 2020-10-16 | 支付宝(杭州)信息技术有限公司 | User community evaluation method, device and equipment |
CN111859125A (en) * | 2020-07-09 | 2020-10-30 | 威海天鑫现代服务技术研究院有限公司 | Semantic network construction and service recommendation method oriented to intellectual property technical resource field |
CN111932308A (en) * | 2020-08-13 | 2020-11-13 | 中国工商银行股份有限公司 | Data recommendation method, device and equipment |
CN112163929A (en) * | 2020-09-27 | 2021-01-01 | 中国平安财产保险股份有限公司 | Service recommendation method and device, computer equipment and storage medium |
CN112487200A (en) * | 2020-11-25 | 2021-03-12 | 吉林大学 | Improved deep recommendation method containing multi-side information and multi-task learning |
CN112633504A (en) * | 2020-12-23 | 2021-04-09 | 北京工业大学 | Wisdom cloud knowledge service system and method for fruit tree diseases and insect pests based on knowledge graph |
CN112733040A (en) * | 2021-01-27 | 2021-04-30 | 中国科学院地理科学与资源研究所 | Travel itinerary recommendation method |
CN113032618A (en) * | 2021-03-26 | 2021-06-25 | 齐鲁工业大学 | Music recommendation method and system based on knowledge graph |
CN113190593A (en) * | 2021-05-12 | 2021-07-30 | 《中国学术期刊(光盘版)》电子杂志社有限公司 | Search recommendation method based on digital human knowledge graph |
CN113392325A (en) * | 2021-06-21 | 2021-09-14 | 电子科技大学 | Deep learning-based information recommendation method |
CN114925294A (en) * | 2022-06-04 | 2022-08-19 | 上海交通大学 | Position prediction system and method based on graph-enhanced time-space model |
CN115270005A (en) * | 2022-09-30 | 2022-11-01 | 腾讯科技(深圳)有限公司 | Information recommendation method, device, equipment and storage medium |
CN115982646A (en) * | 2023-03-20 | 2023-04-18 | 西安弘捷电子技术有限公司 | Multi-source test data management method and system based on cloud platform |
CN116701772A (en) * | 2023-08-03 | 2023-09-05 | 广东美的暖通设备有限公司 | Data recommendation method and device, computer readable storage medium and electronic equipment |
WO2023197910A1 (en) * | 2022-04-12 | 2023-10-19 | 华为技术有限公司 | User behavior prediction method and related device thereof |
CN117786234A (en) * | 2024-02-28 | 2024-03-29 | 云南师范大学 | Multimode resource recommendation method based on two-stage comparison learning |
CN118245849A (en) * | 2024-05-21 | 2024-06-25 | 北京德和顺天科技有限公司 | Automobile fault detection method based on big data |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103995823A (en) * | 2014-03-25 | 2014-08-20 | 南京邮电大学 | Information recommending method based on social network |
US20170098236A1 (en) * | 2015-10-02 | 2017-04-06 | Yahoo! Inc. | Exploration of real-time advertising decisions |
CN107330461A (en) * | 2017-06-27 | 2017-11-07 | 安徽师范大学 | Collaborative filtering recommending method based on emotion with trust |
CN107562795A (en) * | 2017-08-01 | 2018-01-09 | 广州市香港科大霍英东研究院 | Recommendation method and device based on Heterogeneous Information network |
CN109241424A (en) * | 2018-08-29 | 2019-01-18 | 陕西师范大学 | A kind of recommended method |
WO2018226888A8 (en) * | 2017-06-06 | 2019-02-21 | Diffeo, Inc. | Knowledge operating system |
CN109388731A (en) * | 2018-08-31 | 2019-02-26 | 昆明理工大学 | A kind of music recommended method based on deep neural network |
CN109871858A (en) * | 2017-12-05 | 2019-06-11 | 北京京东尚科信息技术有限公司 | Prediction model foundation, object recommendation method and system, equipment and storage medium |
CN110245285A (en) * | 2019-04-30 | 2019-09-17 | 中国科学院信息工程研究所 | A kind of personalized recommendation method based on Heterogeneous Information network |
-
2020
- 2020-01-17 CN CN202010054209.1A patent/CN111259133B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103995823A (en) * | 2014-03-25 | 2014-08-20 | 南京邮电大学 | Information recommending method based on social network |
US20170098236A1 (en) * | 2015-10-02 | 2017-04-06 | Yahoo! Inc. | Exploration of real-time advertising decisions |
WO2018226888A8 (en) * | 2017-06-06 | 2019-02-21 | Diffeo, Inc. | Knowledge operating system |
CN107330461A (en) * | 2017-06-27 | 2017-11-07 | 安徽师范大学 | Collaborative filtering recommending method based on emotion with trust |
CN107562795A (en) * | 2017-08-01 | 2018-01-09 | 广州市香港科大霍英东研究院 | Recommendation method and device based on Heterogeneous Information network |
CN109871858A (en) * | 2017-12-05 | 2019-06-11 | 北京京东尚科信息技术有限公司 | Prediction model foundation, object recommendation method and system, equipment and storage medium |
CN109241424A (en) * | 2018-08-29 | 2019-01-18 | 陕西师范大学 | A kind of recommended method |
CN109388731A (en) * | 2018-08-31 | 2019-02-26 | 昆明理工大学 | A kind of music recommended method based on deep neural network |
CN110245285A (en) * | 2019-04-30 | 2019-09-17 | 中国科学院信息工程研究所 | A kind of personalized recommendation method based on Heterogeneous Information network |
Non-Patent Citations (2)
Title |
---|
HONGWEI WANG: "RippleNet: Propagating User Preferences on the Knowledge Graph for Recommender Systems", 《URL:HTTPS:https://ARXIV.ORG/PDF/1803.03467.PDF》 * |
熊海涛: "《面向复杂数据推荐分析研究》", 31 January 2015 * |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111782813A (en) * | 2020-07-07 | 2020-10-16 | 支付宝(杭州)信息技术有限公司 | User community evaluation method, device and equipment |
CN111782813B (en) * | 2020-07-07 | 2023-10-31 | 支付宝(杭州)信息技术有限公司 | User community evaluation method, device and equipment |
CN111859125A (en) * | 2020-07-09 | 2020-10-30 | 威海天鑫现代服务技术研究院有限公司 | Semantic network construction and service recommendation method oriented to intellectual property technical resource field |
CN111784081B (en) * | 2020-07-30 | 2022-03-01 | 南昌航空大学 | Social network link prediction method adopting knowledge graph embedding and time convolution network |
CN111784081A (en) * | 2020-07-30 | 2020-10-16 | 南昌航空大学 | Social network link prediction method adopting knowledge graph embedding and time convolution network |
CN111932308A (en) * | 2020-08-13 | 2020-11-13 | 中国工商银行股份有限公司 | Data recommendation method, device and equipment |
CN112163929A (en) * | 2020-09-27 | 2021-01-01 | 中国平安财产保险股份有限公司 | Service recommendation method and device, computer equipment and storage medium |
CN112163929B (en) * | 2020-09-27 | 2024-04-05 | 中国平安财产保险股份有限公司 | Service recommendation method, device, computer equipment and storage medium |
CN112487200A (en) * | 2020-11-25 | 2021-03-12 | 吉林大学 | Improved deep recommendation method containing multi-side information and multi-task learning |
CN112633504A (en) * | 2020-12-23 | 2021-04-09 | 北京工业大学 | Wisdom cloud knowledge service system and method for fruit tree diseases and insect pests based on knowledge graph |
CN112733040B (en) * | 2021-01-27 | 2021-07-30 | 中国科学院地理科学与资源研究所 | Travel itinerary recommendation method |
CN112733040A (en) * | 2021-01-27 | 2021-04-30 | 中国科学院地理科学与资源研究所 | Travel itinerary recommendation method |
CN113032618A (en) * | 2021-03-26 | 2021-06-25 | 齐鲁工业大学 | Music recommendation method and system based on knowledge graph |
CN113190593A (en) * | 2021-05-12 | 2021-07-30 | 《中国学术期刊(光盘版)》电子杂志社有限公司 | Search recommendation method based on digital human knowledge graph |
CN113392325A (en) * | 2021-06-21 | 2021-09-14 | 电子科技大学 | Deep learning-based information recommendation method |
WO2023197910A1 (en) * | 2022-04-12 | 2023-10-19 | 华为技术有限公司 | User behavior prediction method and related device thereof |
CN114925294A (en) * | 2022-06-04 | 2022-08-19 | 上海交通大学 | Position prediction system and method based on graph-enhanced time-space model |
CN115270005B (en) * | 2022-09-30 | 2022-12-23 | 腾讯科技(深圳)有限公司 | Information recommendation method, device, equipment and storage medium |
CN115270005A (en) * | 2022-09-30 | 2022-11-01 | 腾讯科技(深圳)有限公司 | Information recommendation method, device, equipment and storage medium |
CN115982646A (en) * | 2023-03-20 | 2023-04-18 | 西安弘捷电子技术有限公司 | Multi-source test data management method and system based on cloud platform |
CN116701772A (en) * | 2023-08-03 | 2023-09-05 | 广东美的暖通设备有限公司 | Data recommendation method and device, computer readable storage medium and electronic equipment |
CN116701772B (en) * | 2023-08-03 | 2024-03-19 | 广东美的暖通设备有限公司 | Data recommendation method and device, computer readable storage medium and electronic equipment |
CN117786234A (en) * | 2024-02-28 | 2024-03-29 | 云南师范大学 | Multimode resource recommendation method based on two-stage comparison learning |
CN117786234B (en) * | 2024-02-28 | 2024-04-26 | 云南师范大学 | Multimode resource recommendation method based on two-stage comparison learning |
CN118245849A (en) * | 2024-05-21 | 2024-06-25 | 北京德和顺天科技有限公司 | Automobile fault detection method based on big data |
Also Published As
Publication number | Publication date |
---|---|
CN111259133B (en) | 2021-02-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111259133B (en) | Personalized recommendation method integrating multiple information | |
US11914674B2 (en) | System and method for extremely efficient image and pattern recognition and artificial intelligence platform | |
US11195057B2 (en) | System and method for extremely efficient image and pattern recognition and artificial intelligence platform | |
US11074495B2 (en) | System and method for extremely efficient image and pattern recognition and artificial intelligence platform | |
Taneja et al. | Modeling user preferences using neural networks and tensor factorization model | |
US20140079297A1 (en) | Application of Z-Webs and Z-factors to Analytics, Search Engine, Learning, Recognition, Natural Language, and Other Utilities | |
Zhang et al. | Cross-domain recommendation with semantic correlation in tagging systems | |
CN114065048A (en) | Article recommendation method based on multi-different-pattern neural network | |
Yang et al. | POI neural-rec model via graph embedding representation | |
CN112257841A (en) | Data processing method, device and equipment in graph neural network and storage medium | |
CN112328832B (en) | Movie recommendation method integrating labels and knowledge graph | |
Ma et al. | Exploring multiple spatio-temporal information for point-of-interest recommendation | |
Park et al. | An effective 3D text recurrent voting generator for metaverse | |
Shokeen et al. | An application-oriented review of deep learning in recommender systems | |
Gao et al. | ST-RNet: A time-aware point-of-interest recommendation method based on neural network | |
Abdollahi | Accurate and justifiable: new algorithms for explainable recommendations. | |
Gan et al. | CDMF: a deep learning model based on convolutional and dense-layer matrix factorization for context-aware recommendation | |
Sun | Music Individualization Recommendation System Based on Big Data Analysis | |
Liao et al. | An integrated model based on deep multimodal and rank learning for point-of-interest recommendation | |
Xing et al. | DynHEN: A heterogeneous network model for dynamic bipartite graph representation learning | |
Drif et al. | A sentiment enhanced deep collaborative filtering recommender system | |
CN116610874A (en) | Cross-domain recommendation method based on knowledge graph and graph neural network | |
Sangeetha et al. | Predicting personalized recommendations using GNN | |
Sangeetha et al. | An Enhanced Neural Graph based Collaborative Filtering with Item Knowledge Graph | |
Ao et al. | Deep Collaborative Filtering Recommendation Algorithm Based on Sentiment Analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |