CN117493704A - User credibility calculation method and device, electronic equipment and medium - Google Patents

User credibility calculation method and device, electronic equipment and medium Download PDF

Info

Publication number
CN117493704A
CN117493704A CN202311675070.2A CN202311675070A CN117493704A CN 117493704 A CN117493704 A CN 117493704A CN 202311675070 A CN202311675070 A CN 202311675070A CN 117493704 A CN117493704 A CN 117493704A
Authority
CN
China
Prior art keywords
information
text information
target
network
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311675070.2A
Other languages
Chinese (zh)
Inventor
栾吉海
魏依鹤
李娟�
宋志刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Agricultural Bank of China
Original Assignee
Agricultural Bank of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agricultural Bank of China filed Critical Agricultural Bank of China
Priority to CN202311675070.2A priority Critical patent/CN117493704A/en
Publication of CN117493704A publication Critical patent/CN117493704A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/169Annotation, e.g. comment data or footnotes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method, a device, electronic equipment and a medium for calculating user credibility, wherein the method comprises the following steps: acquiring network text information of a target user, wherein the network text information comprises at least one of page text information, video text information and original comment information; performing validity filtering on the network text information to obtain valid text information; and determining an emotion analysis result of the target user based on the valid text information, wherein the emotion analysis result is used for calculating the credibility of the target user. According to the method, the network text information is effectively filtered, so that an information basis is provided for the subsequent calculation of the user credibility, and the accuracy of the user credibility is further improved.

Description

User credibility calculation method and device, electronic equipment and medium
Technical Field
The present invention relates to the field of network technologies, and in particular, to a method and apparatus for calculating user credibility, an electronic device, and a medium.
Background
The rapid development of the internet industry has led to the explosion of internet information, which provides convenience for the financial industry to evaluate the credibility of users for risk management.
The information sources in the existing user credibility calculation method are complex, and certain interference exists in emotion analysis, so that the accuracy of the user credibility is reduced.
Disclosure of Invention
The invention provides a method, a device, electronic equipment and a medium for calculating user credibility, so as to improve the accuracy of the user credibility.
According to an aspect of the present invention, there is provided a method for calculating user credibility, the method comprising:
acquiring network text information of a target user, wherein the network text information comprises at least one of page text information, video text information and original comment information;
performing validity filtering on the network text information to obtain valid text information;
and determining an emotion analysis result of the target user based on the valid text information, wherein the emotion analysis result is used for calculating the credibility of the target user.
According to another aspect of the present invention, there is provided a computing device of user credibility, comprising:
the system comprises an acquisition module, a storage module and a display module, wherein the acquisition module is used for acquiring network text information of a target user, and the network text information comprises at least one item of page text information, video text information and original comment information;
The filtering module is used for effectively filtering the network text information to obtain effective text information;
and the determining module is used for determining an emotion analysis result of the target user based on the valid text information, and the emotion analysis result is used for calculating the credibility of the target user.
According to another aspect of the present invention, there is provided an electronic apparatus including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the method of calculating user trustworthiness according to any one of the embodiments of the present invention.
According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to implement a method for calculating user confidence according to any of the embodiments of the present invention when executed.
The embodiment of the invention provides a method, a device, electronic equipment and a medium for calculating user credibility, wherein the method comprises the following steps: acquiring network text information of a target user, wherein the network text information comprises at least one of page text information, video text information and original comment information; performing validity filtering on the network text information to obtain valid text information; and determining an emotion analysis result of the target user based on the valid text information, wherein the emotion analysis result is used for calculating the credibility of the target user. By utilizing the technical scheme, the information foundation is provided for the subsequent calculation of the user credibility by effectively filtering the network text information, and the accuracy of the user credibility is further improved.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a method for calculating user confidence level according to a first embodiment of the present invention;
FIG. 2 is a flow chart for obtaining web text information according to a first embodiment of the present invention;
FIG. 3 is a flow chart of a focused web crawler process provided in accordance with a first embodiment of the present invention;
FIG. 4 is a flow chart of text extraction provided in accordance with a first embodiment of the present invention;
FIG. 5 is a flow chart of a text clustering provided in accordance with a first embodiment of the present invention;
FIG. 6 is a flow chart for obtaining valid text information according to a first embodiment of the present invention;
FIG. 7 is a flowchart of a method for calculating user confidence level according to a second embodiment of the present invention;
FIG. 8 is a flow chart for obtaining valid text information according to a second embodiment of the present invention;
fig. 9 is an overall architecture diagram of a method for calculating user credibility according to a second embodiment of the present invention;
FIG. 10 is a flowchart of extracting web page information according to a second embodiment of the present invention;
FIG. 11 is a schematic diagram of a user confidence level calculating apparatus according to a third embodiment of the present invention;
fig. 12 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example 1
Fig. 1 is a flowchart of a method for calculating user reliability according to an embodiment of the present invention, where the method may be performed by a device for calculating user reliability, and the device for calculating user reliability may be implemented in hardware and/or software, and the device for calculating user reliability may be configured in an electronic device. As shown in fig. 1, the method includes:
S110, acquiring network text information of a target user, wherein the network text information comprises at least one item of page text information, video text information and original comment information.
The web text information may be considered as text information related to the target user on the public network, and the source of the web text information is not limited, for example, the web text information may include at least one of page text information, video text information and original comment information, and further, the embodiment may acquire different web text information of the target user through different acquisition modes.
In one embodiment, the obtaining the web text information of the target user includes:
downloading a target webpage by adopting a theme crawler based on a search engine;
extracting text content of the target webpage based on the text density index and the symbol density index;
clustering texts in the text content by adopting an improved k-means clustering algorithm to obtain target cluster data;
and performing subject judgment on the target cluster data to obtain page text information with similarity meeting a preset similarity threshold.
In one embodiment, web text information for a target user may be obtained by employing a search engine based topic crawler.
Fig. 2 is a flowchart for obtaining web text information according to a first embodiment of the present invention, as shown in fig. 2, a keyword of a target user may be input for searching through a selected search engine as a data obtaining source, then the first several pages of a search result may be obtained, a link target thereof may be extracted to a web page for downloading the page, then text in the downloaded web page may be extracted by a specific algorithm, and then the extracted text may be clustered by a clustering algorithm, so as to store the text required by the user. Further, for the keywords of the specific subject, a subject crawler facing the whole network can be performed, for example, a seed URL target webpage can be firstly acquired for downloading pages, then a specific algorithm is adopted to extract texts in the downloaded webpage, then subject judgment is performed on the texts, texts with high similarity are stored, and along with continuous updating of the webpage, the existing subject word stock can not accurately acquire all relevant texts, so that expansion of the subject word stock is required.
Fig. 3 is a flowchart of a focused web crawler process according to a first embodiment of the present invention, where, as shown in fig. 3, a crawling rule in this embodiment may select a focused web crawler, first, a target user keyword may be input into a search engine to obtain a URL link, crawl the page and obtain a new URL, a web page unrelated to the target user company may be filtered from the new link, the URL that has been crawled may be stored in a list for deduplication and filtering, then, the filtered link may be placed in a URL queue, and a crawling sequence flow may be determined to perform reading of a next URL until a stop condition is met, and then crawling may be completed.
After the target webpage is obtained, text extraction is needed, in the embodiment, only the content of the text part in the webpage is reserved through the text extraction technology, and the noise part content in the webpage is removed, otherwise, the noise content can influence the text clustering of the next step and influence the use of data for other subsequent projects. Fig. 4 is a flowchart of text extraction according to a first embodiment of the present invention, as shown in fig. 4, an object node may be first parsed, for example, a body in a target web page is obtained, each node in the body is traversed, and the node is subjected to hash processing.
For the node i, all texts under the node i can be obtained to form a character string text, the length of the character string is the character string word number Ti of the node i, and if the node comprises a < class > tag, and the < class > tag comprises attribute keywords such as [ 'content', 'article', 'news_txt', 'post_text' ], the weight can be added (for example, multiplied by 2) to the length of the character string of the node i; obtaining the character string word number LTi with links by obtaining all texts under all < a > tags under the nodes; acquiring all the label numbers of the node i as the label number TGi; the < a > tag number under node i may be its linked tag number LTGi, so that the text density of node i may be calculated.
The symbol density of the node can then be calculated, and the symbol density SbDi of the node i can be calculated by traversing the string texti and counting how many punctuations Sbi are in the string, "' |,". And then, the comprehensive score TSbDi of the nodes can be calculated, the nodes are ordered according to the scores, a result set is returned, and finally, the text content of the node with the highest score can be extracted as a text.
An intermediate step in acquiring page text information is to exclude category data that is not needed due to keyword ambiguity by clustering. Specifically, the embodiment can adopt an improved k-means clustering algorithm for clustering. Fig. 5 is a flowchart of a text clustering method according to a first embodiment of the present invention, as shown in fig. 5, an original text (i.e., body content) may be first preprocessed, for example, a word may be segmented by jieba segmentation, stop words, punctuation marks and numbers may be removed, and then all words in the preprocessed text may be extracted to form text vectors.
Then, text is expressed, the text expression model of this embodiment may adopt a VSM (Vector Space Model) vector space model, and calculate the characteristics of the document through a TF-IDF model, where TF refers to word frequency, i.e., TF (i, j) refers to frequency of occurrence of i words in the jth document, and IDF refers to inverse document frequency, i.e., the smaller the number of documents Ni containing the word i, the larger IDF (i).
Then text clustering is performed, i.e., k points { c1, c2, & gt, ck } as the initial center can be selected; calculating the similarity between each text di and each center cj, dividing the text di into clusters where the center of maximum similarity is located, then obtaining clusters { C1, C2, ck } and recording the calculated similarity; calculating the average similarity inside each cluster Ci, and calculating the average similarity as meansSim; for each cluster, a text set { d1, d2, & gt, dm } with intra-cluster similarity greater than (1+μ) meansSim may be selected; the mean point of { d1, d2, & gt, dm } can be calculated as the new center of the cluster; repeating steps b-e until no more change in cluster center occurs. So that the appropriate cluster of data can be selected for storage in the database.
The last step of obtaining the page text information is to perform topic identification on the output text, judge whether the page is related to the topic, for example, topic identification can be performed by combining a machine learning and a vector space model, and the machine learning is accurate through vector space training dataSelecting data. In addition, the characteristic weight is calculated by using a TF-IDF (word frequency-inverse document frequency) formula, and the similarity between two documents can be expressed by cosine of the included angle between the corresponding vectors, wherein N is the number of all the documents, N i For containing entry t i The documents with similarity larger than a certain threshold value are used as positive example sets, the documents with similarity smaller than a certain threshold value are used as negative example sets, the LSTM model is used for training, and the subsequently crawled texts are screened through the model after training.
In one embodiment, the obtaining the web text information of the target user further includes:
and carrying out focused web crawlers and incremental crawlers on a target website of the target user by using a target website crawler to obtain First comment information of the target website, wherein the theme crawler and the target website crawler adopt an improved Best-First search strategy.
In one embodiment, the target website crawler may be used to crawl the target website of the target user, so as to obtain the first comment information of the target website, and for a special webpage, this embodiment may use an incremental crawler to monitor the type of website, detect the data update condition of the type of website, and the core of the incremental crawler is mainly performing deduplication. Before the request for sending the access start URL, whether the URL is crawled or not can be judged; judging whether the part of the content is analyzed or not when the content is analyzed; judging whether the data is stored in the database when the data is written in the database, if the data is stored in the Redis, traversing the list, if the data is not stored in the Redis, sending a request to a URL meeting the conditions, analyzing the page to generate a unique identifier for the content, and if the data is stored in the Redis, generating a data fingerprint as the unique identifier through abstract algorithms such as MD5 and sha; and traversing whether the generated unique identifier exists in Redis, and if not, storing the unique identifier into a database to obtain first comment information of the target website.
Further, the topic crawler and the target web crawler used in the embodiment may adopt an improved Best-First search strategy, that is, the adopted link crawling strategy may be a topic search strategy based on link content evaluation, the value of the link is composed of three parts, the First part is the address content of the link itself, structural comparison is performed from the current link address and the link address of the parent page, and the value of the address in the station of the link is obtained, which can evaluate the addition of the current connection to a certain extent. The second part is the value that the link inherits to the parent page, and if the topic relevance of the parent page where the link is located is high, the link is considered to be more likely to be relevant to the topic. The last part is then the anchor text content related to the link, since the anchor text content is generally a high summary of the page content. Compared with the prior art, only the value of the second part of the link parent page is considered, and the improved Best-First search strategy is adopted in the embodiment, so that the values of the three parts can be comprehensively considered, and the accuracy of the network text information is improved.
In one embodiment, the obtaining the web text information of the target user includes:
acquiring video information of a target user by adopting a video crawler;
and extracting video text information and second comment information of the video information.
The video text information may be text extracted from the video information, and the second comment information is comment area information in the video information.
In one embodiment, a video crawler can be used for capturing short video data of a mobile terminal, text and comment area information in the captured video can be extracted, for example, a simulator can be downloaded to capture a packet of the short video, the functions of turning pages and sliding to take down a batch of comments are realized by using a program, the text is extracted, for example, the video can be cut into pictures one by one, and then the text in each picture is extracted to obtain video text information.
And S120, performing validity filtering on the network text information to obtain valid text information.
Because the acquired network text information contains the infirm information, the embodiment can effectively filter the network text information to filter the infirm information, the information containing the guidance and the like. The means of specific filtering is not limited as long as effective text information can be obtained.
Fig. 6 is a flowchart for obtaining valid text information according to the first embodiment of the present invention, where, as shown in fig. 6, valid comment data may be obtained by inputting raw data (i.e., web text information) into an information processing unit, and specifically, page information in the web text information may be analyzed and clause processing may be performed; summarizing the original comment information and clause information acquired by all channels and cleaning data; and carrying out reliability analysis on the information after data cleaning, wherein the reliability analysis can comprise operations such as timeliness analysis, usefulness voting treatment, comment content mining and the like, and finally obtaining effective comment data.
The comment timeliness collection can be that the difference value between the current comment and the earliest comment release time is used as a timeliness index for measuring the credibility, and the more the number of days of interval, the closer the comment is to the current reading time, and the stronger the timeliness is. The comment content mining can adopt a Chinese word segmentation system, and the comment is subjected to information mining through the functions of Chinese word segmentation, part-of-speech tagging, keyword statistics and the like, wherein the comment comprises statistics of feature words and emotion words. And calculating and judging the comment effective length according to the feature words, the emotion words and the comment total length. The specific process of the reliability analysis is not further described here.
S130, determining an emotion analysis result of the target user based on the effective text information, wherein the emotion analysis result is used for calculating the credibility of the target user.
After the effective text information is obtained through the steps, the emotion analysis result of the target user can be determined, for example, the effective text information can be input into a certain preset analysis model to directly output and obtain the emotion analysis result of the target user, or the emotion analysis model corresponding to the target user can be obtained through training, and then the emotion analysis result of the target user can be obtained based on the effective text information and combined with the emotion analysis model.
In one embodiment, the determining the emotion analysis result of the target user based on the valid text information includes:
model training is carried out on the long-period memory model based on first effective information to obtain an emotion analysis model corresponding to the target user, wherein the first effective information is information obtained by preprocessing training text information in the effective text information;
and inputting second effective information into the emotion analysis model to obtain an emotion analysis result of the target user, wherein the second effective information is information obtained by preprocessing test text information in the effective text information.
In this embodiment, the valid text information may be split into two parts, namely training text information and test text information, and preprocessing operations are performed to obtain first valid information and second valid information, where the preprocessing operations may include text preprocessing, vectorization representation, and the like, and exemplary, first, correction of punctuation marks, removal of abnormal data, and deletion of repeated data may be performed; then Chinese word segmentation and stop word processing are carried out, the stop word mainly removes meaningless auxiliary words and prepositions, and the embodiment can adopt a Ha Gong stop word list; and constructing a general emotion dictionary. The vectorized representation may be text vectorized using Word2vec, for example.
Then, training the model by adopting a method based on LSTM (long-short term memory model), such as training the long-short term memory model based on the first effective information to obtain an emotion analysis model corresponding to the target user, and obtaining an emotion analysis result of the target user from the trained model by the second effective information, so that the emotion analysis result can be used for calculating the credibility of the target user, displaying negative and positive ideas of the target user under the public angle, classifying emotion according to scores, and displaying word cloud images.
According to the method for calculating the user credibility, provided by the embodiment of the invention, the network text information of the target user is obtained, wherein the network text information comprises at least one of page text information, video text information and original comment information; performing validity filtering on the network text information to obtain valid text information; and determining an emotion analysis result of the target user based on the valid text information, wherein the emotion analysis result is used for calculating the credibility of the target user. By utilizing the method, the information basis is provided for the subsequent calculation of the user credibility by effectively filtering the network text information, so that the accuracy of the user credibility is improved.
Example two
Fig. 7 is a flowchart of a method for calculating user credibility according to a second embodiment of the present invention, where the second embodiment optimizes the user credibility based on the above embodiments. In this embodiment, the web text information includes page text information, video text information and original comment information, and validity filtering is performed on the web text information, so that valid text information is further specified as: extracting target keywords of first network information, wherein the first network information comprises page text information, video text information and third comment information, and the third comment information is comment information with the support rate lower than a preset threshold value in original comment information; and determining the target keyword and fourth comment information as effective text information of the network text information, wherein the fourth comment information is comment information with the support rate higher than a preset threshold value in the original comment information.
For details not yet described in detail in this embodiment, refer to embodiment one.
As shown in fig. 7, the method includes:
s210, acquiring network text information of a target user, wherein the network text information comprises page text information, video text information and original comment information.
S220, extracting target keywords of first network information, wherein the first network information comprises page text information, video text information and third comment information, and the third comment information is comment information with the support rate lower than a preset threshold value in the original comment information.
In this embodiment, the first network information may include information of page text information, video text information and third comment information, where the third comment information may be comment information with a support rate lower than a preset threshold in the original comment information, and the preset threshold may be obtained by an experience value or by calculation. The target keyword may be understood as a high-frequency keyword extracted from the first network information.
In one embodiment, the extracting the target keyword of the first network information includes:
sentence dividing processing is carried out on the first network information to obtain at least one target sentence;
generating a target word sense structure of the first network information based on a preset network model, wherein the preset network model is obtained by training at least one target sentence;
And extracting and sequencing the keywords of the target word sense structure to obtain target keywords of the first network information.
In one embodiment, since the original comment information includes the praise information, but the page and the video cannot acquire the praise information, for comments incapable of acquiring the support information, the embodiment may use a comment usefulness analysis method based on the low-frequency keywords, and may acquire the low-frequency keywords in the original comment information to acquire comments with low occurrence frequency and high usefulness as a sample set; in addition, comments having a data acquisition support degree larger than the average value for which the praise information can be obtained may also be used as the sample set.
Fig. 8 is a flowchart for obtaining valid text information according to the second embodiment of the present invention, as shown in fig. 8, first, sentences with comments supporting higher than the average value may be selected to enter a candidate set, and comments with lower than the average value may be processed together with other information.
The page text information, the video text information and the third comment information can be divided into sentences, neural network model training is carried out, and candidate features and word meaning structures of keywords generated by clustering are obtained; the word sense structure sorting and keyword extraction are performed, and then the low-frequency keywords are sorted according to the relevance between comments and target users, the sorting basis adopted in the embodiment can be the context information of each word in the low-frequency keywords in sentences, and the keyword vector calculation rule can be as follows Wherein Vi represents the vector of the keyword, pi represents the currently ordered keyword, wi represents the words constituting the keyword, vwi represents the context information of wi in the comment set, wherein the score on Vi may beAnd (3) Vt is word frequency vector generated by manually selected document clusters after document clustering, vb represents background vector generated by word frequency in all document sets, score of each keyword to vector Vi is calculated respectively, and then the low-frequency keyword sequence can be obtained, and finally the low-frequency keywords with the top ranking are stored in a database to be used as a data set.
S230, determining the target keyword and fourth comment information as effective text information of the network text information, wherein the fourth comment information is comment information with the support rate higher than a preset threshold value in the original comment information.
The fourth comment information may be comment information with a support rate higher than a preset threshold value in the original comment information.
In summary, the step may determine the extracted target keyword and the fourth comment information as valid text information of the web text information, so as to determine a subsequent emotion analysis result.
S240, determining an emotion analysis result of the target user based on the effective text information, wherein the emotion analysis result is used for calculating the credibility of the target user.
According to the method for calculating the user credibility, provided by the embodiment of the invention, the network text information of the target user is obtained, wherein the network text information comprises page text information, video text information and original comment information; extracting target keywords of first network information, wherein the first network information comprises page text information, video text information and third comment information, and the third comment information is comment information with the support rate lower than a preset threshold value in original comment information; determining the target keyword and fourth comment information as effective text information of the network text information, wherein the fourth comment information is comment information with the support rate higher than a preset threshold value in original comment information; and determining an emotion analysis result of the target user based on the valid text information, wherein the emotion analysis result is used for calculating the credibility of the target user. By means of the method, the target keywords of comment information with the support rate lower than the preset threshold value in the page text information, the video text information and the original comment information are extracted, the extracted target keywords and comment information with the higher support rate are determined to be effective text information of the network text information, the effectiveness of the effective text information is improved, and the accuracy of the user credibility is further improved.
Fig. 9 is an overall architecture diagram of a method for calculating user credibility according to a second embodiment of the present invention, as shown in fig. 9, firstly, required text information may be obtained from a target company database through a subject crawler based on a search engine and a specific target website crawler in a web page information obtaining unit; the short video information extraction unit acquires a video through the short video crawling device, and acquires text information through the comment information acquirer and the video text acquirer; then, the text information acquired in the first two processes can be subjected to an information processing unit to respectively carry out a page information processor and a comment information processor, for example, the text information can be subjected to denoising and key content extraction to obtain cleaned data; and inputting the cleaned data into an emotion analysis unit, and finally obtaining the credibility of the target user.
Fig. 10 is a flowchart of extracting web page information, as shown in fig. 10, where, on one hand, a target page can be downloaded by a topic crawler based on a search engine by focusing on a web crawler means, text extraction is performed, and text clustering is performed on the obtained text to obtain page text information; on the other hand, a crawler can be used for a website of a specific target, a user is firstly simulated to log in, then a focused web crawler can be used for monitoring the website, and when new data exist, the incremental crawler is used for acquiring user comments of a fixed website as a part of the original data.
Through the description, it can be found that the method for calculating the user credibility provided by the embodiment of the invention is based on the theme crawler of the search engine and the crawler based on the short video platform to obtain a large amount of comment information, and performs incremental crawler aiming at key websites. The method comprises the steps of filtering original information through comment information usefulness, filtering out unreal and useless comments, analyzing low-frequency keywords aiming at page information and comments with low support degree, obtaining meaningful comment information with few occurrence times, carrying out preliminary mining and comment timeliness analysis, and finally obtaining the most probable meaningful comments as samples; and then carrying out emotion analysis on the comment sample, including operations such as text preprocessing, vectorization representation, model construction and the like, and finally realizing emotion analysis on a target company from a public perspective, so that the credibility degree corresponding to a target user is calculated from the public perspective, the comprehensive evaluation on the capacity and credibility degree of the enterprise for fulfilling commitments by utilizing big data is realized, and the enterprise is helped to prevent business hazards.
The method comprises the steps of crawling short video platform data, focusing web crawlers based on a search engine and increment web crawlers based on specific websites by using multiple crawler means, so that web text information of a target client is obtained as comprehensively as possible, and a data basis is provided for subsequent calculation reliability. On the other hand, by carrying out useful comment filtering before carrying out emotion analysis, not only comments with high support degree but also comments with low support degree are obtained by using a low-frequency keyword mode, and comment samples are obtained as comprehensively and accurately as possible.
Example III
Fig. 11 is a schematic structural diagram of a user credibility calculating device according to a third embodiment of the present invention. As shown in fig. 11, the apparatus includes:
an obtaining module 310, configured to obtain web text information of a target user, where the web text information includes at least one of page text information, video text information, and original comment information;
the filtering module 320 is configured to perform validity filtering on the web text information to obtain valid text information;
and the determining module 330 is configured to determine an emotion analysis result of the target user based on the valid text information, where the emotion analysis result is used to calculate the credibility of the target user.
According to the computing device for the user credibility, provided by the embodiment of the invention, the network text information of the target user is obtained through the obtaining module, wherein the network text information comprises at least one of page text information, video text information and original comment information; the network text information is subjected to validity filtration through a filtration module to obtain valid text information; and determining an emotion analysis result of the target user based on the valid text information through a determining module, wherein the emotion analysis result is used for calculating the credibility of the target user. By utilizing the device, the information foundation is provided for the subsequent calculation of the user credibility by effectively filtering the network text information, and the accuracy of the user credibility is further improved.
Optionally, the obtaining module 310 is specifically configured to:
downloading a target webpage by adopting a theme crawler based on a search engine;
extracting text content of the target webpage based on the text density index and the symbol density index;
clustering texts in the text content by adopting an improved k-means clustering algorithm to obtain target cluster data;
and performing subject judgment on the target cluster data to obtain page text information with similarity meeting a preset similarity threshold.
Optionally, the obtaining module 310 is specifically configured to:
and carrying out focused web crawlers and incremental crawlers on a target website of the target user by using a target website crawler to obtain First comment information of the target website, wherein the theme crawler and the target website crawler adopt an improved Best-First search strategy.
Optionally, the obtaining module 310 is specifically configured to:
acquiring video information of a target user by adopting a video crawler;
and extracting video text information and second comment information of the video information.
Optionally, the web text information includes page text information, video text information and original comment information, and the filtering module 320 includes:
the extraction unit is used for extracting target keywords of first network information, wherein the first network information comprises page text information, video text information and third comment information, and the third comment information is comment information with the support rate lower than a preset threshold value in the original comment information;
The determining unit is used for determining the target keyword and fourth comment information as effective text information of the network text information, wherein the fourth comment information is comment information with the support rate higher than a preset threshold value in the original comment information.
Optionally, the extracting unit is specifically configured to:
sentence dividing processing is carried out on the first network information to obtain at least one target sentence;
generating a target word sense structure of the first network information based on a preset network model, wherein the preset network model is obtained by training at least one target sentence;
and extracting and sequencing the keywords of the target word sense structure to obtain target keywords of the first network information.
Optionally, the determining module 330 is specifically configured to:
model training is carried out on the long-period memory model based on first effective information to obtain an emotion analysis model corresponding to the target user, wherein the first effective information is information obtained by preprocessing training text information in the effective text information;
and inputting second effective information into the emotion analysis model to obtain an emotion analysis result of the target user, wherein the second effective information is information obtained by preprocessing test text information in the effective text information.
The user credibility calculating device provided by the embodiment of the invention can execute the user credibility calculating method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the executing method.
Example IV
Fig. 12 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 12, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as the calculation of user confidence.
In some embodiments, the method of calculating user trustworthiness may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the above-described method of calculating user trustworthiness may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the method of calculating the user's trustworthiness in any other suitable manner (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (10)

1. A method for calculating user confidence, the method comprising:
acquiring network text information of a target user, wherein the network text information comprises at least one of page text information, video text information and original comment information;
performing validity filtering on the network text information to obtain valid text information;
and determining an emotion analysis result of the target user based on the valid text information, wherein the emotion analysis result is used for calculating the credibility of the target user.
2. The method of claim 1, wherein the obtaining web text information of the target user comprises:
downloading a target webpage by adopting a theme crawler based on a search engine;
extracting text content of the target webpage based on the text density index and the symbol density index;
clustering texts in the text content by adopting an improved k-means clustering algorithm to obtain target cluster data;
and performing subject judgment on the target cluster data to obtain page text information with similarity meeting a preset similarity threshold.
3. The method of claim 2, wherein the obtaining web text information of the target user further comprises:
and carrying out focused web crawlers and incremental crawlers on a target website of the target user by using a target website crawler to obtain First comment information of the target website, wherein the theme crawler and the target website crawler adopt an improved Best-First search strategy.
4. The method of claim 1, wherein the obtaining web text information of the target user comprises:
acquiring video information of a target user by adopting a video crawler;
And extracting video text information and second comment information of the video information.
5. The method of claim 1, wherein the web text information includes page text information, video text information, and original comment information, and the performing validity filtering on the web text information to obtain valid text information includes:
extracting target keywords of first network information, wherein the first network information comprises page text information, video text information and third comment information, and the third comment information is comment information with the support rate lower than a preset threshold value in original comment information;
and determining the target keyword and fourth comment information as effective text information of the network text information, wherein the fourth comment information is comment information with the support rate higher than a preset threshold value in the original comment information.
6. The method of claim 5, wherein extracting the target keyword of the first network information comprises:
sentence dividing processing is carried out on the first network information to obtain at least one target sentence;
generating a target word sense structure of the first network information based on a preset network model, wherein the preset network model is obtained by training at least one target sentence;
And extracting and sequencing the keywords of the target word sense structure to obtain target keywords of the first network information.
7. The method of claim 1, wherein said determining emotion analysis results for the target user based on the valid text information comprises:
model training is carried out on the long-period memory model based on first effective information to obtain an emotion analysis model corresponding to the target user, wherein the first effective information is information obtained by preprocessing training text information in the effective text information;
and inputting second effective information into the emotion analysis model to obtain an emotion analysis result of the target user, wherein the second effective information is information obtained by preprocessing test text information in the effective text information.
8. A computing device for user confidence, comprising:
the system comprises an acquisition module, a storage module and a display module, wherein the acquisition module is used for acquiring network text information of a target user, and the network text information comprises at least one item of page text information, video text information and original comment information;
the filtering module is used for effectively filtering the network text information to obtain effective text information;
And the determining module is used for determining an emotion analysis result of the target user based on the valid text information, and the emotion analysis result is used for calculating the credibility of the target user.
9. An electronic device, the electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the method of calculating user trustworthiness of any one of claims 1-7.
10. A computer readable storage medium storing computer instructions for causing a processor to perform the method of calculating user trustworthiness of any one of claims 1-7.
CN202311675070.2A 2023-12-07 2023-12-07 User credibility calculation method and device, electronic equipment and medium Pending CN117493704A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311675070.2A CN117493704A (en) 2023-12-07 2023-12-07 User credibility calculation method and device, electronic equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311675070.2A CN117493704A (en) 2023-12-07 2023-12-07 User credibility calculation method and device, electronic equipment and medium

Publications (1)

Publication Number Publication Date
CN117493704A true CN117493704A (en) 2024-02-02

Family

ID=89676553

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311675070.2A Pending CN117493704A (en) 2023-12-07 2023-12-07 User credibility calculation method and device, electronic equipment and medium

Country Status (1)

Country Link
CN (1) CN117493704A (en)

Similar Documents

Publication Publication Date Title
CN110516067B (en) Public opinion monitoring method, system and storage medium based on topic detection
CN107451126B (en) Method and system for screening similar meaning words
WO2019091026A1 (en) Knowledge base document rapid search method, application server, and computer readable storage medium
CN111797214A (en) FAQ database-based problem screening method and device, computer equipment and medium
US20090319449A1 (en) Providing context for web articles
CN106844640B (en) Webpage data analysis processing method
CN111160019B (en) Public opinion monitoring method, device and system
CN111538903B (en) Method and device for determining search recommended word, electronic equipment and computer readable medium
CN112559684A (en) Keyword extraction and information retrieval method
CN113986864A (en) Log data processing method and device, electronic equipment and storage medium
CN114021577A (en) Content tag generation method and device, electronic equipment and storage medium
CN113806660B (en) Data evaluation method, training device, electronic equipment and storage medium
CN113660541B (en) Method and device for generating abstract of news video
US20210272013A1 (en) Concept modeling system
CN113806483B (en) Data processing method, device, electronic equipment and computer program product
CN111930949B (en) Search string processing method and device, computer readable medium and electronic equipment
CN112926297A (en) Method, apparatus, device and storage medium for processing information
CN114201607B (en) Information processing method and device
CN111460206A (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN109344397B (en) Text feature word extraction method and device, storage medium and program product
CN107169065B (en) Method and device for removing specific content
CN113157857B (en) Hot topic detection method, device and equipment for news
CN116108844A (en) Risk information identification method, apparatus, device and storage medium
CN117493704A (en) User credibility calculation method and device, electronic equipment and medium
CN114528378A (en) Text classification method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination