CN104281694A - Analysis system of emotional tendency of text - Google Patents

Analysis system of emotional tendency of text Download PDF

Info

Publication number
CN104281694A
CN104281694A CN201410537881.0A CN201410537881A CN104281694A CN 104281694 A CN104281694 A CN 104281694A CN 201410537881 A CN201410537881 A CN 201410537881A CN 104281694 A CN104281694 A CN 104281694A
Authority
CN
China
Prior art keywords
text
analysis system
module
trend analysis
texts
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410537881.0A
Other languages
Chinese (zh)
Inventor
贾岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ANHUI HUAZHEN INFORMATION SCIENCE & TECHNOLOGY Co Ltd
Original Assignee
ANHUI HUAZHEN INFORMATION SCIENCE & TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ANHUI HUAZHEN INFORMATION SCIENCE & TECHNOLOGY Co Ltd filed Critical ANHUI HUAZHEN INFORMATION SCIENCE & TECHNOLOGY Co Ltd
Priority to CN201410537881.0A priority Critical patent/CN104281694A/en
Publication of CN104281694A publication Critical patent/CN104281694A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an analysis system of emotional tendency of a text. The analysis system comprises a sample training module, an entity extracting module, a characteristic extracting module and an emotional tendency recognizing module. The sample training module is used for receiving texts to by analyzed and training the sample to acquire a discrimination template. The entity extracting module is used for extracting the entities of texts to be discriminated and filtering texts without entities. The characteristic extracting module is used for extracting tendency relative characteristics of the texts. The emotional tendency recognizing module is used for discriminating the tendency of the texts according to maximum entropy method. According to the arrangement, the forums and blogs belongs to the field of the enterprise user are collected, the texts of the webs are extracted, the emotional tendency of the texts and aimed entities are acquired through the analysis system, a chart showing image changes of the enterprise and the competitor is automatically generated, and accordingly the emotional tendency of the text is accurately judged through text classification.

Description

A kind of text emotion trend analysis system
Technical field
The present invention relates to grid computing technology field, particularly relate to a kind of text emotion trend analysis system.
Background technology
Analyze enterprise and image product in order to search for CIS, usually can use sentiment classification, and according to tendentious degree, text being divided into a few class.Because the tendentiousness of text is not only decided by these words such as polarity word, degree words, also with the relative position of these words and relevant with the relation of entity word, and text classification can only consider the feature of word, so utilize text classification all lower to judge the certain methods accuracy rate of emotion tendentiousness of text at present.
Summary of the invention
In order to solve the technical matters existed in background technology, the present invention proposes a kind of text emotion trend analysis system, improving and utilizing text classification to judge the accuracy rate of emotion tendentiousness of text.
A kind of text emotion trend analysis system that the present invention proposes, comprising:
Sample training module, for receiving text to be analyzed, training sample, obtaining and differentiating template;
Entity extraction module, extracts text entities to be discriminated, filters not containing the text of entity;
Characteristic extracting module, extracts the tendentiousness correlated characteristic in text;
Sentiment orientation identification module, utilizes maximum entropy method to differentiate Text Orientation.
Preferably, described tendentiousness correlated characteristic comprises: polarity word, dimension word, qualifier, negative word.
Preferably, to text carry out trend analysis before set up entity dictionary, polarity dictionary, dimension dictionary, qualifier dictionary and other relevant dictionary.
Preferably, described entity extraction module, specifically for:
Pre-service;
The calculating of item weight;
According to pretreated training set;
Learning model building, constructs sorter;
Utilize test set document to test the performance of the sorter established by certain method of testing, and constantly feedback, study improve this classifier performance, until make it.
Preferably, described pre-service is specially: the form being expressed as document sets to be easy to computer disposal according to the disaggregated model adopted.
Preferably, the calculating of described item weight, is specially: represent importance every in document according to suitable weighing computation method.
Preferably, described characteristic extracting module, specifically for:
By the Feature Words in keyword abstraction or feature extraction text;
By vector space model by document vectorization;
Calculate the similarity between document, and select appropriate algorithm to carry out cluster.
In the present invention, gather the forum in field, user enterprise place, blog, extract the text in webpage, by text emotion trend analysis obtain text Sentiment orientation and for entity (enterprise, enterprise product, rival etc.), and automatically generate enterprise and rival's image change chart, utilize text classification to judge the accuracy rate of emotion tendentiousness of text to improve.
Accompanying drawing explanation
Fig. 1 is a kind of text emotion trend analysis system that the embodiment of the present invention proposes;
Fig. 2 is the functional diagram of Fig. 1 Chinese version sort module;
Fig. 3 is the functional diagram of Fig. 1 Chinese version cluster module.
Embodiment
As shown in Figure 1, the embodiment of the present invention proposes a kind of text emotion trend analysis system, comprising: sample training module 10, for receiving text to be analyzed, training sample, obtaining and differentiating template; Entity extraction module 20, extracts text entities to be discriminated, filters not containing the text of entity; Characteristic extracting module 30, extracts the tendentiousness correlated characteristic (polarity word, dimension word, qualifier, negative word etc.) in text; Sentiment orientation identification module 40, utilizes maximum entropy method to differentiate Text Orientation.In addition, to text carry out trend analysis before set up entity dictionary, polarity dictionary, dimension dictionary, qualifier dictionary and other relevant dictionary.
Wherein, the function of entity extraction module 20 as shown in Figure 2, comprising: be first pre-service, and document sets is expressed as the form being easy to computer disposal by the disaggregated model according to adopting; Next is the calculating of a weight, represents importance every in document according to suitable weighing computation method; Be according to pretreated training set (having predicted the document of classification) learning model building again, construct sorter; Finally utilize test set document to test the performance of the sorter established by certain method of testing, and constantly feedback, study improve this classifier performance, until make it.
Wherein, the function of characteristic extracting module 30 as shown in Figure 3, comprising: by the Feature Words in keyword abstraction or feature extraction text, then by vector space model by document vectorization, finally calculate the similarity between document, and select appropriate algorithm to carry out cluster.
The above; be only the present invention's preferably embodiment; but protection scope of the present invention is not limited thereto; anyly be familiar with those skilled in the art in the technical scope that the present invention discloses; be equal to according to technical scheme of the present invention and inventive concept thereof and replace or change, all should be encompassed within protection scope of the present invention.

Claims (7)

1. a text emotion trend analysis system, is characterized in that, comprising:
Sample training module, for receiving text to be analyzed, training sample, obtaining and differentiating template;
Entity extraction module, extracts text entities to be discriminated, filters not containing the text of entity;
Characteristic extracting module, extracts the tendentiousness correlated characteristic in text;
Sentiment orientation identification module, utilizes maximum entropy method to differentiate Text Orientation.
2. text emotion trend analysis system according to claim 1, is characterized in that, described tendentiousness correlated characteristic comprises: polarity word, dimension word, qualifier, negative word.
3. text emotion trend analysis system according to claim 1, is characterized in that, to text carry out trend analysis before set up entity dictionary, polarity dictionary, dimension dictionary, qualifier dictionary and other relevant dictionary.
4. text emotion trend analysis system according to claim 1, is characterized in that, described entity extraction module, specifically for:
Pre-service;
The calculating of item weight;
According to pretreated training set;
Learning model building, constructs sorter;
Utilize test set document to test the performance of the sorter established by certain method of testing, and constantly feedback, study improve this classifier performance, until make it.
5. text emotion trend analysis system according to claim 4, it is characterized in that, described pre-service is specially: document sets is expressed as the form being easy to computer disposal by the disaggregated model according to adopting.
6. text emotion trend analysis system according to claim 4, is characterized in that, the calculating of described item weight, is specially: represent importance every in document according to suitable weighing computation method.
7. text emotion trend analysis system according to claim 1, is characterized in that, described characteristic extracting module, specifically for:
By the Feature Words in keyword abstraction or feature extraction text;
By vector space model by document vectorization;
Calculate the similarity between document, and select appropriate algorithm to carry out cluster.
CN201410537881.0A 2014-10-13 2014-10-13 Analysis system of emotional tendency of text Pending CN104281694A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410537881.0A CN104281694A (en) 2014-10-13 2014-10-13 Analysis system of emotional tendency of text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410537881.0A CN104281694A (en) 2014-10-13 2014-10-13 Analysis system of emotional tendency of text

Publications (1)

Publication Number Publication Date
CN104281694A true CN104281694A (en) 2015-01-14

Family

ID=52256567

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410537881.0A Pending CN104281694A (en) 2014-10-13 2014-10-13 Analysis system of emotional tendency of text

Country Status (1)

Country Link
CN (1) CN104281694A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106407177A (en) * 2016-08-26 2017-02-15 西南大学 Emergency online group behavior detection method based on clustering analysis
CN107102984A (en) * 2017-04-21 2017-08-29 中央民族大学 A kind of Tibetan language microblog emotional sentiment classification method and system
CN107194739A (en) * 2017-05-25 2017-09-22 上海耐相智能科技有限公司 A kind of intelligent recommendation system based on big data
CN107526831A (en) * 2017-09-04 2017-12-29 华为技术有限公司 A kind of natural language processing method and apparatus
CN109165298A (en) * 2018-08-15 2019-01-08 上海文军信息技术有限公司 A kind of text emotion analysis system of autonomous upgrading and anti-noise
CN109783800A (en) * 2018-12-13 2019-05-21 北京百度网讯科技有限公司 Acquisition methods, device, equipment and the storage medium of emotion keyword
WO2019174423A1 (en) * 2018-03-16 2019-09-19 北京国双科技有限公司 Entity sentiment analysis method and related apparatus

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101894102A (en) * 2010-07-16 2010-11-24 浙江工商大学 Method and device for analyzing emotion tendentiousness of subjective text
CN102663046A (en) * 2012-03-29 2012-09-12 中国科学院自动化研究所 Sentiment analysis method oriented to micro-blog short text
CN104182387A (en) * 2014-07-21 2014-12-03 安徽华贞信息科技有限公司 Text emotional tendency analysis system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101894102A (en) * 2010-07-16 2010-11-24 浙江工商大学 Method and device for analyzing emotion tendentiousness of subjective text
CN102663046A (en) * 2012-03-29 2012-09-12 中国科学院自动化研究所 Sentiment analysis method oriented to micro-blog short text
CN104182387A (en) * 2014-07-21 2014-12-03 安徽华贞信息科技有限公司 Text emotional tendency analysis system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
邓时滔: "中文文本情感倾向性分类研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106407177A (en) * 2016-08-26 2017-02-15 西南大学 Emergency online group behavior detection method based on clustering analysis
CN107102984A (en) * 2017-04-21 2017-08-29 中央民族大学 A kind of Tibetan language microblog emotional sentiment classification method and system
CN107194739A (en) * 2017-05-25 2017-09-22 上海耐相智能科技有限公司 A kind of intelligent recommendation system based on big data
CN107194739B (en) * 2017-05-25 2018-10-26 广州百奕信息科技有限公司 A kind of intelligent recommendation system based on big data
CN107526831A (en) * 2017-09-04 2017-12-29 华为技术有限公司 A kind of natural language processing method and apparatus
US11630957B2 (en) 2017-09-04 2023-04-18 Huawei Technologies Co., Ltd. Natural language processing method and apparatus
WO2019174423A1 (en) * 2018-03-16 2019-09-19 北京国双科技有限公司 Entity sentiment analysis method and related apparatus
CN109165298A (en) * 2018-08-15 2019-01-08 上海文军信息技术有限公司 A kind of text emotion analysis system of autonomous upgrading and anti-noise
CN109165298B (en) * 2018-08-15 2022-11-15 上海五节数据科技有限公司 Text emotion analysis system capable of achieving automatic upgrading and resisting noise
CN109783800A (en) * 2018-12-13 2019-05-21 北京百度网讯科技有限公司 Acquisition methods, device, equipment and the storage medium of emotion keyword
CN109783800B (en) * 2018-12-13 2024-04-12 北京百度网讯科技有限公司 Emotion keyword acquisition method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN104281694A (en) Analysis system of emotional tendency of text
Wen et al. Emotion classification in microblog texts using class sequential rules
CN104598535B (en) A kind of event extraction method based on maximum entropy
CN104331506A (en) Multiclass emotion analyzing method and system facing bilingual microblog text
CN103617157A (en) Text similarity calculation method based on semantics
CN104268160A (en) Evaluation object extraction method based on domain dictionary and semantic roles
CN103336766A (en) Short text garbage identification and modeling method and device
CN103744953A (en) Network hotspot mining method based on Chinese text emotion recognition
CN104361037B (en) Microblogging sorting technique and device
CN104317965A (en) Establishment method of emotion dictionary based on linguistic data
CN106682123A (en) Hot event acquiring method and device
US9652997B2 (en) Method and apparatus for building emotion basis lexeme information on an emotion lexicon comprising calculation of an emotion strength for each lexeme
CN108959329A (en) A kind of file classification method, device, medium and equipment
CN108090178A (en) A kind of text data analysis method, device, server and storage medium
CN104008187A (en) Semi-structured text matching method based on the minimum edit distance
CN103617245A (en) Bilingual sentiment classification method and device
CN104794209B (en) Chinese microblogging mood sorting technique based on Markov logical network and system
CN103927342A (en) Vertical search engine system on basis of big data
CN104268214B (en) A kind of user's gender identification method and system based on microblog users relation
Campbell et al. Content+ context networks for user classification in twitter
CN103744958A (en) Webpage classification algorithm based on distributed computation
CN102541935A (en) Novel Chinese Web document representing method based on characteristic vectors
CN114065749A (en) Text-oriented Guangdong language recognition model and training and recognition method of system
CN105243095A (en) Microblog text based emotion classification method and system
CN113626604A (en) Webpage text classification system based on maximum interval criterion

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20150114

RJ01 Rejection of invention patent application after publication