Skip to content

A Korean NLP Python Library for Economic Analysis

License

Notifications You must be signed in to change notification settings

an-seunghwan/eKoNLPy

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

eKo(nomic)NLPy

eKoNLPy is a Korean NLP Python Library for Economic Analysis, which supports Korean Language.

KoNLPy의 Mecab tagger를 기반으로 경제관련 전문용어, 금융기관, 기업명 등을 하나의 명사로 분류하도록 후처리 기능을 추가.

통화정책(Monetary Policy)의 어조(Hawkish/Dovish)를 판단할 수 있는 Sentiment Analysis 기능 포함.

경제의 불확실성(Uncertain/Stable)을 판단할 수 있는 Economic Uncertainty Analysis 기능 포함.

경제 문서의 주제를 분류할 수 있는 Topic Analysis 기능 포함.

Usage

Part of speech tagging

KoNLPy와 동일하게 Mecab.pos(phrase)를 입력합니다. 먼저 KoNLPy의 Mecab 형태소 분석기로 처리한 후, 템플릿에 등록된 연속된 토큰의 조합이 사용자 사전에 등록되어 있으면 복합명사로 어절을 분리합니다.

from ekonlpy.tag import Mecab
mecab = Mecab()
mecab.pos('금통위는 따라서 물가안정과 병행, 경기상황에 유의하는 금리정책을 펼쳐나가기로 했다고 밝혔다.')

> [('금통위', 'NNG'), ('는', 'JX'), ('따라서', 'MAJ'), ('물가', 'NNG'), ('안정', 'NNG'), ('과', 'JC'), ('병행', 'NNG'), (',', 'SC'), ('경기', 'NNG'), ('상황', 'NNG'), ('에', 'JKB'), ('유의', 'NNG'), ('하', 'XSV'), ('는', 'ETM'), ('금리정책', 'NNG'), ('을', 'JKO'), ('펼쳐', 'VV+EC'), ('나가', 'VX'), ('기', 'ETN'), ('로', 'JKB'), ('했', 'VV+EP'), ('다고', 'EC'), ('밝혔', 'VV+EP'), ('다', 'EF'), ('.', 'SF')]

Lemmatisation and synoyms

Sentiment 분석의 정확도를 높이기 위해, 동의어 처리와 lemmatization 기능을 제공한다.

Add words to dictionary

ekonlpy.tag의 Mecab은 add_dictionary를 통하여 str 혹은 list of str 형식의 단어를 사전에 추가할 수 있습니다.

from ekonlpy.tag import Mecab
mecab = Mecab()
mecab.add_dictionary('금통위', 'NNG')

Sentiment analysis

To use the Korean Monetary Policy dictionary, create an instance of the MPKO class in ekonlpy.sentiment

from ekonlpy.sentiment import MPKO
mpko = MPKO(kind=1)
tokens = mpko.tokenize(text)
score = mpko.get_score(tokens)

kind parammeter for MPKO class: a parameter to select a lexicon file

0: a lexicon file generated using Naive-bayes classifier with 5-gram tokens as features and
    changes of call rates as positive/negative label.

1: a lexicon file generated by polarity induction and seed propagation method with 5-gram tokens.

Classifier를 이용하여 통화정책 센티멘트를 분석하기 위해서는 ekonlpy.sentiment의 MPCK 클래스를 사용한다.

from ekonlpy.sentiment import MPCK
mpck = MPCK()
tokens = mpck.tokenize(text)
ngrams = mpck.ngramize(tokens)
score = mpck.classify(tokens + ngrams, intensity_cutoff=1.5)

intensity_cutoff parameter를 사용해 분류정확도가 낮은 문장을 neutral로 분류하는 강도를 설정할 수 있다. (default: 1.3)

KSA is a korean sentiment analyzer for general korean texts. KSA는 일반적인 한국어 감성분석 용도로 사용합니다. 형태소 분석기는 서울대학교 IDS 연구실에서 만든 꼬꼬마를 사용한다. 감성사전 또한 동 연구소의 것을 사용한다. (참고: https://kkma.snu.ac.kr/)

from ekonlpy.sentiment import KSA
ksa = KSA()
tokens = ksa.tokenize(text)
score = ksa.get_score(tokens)

Similarly, to use the Harvard IV-4 dictionary for general english sentiment analysis:

from ekonlpy.sentiment import HIV4
hiv = HIV4()
tokens = hiv.tokenize(text)
score = hiv.get_score(tokens)

Similarly, to use the Loughran and McDonald dictionary for financial domain sentiment analysis:

from ekonlpy.sentiment import LM
lm = LM()
tokens = lm.tokenize(text)
score = lm.get_score(tokens)

Economic uncertainty analysis

To use the Korean Economic Uncertainty dictionary, create an instance of the EUKO class in ekonlpy.sentiment

from ekonlpy.sentiment import EUKO
euko = EUKO(kind=1)
tokens = euko.tokenize(text)
score = euko.get_score(tokens)

kind parammeter for EUKO class: a parameter to select a lexicon file

0: a lexicon file generated using Naive-bayes classifier with 5-gram tokens as features and levels of VKOSPI as positive/negative label.
1: a lexicon file generated by seed propagation method with 5-gram tokens.

Topic analysis

To analyze the Monetary Policy Topics, create an instance of the MPTK class in ekonlpy.topic

from ekonlpy.topic import MPTK
mptk = MPTK()
tokens = mptk.nouns(text)
bow = mptk.doc2bow(tokens)
dtm = mptk.get_document_topic(bow)

parammeters for get_document_topic fucntion

include_names: If True, return tuples of list including topic names. 
                    ex) (topic_id, topic_name, topic_weight)
               If False (default), return tuples of list without topic name. 
                    ex) (topic_id, topic_weight)  

min_weight: If min_weight is set, return topics with the topic weight is greather than the min_weight.
            Otherwise, return all available topics.

Install

$ git clone https://github.com/entelecheia/eKoNLPy.git

$ cd eKoNLPy

$ pip install .

$ pip install . --upgrade (for upgrade)

Requires

  • KoNLPy >= 0.4.4
  • nltk >= 2.0
  • gensim >= 3.1.0
  • scipy >= 0.19.1
  • numpy >= 1.13

License

eKoNLPy is Open Source Software, and is released under the license GPL v3.

  • Lee, Young Joon, eKoNLPy: A Korean NLP Python Library for Economic Analysis, 2018. https://github.com/entelecheia/eKoNLPy.

  • Lee, Young Joon, Soohyon Kim, and Ki Young Park. "Deciphering Monetary Policy Board Minutes with Text Mining: The Case of South Korea." Korean Economic Review 35 (2019): 471-511.

BibTeX entry:

@misc{lee2018ekonlpy,
    author= {Lee, Young Joon},
    year  = {2018},
    title = {{eKoNLPy: A Korean NLP Python Library for Economic Analysis}},
    note  = {\url{https://github.com/entelecheia/eKoNLPy}}
}

About

A Korean NLP Python Library for Economic Analysis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%