Sentiment analysis library for russian language
Please note that Dostoevsky
supports only Python 3.6 (3.7+ version'll be supported when tensorflow get it support, sorry)
$ pip install dostoevsky
This model was trained on RuSentiment dataset and achieves up to ~0.70 F1 score
First of all, you'll need to download pretrained word embeddings and model:
$ dostoevsky download vk-embeddings cnn-social-network-model
Then, we can build our pipeline: text -> tokenizer -> word embeddings -> CNN
from dostoevsky.tokenization import UDBaselineTokenizer
from dostoevsky.word_vectors import SocialNetworkWordVectores
from dostoevsky.models import SocialNetworkModel
tokenizer = UDBaselineTokenizer()
tokens = tokenizer.split('всё очень плохо') # [('всё', 'ADJ'), ('очень', 'ADV'), ('плохо', 'ADV')]
word_vectors_container = SocialNetworkWordVectores()
vectors = word_vectors_container.get_word_vectors(tokens)
vectors.shape # (3, 300) - three words/vectors with dim=300
model = SocialNetworkModel(
tokenizer=tokenizer,
word_vectors_container=word_vectors_container,
lemmatize=False,
)
model.predict(['наступили на ногу', 'всё суперски']) # array(['negative', 'positive'], dtype='<U8')