Skip to content

Source code for ISPRAS-2021 journal paper "Language Models Application in Sentiment Attitude Extraction Task" (in Russian)

License

Notifications You must be signed in to change notification settings

nicolay-r/neural-networks-for-attitude-extraction

Repository files navigation

Neural Networks Applications in Sentiment Attitude Extraction

UPD January 10th, 2021: These scripts mostly become a part of AREkit-0.22.0 demo and examples! [demo-readme]

This repository is an application for neural-networks of AREkit framework, devoted to sentiment attitude extraction task [initial-paper], applied for a document contexts:

Figure: Example of a context with attitudes mentioned in it; named entities «Russia» and «NATO» have the negative attitude towards each other with additional indication of other named entities.

It provides applications for:

Models List

Dependencies

  • Python-2.7
  • AREKit == 0.20.5

Installation

AREkit repository:

# Clone repository in local folder of the currect project. 
git clone -b 0.20.5-rc https://github.com/nicolay-r/AREkit ../arekit
# Install dependencies.
pip install -r arekit/requirements.txt

Prepare the data

We utilize RusVectores news-2015 embedding:

mkdir -p data
curl http:https://rusvectores.org/static/models/rusvectores2/news_mystem_skipgram_1000_20_2015.bin.gz -o "data/news_rusvectores2.bin.gz"

Application #1. Data Serialization

Using run_serialization.sh in order to prepare data for a particular experiment:

python run_serialization.py 
    --cv-count 3 --frames-version v2_0 
    --experiment rsr+ra --labels-count 3 --ra-ver v1_0
    --emb-filepath data/news_rusvectores2.bin.gz 
    --entity-fmt rus-simple --balance-samples True

Application #2. Training

Using run_train_classifier.sh to run an experiment.

CUDA_VISIBLE_DEVICES=0 python run_training.py --do-eval 
    --bags-per-minibatch 32 --dropout-keep-prob 0.80 --cv-count 3 
    --labels-count 3 --experiment rsr+ra --model-input-type ctx --ra-ver v1_0
    --model-name cnn --test-every-k-epoch 5 --learning-rate 0.1 
    --balanced-input True --train-acc-limit 0.99  --epochs 100

Script Arguments Manual

Common flags:

  • --experiment -- is an experiment which could be as follows:
    • rsr -- supervised learning + evaluation within RuSentRel collection;
    • ra -- pretraining with RuAttitudes collection;
    • rsr+ra -- combined training within RuSentRel and RuAttitudes and evalut.
  • --cv_count -- data folding mode:
    • 1 -- predefined docs separation onto TRAIN/TEST (RuSentRel);
    • k -- CV-based folding onto k-folds; (k=3 supported);
  • --frames_versions -- RuSentiFrames collection version:
    • v2.0 -- RuSentiFrames-2.0;
  • --ra_ver -- RuAttitudes version, if collection is applicable (ra or rsr+ra experiments):
    • v1_2 -- RuAttitudes-1.0 paper;
    • v2_0_base;
    • v2_0_large;
    • v2_0_base_neut;
    • v2_0_large_neut;

Training specific flags:

  • --model_name -- model to train (see [list]);
  • --do_eval -- activates evaluation during training process;
  • --bags_per_minibatch -- количество мешков в мини-партии;
  • --balanced_input -- флаг, указывает на использование сбалансированной коллекции в обучении модели;
  • --emb-filepath -- path to Word2Vec model;
  • --entity-fmt -- entities formatting type:
    • rus-simple -- using russian masks: объект, субъект, сущость;
    • sharp-simple -- using BERT related notation for meta tokens: #O (object), #S (subjects), #E (entities);
  • --balance-samples -- activates sample balancing;

About

Source code for ISPRAS-2021 journal paper "Language Models Application in Sentiment Attitude Extraction Task" (in Russian)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published