metaphor-detection

Data format

An assumption is made on the data format. It should be a tab-separated file with the following columns:

document_name: ID of the document that the current sentence belongs to (set dummy IDs if there are no documents). This is used to extract previous sentences to provide the model additional context.
idx: position of the current sentence in the current document.
sentence_words: words in the sentence (list of strings)
met_type: annotated metaphors in the document (optional)

See cjvt/komet, cjvt/gkomet, and matejklemen/vuamc for examples of datasets formatted in this way. Below is a snippet of how the data should look like:

document_name	idx	idx_paragraph	idx_sentence	sentence_words	met_type	met_frame	met_type_mapped
komet1484.div.xml	60	31	0	['»', 'Jaz', 'grem', 'tudi', 'sam', 'domov', '!', '«', 'je', 'Maticu', 'zasijal', 'obraz', '.']	[{'type': 'MRWi', 'word_indices': [10]}]	[{'type': 'cause_visual_change', 'word_indices': [10]}]
komet1484.div.xml	61	31	1	['»', 'Imaš', 'kaj', 'drobiža', '?', '«']	[]	[]

Training the models

Token-level:

$ python3 metaphor_detection_token.py \
--mode="train" \
--experiment_dir="komet-fold1-xlmrbase-2e-5-binary3-history0-optthresh" \
--train_path=data/komet-5fold/fold1/train.tsv  \
--dev_path=data/komet-5fold/fold1/dev.tsv \
--test_path=data/komet-5fold/fold1/test.tsv \
--history_prev_sents=0 \
--pretrained_name_or_path="xlm-roberta-base" \
--learning_rate=2e-5 \
--batch_size=120 \
--num_epochs=10 \
--validate_every_n_examples=3000 \
--early_stopping_rounds=100 \
--validation_metric="f1_score_binary" \
--random_seed=17 \
--type_scheme="binary" \
--mrwi \
--mrwd \
--mrwimp \
--wandb_project_name="metaphor-komet-token-span-optimization" \
--optimize_bin_threshold

Sentence-level:

$ python3 metaphor_detection_sentence.py \
--mode="train" \
--experiment_dir="komet-sent-fold0-sloberta-2e-5-binary3-history0-optthresh" \
--train_path=data/komet-sent-5fold/fold0/train.tsv \
--dev_path=data/komet-sent-5fold/fold0/dev.tsv \
--test_path=data/komet-sent-5fold/fold0/test.tsv \
--history_prev_sents=0 \
--pretrained_name_or_path="EMBEDDIA/sloberta" \
--learning_rate=2e-5 \
--batch_size=200 \
--num_epochs=10 \
--validate_every_n_examples=3000 \
--early_stopping_rounds=100 \
--validation_metric="f1_score_binary" \
--random_seed=17 \
--mrwi \
--mrwd \
--mrwimp \
--wandb_project_name="metaphor-komet-sentence" \
--optimize_bin_threshold

Name		Name	Last commit message	Last commit date
Latest commit History 172 Commits
README.md		README.md
accuracy_by_num_metaphors.py		accuracy_by_num_metaphors.py
accuracy_by_upos.py		accuracy_by_upos.py
combine_data.py		combine_data.py
custom_modules.py		custom_modules.py
data.py		data.py
predict_sentence.py		predict_sentence.py
predict_token.py		predict_token.py
preprocess.py		preprocess.py
test_data.py		test_data.py
train_sentence.py		train_sentence.py
train_token.py		train_token.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

metaphor-detection

Data format

Training the models

About

Releases

Packages

Languages

matejklemen/metaphor-recognition

Folders and files

Latest commit

History

Repository files navigation

metaphor-detection

Data format

Training the models

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages