Sense embeddings

Dependencise

Pytorch
Numpy
transformers

Installation

Note that the server MUST be running on Python>=3.5 with Pytorch>=1.10.

Additional Packages

To install additional packages used by this project run:

pip install -r requirements.txt

External Data

Download pre-trained BERT (large-cased)

$ cd external/bert  # from repo home
$ wget https://storage.googleapis.com/bert_models/2018_10_18/cased_L-24_H-1024_A-16.zip
$ unzip cased_L-24_H-1024_A-16.zip

Download pre-trained GloVe (Common Crawl, 840B tokens)

$ cd external/glove  # from repo home
$ wget http:https://nlp.stanford.edu/data/glove.840B.300d.zip
$ unzip glove.840B.300d.zip

Download the WSD Evaluation Framework

$ cd external/wsd_eval  # from repo home
$ wget http:https://lcl.uniroma1.it/wsdeval/data/WSD_Evaluation_Framework.zip
$ unzip WSD_Evaluation_Framework.zip

Train the model

After obtaining word embeddings and contextualised embeddings, we can train the model using this:

usage: train_linear_diagonal.py [-h]
                                [--glove_embedding_path GLOVE_EMBEDDING_PATH]
                                [--num_epochs NUM_EPOCHS] [--loss {standard}]
                                [--emb_dim EMB_DIM]
                                [--diagonalize DIAGONALIZE] [--device DEVICE]
                                [--bert BERT] [--wsd_fw_path WSD_FW_PATH]
                                [--dataset {semcor,semcor_omsti}]
                                [--batch_size BATCH_SIZE] [--lr LR]
                                [--merge_strategy {mean,first,sum}]

Word Sense Mapping

optional arguments:
  -h, --help            show this help message and exit
  --glove_embedding_path GLOVE_EMBEDDING_PATH
  --num_epochs NUM_EPOCHS
  --loss {standard}
  --emb_dim EMB_DIM
  --diagonalize DIAGONALIZE
  --device DEVICE
  --bert BERT
  --wsd_fw_path WSD_FW_PATH
                        Path to Semcor
  --dataset {semcor,semcor_omsti}
                        Name of dataset
  --batch_size BATCH_SIZE
                        Batch size
  --lr LR               Learning rate
  --merge_strategy {mean,first,sum}
                        WordPiece Reconstruction Strategy

WSD Evaluation

Usage description.

$ python eval.py -h
usage: eval.py [-h] [-glove_embedding_path GLOVE_EMBEDDING_PATH]
               [-sv_path SV_PATH] [-load_weight_path LOAD_WEIGHT_PATH]
               [-wsd_fw_path WSD_FW_PATH]
               [-test_set {senseval2,senseval3,semeval2007,semeval2013,semeval2015,ALL}]
               [-batch_size BATCH_SIZE] [-merge_strategy MERGE_STRATEGY]
               [-ignore_pos] [-thresh THRESH] [-k K] [-quiet] [-device DEVICE]

WSD Evaluation.

optional arguments:
  -h, --help            show this help message and exit
  -glove_embedding_path GLOVE_EMBEDDING_PATH
  -sv_path SV_PATH      Path to sense vectors (default: data/vectors/senseMatr
                        ix.semcor_diagonal_linear_large_300_200.npz)
  -load_weight_path LOAD_WEIGHT_PATH
  -wsd_fw_path WSD_FW_PATH
                        Path to WSD Evaluation Framework (default:
                        external/wsd_eval/WSD_Evaluation_Framework/)
  -test_set {senseval2,senseval3,semeval2007,semeval2013,semeval2015,ALL}
                        Name of test set (default: ALL)
  -batch_size BATCH_SIZE
                        Batch size (default: 64)
  -merge_strategy MERGE_STRATEGY
                        WordPiece Reconstruction Strategy (default: mean)
  -ignore_pos           Ignore POS features (default: True)
  -thresh THRESH        Similarity threshold (default: -1)
  -k K                  Number of Neighbors to accept (default: 2)
  -quiet                Less verbose (debug=False) (default: True)
  -device DEVICE

To replicate, use as follows:

$  python eval.py -sv_path data/vectors/senseMatrix.semcor_diagonal_linear_large_300_200.npz -test_set ALL

WiC evaluation

You'll need to download the WiC dataset and place it in 'external/wic/':

$ cd external/wic
$ wget https://pilehvar.github.io/wic/package/WiC_dataset.zip
$ unzip WiC_dataset.zip

Sense Comparison

Usage description.

usage: train_classifier.py [-h] [--emb_dim EMB_DIM]
                           [-glove_embedding_path GLOVE_EMBEDDING_PATH]
                           [-eval_set {train,dev,test}] [-sv_path SV_PATH]
                           [-load_weight_path LOAD_WEIGHT_PATH]
                           [-out_path OUT_PATH] [-device DEVICE]

Evaluation of WiC solution using LMMS for sense comparison.

optional arguments:
  -h, --help            show this help message and exit
  --emb_dim EMB_DIM
  -glove_embedding_path GLOVE_EMBEDDING_PATH
  -eval_set {train,dev,test}
                        Evaluation set
  -sv_path SV_PATH      Path to sense vectors
  -load_weight_path LOAD_WEIGHT_PATH
  -out_path OUT_PATH    Path to .pkl classifier generated
  -device DEVICE

To train binary classifier, use:

$ python train_classifier.py

Evaluation using Classifier

$ python eval_classifier_wic.py

Name		Name	Last commit message	Last commit date
Latest commit History 246 Commits
external		external
README.md		README.md
eval.py		eval.py
eval_classifier_wic.py		eval_classifier_wic.py
nns.py		nns.py
requirements.txt		requirements.txt
senseidx.txt		senseidx.txt
train.py		train.py
train_classifier.py		train_classifier.py
visualisation.py		visualisation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sense embeddings

Table of Contents

Dependencise

Installation

Additional Packages

External Data

Train the model

WSD Evaluation

WiC evaluation

Sense Comparison

About

Releases

Packages

Languages

Jodiechou/senseEmbeddings

Folders and files

Latest commit

History

Repository files navigation

Sense embeddings

Table of Contents

Dependencise

Installation

Additional Packages

External Data

Train the model

WSD Evaluation

WiC evaluation

Sense Comparison

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages