Text similarity

Evaluating the semantic similarity between sentence pairs using GloVe and LSTM on pytorch.

Data

The dataset used for this project is the SICK dataset (Sentences Involving Compositional Knowledge, by Marelli et al.).

Process

1 - Words are vectorized using GloVe word embeddings.
2 - Both sentences are vectorized using pytorch LSTM.
3 - The 2 sentence vectors are merged by multiplication and substraction.
4 - The resulting vectors go through additionnal weight layers.
5 - A sigmoid function gives a similarity score between 0 and 1.

Note

This notebook has been edited on a fixed kaggle environment. It is not reproducible as such on another platform such as Colab, for instance. Notably, the torchtext module has been removed in recent Pytorch versions. Adapting the notebook to recent versions has shown impossible, notably since it relied on the BucketIterator to process the sentence pairs simultaneously and no equivalent feature has been found in recent versions.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
README.md		README.md
sentence-relatedness.ipynb		sentence-relatedness.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text similarity

Data

Process

Note

About

Releases

Packages

Languages

vrivier/text-similarity

Folders and files

Latest commit

History

Repository files navigation

Text similarity

Data

Process

Note

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages