Comparison_Analysis_of_embeddings

Part I Text Classification

Approach 1: Rule based classification

The rule-based classification works using a few hand-crafted rules. In sentiment classification, few words are associated with positive sentiment and few others with negative sentiment.

EVALUATION of rule-based classifier with learnable weights is: 70.975

Approach 2: Bag-of-Words (BoW)

The bag-of-words model is a model of text which uses a representation of text that is based on an unordered collection (or "bag") of words. It is used in natural language processing and information retrieval (IR). It disregards word order (and thus any non-trivial notion of grammar[clarification needed]) but captures multiplicity.

EVALUATION of BoW classifier is: 80.675

Part II Word2Vec

Word2vec is one of the most popular techniques to learn word embeddings. The idea behind word2vec was that the meaning of a word is determined by the context in which it occurs. A word embedding is a learned representation for text where words that have the same meaning have a similar representation. Word2vec model has 2 architectures:

Continuous bag of word (CBOW):

The continuous bag-of-words (CBOW) model is a neural network for natural language processing tasks such as language translation and text classification. It is based on predicting a target word given the context of the surrounding words. The CBOW model takes a window of surrounding words as input and tries to predict the target word in the center of the window. The model is trained on a large text dataset and learns to predict the target word based on the patterns it observes in the input data. The CBOW model is often combined with other natural language processing techniques and models, such as the skip-gram model, to improve the performance of natural language processing tasks.

Skip-gram:

Skip-gram is one of the unsupervised learning techniques used to find the most related words for a given word. Skip-gram is used to predict the context word for a given target word. It's reverse of CBOW algorithm. Here, target word is input while context words are output.

EVALUATION: Precision at 5 for the analogy test is 0.1111111111111111
EVALUATION: Precision at 5 for the analogy test with Google skip-gram model is 0.0034

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
GOUDA_PRITAM_TRILOCHAN_SAVITA_23754_assignment1.ipynb		GOUDA_PRITAM_TRILOCHAN_SAVITA_23754_assignment1.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Comparison_Analysis_of_embeddings

Part I Text Classification

Approach 1: Rule based classification

Approach 2: Bag-of-Words (BoW)

Part II Word2Vec

Continuous bag of word (CBOW):

Skip-gram:

About

Releases

Packages

Languages

pritamgouda11/Comparison_Analysis_of_embeddings

Folders and files

Latest commit

History

Repository files navigation

Comparison_Analysis_of_embeddings

Part I Text Classification

Approach 1: Rule based classification

Approach 2: Bag-of-Words (BoW)

Part II Word2Vec

Continuous bag of word (CBOW):

Skip-gram:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages