CricketSemantics

CricketSemantics is a NLP project that includes

A Cricket commentary dataset and scrapping engine to scrape data from cricbuzz
A Semantic Search Engine built using SentenceTransformers and FAISS.
A Doc2Vec sentence embeddings and KMeans Cluster (n=5)

Data Scrapping

Scrapped Data from cricbuzz
Used Scrapy to scrape the data
Data here
full blog on how to do this here

Commentary Search Engine

Index file is here
Training code

How to run locally.

Download this repo into your local system

-Then

pip install -r requirements.txt

-Then go to your commandline or terminal

python3 semanticSearchCricket.py

Examples

Doc2Vec model and KMeans Cluster

The Code is in this kaggle notebook
Created a Doc2Vec embeddings on the cricket commentary dataset.
Performed Principle Component Analysis (PCA) on the embedding vectors to reduce it from size 100 to size 2
Used KMeans clustering and clustered the reduced embedding data into 5 distinct clusters.
Used matplotlib to create a scatterplot to visualize the clusters.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
cricket_data_scrapper		cricket_data_scrapper
search_engine		search_engine
README.md		README.md
cricket_commentary_doc2vec		cricket_commentary_doc2vec
d2v_cricket_commentary.model		d2v_cricket_commentary.model
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CricketSemantics

Data Scrapping

Commentary Search Engine

How to run locally.

Examples

Doc2Vec model and KMeans Cluster

Examples

About

Releases 1

Packages

Languages

arjunprakash027/CricketSemantics

Folders and files

Latest commit

History

Repository files navigation

CricketSemantics

Data Scrapping

Commentary Search Engine

How to run locally.

Examples

Doc2Vec model and KMeans Cluster

Examples

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages