Skip to content

Applying several strategies like tf-idf, posting index, vector similarity, etc. to build an Information Retrieval system to increase the efficacy of querying the dataset.

Notifications You must be signed in to change notification settings

UtkarshBagaria/Information-Retrieval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Information-Retrieval

Working on the 800,000 news files dataset, an information retrieval system. Among the thousands of files obtaining information is a very tedious task if one has to go through each and every word from every file. This can be solved using an efficient information retrieval system. Using several techniques like removing stop words, punctuations, lower case and stemming the data was first pre-processed and cleaned for use.

image

A posting index was created on this data. With the word being the key which maps to a list. The first element of the list being the count of the word, second being another dictionary with each file it occurs in as the key with the word positions in the file as the values.

Using this posting index created, boolean retrieval was performed on the data.

image

Positional retrieval was performed.

image

Wild cary query.

image image

Using the posting index created before a bi-word index was made and used for bi word query retrieval.

image

Retrieval using Similarity Index with Vector Space Model

image

Likelihood Model using Bayes theorem

image

Assigned tf-idf scores based on the input.

image

Obtained 0.9735667696532784 (97.35%) precision.

Relevance Feedback and reranking of results.

image

About

Applying several strategies like tf-idf, posting index, vector similarity, etc. to build an Information Retrieval system to increase the efficacy of querying the dataset.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages