Motivation

Information retrieval (IR) is finding material (usually documents) of an unstructured nature . . . that satisfies an information need from within large collections (usually stored on computers).

Basic assumptions of IR

Collection: Fixed set of documents
Goal: Retrieve documents with information that's relevant to the user's information need and helps the user to complete a task

Details

This project has been developed on windows 10 pro, visual studio 2019 and C#

Description

Boolean Retrieval Model

The Boolean model of information retrieval (BIR) is a classical information retrieval (IR) model and, at the same time, the first and most adopted one. It is used by many IR systems to this day

In the Boolean retrieval model we can pose any query in the form of a Boolean expression of term i.e., one in which terms are combined with the operators and, or, and not.

Basic Assumption of Boolean Model

An index term is either present(1) or absent(0) in the document
Queries are Boolean combinations of index terms.
X AND Y: represents doc that contains both X and Y
X OR Y: represents doc that contains either X or Y
NOT X: represents the doc that do not contain X

An example information retrieval problem

Brutus AND Caesar AND NOT Calpurnia

Suppose you wanted to determine which plays of Shakespeare contain the words Brutus and Caesar and not Calpurnia. One way to do that is to start at the beginning and to read through all the text,

The simplest form of document retrieval is for a computer to do this sort of linear scan through documents. This process is commonly referred to as GREPPING through text

The way to avoid lineraly scanning the text for each query is to index the documents in advance.
Suppose we record for each document whether it contains each word out of all the words that may be used.
The result is a binary term-document incidence matrix.
Terms are the indexed units. They are usually words.

The included process

Documents to be indexed.
Token stream
Modefied tokens (Stemming).
Indexer (incidence matrix).

Download

You can find source code here, download and run on visual studio's default dlls

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
LICENSE		LICENSE
README.md		README.md
Screenshot (93).png		Screenshot (93).png
Screenshot (94).png		Screenshot (94).png
Screenshot (95).png		Screenshot (95).png
Screenshot (96).png		Screenshot (96).png
Test1.txt		Test1.txt
Test2.txt		Test2.txt
Test3.txt		Test3.txt
src.md		src.md
test4.txt		test4.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Motivation

Details

Description

Boolean Retrieval Model

An example information retrieval problem

The included process

Download

License

About

Releases

Packages

License

AN4553R/IR

Folders and files

Latest commit

History

Repository files navigation

Motivation

Details

Description

Boolean Retrieval Model

An example information retrieval problem

The included process

Download

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages