WebSurfer

Introduction

A simple search engine developed for Information Retrieval And Web Search course at Yazd University.

This project is aimed to index web documents and then get query from user through web interface, process query, extract documents based on the query and ranking extracted documents and display them to user.

This project does not include any crawler or bot. Documents should be crawled first and placed in a directory. Then use this project to index them.

Description

This project is written in python entirely. The Project consists of two major parts:

Indexer engine
Query processor

Indexer engine is written in pure python and includes these stages:

Read documents from storage
Add docs to a thread-safe queue
Pop docs from queue and index them by indexer workers (workers run in their own processes concurrently)
1. Normalize text (convert escaped chars, remove unnecessary HTML tags, ...)
2. Parse text (extract body and title)
3. Tokenize
4. Create dictionary and posting list
5. Save postings list to MongoDB

Query processor created using Django and provides a web interface for users to enter query and view search result. This module includes:

Get query from user
Fetch postings list from MongoDB based on query
Rank documents using two methods:
- TF-IDF
- Positional
Sort documents based on ranking
Show results to the user

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
indexer		indexer
presentation		presentation
query_parser		query_parser
screenshots		screenshots
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WebSurfer

Introduction

Description

Dependencies

Architecture

Screenshots

About

Releases

Packages

Languages

MRezaKarimi/WebSurfer

Folders and files

Latest commit

History

Repository files navigation

WebSurfer

Introduction

Description

Dependencies

Architecture

Screenshots

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages