Skip to content

s09g/search_engine

Repository files navigation

HUAJI Search


Search Engine Module

  • Web Crawler
  • Links Extraction
  • Page Rank
  • TF-IDF
  • N-Gram
  • Top K
  • Inverted Index
  • Recommender System
  • Sentiment Analysis
  • Front page
  • Spelling Correction
  • Language Identifier
  • Auto Completion
  • Snippet

  1. A well-functional search engine, including Web Crawler, Spelling Correction, Inverted index, PageRank Algorithm, TF-IDF Algorithm, AutoComplete, Recommender System and Sentiment Analysis
  2. Web Crawler: Implemented a multithreading web crawler based on crawler4j
  3. Page Rank: Extracted out-links from webpages collected by web crawler, built adjacent matrix from hyperlinks of each page, calculated PageRank based on page relation
  4. TF-IDF: Parsed HTML pages, extracted content text and computed TF-IDF
  5. N-Gram: generated language model, built real-time AutoCompletion based on N-Gram statistics
  6. Recommender System: built video rating matrix from dataset, calculated video co-occurrence matrix, based on Item Collaborative Filtering algorithm
  7. Sentiment Analysis: Extracted emotion feature from text and implemented sentiment analysis based on emotion dictionary
  8. Implemented Top K algorithm and Inverted Index, increased the query efficiency
  9. Implemented Spelling Correction
  10. UI: built front pages with PHP, Bootstrap and jQuery

demo

enter a query
the result will show up with docID, title, url, description and snippets query_result

AutoCompletion: give user query suggestions auto_completion

Spelling Correction: When I mis-typing California as californa, it will ask "Are you looking for California" spelling_correction

We can click on the spelling correction hint. It will help us redirect to the correct word. redirect

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published