Skip to content

The python-based prototype of Chinese query expansion system using CKIP eHownet dictionary and association mining algorithm

License

Notifications You must be signed in to change notification settings

paulyang0125/QueryExpansionSystem

Repository files navigation

#Apriori-based query expansion system for Chinese IR

This is the python-based prototype of Chinese query expansion system using CKIP eHownet dictionary and association mining algorithm – Apriori in order to explore more query options for user in Google. However, as this still belongs to init version and prototype, codes of interface, HTTP server and database use simple sqlite and CGI script for development and it doesn’t integrate the web-framework like Django and doesn't use the ways like multiprocessing or threading to improve cal performance. In addition, the full usage of ehownet through SQL is not for free, thus the number of terms in the current dictionary to expand the user query is limited.

the slide for more detail

Quick-start

  1. Jieba is required to be installed in advance
easy_install jieba

or 

pip install jieba
  1. Run simple_httpd.py to start the http development server
  2. Open index.html by any your preferred browser to enter the entry of the system
  3. Know about the status by checking logs\server_info.log

Preview

Demo

Technical Overview

  • Make use of Google Web Search API to get the web snippet from Google index server
  • Use Bag of word and TF/IDF for feature exaction
  • Use Apriori algorithmto mine the association rule in a webpage and use eHowbet and a simple weighted scheme to prioritize the rules
  • Introduction to utilizing CKIP ehownet

Use case diagram

Demo1

Flow chart

Demo2

Retrieval based on two dimensional system

Demo3

License

The MIT License (MIT) Copyright (c) 2013 Yang Yao-Nien

Permission is hereby granted, free of charge, to any person obtaining a copy ofthis software and associated documentation files (the "Software"), to deal inthe Software without restriction, including without limitation the rights touse, copy, modify, merge, publish, distribute, sublicense, and/or sell copies ofthe Software, and to permit persons to whom the Software is furnished to do so,subject to the following conditions: The above copyright notice and this permission notice shall be included in allcopies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS ORIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESSFOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS ORCOPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHERIN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR INCONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

About

The python-based prototype of Chinese query expansion system using CKIP eHownet dictionary and association mining algorithm

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published