RAKE short for Rapid Automatic Keyword Extraction algorithm, is a domain independent keyword extraction algorithm which tries to determine key phrases in a body of text by analyzing the frequency of word appearance and its co-occurance with other words in the text.
pip install rake-nltk
git clone https://github.com/csurfer/rake-nltk.git
python rake-nltk/setup.py install
If you see a stopwords error, it means that you do not have the corpus stopwords downloaded from NLTK. You can download it using command below.
python -c "import nltk; nltk.download('stopwords')"
from rake_nltk import Rake
# Uses stopwords for english from NLTK, and all puntuation characters.
r = Rake()
r.extract_keywords_from_text(<text to process>)
r.get_ranked_phrases() # To get keyword phrases ranked highest to lowest.
from rake_nltk import Metric, Rake
# To use it with a specific language supported by nltk.
r = Rake(language=<language>)
# If you want to provide your own set of stop words and punctuations to
r = Rake(
stopwords=<list of stopwords>,
punctuations=<string of puntuations to ignore>
)
# If you want to control the metric for ranking. Paper uses d(w)/f(w) as the
# metric. You can use this API with the following metrics:
# 1. d(w)/f(w) (Default metric) Ratio of degree of word to its frequency.
# 2. d(w) Degree of word only.
# 3. f(w) Frequency of word only.
r = Rake(ranking_metric=Metric.DEGREE_TO_FREQUENCY_RATIO)
r = Rake(ranking_metric=Metric.WORD_DEGREE)
r = Rake(ranking_metric=Metric.WORD_FREQUENCY)
This is a python implementation of the algorithm as mentioned in paper Automatic keyword extraction from individual documents by Stuart Rose, Dave Engel, Nick Cramer and Wendy Cowley
- It is extremely fun to implement algorithms by reading papers. It is the digital equivalent of DIY kits.
- There are some rather popular implementations out there, in python(aneesha/RAKE) and node(waseem18/node-rake) but neither seemed to use the power of NLTK. By making NLTK an integral part of the implementation I get the flexibility and power to extend it in other creative ways, if I see fit later, without having to implement everything myself.
- I plan to use it in my other pet projects to come and wanted it to be modular and tunable and this way I have complete control.
Please use issue tracker for reporting bugs or feature requests.
Pull requests are most welcome.
If you found the utility helpful you can buy me a cup of coffee using