Skip to content

jayanthjj/worth-a-read

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Worth-a-read

NLP based Extractive Text Summarizer uses nltk python package and backend in flask to be easily compatible with .py files.

What is Natural language Processing (NLP) ?

Natural Language Processing or NLP is a field of Deep Learning that helps computers understand, interpret and manipulate human language. Natural language processing helps computers communicate with humans in their own language and scales other language-related tasks. For example, NLP makes it possible for computers to read text, hear speech, interpret it, measure sentiment and determine which parts are important.


Why NLP ?

Natural Language Processing is present everywhere, the news-apps give a gist of the news as a preview which uses these NLP models to render a short crisp idea of the news. Unlike Image based models using CNNs or any other Neural Networks its not as easy to process data, as in the case of images we have the pixel data as numbers, but in the case of words or sentences we cannot convert each of them into their ASCII equivalent everytime.


What is Text Summarization ?

Summarization is basically producing a short idea from a large piece of information without affecting the meaning or the take away of the text. So it depends on how a model treats which part of the text is important and which can be neglected as a whole. Therefore we have two types of Summarization Techniques namely:

Abstractive Text Summarization

Abstractive methods contains words that needn't be present in the source documents. These select words based on semantic understanding, It helps in bringing out a new and consise idea by understanding the text. They interpret and examine the text using advanced natural language techniques in order to generate a new shorter text that conveys the most critical information from the original text. Therefore, we can say that Abstractive text Summarization is closely related to human semantics understanding capabilities.

Extractive Text Summarization

Extractive methods helps in summarizing articles by selecting a set of common or reduntant words which gives the idea of the main text. In a way it assigns certain weight to certain parts or words in a sentence and then helps in giving necessary information. Abstractive text summarization requires high level of training so commonly we use extractive Summarization techniques. The extraction is made according to the defined metric without making any changes to the texts.
In this projectExtractive Text Summarization techniques are used to summarize given texts.


How do we do Extractive Text summarization?

We make use of the NTKL toolkit provided by Python. The Natural Language Toolkit, or more commonly NLTK, is a suite of libraries and programs for symbolic and statistical natural language processing for English written in the Python programming language.

I. Create a Hash table with frequencies of each word.

A dictionary is created with the frequency of occurence of each word in the given text. One this we should do is to avoid the stopWords, which are the commonly used words like a, an, the, so on.
To avoid the stopWords we can use the nltoolkit's stopWords method, which ignores the earlier mentioned commonly used words.

II. Tokenize the given Text.

We split each sentence using the nltoolkit's tokenizer and thus get a vector of sentences/string, which can be further used to score each sentence according to the frequency count.

III. Score each sentence according to the frequency table count.

A sentence is scored according to the frequence of count of the non-stopWords by running to loops on the sentence vector and also on the Hash Table.
This gives us the score for further sentence formation.

IV. Use a Max Heap or a priority Queue to get highly scored sentences

We make use of Max heap to get the sentences with higher scores, and construct a final piece of text using the same.active
Thus, providing us with the final summary of the given text.

This is how NLP can be used for TEXT Summarization.

Sample


Actual text

Summarized text