Automatic Headline Generation from News Articles in Hindi Language
DeepNews is a high-level headline generating tool, written in Python and capable of running on top of either Keras, TensorFlow or Theano. It was developed for media orgnizations or writters where they can quickly come up with headline that is short and information conveying.
DeepNews in written on top of Python and Keras, ThensorFlow and Theano.
Installing Python:
- Anaconda - Comes with prebuild libraries like Pandas, Numpy, Scipy, etc. (Recommended)
- Official Python website
Installing Keras
sudo pip install keras
- Windows Based System can follow this steps Stackoverflow
Installing TensorFlow
Amazon AWS (All libraries are installed in the AMI image)
- G2 or P2 (GPU) based instances
- Amazon Machine Image AMI
- GPU configuration are enabled (by default)
Neural networks are computations heavy, GPU configuration is recommended.
Length of Article histogram
Length of Headline histogram
features | values |
---|---|
no of articles | 2,97,965 |
no of tokens | 85,940,081 (85.94M) |
no of unique tokens in articles | 3,88,449 |
no of unique tokens in headlines | 58,448 |
avg length of article | 272 |
avg length of headline | 7 |
size of dataset | 1.06GB |
avg. of (ratio len(article)/len(headline)) (Behind 43 words of description, headline contain 1 word) | 43 |
features | values |
---|---|
no of articles | 5,95,847 |
no of tokens | 20,92,32,922 (209M) |
no of unique tokens in articles | 10,26,083 |
no of unique tokens in headlines | 1,24,965 |
avg length of article | 316 |
avg length of headline | 11 |
size of dataset | 3.70GB |
avg. of (ratio len(article)/len(headline)) (Behind 43 words of description, headline contain 1 word) | 34 |
News Website | Number of Articles | URL |
---|---|---|
Aaj Tak | 92765 | https://www.aajtak.intoday.in |
ABP News | 13654 | https://www.abpnews.abplive.in |
Amar Ujala | 181 | https://www.amarujala.com |
BBC Hindi | 28861 | https://bbc.com/hindi |
Deshbandhu | 3174 | https://deshbandhu.co.in |
Economic Times | 993 | https://hindi.economictimes.indiatimes.com |
Jagran | 73290 | https://www.jagran.com |
Navbharat Times | 10329 | https://www.navbharattimes.indiatimes.com |
NDTV | 92942 | https://www.khabar.ndtv.com/news/ |
News18 | 38833 | https://www.news18.com |
Patrika | 68288 | https://www.patrika.com |
Punjab Kesari | 15494 | https://www.punjabkesari.in |
Rajasthan Patrika | 89038 | https://www.rajasthanpatrika.patrika.com |
Zee News | 10463 | https://www.zeenews.india.com/hindi |