Skip to content

dc-aichara/DS-ML-Public

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

99 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


  • A notebook to guide hyperparameters optimization using Bayesian model based optimization.

Example hyperparameters optimization results table for LightGBM Regressor on Boston Housing data.

Read complete article on Medium.

An example of telegram chats which contain keyword 'bitcoin' or 'btc'

  • Get text messages from telegram groups and channels which contain word 'bitcoin' or 'btc'.

Example:

>>> tele_btc_messages.head()

  • Use to get users who were online in last 24 hours.

Example:

$ cd DS-ML-Public
$ python telegram_user_status.py 12345 fe3922d77g6wgwgwyu35g46c9 bitgrit
Number of active users in last 24 hours is 1530.
          User               status
0  Dayal Chand               online
1       Sameer             recently
2  Dikesh Shah  2019-07-02 01:13:19
3       Crypto  2019-07-02 00:47:50
4        Billy  2019-07-02 01:32:49

  • A Jupyter Notebook for Google Analytics Reporting API tutorial.
  • A dashboard demo app

Usages

$ git clone https://github.com/dc-aichara/DS-ML-Public.git
$ cd DS-ML-Public/WebScrapers
$ python3

>>> from crypto_news_scraper import NewsScrap
>>> news = NewsScrap()
>>> df_coindesk = news.coin_desk_news()
>>> df_coindesk.head()
  category                                            heading  ...                time    source
0     news  Dapp.com Closes $1 Million Investment Round Le...  ... 2019-09-06 22:00:00  CoinDesk
1     news  Telegram Finally Releases Code for Its $1.7 Bi...  ... 2019-09-06 21:46:00  CoinDesk
2     news  Massive $1 Billion Bitcoin Whale Transaction M...  ... 2019-09-06 19:00:00  CoinDesk
3     news  Ethereum Picks Early October for Testnet Activ...  ... 2019-09-06 18:00:00  CoinDesk
4     news  Dapp Data Site DappRadar Raises $2.33 Million ...  ... 2019-09-06 17:00:00  CoinDesk

[5 rows x 6 columns]
>>> df_cointelegraph = news.cointelegraph_news()
>>> df_cointelegraph.head()
  category                                            heading  ...                 time         source
0     News  Crypto and Blockchain Adoption Grows: 5 Import...  ...  2019-09-09 11:15:03  CoinTelegraph
1     News  WorldsFirstBlockchain Smartphone to Becom...  ...  2019-09-09 08:15:03  CoinTelegraph
2     News  Ethereum's Istanbul Hard Fork Implementation D...  ...  2019-09-09 08:15:03  CoinTelegraph
3     News  Blockchain Startup DappRadar Raises $2.33M Fro...  ...  2019-09-09 08:15:03  CoinTelegraph
4     News  Huobis Research Arm to Partner with the Unive...  ...  2019-09-09 07:15:03  CoinTelegraph

[5 rows x 6 columns]
>>> df_all = news.get_all_news()
Getting news from CoinDesk!!
Getting news from Cointelegraph!!
Getting news from cryptonewsz!! This will take 1-2 mintues. 😉
>>> df_all.head()
  category                                            heading  ...                 time    source
0     news  Dapp.com Closes $1 Million Investment Round Le...  ...  2019-09-06 22:00:00  CoinDesk
1     news  Telegram Finally Releases Code for Its $1.7 Bi...  ...  2019-09-06 21:46:00  CoinDesk
2     news  Massive $1 Billion Bitcoin Whale Transaction M...  ...  2019-09-06 19:00:00  CoinDesk
3     news  Ethereum Picks Early October for Testnet Activ...  ...  2019-09-06 18:00:00  CoinDesk
4     news  Dapp Data Site DappRadar Raises $2.33 Million ...  ...  2019-09-06 17:00:00  CoinDesk

[5 rows x 6 columns]

Usages

$ git clone https://github.com/dc-aichara/DS-ML-Public.git
$ cd DS-ML-Public/WebScrapers
$ python3

>>> from inshorts_news_scraper import InshortsNews
>>> news = InshortsNews('business')
>>> df_b = news.get_news()
>>> df_b.head()
                                            headings                                               news       short_by                time  category
0  BSNL plans to fire 30% contract staff unpaid s...  BSNL is reportedly planning to lay off about 3...  Anushka Dixit 2019-09-09 23:35:00  business
1  SAT overturns SEBI's 2 year-ban on PwC in7,8...  The Securities Appellate Tribunal (SAT) on Mon...  Anushka Dixit 2019-09-09 21:29:00  business
2  Nissan CEO Hiroto Saikawa to step down on Sept...  Nissan CEO Hiroto Saikawa will step down on Se...         Dharna 2019-09-09 21:08:00  business
3  British Airways pilots begin 2-day strike over...  British Airways pilots began a two-day strike ...  Anushka Dixit 2019-09-09 20:18:00  business
4  SEBI making e-voting app for retail investors ...  Markets regulator SEBI is working on an e-voti...         Dharna 2019-09-09 18:04:00  business
>>> df_all = news.get_all_news()
>>> df_all.head()
                                            headings                                               news        short_by                time  category
0  Conflict between India, Pak less heated now th...  Speaking about tensions between India and Paki...  Arshiya Chopra 2019-09-10 08:50:00  national
1  Bengaluru woman loses95,000 after calling fa...  A Bengaluru woman lost95,000 after calling a...  Pragya Swastik 2019-09-10 08:25:00  national
2  IAS officer who resigned is traitor, should go...  BJP MP Anantkumar Hegde has called IAS officer...    Apaar Sharma 2019-09-09 23:28:00  national
3  Stop drama, stand up, CISF allegedly tells wom...  Virali Modi, a disability rights activist, has...    Anmol Sharma 2019-09-09 23:10:00  national
4  Tech firms may be allowed to sell users' publi...  India is reportedly mulling guidelines which w...          Dharna 2019-09-09 23:00:00  national

Usage:

>>> from japanese_news_scraper import JapaneseNewsScrap
>>> jp_news = JapaneseNewsScrap(24*60*60)
>>> df_coinpost = jp_news.get_coin_post_news()
>>> df_coinpost.head()
                  time                                         heading  ...                           link    source
0  2019-10-08 15:30:48              米リップル社大学ブロックチェーン研究イニシアチブで年次大会を初開催  ...  https://coinpost.jp/?p=111090  CoinPost
1  2019-10-08 15:29:29                     米NBAのキングスファン向けの独自仮想通貨発行を発表  ...  https://coinpost.jp/?p=111088  CoinPost
2  2019-10-08 14:59:53             金融庁がブロックチェーン実験結果を公表金融機関の顧客KYC情報を共有  ...  https://coinpost.jp/?p=111189  CoinPost
3  2019-10-08 14:26:52    Chainlinkの新フレームワーク発表で仮想通貨LINKが高騰 協賛にIntelなど  ...  https://coinpost.jp/?p=111080  CoinPost
4  2019-10-08 14:04:36  イーサリアム企業連合ブロックチェーン仕様の新バージョン発表Devcon 5で検証実施  ...  https://coinpost.jp/?p=111170  CoinPost

[5 rows x 5 columns]
  • A python script to extract text from pages of given website.

Usage:

>>> from website_pages_scraper import WebScraper
>>> webscraper = WebScraper(main_page_url='https://www.example.com/')
>>> df = webscraper.scrap_website(depth=2)