This Crawler is create to crawl some kaskus thread, like this.
Thread info, user info is saved to sql databases.
- Scrapy
- mysqldb
-
Edit db_base.py change your database setting
-
Edit kaskus/settings.py, change your scrapy spider setting
-
Edit kaskus/spiders/new_kaskus_spider.py, change list of thread in this line:
start_urls = ['https://www.kaskus.co.id/thread/509881921dd719d70e000015']
Or You can do like this too
start_urls = ['https://www.kaskus.co.id/thread/509881921dd719d70e000015', 'https://www.kaskus.co.id/thread/50c3d3324f6ea10528000001']
-
And start your crawler with this command
scrapy crawl new_kaskus
The script is still sucks, not follow scrapy standards, use at your own risks.
mail me at clasense4[at]gmail[dot]com