This a project using scrapy to crawl data from 链家网.
- Change directory to the root directory of this project.
- Use
scrapy crawl lianjia -o outputs.csv
to save the crawled data into a .csv(also support .json and .xml format) file.
scrapy shell URL
can be used for debugging specific urlscrapy crawl lianjia -s LOG_FILE=scrapy.log
will save the logging info into a file
- Add analysis for different houses of the same time
- Add analysis for the same house of different time
- Add function of scheduling the script
- Use proxy to crawl data more quickly