GitHub - pangxiaobin/CrawlerHot: 今日热榜抓取网站热榜信息，并且前端进行展示 go 实现版本：https://github.com/pangxiaobin/goCrawlerHot

说明

我的博客热点展示：https://www.panglb.top/hot/
前后端分离，后端使用轻量级框架web.py，前端使用了layui，数据保存为本地json文件。

├── crawler.py  # 主要爬虫代码
├── helper.py  # 帮助函数
├── html    # 前端页面展示
│   ├── hot.html
│   └── layui  # 前端依赖
├── image
│   └── hot.png
├── LICENSE
├── README.md
├── requments.txt  # 环境依赖
├── result  # 爬虫数据保存
│   └── result.json
├── run.py  # 定时爬虫入口
├── server.py  # 后端服务
├── settings.py
└── uwsgi.ini  # uwsgi服务器配置

目前只写了以下热点信息的爬取
- 知乎热榜
- V2EX
- GitHub
- 新浪微博
- 天涯
- 贴吧
- 豆瓣
- 云音乐
环境
- python3.6

运行

下载

 git clone https://github.com/pangxiaobin/CrawlerHot.git
 cd CrawlerHot

安装依赖

# 创建虚拟环境  需要安装virtualenv 和virtualenvwrapper
mkvirtualenv hot
pip install -r requments.txt
# 注释 windows pip install uwsgi 会报错 windows下演示可先在requments.txt 注释掉uwsgi

本地运行效果展示

数据爬取

python run.py
# 单独看爬虫效果 可以吧run() 注释
# __name__ == '__main__':
#    run_crawler()  # 单次爬虫运行
#    run()  # 定时爬虫运行

启动本地服务

python server.py

查看前端页面展示

把html/hot.html 在浏览器中打开就能看到效果了

服务器部署uwsgi+nginx

项目是前后端分离的，后端可以单独就uwsgi起服务，前端用nginx。
uwsgi起http服务

修改uwsgi.ini中的chdir
# 这里指定你服务器端开放的端口
http=0.0.0.0:8080
# 配置工程目录 项目所在的绝对路径
chdir=yourpath/CrawlerHot

起动uwsgi

uwsgi --ini uwsgi.ini

修改前端请求的接口

    #/html/hot.html
    # 这里的127.0.0.1 要修改为你服务器的ip
    https://127.0.0.1:8080/hot =》https://server_ip:8080/hot

配置nginx部署前端

# /etc/nginx/conf.d/default.conf 添加location 配置
server {
    listen       80;
    # 这里更改为你服务器的ip
    server_name  your_server_ip;
    
    location /hot {
        # 绝对路径
       alias /youtpath/CrawlerHot/html;
       index hot.html;
    }
}

运行定时爬虫脚本

nohup python -u run.py &

效果展示

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

说明

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.idea		.idea
html		html
image		image
result		result
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
crawler.py		crawler.py
helper.py		helper.py
requments.txt		requments.txt
run.py		run.py
server.py		server.py
settings.py		settings.py
uwsgi.ini		uwsgi.ini

License

pangxiaobin/CrawlerHot

Folders and files

Latest commit

History

Repository files navigation

说明

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages