Skip to content

Commit

Permalink
Scrapy框架
Browse files Browse the repository at this point in the history
Scrapy框架结构
  • Loading branch information
Mocha-Pudding committed Feb 25, 2019
1 parent 6b6d98f commit 520be38
Show file tree
Hide file tree
Showing 17 changed files with 518 additions and 54 deletions.
128 changes: 114 additions & 14 deletions Scrapy_demo/.idea/workspace.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

39 changes: 19 additions & 20 deletions Scrapy_demo/.ipynb_checkpoints/Scrapy快速入门-checkpoint.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -36,31 +36,30 @@
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
"source": [
"### 目录结构介绍:\n",
"\n",
"以下介绍下主要文件的作用: \n",
" \n",
"items.py:用来存放爬虫爬取下来数据的模型。 \n",
"middlewares.py:用来存放各种中间件的文件。 \n",
"pipelines.py:用来将items的模型存储到本地磁盘中。 \n",
"settings.py:本爬虫的一些配置信息(比如请求头、多久发送一次请求、ip代理池等)。 \n",
"scrapy.cfg:项目的配置文件。 \n",
"spiders包:以后所有的爬虫,都是存放到这个里面。 "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Scrapy笔记总结:\n",
"<img src=\"Scrapy笔记.jpg\" width=\"80%\">"
]
}
],
"metadata": {
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 使用Scrapy框架爬取糗事百科段子:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 使用命令创建一个爬虫:\n",
" \n",
"scrapy gensipder qsbk \"qiushibaike.com\" \n",
" \n",
"创建了一个名字叫做qsbk的爬虫,并且能爬取的网页只会限制在qiushibaike.com这个域名下。 "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
39 changes: 19 additions & 20 deletions Scrapy_demo/Scrapy快速入门.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -36,31 +36,30 @@
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
"source": [
"### 目录结构介绍:\n",
"\n",
"以下介绍下主要文件的作用: \n",
" \n",
"items.py:用来存放爬虫爬取下来数据的模型。 \n",
"middlewares.py:用来存放各种中间件的文件。 \n",
"pipelines.py:用来将items的模型存储到本地磁盘中。 \n",
"settings.py:本爬虫的一些配置信息(比如请求头、多久发送一次请求、ip代理池等)。 \n",
"scrapy.cfg:项目的配置文件。 \n",
"spiders包:以后所有的爬虫,都是存放到这个里面。 "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Scrapy笔记总结:\n",
"<img src=\"Scrapy笔记.jpg\" width=\"80%\">"
]
}
],
"metadata": {
Expand Down
Binary file added Scrapy_demo/Scrapy笔记.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Empty file.
Binary file not shown.
Binary file not shown.
14 changes: 14 additions & 0 deletions Scrapy_demo/qsbk/qsbk/items.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# -*- coding: utf-8 -*-

# Define here the models for your scraped items
#
# See documentation in:
# https://doc.scrapy.org/en/latest/topics/items.html

import scrapy


class QsbkItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
pass
Loading

0 comments on commit 520be38

Please sign in to comment.