Skip to content

Commit

Permalink
Version 0.3.0 Beta
Browse files Browse the repository at this point in the history
  • Loading branch information
naibo committed May 19, 2023
1 parent 6096fb4 commit 3529ec4
Show file tree
Hide file tree
Showing 11 changed files with 48 additions and 20 deletions.
2 changes: 1 addition & 1 deletion ElectronJS/src/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ <h5 style="margin-top: 20px">选择语言/Select Language</h5>

<p><a @click="changeLang('en')" class="btn btn-outline-primary btn-lg"
style="margin-top: 15px; width: 300px;height:60px;padding-top:12px;">English</a></p>
<p><a href="/NaiboWang/EasySpider/Releases" target="_blank">Github</a>最新版本/Newest Version:{{newest_version}}</p>
<p><a href="https://github.com/NaiboWang/EasySpider/releases" target="_blank">Github</a>最新版本/Newest Version:{{newest_version}}</p>
<!-- <p>如发现新版本更新,可从以下Github仓库下载最新版本使用/If a new version is found, you can download the latest version from the following Github repository:</p>-->
<!-- <p></p>-->

Expand Down
5 changes: 4 additions & 1 deletion ElectronJS/src/taskGrid/FlowChart_CN.html
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,8 @@ <h4 class="modal-title">等价XPath</h4>
<label>链接(每行一个链接,有多少行链接整个任务流程就会被执行多少次):</label>
<textarea onkeydown="inputDelete(event)" class="form-control" rows="2" v-model='nowNode["parameters"]["links"]'></textarea>
</div>
<label>页面加载最长等待时间(秒):</label>
<input onkeydown="inputDelete(event)" class="form-control" v-model.number="nowNode['parameters']['maxWaitTime']" type="number" required></input>
<label>执行完是否向下滚动:</label>
<select v-model='nowNode["parameters"]["scrollType"]' class="form-control">
<option value = 0>不滚动</option>
Expand All @@ -123,7 +125,8 @@ <h4 class="modal-title">等价XPath</h4>
<textarea onkeydown="inputDelete(event)" class="form-control" rows="2" v-model='nowNode["parameters"]["xpath"]'></textarea>
<p><button type="button" data-toggle="modal" data-target="#myModal_XPath" @click="changeXPaths(nowNode['parameters']['allXPaths'])" class="btn btn-primary" style="margin-top: 10px">点此查看其他等价的XPath</button></p>
</div>

<label>点击后页面加载最长等待时间(秒):</label>
<input onkeydown="inputDelete(event)" class="form-control" v-model.number="nowNode['parameters']['maxWaitTime']" type="number" required></input>
<label>执行完是否向下滚动:</label>
<select v-model='nowNode["parameters"]["scrollType"]' class="form-control">
<option value = 0>不滚动</option>
Expand Down
2 changes: 1 addition & 1 deletion ElectronJS/src/taskGrid/invokeTask.html
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ <h4 class="modal-title" id="myModalLabel">{{"Task Invocation Instruction~执行
<div class="modal-body">
<input onkeydown="inputDelete(event)" id="serviceId" type="hidden" name="serviceId" value="-1"></input>
<input onkeydown="inputDelete(event)" id="url" type="hidden" name="url" value="about:blank"></input>
<label>{{`Please open a terminal, go to EasySpider's folder, and then copy (Command/Ctrl + c) the following command to run the task (EasySpider cannot quit when executing command, unless --read_type is set to "local"):~请在EasySpider目录下打开命令行工具Terminal,然后复制(Command/Ctrl + c)和运行以下命令以执行任务(执行命令时不能退出EasySpider,除非将--read_type设置为local):` | lang}}</label>
<label>{{`Please open a terminal (For Windows, please use PowerShell instead of CMD), go to EasySpider's folder, and then copy (Command/Ctrl + c) the following command to run the task (EasySpider cannot quit when executing command, unless --read_type is set to "local"):~请在EasySpider目录下打开命令行工具Terminal (Windows请使用PowerShell而不是CMD),然后复制(Command/Ctrl + c)和运行以下命令以执行任务(执行命令时不能退出EasySpider,除非将--read_type设置为local):` | lang}}</label>
<label><a href="https://github.com/NaiboWang/EasySpider/wiki/Argument-Instruction" target="_blank">{{`Click Here~点击这里` | lang}}</a> {{`Here to see argument instruction.~这里查看参数配置说明。` | lang}}</label>
<textarea class="form-control" style="height:150px">cd {{easyspider_location}}
{{command}} --config_folder "{{config_folder}}" --headless 0 --read_type remote --config_file_name config.json --saved_file_name </textarea>
Expand Down
2 changes: 2 additions & 0 deletions ElectronJS/src/taskGrid/logic_CN.js
Original file line number Diff line number Diff line change
Expand Up @@ -129,11 +129,13 @@ function addParameters(t) {
if (t.option == 1) {
t["parameters"]["url"] = "about:blank";
t["parameters"]["links"] = "about:blank";
t["parameters"]["maxWaitTime"] = 10; //最长等待时间
t["parameters"]["scrollType"] = 0; //滚动类型,0不滚动,1向下滚动1屏,2滚动到底部
t["parameters"]["scrollCount"] = 0; //滚动次数
} else if (t.option == 2) { //点击元素
t["parameters"]["scrollType"] = 0; //滚动类型,0不滚动,1向下滚动1屏,2滚动到底部
t["parameters"]["scrollCount"] = 0; //滚动次数
t["parameters"]["maxWaitTime"] = 10; //最长等待时间
t["parameters"]["paras"] = []; //默认参数列表
t["parameters"]["beforeJS"] = ""; //执行前执行的js
t["parameters"]["beforeJSWaitTime"] = 0; //执行前js等待时间
Expand Down
1 change: 1 addition & 0 deletions ElectronJS/tasks/54.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"id":54,"name":"新web采集任务","url":"https://www.jd.com","links":"about:blank","create_time":"5/19/2023, 1:54:12 PM","containJudge":false,"desc":"https://www.jd.com","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://www.jd.com","desc":"要采集的网址列表,多行以\\n分开","type":"string","exampleValue":"https://www.jd.com"},{"id":1,"name":"urlList_1","nodeId":2,"nodeName":"打开网页","value":"about:blank","desc":"要采集的网址列表,多行以\\n分开","type":"string","exampleValue":"about:blank"}],"outputParameters":[],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,2],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","wait":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"url":"https://www.jd.com","links":"https://www.jd.com","maxWaitTime":10,"scrollType":0,"scrollCount":0}},{"id":2,"index":2,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":1,"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","wait":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"url":"about:blank","links":"about:blank","maxWaitTime":16,"scrollType":0,"scrollCount":0}}]}
1 change: 1 addition & 0 deletions ElectronJS/tasks/55.json

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions ElectronJS/tasks/56.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"id":56,"name":"新web采集任务","url":"https://www.jd.com","links":"https://www.jd.com","create_time":"5/19/2023, 2:34:32 PM","containJudge":false,"desc":"https://www.jd.com","inputParameters":[{"id":0,"name":"urlList_0","nodeId":1,"nodeName":"打开网页","value":"https://www.jd.com","desc":"要采集的网址列表,多行以\\n分开","type":"string","exampleValue":"https://www.jd.com"},{"id":1,"name":"loopTimes_循环_1","nodeId":4,"nodeName":"循环","desc":"循环循环执行的次数(0代表无限循环)","type":"int","exampleValue":0,"value":0}],"outputParameters":[],"graph":[{"index":0,"id":0,"parentId":0,"type":-1,"option":0,"title":"root","sequence":[1,3,4],"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","wait":0},"isInLoop":false},{"id":1,"index":1,"parentId":0,"type":0,"option":1,"title":"打开网页","sequence":[],"isInLoop":false,"position":0,"parameters":{"useLoop":false,"xpath":"","wait":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"url":"https://www.jd.com","links":"https://www.jd.com","maxWaitTime":10,"scrollType":0,"scrollCount":0}},{"id":-1,"index":2,"parentId":0,"type":0,"option":2,"title":"点击元素","sequence":[],"isInLoop":false,"position":1,"parameters":{"history":1,"tabIndex":0,"useLoop":false,"xpath":"","wait":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"scrollType":0,"scrollCount":0,"maxWaitTime":10,"paras":[]}},{"id":2,"index":3,"parentId":0,"type":0,"option":2,"title":"点击元素","sequence":[],"isInLoop":false,"position":1,"parameters":{"history":4,"tabIndex":-1,"useLoop":false,"xpath":"//*[@id=\"search-link\"]/i[1]","wait":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"scrollType":0,"scrollCount":0,"maxWaitTime":10,"paras":[],"allXPaths":["/html/body/div[4]/div[1]/div[2]/div[1]/a[1]/i[1]","//i[contains(., '')]"]}},{"id":3,"index":4,"parentId":0,"type":1,"option":8,"title":"循环","sequence":[5],"isInLoop":false,"position":2,"parameters":{"history":5,"tabIndex":-1,"useLoop":false,"xpath":"//*[contains(@class, \"button\")]/i[1]","wait":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"scrollType":0,"scrollCount":0,"loopType":0,"pathList":"","textList":"","code":"","waitTime":0,"exitCount":0,"historyWait":2,"allXPaths":["/html/body/div[3]/div[2]/div[1]/button[1]/i[1]","//i[contains(., '')]"]}},{"id":4,"index":5,"parentId":3,"type":0,"option":2,"title":"点击元素","sequence":[],"isInLoop":true,"position":0,"parameters":{"history":5,"tabIndex":-1,"useLoop":true,"xpath":"//*[contains(@class, \"button\")]/i[1]","wait":0,"beforeJS":"","beforeJSWaitTime":0,"afterJS":"","afterJSWaitTime":0,"scrollType":0,"scrollCount":0,"maxWaitTime":10,"paras":[],"allXPaths":["/html/body/div[3]/div[2]/div[1]/button[1]/i[1]","//i[contains(., '')]"],"loopType":0}}]}
2 changes: 1 addition & 1 deletion ExecuteStage/.vscode/launch.json
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
"console": "integratedTerminal",
"justMyCode": true,
// "args": ["--id", "38", "--read_type", "local", "--headless", "1"]
"args": ["--id", "78", "--headless", "0"]
"args": ["--id", "10", "--headless", "0"]
}
]
}
29 changes: 19 additions & 10 deletions ExecuteStage/easyspider_executestage.py
Original file line number Diff line number Diff line change
Expand Up @@ -418,7 +418,6 @@ def loopExcute(node, loopValue, clickPath="", index=0):
def openPage(para, loopValue):
rt = Time("打开网页")
time.sleep(2) # 打开网页后强行等待至少2秒
time.sleep(random.uniform(1, 10)) # 生成一个a到b的小数等待时间
global links
global urlId
global history
Expand All @@ -442,12 +441,18 @@ def openPage(para, loopValue):
else:
url = links[urlId]
try:
maxWaitTime = int(para["maxWaitTime"])
except:
maxWaitTime = 10 # 默认最大等待时间为10秒
try:
browser.set_page_load_timeout(maxWaitTime) # 加载页面最大超时时间
browser.set_script_timeout(maxWaitTime)
browser.get(url)
Log('Loading page: ' + url)
recordLog('Loading page: ' + url)
except TimeoutException:
Log('time out after 10 seconds when loading page: ' + url)
recordLog('time out after 10 seconds when loading page: ' + url)
Log('time out after set seconds when loading page: ' + url)
recordLog('time out after set seconds when loading page: ' + url)
browser.execute_script('window.stop()')
rt.end()
try:
Expand All @@ -464,8 +469,8 @@ def openPage(para, loopValue):
Log('URL Page: ' + url)
recordLog('URL Page: ' + url)
except TimeoutException:
Log('time out after 10 seconds when getting body text: ' + url)
recordLog('time out after 10 seconds when getting body text:: ' + url)
Log('time out after set seconds when getting body text: ' + url)
recordLog('time out after set seconds when getting body text:: ' + url)
browser.execute_script('window.stop()')
time.sleep(1)
Log("Need to wait 1 second to get body text")
Expand Down Expand Up @@ -519,11 +524,17 @@ def clickElement(para, loopElement=None, clickPath="", index=0):
global history
time.sleep(0.1) # 点击之前等待1秒
rt = Time("Click Element")
Log("Wait 1 second before clicking element")
Log("Wait 0.1 second before clicking element")
if para["useLoop"]: # 使用循环的情况下,传入的clickPath就是实际的xpath
path = clickPath
else:
path = para["xpath"] # 不然使用元素定义的xpath
try:
maxWaitTime = int(para["maxWaitTime"])
except:
maxWaitTime = 10
browser.set_page_load_timeout(maxWaitTime) # 加载页面最大超时时间
browser.set_script_timeout(maxWaitTime)
# 点击前对该元素执行一段JavaScript代码
try:
if para["beforeJS"] != "":
Expand All @@ -541,8 +552,8 @@ def clickElement(para, loopElement=None, clickPath="", index=0):
browser.execute_script(script, str(index)) # 用js的点击方法

except TimeoutException:
Log('time out after 10 seconds when loading clicked page')
recordLog('time out after 10 seconds when loading clicked page')
Log('time out after set seconds when loading clicked page')
recordLog('time out after set seconds when loading clicked page')
browser.execute_script('window.stop()')
rt.end()
except Exception as e:
Expand Down Expand Up @@ -983,8 +994,6 @@ def clean():

wait = WebDriverWait(browser, 10)
browser.get('about:blank')
browser.set_page_load_timeout(10) # 加载页面最大超时时间
browser.set_script_timeout(10)
id = c.id
print("id: ", id)
if c.saved_file_name != "":
Expand Down
21 changes: 16 additions & 5 deletions Readme.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
## 请您Star/Please Star

如果您觉得此工具不错,请轻轻点击此页面右上角**Star**按钮增加项目曝光度,谢谢!
如果您觉得此工具不错,请轻轻点击此页面右上角**Star**按钮增加项目曝光度,谢谢!软件完全免费,只求大家Star和宣传给其他需要的朋友,谢谢!

If you think this tool is good, please gently click the **Star** button in the upper right corner at this page to increase the project exposure, thank you!
If you think this tool is good, please gently click the **Star** button in the upper right corner at this page to increase the project exposure, thank you! The software is completely free, only ask everyone to Star and promote it to other friends in need, thank you!

# EasySpider: Visual Code-Free Web Crawler

Expand Down Expand Up @@ -37,14 +37,25 @@ Bilibili/B站视频教程:

[如何无代码可视化的爬取需要登录才能爬的网站 - 知乎网站案例](https://www.bilibili.com/video/BV1HV4y1r7v8)

[如何爬需要输入验证码的网站](https://www.bilibili.com/video/BV18c411K7FH)

[如何切换IP池和使用隧道IP - 打开详情页采集案](https://www.bilibili.com/video/BV1KT411t79n)
[【重要】自定义条件判断之使用循环项内的JS命令返回值 - 第二弹](https://www.bilibili.com/video/BV1mu411x7Nn/)

[流程图执行逻辑解析 - 58同城房源描述采集案例](https://www.bilibili.com/video/BV1YL411z7uW)

[MacOS系统设计和执行eBay网站爬虫任务教程](https://www.bilibili.com/video/BV1WL411h71r)

[如何执行自己写的JS代码和系统代码 (自定义操作)](https://www.bilibili.com/video/BV1qs4y1z7Hc/)

[如何自定义循环和判断条件 - 第一弹](https://www.bilibili.com/video/BV1Ys4y1z777/)

[如何对元素和网页截图及命令行执行指南](https://www.bilibili.com/video/BV1dV4y1z764/)

[OCR识别元素内容功能](https://www.bilibili.com/video/BV1xz4y1b72D/)

[如何爬需要输入验证码的网站](https://www.bilibili.com/video/BV18c411K7FH)

[如何切换IP池和使用隧道IP - 打开详情页采集案](https://www.bilibili.com/video/BV1KT411t79n)


Refer to [Youtube Playlist](https://youtube.com/playlist?list=PL0kEFEkWrT7mt9MUlEBV2DTo1QsaanUTp) to see the video tutorials of EasySpider.

## 声明/Declaration
Expand Down
2 changes: 1 addition & 1 deletion Releases/EasySpider_windows_amd64/config.json
Original file line number Diff line number Diff line change
@@ -1 +1 @@
{"webserver_address":"http:https://localhost","webserver_port":8074,"user_data_folder":"./user_data1","absolute_user_data_folder":"D:\\Documents\\Projects\\EasySpider\\Releases\\EasySpider_windows_amd64\\user_data1"}
{"webserver_address":"http:https://localhost","webserver_port":8074,"user_data_folder":"./user_data","absolute_user_data_folder":"D:\\Documents\\Projects\\EasySpider\\Releases\\EasySpider_windows_amd64\\user_data1"}

0 comments on commit 3529ec4

Please sign in to comment.