GitHub - program-in-chinese/ChromeCrawlerWildSpider: 网页爬虫: Chrome插件，在Chrome浏览器同时加载多个页面并抓取内容.

Chrome extension in webstore: https://chrome.google.com/webstore/detail/wild-spider/aanpchnfojihjddlocpgoekffmjkhbbe

#WATCH OUT: more tabs you use, more computer resources (CPU, memory) will be used, and saving each page costs a bit disk (in IndexedDB, accessible from Chrome Extensions -> Wild Spider, Inspect views: background page)to save the content.

The "spider" works in this way:

1. The current url is used as the starting point, and it's loaded again in a new tab.
1. After this page is loaded, fetch all the links on the page.
1. Get all the links on the page, including relative urls.
1. Save the text content of the page. Open the extracted link parallelly in all the tabs used (by default 3, set in eventPage).
1. repeat 2-4

控制部分主要用中文编写: eventPage.js

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
screenshots		screenshots
Dexie.js		Dexie.js
README.md		README.md
content.js		content.js
eventPage.js		eventPage.js
htmlparser2.js		htmlparser2.js
icon.png		icon.png
manifest.json		manifest.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

program-in-chinese/ChromeCrawlerWildSpider

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages