Crawling-for-dummies

a chrome-addons that allows user-friendly approach to web crawling.

Problem Statement

Crawlers that currently exist are hardly accessible and require professional coding, making it difficult for non-major to use.
In addition, the existing crawler program has the disadvantage that it is difficult to grasp at once because the UI is not intuitive. Therefore, we want to improve UI and UX so that users who have nothing to do with coding can easily use crawling programs.

Mission Statement

Crawling-for-dummies develops open-source software for web crawling service.
Crawling-for-dummies is a chrome-addons that allows user-friendly approach to web crawling.
It is design to easy-to-use software that can be used with little knowledge. Features List:
Ability to extract a specific elements using the element inspection tool
Ability to select a selector when a mouse click event occurs on an element
Ability to select the desired elements from the entire DOM with a javascript selector (href, innertext, etc.)
Ability to list selected elements and extract them as excel, json, csv file.
Selects elements from webpage view
Extracts data from selected elements (.csv, .json)
Supports multi-page crawling feature

What is web crawling?

Web crawling is the process of indexing information on a website using automated scripts or programs. These programs are called by various names, including Web crawlers, Spider, Spider-bot, and Crawlers.

Why web crawling is IMPORTANT?

In 2013, IBM announced that approximately 90% of the world’s data was generated in the last two years and that the rate of data generation doubled every two years. However, most data is unstructured and Web crawlers index vast amounts of unstructured data to help search engines find the information they want to find. Indexed data also plays an important role in various fields - data science projects.

In our project…

Crawlers that currently exist are hardly accessible and require professional coding, making it difficult for non-major to use. In addition, the existing crawler program has the disadvantage that it is difficult to grasp at once because the UI is not intuitive. Therefore, we want to improve UI and UX so that users who have nothing to do with coding can easily use crawling programs.

Target Development Language

Technology stack: Node.js, Manifest V3
Supports Windows 10 / Mac OS Environment
dependency manager: Yarn, NPM
development tool: VSCode, Chrome
version control system: Git, Docker
project management tool: discord, notion
project document: read the docs
Enough disk space to hold the crawling data

How to install

Download Crawling-for-Dummies from our Github page, website, or click here to download, and unpack.
At Chrome browser, Visit the Chrome Extension Setting or paste chrome:https://extensions/ at address bar
- you can find Chrome Extension Setting at 3 dots at right side of address bar > More tools > Extension
Turn on Developer mode feature at top right
Click Load unpacked button at top left
Select src folder from the download file
If Crawling-for-Dummies is added, you are good to go!

How to use

Go to the page that you want to get data
Click puzzle-shaped button called extension at the right side of address bar
Select Crawling-for-Dummies and the pop-up will appear
Turn on the Selecting Mode and click the element that you want
Paste the web page url to URL blank
- If you want the data form multiple pages, Find the "page=" or "p=" in the URL, and changes the following number with {startingpage:lastpage} for example, {1:10} means from 1 to 10 pages
Click CRAWL button and Wait
It will start to download the crawled data as .json file

Member Introduction

김남권(16011356)
배경준(15011048)
고민석(15011027)
황예원(19011705)

Development:
김남권(16011356), 배경준 (15011048),
고민석 (15011027), 황예원(19011705)
SCM ( git ) : 배경준(15011048)
Configuration (manifest, etc...) : 황예원(19011705)
CI/CD ( Docker, k8s ) : 고민석(15011027)
Documentation ( read the docs, markdown ) : 김남권(16011356)

Contact

Discord server

On the Crawling-for-dummies Discord server, you can chat with members of the Crawling-for-dummies community in real time. You'll meet Crawling-for-dummies users, contributors, and developer advocates. This is a great place to stop in for quick questions or to share your latest Crawling-for-dummies discoveries.

Mailing list

Join the Crawling-for-dummies mailing list to discuss the ongoing development of Crawling-for-dummies and to find out about new Crawling-for-dummies releases.

License

This work is published under Apache-2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.github		.github
docs		docs
src		src
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Crawling-for-dummies

Problem Statement

Mission Statement

What is web crawling?

Why web crawling is IMPORTANT?

In our project…

Target Development Language

How to install

How to use

Member Introduction

Contact

Discord server

Mailing list

License

About

Releases 2

Packages

Contributors 4

Languages

License

backgroundjun/Crawling-for-dummies

Folders and files

Latest commit

History

Repository files navigation

Crawling-for-dummies

Problem Statement

Mission Statement

What is web crawling?

Why web crawling is IMPORTANT?

In our project…

Target Development Language

How to install

How to use

Member Introduction

Contact

Discord server

Mailing list

License

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 4

Languages

Packages