Skip to content

tienlonghungson/IT-Jobs-TopCV-Crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

IT Job TopCV Crawler

Overview

  • This repo crawls data about IT jobs from TopCV.vn (IT Jobs category)
  • Data can be crawled from a specific webpage or consecutive webpages

Requirements:

  • requests
  • beautifulsoup4

How to run

Custom run

  • In bash shell, type python3 crawler.py a b, where a, b are the index of webpage
  • This command will crawl data from consecutive webpages from page a to page b

Default Run

  • Use run.sh to start crawling
  • This bash scipt will execute simultaneously 14 thread
  • Each thread crawl data from 10 consecutive pages (1-9,10-19,20-29,...) and save to file naming recruit_a_b.json (so there are 14 files after all)

Result

Data is stored in this repo

Author:

About

Crawl data about IT Jobs from TopCV.vn

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published