Skip to content

aftandilmmd/turboaz_crawler

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

turboaz_crawler

The script collects as many data as possible from turbo.az website which mostly has car advertisements on it. This script can be used by people who want to collect same data and play with it.

The script consists of two main function extract_item() and parse_inner():

  1. extract_item() This function takes items on pagination pages. Like items in first page, items in second page and so on.
  2. parse_inner() This function takes item url and parse inner page of that one item.

It would be esay if you use this two function separately. However, you still able to use these function togather and since both function return dictionary we can easly concat them by using "car.update(self.parse_inner(passed_url))" which you will see in extract_item() function.

I collected data into mongodb since there is some data that cannot be structured such as phone numbers. In some ads there were more than one phone number which is basicly problematic to store in structured database. You can still wor around and by changing some parts you can store it in sql structured database or even in excel file.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%