Here we study how easy to extract data from websites. Beginner level.
Here we will unload data from the https://pudding.cool/ which will contain title, author and description of articles.
We will use Pandas for data manipulating, Urllib3 is used to open URLs and the Beautiful Soup package is used to extract data from html files.
And we have this table as a result of the first step:
Here we will unload data from the https://www.work.ua/jobs-kyiv-data+analyst/ which will contain job title and hiring company. In order to get all the data on request from this site, we will have to upload data from several pages:
And we have csv file as a result of the second step:
What if we want to gather information from the inner part of articles?
To do it we need to gather links of these articles and than go through them to gather inner information.
Study 3rd step to learn how to do it.
As a result we have titles and time from the inner side of articles: