Simple-Web-mining.-Beginner-level.

Here we study how easy to extract data from websites. Beginner level.

First step

Here we will unload data from the https://pudding.cool/ which will contain title, author and description of articles.

We will use Pandas for data manipulating, Urllib3 is used to open URLs and the Beautiful Soup package is used to extract data from html files.

And we have this table as a result of the first step:

Second step

Here we will unload data from the https://www.work.ua/jobs-kyiv-data+analyst/ which will contain job title and hiring company. In order to get all the data on request from this site, we will have to upload data from several pages:

And we have csv file as a result of the second step:

Third step

What if we want to gather information from the inner part of articles?

To do it we need to gather links of these articles and than go through them to gather inner information.

Study 3rd step to learn how to do it.

As a result we have titles and time from the inner side of articles:

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
1st step (web_mining).ipynb		1st step (web_mining).ipynb
2nd step(web_mining).ipynb		2nd step(web_mining).ipynb
3rd step updated(web_mining).ipynb		3rd step updated(web_mining).ipynb
3rd step(web_mining).ipynb		3rd step(web_mining).ipynb
README.md		README.md
sample.ipynb		sample.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simple-Web-mining.-Beginner-level.

First step

Second step

Third step

About

Releases

Packages

Languages

nika999/Simple-Web-mining.-Beginner-level

Folders and files

Latest commit

History

Repository files navigation

Simple-Web-mining.-Beginner-level.

First step

Second step

Third step

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages