Skip to content

Lixinae/WebCrawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Important /!\ : This project is now integrated within the VahenWebsite project (https://github.com/Vahen/VahenWebsite) as a submodule of the project. The developpement will be on the website rather than here

WebCrawler

A little webCrawler in Python, using Beautifoul Soup 4

How to install

  • Install pip (or any package manager):

    • Installation :

      • (Windows)

        • How to Install pip:
          • Download https://raw.github.com/pypa/pip/master/contrib/get-pip.py.
          • Remember to save it as "get-pip.py"
          • Now go to the download folder. Right click on get-pip.py then open with python.exe.
          • You can add system variable by (by doing this you can use pip and easy_install without specifying path)
            • 1 Clicking on Properties of My Computer
            • 2 Then chose Advanced System Settings
            • 3 Click on Advanced Tab
            • 4 Click on Environment Variables
            • 5 From System Variables >>> select variable path.
            • 6 Click edit then add the following lines at the end of it :
              • ;c:\Python27;c:\Python27\Scripts
            • (please dont copy this, just go to your python directory and copy the paths similar to this)
          • NB:- you have to do this once only.
      • (Linux)

    • Upgrading:

      • (Windows)
        • python -m pip install -U pip
      • (Linux)
        • pip install -U pip
  • Install BeautifulSoup : pip install beautifulsoup4

  • And you are ready to use it :)

  • Just launch the python script and follow the instructions.

  • /!\ This script doesn't work if you need to log in the website to download elements /!\

  • /!\ It's only supports direct links in the website /!\

  • If you find any bugs feel free to tell me about them :)

You can try it on repl.it : Run on Repl.it

About

A little webCrawler in Python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages