Important /!\ : This project is now integrated within the VahenWebsite project (https://github.com/Vahen/VahenWebsite) as a submodule of the project. The developpement will be on the website rather than here

WebCrawler

A little webCrawler in Python, using Beautifoul Soup 4

How to install

Install pip (or any package manager):
- Installation :
  - (Windows)
    - How to Install pip:
      - Download https://raw.github.com/pypa/pip/master/contrib/get-pip.py.
      - Remember to save it as "get-pip.py"
      - Now go to the download folder. Right click on get-pip.py then open with python.exe.
      - You can add system variable by (by doing this you can use pip and easy_install without specifying path)
        
        1 Clicking on Properties of My Computer
        
        2 Then chose Advanced System Settings
        
        3 Click on Advanced Tab
        
        4 Click on Environment Variables
        
        5 From System Variables >>> select variable path.
        
        6 Click edit then add the following lines at the end of it :
        
        ;c:\Python27;c:\Python27\Scripts
        
        (please dont copy this, just go to your python directory and copy the paths similar to this)
      - NB:- you have to do this once only.
  - (Linux)
    - Instructions are here https://pip.pypa.io/en/stable/installing/
    - Below is a shorter version for a quick install
    - Download : https://bootstrap.pypa.io/get-pip.py (or the previous link)
    - run "python get-pip.py"
- Upgrading:
  - (Windows)
    - python -m pip install -U pip
  - (Linux)
    - pip install -U pip
Install BeautifulSoup : pip install beautifulsoup4
And you are ready to use it :)
Just launch the python script and follow the instructions.
/!\ This script doesn't work if you need to log in the website to download elements /!\
/!\ It's only supports direct links in the website /!\
If you find any bugs feel free to tell me about them :)

You can try it on repl.it :

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
Tests		Tests
.replit		.replit
README.md		README.md
WebCrawlerUpdate.py		WebCrawlerUpdate.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Important /!\ : This project is now integrated within the VahenWebsite project (https://github.com/Vahen/VahenWebsite) as a submodule of the project. The developpement will be on the website rather than here

WebCrawler

How to install

About

Releases

Packages

Languages

Lixinae/WebCrawler

Folders and files

Latest commit

History

Repository files navigation

Important /!\ : This project is now integrated within the VahenWebsite project (https://github.com/Vahen/VahenWebsite) as a submodule of the project. The developpement will be on the website rather than here

WebCrawler

How to install

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages