webscraper__TRITONproject

FIRST STEP Install the packages specified in a requirements.txt.

All you need to do is open your command prompt or terminal, navigate to the directory where the requirements.txt file is located, and type ==> pip install -r requirements.txt.

This command will install all the packages listed in the file for you. It's a convenient way to ensure that all the required packages are installed before you run your project.

This project is a web scraping tool that extracts data from the Triton website related to vacuum cleaners and pressure washers. The tool uses the Scrapy framework and Scrapy-Splash to enable JavaScript rendering. The data is extracted from the Triton website and stored in a CSV and Excel files using the ScrapyPyXlsx library.

The VacuumCleanersSpider is the main spider that crawls the Triton website and extracts data from multiple pages. It starts by sending a request to each of the URLs specified in the start_urls list. The start_requests method sends a request using Scrapy-Splash, which allows JavaScript rendering. The parse method extracts data from the first page of each URL, including product title, price, availability, and link to the product page. It then sends a request to each product page to extract additional details, such as product code.

Finally, the parse_product_page method uses an item loader to store the extracted data in a TritonprojectItem object, which is then saved in an Excel file using ScrapyPyXlsx.

The main driver section runs the spider by creating a CrawlerProcess and calling process.crawl(VacuumCleanersSpider) to start the spider. The process.start() method then runs the spider, which starts the scraping process.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
TritonProject		TritonProject
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

webscraper__TRITONproject

About

Releases

Packages

Languages

razvanchirilov/webscraper__TRITONproject

Folders and files

Latest commit

History

Repository files navigation

webscraper__TRITONproject

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages