Skip to content

Implementing Docker Hub Scraper using Python, Selenium and Docker Desktop

Notifications You must be signed in to change notification settings

intellek/hubscraper

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

stars forks Discord issues Visitor count Twitter

Hubscraper - Scraping Docker Hub for Official Images and Extensions

A Docker image that scraps Docker Hub official and verfied Images.

A Handy Python script for web scraping dynamically the Docker Hub website. This script is capable of fetching a list of Docker Extensions from the Docker Hub

Getting Started

Pre-requisite

Buiding it locally

Clone the repository

git clone https://github.com/collabnix/hubscraper/

Install the required modules

pip3 install -r requirements.txt

Modify the script

Go to Line 17 and make the necessary changes:

# Change the base_dir with your path.
base_dir = '/Users/ajeetraina/Downloads' + os.sep

# MS Edge Driver
# driver = webdriver.Edge(service=Service(EdgeChromiumDriverManager().install()))

# Safari Driver


csv_file = open('results.csv', 'w')

# create the csv writer
writer = csv.writer(csv_file)

writer.writerow(['Image Name','Downloads','Stars'])




driver = webdriver.Chrome(executable_path = "/Users/ajeetraina/Downloads/chromedriver\ 3")

Execute the script

python3 scraper.py

image

Building with Docker

git clone https://github.com/collabnix/hubscraper/
docker build -t ajeetraina/hubscraper .

Running the Hubscraper in a Docker container

docker run --platform=linux/amd64 -it -w /app -v $(pwd):/app ajeetraina/scraperhubb bash
root@960e8b9fa2c2:/usr/workspace# python scraper.py 
[WDM] - Downloading: 100%|███████████████████████████████████████████████████████████████| 6.96M/6.96M [00:00<00:00, 8.90MB/s]

About

Implementing Docker Hub Scraper using Python, Selenium and Docker Desktop

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 87.2%
  • Dockerfile 12.8%