A Docker image that scraps Docker Hub official and verfied Images.
A Handy Python script for web scraping dynamically the Docker Hub website. This script is capable of fetching a list of Docker Extensions from the Docker Hub
- Install Python 3.9+
- Download Chrome Driver
git clone https://github.com/collabnix/hubscraper/
pip3 install -r requirements.txt
Go to Line 17 and make the necessary changes:
# Change the base_dir with your path.
base_dir = '/Users/ajeetraina/Downloads' + os.sep
# MS Edge Driver
# driver = webdriver.Edge(service=Service(EdgeChromiumDriverManager().install()))
# Safari Driver
csv_file = open('results.csv', 'w')
# create the csv writer
writer = csv.writer(csv_file)
writer.writerow(['Image Name','Downloads','Stars'])
driver = webdriver.Chrome(executable_path = "/Users/ajeetraina/Downloads/chromedriver\ 3")
python3 scraper.py
git clone https://github.com/collabnix/hubscraper/
docker build -t ajeetraina/hubscraper .
docker run --platform=linux/amd64 -it -w /app -v $(pwd):/app ajeetraina/scraperhubb bash
root@960e8b9fa2c2:/usr/workspace# python scraper.py
[WDM] - Downloading: 100%|███████████████████████████████████████████████████████████████| 6.96M/6.96M [00:00<00:00, 8.90MB/s]