#

crawlers

Here are 152 public repositories matching this topic...

g1879 / DrissionPage

基于python的网页自动化工具。既能控制浏览器，也能收发数据包。可兼顾浏览器自动化的便利性和requests的高效率。功能强大，内置无数人性化设计和便捷功能。语法简洁而优雅，代码量少。

requests crawlers automation-framework selenium-python

Updated Aug 13, 2024
Python

isbot

omrilotan / isbot

🤖/👨‍🦰 Detect bots/crawlers/spiders using the user agent string

user-agent user-agent-parser user-agent-analysis crawlers web-crawlers

Updated Aug 19, 2024
TypeScript

flathunters / flathunter

A bot to help people with their rental real-estate search. 🏠🤖

real-estate python telegram-bot selenium chromedriver crawlers wg-gesucht 2captcha idealista realestate rental immobilienscout24 mattermost-bot immowelt subito imagetyperz kleinanzeigen immobiliare vrmimmo

Updated Aug 16, 2024
HTML

ai.robots.txt

ai-robots-txt / ai.robots.txt

A list of AI agents and robots to block.

privacy ai crawling crawlers

Updated Aug 28, 2024
Python

salimk / Rcrawler

An R web crawler and scraper

crawler scraper r webscraper crawlers webcrawler webscraping webscrapping rpackage

Updated Mar 27, 2022
R

StJudeWasHere / seonaut

Open source SEO auditing tool.

go docker golang crawler web docker-compose seo crawling audit multiuser seotools crawlers search-engine-optimization seo-audit crawlergo

Updated Aug 19, 2024
Go

Norconex / crawlers

Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines.

java search-engine crawler flexible web-crawler crawlers filesystem-crawler collector-http collector-fs

Updated Aug 28, 2024
Java

Proxy-List-Scrapper

narkhedesam / Proxy-List-Scrapper

Proxy List Scrapper

Updated Feb 1, 2023
Python

ArchiveTeam / wget-lua

Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.

crawler scraper downloader spider lua ftp scraping crawling archiving wget crawl zstd crawlers warc webarchiving archiveteam wget-lua

Updated Aug 19, 2024
C

jonasjacek / robots.txt

Simple robots.txt template. Keep unwanted robots out (disallow). White lists (allow) legitimate user-agents. Useful for all websites.

search-engine whitelist user-agent seo crawling twitterbot robots-txt googlebot crawlers web-crawling bingbot robots-exclusion-standard blocking-bots web-robots search-engine-optimization baiduspider

Updated Feb 18, 2024

behitek / social-scraper

Vietnamese text data crawler scripts for various sites (including Youtube, Facebook, 4rum, news, ...)

instagram crawler scraper youtube requests crawlers scraping-websites crawling-framework selenium-python

Updated Oct 25, 2022
Python

howie6879 / hproxy

hproxy - Asynchronous IP proxy pool, aims to make getting proxy as convenient as possible.(异步爬虫代理池)

crawler schedule proxy sanic asyncio crawlers proxy-pool proxy-spider hproxy

Updated Dec 13, 2021
Python

Potelo / laravel-block-bots

Block crawlers and high traffic users on your site by IP using Redis

laravel bots crawlers scrapper

Updated Sep 24, 2023
PHP

flulemon / sneakpeek

Sneakpeek is a framework that helps to quickly and conviniently develop scrapers. It’s the best choice for scrapers that have some specific complex scraping logic that needs to be run on a constant basis

python crawler scraper vue scraping crawling python3 scrapers scraper-engine crawlers crawling-framework website-crawler scraping-framework crawler-python scraper-api crawling-engine

Updated Aug 19, 2023
Python

BaseMax / GooglePlayWebServiceAPI

Tiny script to crawl information of a specific application in the Google play/store base on PHP.

api php crawler google-play google-play-services crawlers hacktoberfest php-crawler google-play-store google-play-games google-play-service google-playstore hacktoberfest2020 google-play-api crawler-php

Updated May 21, 2023
PHP

peterbencze / serritor

Serritor is an open source web crawler framework built upon Selenium and written in Java. It can be used to crawl dynamic web pages that require JavaScript to render data.

Updated Jul 7, 2022
Java

herrbischoff / user-agents

User agent database in JSON format of bots, crawlers, certain malware, automated software, scripts and uncommon ones.

json data automation database user-agent bots malware crawlers

Updated Nov 22, 2020
Shell

zcrawl / zcrawl

An open source web crawling platform

golang scraping crawling crawlers web-crawling webcrawling

Updated May 6, 2018
Go

p0dalirius / crawlersuseragents

Python script to check if there is any differences in responses of an application when the request comes from a search engine's crawler.

crawler web user-agent tool request bugbounty crawlers pentest

Updated Oct 1, 2023
Python

delvelabs / htcap

htcap is a web application scanner able to crawl single page application (SPA) in a recursive manner by intercepting ajax calls and DOM changes.

crawler security-audit single-page-app scanner crawling single-page-applications crawlers scanning vulnerability-detection vulnerability-scanners javascript-crawler

Updated Mar 21, 2018
Python

Improve this page

Add a description, image, and links to the crawlers topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the crawlers topic, visit your repo's landing page and select "manage topics."