Skip to content

Commit

Permalink
added info on anti-bot
Browse files Browse the repository at this point in the history
  • Loading branch information
pigivinci committed Jul 6, 2023
1 parent d068ea0 commit c0d4a8d
Show file tree
Hide file tree
Showing 6 changed files with 24 additions and 11 deletions.
6 changes: 5 additions & 1 deletion Pages/5.Antibot/Akamai.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,11 @@ Use [Wappalyzer Chrome Extension](https://github.com/reanalytics-databoutique/we

### Recommended approach to Akamai Bot Manager

**BEST CHOICE**: a standard configuration of Akamai requires a good proxy rotation to be beaten, there's no need of a fully rendered browser
**BEST CHOICE**: a standard configuration of Akamai requires a good proxy rotation to be beaten, there's no need of a fully rendered browser. If there's no need to login, rotating datacenter proxies are usually enough.

### How yo bypass Akamai according to The Web Scraping Club
[Posts about Akamai](https://substack.thewebscraping.club/t/akamai)


### Reference and interesting links
[Official web page](https://www.akamai.com/products/bot-manager)
Expand Down
6 changes: 4 additions & 2 deletions Pages/5.Antibot/Cloudflare.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,10 @@ Use [Wappalyzer Chrome Extension](https://github.com/reanalytics-databoutique/we
### Recommended approach to Cloudflare Bot Management
**BEST CHOICE**: Each website can be configured with different degrees of protection. The best approach is using [Playwright](https://github.com/reanalytics-databoutique/webscraping-open-doc/blob/main/Pages/Tools/Playwright.md) + a privacy focused browser like Brave or antidetect browser like Gologin.

A good solution, still to be tested by our side, is to find the IP address of the web server of the target website and then scrape from there. An updated version of the solution techniques with code can be found on [The Web Scraping Club](https://substack.thewebscraping.club "The Web Scraping Club").
A good solution, still to be tested by our side, is to find the IP address of the web server of the target website and then scrape from there.

### How yo bypass Cloudflare according to The Web Scraping Club
[Posts about Cloudflare](https://substack.thewebscraping.club/t/cloudflare)

### Reference and interesting links
[Official web page](https://www.cloudflare.com/en-gb/products/bot-management/)
Expand All @@ -24,4 +27,3 @@ A good solution, still to be tested by our side, is to find the IP address of th

[High level description](https://www.zenrows.com/blog/bypass-cloudflare#what-is-cloudflare-bot-management)

[List of articles on The Web Scraping Club](https://substack.thewebscraping.club/t/cloudflare)
7 changes: 5 additions & 2 deletions Pages/5.Antibot/Datadome.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,12 @@
Use [Wappalyzer Chrome Extension](https://github.com/reanalytics-databoutique/webscraping-open-doc/blob/0386528f99a1209a538f6d042e859cd9933011c8/Pages/Tools/Wappalyzer.md)

### Recommended approach to Datadome
**BEST CHOICE**: [Playwright](https://github.com/reanalytics-databoutique/webscraping-open-doc/blob/main/Pages/Tools/Playwright.md) + Brave browser is a good solution. An updated version of the solution techniques with code can be found on [The Web Scraping Club](https://substack.thewebscraping.club "The Web Scraping Club").
**BEST CHOICE**: [Playwright](https://github.com/reanalytics-databoutique/webscraping-open-doc/blob/main/Pages/Tools/Playwright.md) + Brave browser is a good solution.

### How yo bypass Datadome according to The Web Scraping Club
[Posts about Datadome](https://substack.thewebscraping.club/t/datadome)

### Reference and interesting links
[Official web page](https://datadome.co/)
[Tests made with online tools](https://blog.vanila.io/how-strong-is-the-datadome-5e9ff211384e)
[List of articles on The Web Scraping Club](https://substack.thewebscraping.club/t/datadome)

6 changes: 4 additions & 2 deletions Pages/5.Antibot/Kasada.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,11 @@ Unluckily [Wappalyzer Chrome Extension](https://github.com/reanalytics-databouti
The first request to the website returns a 429 error (visible only from the Network inspection in the browser's developer tools), then redirect to the same page that works properly. This second request added some elements in the response headers like "x-kpsdk-ct"

### Recommended approach to Kasada
**BEST CHOICE**: at the moment, the best approach is a [Playwright](https://github.com/reanalytics-databoutique/webscraping-open-doc/blob/main/Pages/Tools/Playwright.md) using Firefox with the right flags. An updated version of the solution techniques with code can be found on [The Web Scraping Club](https://substack.thewebscraping.club "The Web Scraping Club").
**BEST CHOICE**: at the moment, the best approach is a [Playwright](https://github.com/reanalytics-databoutique/webscraping-open-doc/blob/main/Pages/Tools/Playwright.md) using Firefox with the right flags.

### How yo bypass Kasada according to The Web Scraping Club
[Posts about Kasada](https://substack.thewebscraping.club/t/kasada)

### Reference and interesting links
[Official web page](https://www.kasada.io/)
[List of articles on The Web Scraping Club](https://substack.thewebscraping.club/t/kasada)

4 changes: 2 additions & 2 deletions Pages/5.Antibot/PerimeterX.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,10 @@ During the execution of the scraper it happens, after some pages, that a challen

**BEST CHOICE**: [Playwright](https://github.com/reanalytics-databoutique/webscraping-open-doc/blob/main/Pages/Tools/Playwright.md) + Firefox

An updated version of the solution techniques with code can be found on [The Web Scraping Club](https://substack.thewebscraping.club "The Web Scraping Club").
### How yo bypass PerimeterX according to The Web Scraping Club
[Posts about PerimeterX](https://substack.thewebscraping.club/t/perimeterx)

### Reference and interesting links
[Official web page](https://www.perimeterx.com/products/bot-defender)
[How Perimeterx works](https://www.trickster.dev/post/how-does-perimeterx-bot-defender-work/)
[List of articles on The Web Scraping Club](https://substack.thewebscraping.club/t/perimeterx)

6 changes: 4 additions & 2 deletions Pages/5.Antibot/Shape.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,11 @@
[Wappalyzer Chrome Extension](https://github.com/reanalytics-databoutique/webscraping-open-doc/blob/0386528f99a1209a538f6d042e859cd9933011c8/Pages/Tools/Wappalyzer.md) doesn't seem to recognize it, we've noticed that certain websites protected by Shape, if opened by a browser in incognito mode and with developer tools tab opened, they stop to work. Closing the developer tools tab, they work again.

### Recommended approach to Shape Bot Defence
**BEST CHOICE**: at the moment, the best approach is a [Playwright](https://github.com/reanalytics-databoutique/webscraping-open-doc/blob/main/Pages/Tools/Playwright.md) + Firefox with the right options. An updated version of the solution techniques with code can be found on [The Web Scraping Club](https://substack.thewebscraping.club "The Web Scraping Club").
**BEST CHOICE**: at the moment, the best approach is a [Playwright](https://github.com/reanalytics-databoutique/webscraping-open-doc/blob/main/Pages/Tools/Playwright.md) + Firefox with the right options.

### How yo bypass Shape according to The Web Scraping Club
[Posts about Shape](https://substack.thewebscraping.club/t/shape)

### Reference and interesting links
[Shape Bot Defence](https://www.f5.com/cloud/products/bot-defense)
[List of articles on The Web Scraping Club](https://substack.thewebscraping.club/t/shape)

0 comments on commit c0d4a8d

Please sign in to comment.