update anti-bot techniques

TheWebScrapingClub · Jun 10, 2023 · b9a8299 · b9a8299
1 parent fec5a13
commit b9a8299
Show file tree

Hide file tree

Showing 5 changed files with 16 additions and 9 deletions.
diff --git a/Pages/Antibot/Cloudflare.md b/Pages/Antibot/Cloudflare.md
@@ -1,17 +1,17 @@
 # Cloudflare Bot Management
 
 ## What is Cloudflare Bot Management?
-[Akamai Bot Manager ](https://www.akamai.com/products/bot-manager "Akamai") detect bots using device fingerprinting bot signatures and ip checks.
+[Cloudflare Bot Management ](https://www.cloudflare.com/products/bot-management/ "Cloudflare") detect bots using device fingerprinting bot signatures and ip checks.
 
 ## Our View on Cloudflare Bot Management
 
 ### How to Identify Cloudflare Bot Management
 Use [Wappalyzer Chrome Extension](https://github.com/reanalytics-databoutique/webscraping-open-doc/blob/0386528f99a1209a538f6d042e859cd9933011c8/Pages/Tools/Wappalyzer.md)
 
 ### Recommended approach to Cloudflare Bot Management
-**BEST CHOICE**: Depends from the configuration of the single website, but [Playwright](https://github.com/reanalytics-databoutique/webscraping-open-doc/blob/main/Pages/Tools/Playwright.md) + [Stealth](https://github.com/reanalytics-databoutique/webscraping-open-doc/blob/main/Pages/Tools/Playwright_stealth.md) are usually enough for scraping.
+**BEST CHOICE**: Each website can be configured with different degrees of protection. The best approach is using [Playwright](https://github.com/reanalytics-databoutique/webscraping-open-doc/blob/main/Pages/Tools/Playwright.md) + a privacy focused browser like Brave or antidetect browser like Gologin.
 
-A good solution, still to be tested by our side, is to find the IP address of the web server of the target website and then scrape from there.
+A good solution, still to be tested by our side, is to find the IP address of the web server of the target website and then scrape from there. An updated version of the solution techniques with code can be found on [The Web Scraping Club](https://substack.thewebscraping.club "The Web Scraping Club").
 
 ### Reference and interesting links
 [Official web page](https://www.cloudflare.com/en-gb/products/bot-management/)
@@ -22,4 +22,6 @@ A good solution, still to be tested by our side, is to find the IP address of th
 
 [Firefox appears to be flagged as suspicious from Cloudflare](https://brianlovin.com/hn/31459258)
 
-[High level description](https://www.zenrows.com/blog/bypass-cloudflare#what-is-cloudflare-bot-management)
+[High level description](https://www.zenrows.com/blog/bypass-cloudflare#what-is-cloudflare-bot-management)
+
+[List of articles on The Web Scraping Club](https://substack.thewebscraping.club/t/cloudflare)
diff --git a/Pages/Antibot/Datadome.md b/Pages/Antibot/Datadome.md
@@ -9,9 +9,9 @@
 Use [Wappalyzer Chrome Extension](https://github.com/reanalytics-databoutique/webscraping-open-doc/blob/0386528f99a1209a538f6d042e859cd9933011c8/Pages/Tools/Wappalyzer.md)
 
 ### Recommended approach to Datadome
-**BEST CHOICE**: [Playwright](https://github.com/reanalytics-databoutique/webscraping-open-doc/blob/main/Pages/Tools/Playwright.md) + [Stealth](https://github.com/reanalytics-databoutique/webscraping-open-doc/blob/main/Pages/Tools/Playwright_stealth.md) are usually enough for scraping.
-
+**BEST CHOICE**: [Playwright](https://github.com/reanalytics-databoutique/webscraping-open-doc/blob/main/Pages/Tools/Playwright.md) + Brave browser is a good solution. An updated version of the solution techniques with code can be found on [The Web Scraping Club](https://substack.thewebscraping.club "The Web Scraping Club").
 ### Reference and interesting links
 [Official web page](https://datadome.co/)
 [Tests made with online tools](https://blog.vanila.io/how-strong-is-the-datadome-5e9ff211384e)
+[List of articles on The Web Scraping Club](https://substack.thewebscraping.club/t/datadome)
 
diff --git a/Pages/Antibot/Kasada.md b/Pages/Antibot/Kasada.md
@@ -10,8 +10,9 @@ Unluckily [Wappalyzer Chrome Extension](https://github.com/reanalytics-databouti
 The first request to the website returns a 429 error (visible only from the Network inspection in the browser's developer tools), then redirect to the same page that works properly. This second request added some elements in the response headers like "x-kpsdk-ct"
 
 ### Recommended approach to Kasada
-**BEST CHOICE**: at the moment, the best approach is a [Playwright](https://github.com/reanalytics-databoutique/webscraping-open-doc/blob/main/Pages/Tools/Playwright.md) + [Stealth](https://github.com/reanalytics-databoutique/webscraping-open-doc/blob/main/Pages/Tools/Playwright_stealth.md) but result can depend from the hardware where the scraper is executed.
+**BEST CHOICE**: at the moment, the best approach is a [Playwright](https://github.com/reanalytics-databoutique/webscraping-open-doc/blob/main/Pages/Tools/Playwright.md) using Firefox with the right flags. An updated version of the solution techniques with code can be found on [The Web Scraping Club](https://substack.thewebscraping.club "The Web Scraping Club").
 
 ### Reference and interesting links
 [Official web page](https://www.kasada.io/)
+[List of articles on The Web Scraping Club](https://substack.thewebscraping.club/t/kasada)
 
diff --git a/Pages/Antibot/PerimeterX.md b/Pages/Antibot/PerimeterX.md
@@ -12,9 +12,12 @@ Use [Wappalyzer Chrome Extension](https://github.com/reanalytics-databoutique/we
 ### Recommended approach to PerimeterX
 During the execution of the scraper it happens, after some pages, that a challenge like the one in the picture is triggered, blocking the execution. It's needed a fully browser to not trigger the captcha, adding some random movement of the mouse and timers before moving to another page.
 
-**BEST CHOICE**: [Playwright](https://github.com/reanalytics-databoutique/webscraping-open-doc/blob/main/Pages/Tools/Playwright.md) + [Stealth](https://github.com/reanalytics-databoutique/webscraping-open-doc/blob/main/Pages/Tools/Playwright_stealth.md)
+**BEST CHOICE**: [Playwright](https://github.com/reanalytics-databoutique/webscraping-open-doc/blob/main/Pages/Tools/Playwright.md) + Firefox
+
+An updated version of the solution techniques with code can be found on [The Web Scraping Club](https://substack.thewebscraping.club "The Web Scraping Club").
 
 ### Reference and interesting links
 [Official web page](https://www.perimeterx.com/products/bot-defender)
 [How Perimeterx works](https://www.trickster.dev/post/how-does-perimeterx-bot-defender-work/)
+[List of articles on The Web Scraping Club](https://substack.thewebscraping.club/t/perimeterx)
 
diff --git a/Pages/Antibot/Shape.md b/Pages/Antibot/Shape.md
@@ -9,8 +9,9 @@
 [Wappalyzer Chrome Extension](https://github.com/reanalytics-databoutique/webscraping-open-doc/blob/0386528f99a1209a538f6d042e859cd9933011c8/Pages/Tools/Wappalyzer.md) doesn't seem to recognize it, we've noticed that certain websites protected by Shape, if opened by a browser in incognito mode and with developer tools tab opened, they stop to work. Closing the developer tools tab, they work again.
 
 ### Recommended approach to Shape Bot Defence
-**BEST CHOICE**: at the moment, the best approach is a [Playwright](https://github.com/reanalytics-databoutique/webscraping-open-doc/blob/main/Pages/Tools/Playwright.md) + [Stealth](https://github.com/reanalytics-databoutique/webscraping-open-doc/blob/main/Pages/Tools/Playwright_stealth.md) but cannot be enough. The scraper should mimic a plausible user interaction with the website, we'll share an example soon.
+**BEST CHOICE**: at the moment, the best approach is a [Playwright](https://github.com/reanalytics-databoutique/webscraping-open-doc/blob/main/Pages/Tools/Playwright.md) + Firefox with the right options. An updated version of the solution techniques with code can be found on [The Web Scraping Club](https://substack.thewebscraping.club "The Web Scraping Club").
 
 ### Reference and interesting links
 [Shape Bot Defence](https://www.f5.com/cloud/products/bot-defense)
+[List of articles on The Web Scraping Club](https://substack.thewebscraping.club/t/shape)