maltz's comments

maltz · on Aug 12, 2021

You should probably take a look at this.

https://www.zenrows.com/blog/stealth-web-scraping-in-python-...

maltz · on Aug 12, 2021

YouTube has "var ytInitialData" & "var ytInitialPlayerResponse" params hardcoded in HTML. No need to run JS!

maltz · on Aug 11, 2021

If pages are constructed client-side, the content you are looking for is either hardcoded as JSON in the HTML or loaded via XHR request. Scrape that.

maltz · on Aug 11, 2021

Playwright. It can be easily used with JS, Python, Go, Java, etc.

cl42 · on Aug 11, 2021

Thanks! Is that like using Selenium? (i.e., you have to manage and code the actions yourself)

anderRV · on Aug 11, 2021

Yes, quite similar. According to their definition it is a "library to automate Chromium, Firefox and WebKit with a single API. "

cl42 · on Aug 11, 2021

Thanks! If there are any third-party managed tools to do this, that would be awesome to know about (i.e., where they somehow run common JS functions/site interactions to test for additional content).

ethbr0 · on Aug 11, 2021

Unfortunately, it's a pathological edge case.

Imagine an async-loaded list, that continues loading more content as it comes in, until it displays all of the content available to the backend.

When would you know such a list is finished loading?

This sounds insane, but it's pretty easy and common for an ambitious UXer to key in on, and is something I've seen in production pages.

(In the event you are a UXer, please include some sort of status update! Even an overlaid spinner that disappears solves the problem.)

maltz · on Aug 11, 2021

It's part of a series of blog posts that talks explicitly about crawling. There are indeed other links that do better explaining advanced extraction techniques.

Extraction => https://www.zenrows.com/blog/mastering-web-scraping-in-pytho...

Avoid blocking => https://www.zenrows.com/blog/stealth-web-scraping-in-python-...

adamqureshi · on Aug 11, 2021

ok but do you offer custom scraping services if i needed to hire someone to build it?

maltz · on Aug 11, 2021

Yep, seems so. https://www.zenrows.com/pricing

adamqureshi · on Aug 12, 2021

thank you