-
-
Notifications
You must be signed in to change notification settings - Fork 639
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to scrape a dynamic website? #71
Comments
It seems no one answer this yet. I don't know if the developers see this or not. But let me help you here. From the scraper file they create, they are using static scraper libraries like requests and BeautifulSoup. Dynamic website needs browser engine to execute the JavaScript parts of the web. Python has some libraries like Selenium or Playwright that using browser engine to render the JavaScript from dynamic webs and extract the HTML from them. But it seems autoscraper didn't use them. Maybe they will, or maybe not. As for November 23rd, 2022, I don't see any dynamic web scraper libraries used in the core file of this program. P.S: Correct me if I'm wrong. |
You can supply a |
I am trying to export a localhost website that is generated with this project:
https://github.com/HBehrens/puncover
The project generates a localhost website, and each time the user interacts clicks a link the project receives a GET request and the website generates the HTML. This means that the HTML is generated each time the user access a link through their browser. At the moment the project does not export the website to html or pdf. For this reason I want to know how could I recursively get all the hyperlinks and then generate the HTML version. Would this be possible with autoscraper?
The text was updated successfully, but these errors were encountered: