-
Notifications
You must be signed in to change notification settings - Fork 17.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: x/net/html: ParseOption to set maxBuf #68101
Comments
Related: #63177 to set the entire Tokenizer |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Proposal Details
Abstract
This proposal suggests introducing an option to set the
MaxBuf
parameter in thehtml.Parse
function to control memory usage when parsing large HTML documents.Background
Currently,
html.Parse
in the Go standard library callsParseWithOptions
internally, leading to a chain of function calls:html.Parse -> ParseWithOptions -> p.parse() -> p.tokenizer.Next() -> readByte()
. WithinreadByte()
, there is a logic block:This logic is activated only if
maxBuf
is set. However, there is no way to setMaxBuf
when usinghtml.Parse
orParseWithOptions
.Problem
When parsing very large HTML documents, such as this page, memory usage can increase significantly due to the inability to set
MaxBuf
.Solution
To address this, I propose introducing a function similar to
ParseOptionEnableScripting
to allow users to setMaxBuf
.Implementation
A sample implementation using reflection is provided below. This implementation, though functional, uses unsafe methods and reflection, which are not ideal for production code:
This implementation can be used as follows:
To properly address the issue, I propose the following function to be added to the standard library:
Testing has shown that setting
maxBuf
to at least 1.04 times the body length ensures normal operation.Feasibility
Adding a function similar to
ParseOptionEnableScripting
to allow users to setMaxBuf
would provide a safe and efficient way to control memory usage when parsing large HTML documents, avoiding the use of unsafe methods and reflection.Environment
The text was updated successfully, but these errors were encountered: