You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've come across an interesting bug that results due to a partial circular dependency between the Entities class and the Document.OutputSettings class. I did not see any existing issues that appeared to match this bug.
Here's the situation. I (as a consumer) want to use the Entities.escape(String) method that takes a single string parameter[1]. This method defaults the OutputSettings to a "DefaultOutput" variable within Entities that is initialized on class load[2]. Because this is a static variable, it is initialized when the Entities class is first invoked. This normally works fine, unless the Document.OutputSettings class is initialized before any Entities references (there could be any number of reasons why we might do this such as creating a static OutputSettings object with our preferred settings in a constant file).
If a new Document.OutputSettings is initialized, it will default the EscapeMode to base[3]. However, because the EscapeMode enumeration is in the Entities class, the JVM classloader now needs to initialize the Entities class including its static creation of a DefaultOutput Document.OutputSettings object. So at this point, the classloader needs to instantiate a second OutputSettings object (the first one is the one I'm instantiating in my consumer code, and this second one is for the DefaultOutput in Entities), but that means it needs to read that base EscapeMode enumeration again, but the code is already in the process of loading the Entities class from the first time through. When the classloader sees this, it sets the EscapeMode on the second instantiation (the DefaultOutput object) to null.
At this point, because that object is static, if I were to call Entities.escape("\u0013") (in other words, a control character), then it will fall down this path[4] where escapeMode is expected to be non-null. The NullPointerException is then thrown.
There are ways for me as a consumer to get around this; for example, I can specify an OutputSettings directly (using the overloaded escape), or always make sure that Entities is initialized prior to any OutputSettings references, but I don't think that should be my responsibility as a consumer, it essentially means I can't use the DefaultOutput version of Entities.escape() unless I keep a close eye on the order of invocation of third-party code, meaning that I have to be aware of the implementation details of the classes.
I have created an example repl.it to reproduce[5]. It is a simple program that only pulls in JSoup (to demonstrate that there's nothing from my consumer code causing this). There is an if condition that when no arguments are sent to the class, demonstrates the issue by initializing an OutputSettings variable and escaping a control character:
OutputSettings test = new OutputSettings();
Entities.escape("\u0013");
This throws the NPE. From the shell, the class can be run again with any argument (e.g. "java -classpath .:target/dependency/* Main 1"), which will cause it to go down the working path, which calls the same code but initializes Entities first by calling Entities.escape before:
Entities.escape("\u0013");
OutputSettings test = new OutputSettings();
Entities.escape("\u0013");
No NPE, and "Hello World" is written out to the console. I don't believe I'm using any of the classes inappropriately, which is why I believe this to be a legitimate bug. Let me know if there is any additional information I can provide or if there are any issues with the repl.it. Thank you.
This is a cleaner decoupling of OutputSettings and Entities than the previous impl which required a lazy initialisation of OutputSettings.
Also simplified how we get a fallback encoder.
Related to #1910, #2042
I've come across an interesting bug that results due to a partial circular dependency between the Entities class and the Document.OutputSettings class. I did not see any existing issues that appeared to match this bug.
Here's the situation. I (as a consumer) want to use the Entities.escape(String) method that takes a single string parameter[1]. This method defaults the OutputSettings to a "DefaultOutput" variable within Entities that is initialized on class load[2]. Because this is a static variable, it is initialized when the Entities class is first invoked. This normally works fine, unless the Document.OutputSettings class is initialized before any Entities references (there could be any number of reasons why we might do this such as creating a static OutputSettings object with our preferred settings in a constant file).
If a new Document.OutputSettings is initialized, it will default the EscapeMode to base[3]. However, because the EscapeMode enumeration is in the Entities class, the JVM classloader now needs to initialize the Entities class including its static creation of a DefaultOutput Document.OutputSettings object. So at this point, the classloader needs to instantiate a second OutputSettings object (the first one is the one I'm instantiating in my consumer code, and this second one is for the DefaultOutput in Entities), but that means it needs to read that base EscapeMode enumeration again, but the code is already in the process of loading the Entities class from the first time through. When the classloader sees this, it sets the EscapeMode on the second instantiation (the DefaultOutput object) to null.
At this point, because that object is static, if I were to call Entities.escape("\u0013") (in other words, a control character), then it will fall down this path[4] where escapeMode is expected to be non-null. The NullPointerException is then thrown.
There are ways for me as a consumer to get around this; for example, I can specify an OutputSettings directly (using the overloaded escape), or always make sure that Entities is initialized prior to any OutputSettings references, but I don't think that should be my responsibility as a consumer, it essentially means I can't use the DefaultOutput version of Entities.escape() unless I keep a close eye on the order of invocation of third-party code, meaning that I have to be aware of the implementation details of the classes.
I have created an example repl.it to reproduce[5]. It is a simple program that only pulls in JSoup (to demonstrate that there's nothing from my consumer code causing this). There is an if condition that when no arguments are sent to the class, demonstrates the issue by initializing an OutputSettings variable and escaping a control character:
OutputSettings test = new OutputSettings();
Entities.escape("\u0013");
This throws the NPE. From the shell, the class can be run again with any argument (e.g. "java -classpath .:target/dependency/* Main 1"), which will cause it to go down the working path, which calls the same code but initializes Entities first by calling Entities.escape before:
Entities.escape("\u0013");
OutputSettings test = new OutputSettings();
Entities.escape("\u0013");
No NPE, and "Hello World" is written out to the console. I don't believe I'm using any of the classes inappropriately, which is why I believe this to be a legitimate bug. Let me know if there is any additional information I can provide or if there are any issues with the repl.it. Thank you.
[1] - https://github.com/jhy/jsoup/blob/master/src/main/java/org/jsoup/nodes/Entities.java#L159
[2] - https://github.com/jhy/jsoup/blob/master/src/main/java/org/jsoup/nodes/Entities.java#L29
[3] - https://github.com/jhy/jsoup/blob/master/src/main/java/org/jsoup/nodes/Document.java#L448
[4] - https://github.com/jhy/jsoup/blob/master/src/main/java/org/jsoup/nodes/Entities.java#L240
[5] - https://replit.com/@trevorrials/JsoupEscapeModeBug#Main.java
The text was updated successfully, but these errors were encountered: