Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Entities.escape(String) can cause NullPointerException based on class initialization #1910

Closed
cernertrevor opened this issue Mar 1, 2023 · 3 comments
Assignees
Labels
bug Confirmed bug that we should fix fixed
Milestone

Comments

@cernertrevor
Copy link

cernertrevor commented Mar 1, 2023

I've come across an interesting bug that results due to a partial circular dependency between the Entities class and the Document.OutputSettings class. I did not see any existing issues that appeared to match this bug.

Here's the situation. I (as a consumer) want to use the Entities.escape(String) method that takes a single string parameter[1]. This method defaults the OutputSettings to a "DefaultOutput" variable within Entities that is initialized on class load[2]. Because this is a static variable, it is initialized when the Entities class is first invoked. This normally works fine, unless the Document.OutputSettings class is initialized before any Entities references (there could be any number of reasons why we might do this such as creating a static OutputSettings object with our preferred settings in a constant file).

If a new Document.OutputSettings is initialized, it will default the EscapeMode to base[3]. However, because the EscapeMode enumeration is in the Entities class, the JVM classloader now needs to initialize the Entities class including its static creation of a DefaultOutput Document.OutputSettings object. So at this point, the classloader needs to instantiate a second OutputSettings object (the first one is the one I'm instantiating in my consumer code, and this second one is for the DefaultOutput in Entities), but that means it needs to read that base EscapeMode enumeration again, but the code is already in the process of loading the Entities class from the first time through. When the classloader sees this, it sets the EscapeMode on the second instantiation (the DefaultOutput object) to null.

At this point, because that object is static, if I were to call Entities.escape("\u0013") (in other words, a control character), then it will fall down this path[4] where escapeMode is expected to be non-null. The NullPointerException is then thrown.

There are ways for me as a consumer to get around this; for example, I can specify an OutputSettings directly (using the overloaded escape), or always make sure that Entities is initialized prior to any OutputSettings references, but I don't think that should be my responsibility as a consumer, it essentially means I can't use the DefaultOutput version of Entities.escape() unless I keep a close eye on the order of invocation of third-party code, meaning that I have to be aware of the implementation details of the classes.

I have created an example repl.it to reproduce[5]. It is a simple program that only pulls in JSoup (to demonstrate that there's nothing from my consumer code causing this). There is an if condition that when no arguments are sent to the class, demonstrates the issue by initializing an OutputSettings variable and escaping a control character:

OutputSettings test = new OutputSettings();
Entities.escape("\u0013");

This throws the NPE. From the shell, the class can be run again with any argument (e.g. "java -classpath .:target/dependency/* Main 1"), which will cause it to go down the working path, which calls the same code but initializes Entities first by calling Entities.escape before:

Entities.escape("\u0013");
OutputSettings test = new OutputSettings();
Entities.escape("\u0013");

No NPE, and "Hello World" is written out to the console. I don't believe I'm using any of the classes inappropriately, which is why I believe this to be a legitimate bug. Let me know if there is any additional information I can provide or if there are any issues with the repl.it. Thank you.

[1] - https://github.com/jhy/jsoup/blob/master/src/main/java/org/jsoup/nodes/Entities.java#L159
[2] - https://github.com/jhy/jsoup/blob/master/src/main/java/org/jsoup/nodes/Entities.java#L29
[3] - https://github.com/jhy/jsoup/blob/master/src/main/java/org/jsoup/nodes/Document.java#L448
[4] - https://github.com/jhy/jsoup/blob/master/src/main/java/org/jsoup/nodes/Entities.java#L240
[5] - https://replit.com/@trevorrials/JsoupEscapeModeBug#Main.java

@jhy jhy closed this as completed in b6f652c Mar 2, 2023
@jhy jhy self-assigned this Mar 2, 2023
@jhy jhy added bug Confirmed bug that we should fix fixed labels Mar 2, 2023
@jhy jhy added this to the 1.16.1 milestone Mar 2, 2023
@jhy
Copy link
Owner

jhy commented Mar 2, 2023

Interesting find! Thanks for reporting. I've fixed this now.

@cernertrevor
Copy link
Author

Excellent. Thank you!

jhy added a commit that referenced this issue Aug 12, 2024
This is a cleaner decoupling of OutputSettings and Entities than the previous impl which required a lazy initialisation of OutputSettings.

Also simplified how we get a fallback encoder.

Related to #1910, #2042
@jhy
Copy link
Owner

jhy commented Aug 12, 2024

With b731fd7 I have further decoupled OutputSettings and Entities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Confirmed bug that we should fix fixed
Projects
None yet
Development

No branches or pull requests

2 participants