Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a plugin that adds a handler to crash the JVM #109930

Merged
merged 5 commits into from
Jun 21, 2024

Conversation

thecoop
Copy link
Member

@thecoop thecoop commented Jun 19, 2024

A rest handler that purposefully causes a hard JVM crash for testing purposes

@thecoop thecoop added >test Issues or PRs that are addressing/adding tests :Core/Infra/Core Core issues without another label labels Jun 19, 2024
@thecoop thecoop requested a review from a team June 19, 2024 14:09
@thecoop thecoop requested a review from a team as a code owner June 19, 2024 14:09
@elasticsearchmachine elasticsearchmachine added the Team:Core/Infra Meta label for core/infra team label Jun 19, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

Copy link
Contributor

@ldematte ldematte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question: do you know if the JVM is treating its internal code differently from the "user" code (i.e. the bytecode that gets interpreted/JITed?)
This plugin may be useful anyway to test how we respond to a JVM crash, but the usefulness can be limited if it ends up that native uncaught exceptions are treated differently based on where/when they originated

@thecoop
Copy link
Member Author

thecoop commented Jun 19, 2024

Other bugs may cause Errors being thrown in userland, but that's the same as any other uncaught exception to Elasticsearch.

@thecoop thecoop requested a review from ldematte June 20, 2024 11:26
Copy link
Contributor

@ldematte ldematte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

private static void redirectStdout() {
if (originalOut == null) {
originalOut = System.out;
stdOutput = new ByteArrayOutputStream();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we read the output from disk? The test nodes have an es.out file that should contain the crash info?

Copy link
Member Author

@thecoop thecoop Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JVM crash is only output to standard out (JVM knows nothing about log4j), the test-cluster.log file doesn't have it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

es.out is not log4j, it's the stdout of the ES process.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oof, I am mixing up testclusters and the new test framework. For things like gradlew run we have es.out, but it looks like the new cluster test framework starts the ES processes with inherited IO.

@mark-vieira is there a strong reason why the stdout/stderr is not captured via a file?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We redirect stdout from the ES proccess to stdout in the test process so the crash should be captured in the test logs but perhaps not in a way that can be asserted on. If we need a general way to redirect logs to some other consumer we can certainly add an API for this use case. I think we've used the getNodeLog() API for these use cases in the past but that obviously doesn't work for JVM level stuff that doesn't go through log4j.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And also it's very rare to have something that goes to stdout/stderr that does not go through log4j

@thecoop
Copy link
Member Author

thecoop commented Jun 20, 2024

@elasticmachine update branch

@thecoop thecoop merged commit 390f915 into elastic:main Jun 21, 2024
15 checks passed
@thecoop thecoop deleted the jvm-crash-handler branch June 21, 2024 07:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Core Core issues without another label Team:Core/Infra Meta label for core/infra team >test Issues or PRs that are addressing/adding tests v8.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants