Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NPE in Netty 4 instrumentation #8939

Closed
stevesea opened this issue Jul 12, 2023 · 7 comments · Fixed by #9166
Closed

NPE in Netty 4 instrumentation #8939

stevesea opened this issue Jul 12, 2023 · 7 comments · Fixed by #9166
Labels
bug Something isn't working repro provided

Comments

@stevesea
Copy link

stevesea commented Jul 12, 2023

Describe the bug

NPE during Netty instrumentation of a Spring Boot 3 app (spring boot parent 3.1.1, spring-cloud 2022.0.3).

14:20:11.1700 |  WARN | i.n.u.c.DefaultPromise | An exception was thrown by io.opentelemetry.javaagent.shaded.instrumentation.netty.v4.common.internal.client.NettySslInstrumentationHandler$$Lambda$1803/0x0000000801d98a48.operationComplete()
java.lang.NullPointerException: Cannot invoke "io.opentelemetry.javaagent.shaded.io.opentelemetry.context.Context.with(io.opentelemetry.javaagent.shaded.io.opentelemetry.context.ImplicitContextKeyed)" because "parentContext" is null
	at io.opentelemetry.javaagent.shaded.instrumentation.netty.v4.common.internal.client.NettySslErrorOnlyInstrumenter.start(NettySslErrorOnlyInstrumenter.java:30)
	at io.opentelemetry.javaagent.shaded.instrumentation.netty.v4.common.internal.client.NettySslInstrumentationHandler.lambda$connect$0(NettySslInstrumentationHandler.java:102)
	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:590)
	at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:583)
	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:559)
	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:492)
	at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:636)
	at io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:625)
	at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:105)
	at io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:84)
	at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.fulfillConnectPromise(AbstractEpollChannel.java:653)
	at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:691)
	at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:567)
	at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:489)
	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:397)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:833)

Steps to reproduce
Still working on reproduction steps with our internal team.

What version are you using?
v.1.27.0

Environment
Compiler: temurin 17 jdk
OS: temurin 17 jre

@stevesea stevesea added the bug Something isn't working label Jul 12, 2023
@laurit
Copy link
Contributor

laurit commented Jul 13, 2023

@stevesea do you always get this NPE or only occasionally? Can you reproduce this at will?

@stevesea
Copy link
Author

stevesea commented Jul 13, 2023

@laurit yes, we can reproduce it at will. the warning message appears three times during application startup, during initialization of tomcat

Still working with our internal team to get good reproduction steps and more details about what might make this particular application unique. The org has already added otel to a handful of different Spring Boot 3 apps, including one using Spring Cloud Gateway which should have exercised lots of those Netty/webflux code paths.

@stevesea
Copy link
Author

The application in question is a Pulsar consumer and producer. The Netty libraries are brought in via transitive dependency on org.apache.pulsar:pulsar-client-original.

This might be the first of our apps to migrate to otel that use Spring Boot + pulsar. Others have been Spring Boot + tomcat/jersey REST APIs. Or, Pulsar consumers/producers that are built on http4k/ktor rather than Spring Boot.

@stevesea
Copy link
Author

stevesea commented Jul 14, 2023

Further digging continues to point the finger at this being a symptom of Pulsar + Netty instrumentation. And, the NPE only occurs when the app connects to a 'pulsar+ssl:https://' url (that is, my typical test harness with a local non-TLS pulsar running via docker-compose doesn't have the NPE).

attached is lightly-redacted application output with otel.javaagent.debug=true (this run was using the latest agent v1.28.0)
run.log

@trask
Copy link
Member

trask commented Jul 26, 2023

Still working with our internal team to get good reproduction steps

let us know how this is going, probably we will need a repro to be able to understand what's going on here

@stevesea
Copy link
Author

I think the key was a pulsar client to a SSL endpoint. I'll work on creating a standalone repository that'll reproduce the problem

@stevesea
Copy link
Author

stevesea commented Aug 7, 2023

Here's a repository that can reproduce the issue https://github.com/stevesea/2023-07-otel-pulsar-ssl-npe

It's not very exciting/functional, but it'll reproduce the NPE in the Netty instrumentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working repro provided
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants