-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] Fix opencensus protobuf bug #31632
[Core] Fix opencensus protobuf bug #31632
Conversation
Signed-off-by: SangBin Cho <[email protected]>
Signed-off-by: SangBin Cho <[email protected]>
Signed-off-by: SangBin Cho <[email protected]>
Signed-off-by: SangBin Cho <[email protected]>
Signed-off-by: SangBin Cho <[email protected]>
Signed-off-by: SangBin Cho <[email protected]>
Signed-off-by: SangBin Cho <[email protected]>
… into fix-opencensus-protobuf-bug
Signed-off-by: SangBin Cho <[email protected]>
@@ -21,7 +21,6 @@ import "src/ray/protobuf/common.proto"; | |||
import "src/ray/protobuf/dependency.proto"; | |||
import "src/ray/protobuf/gcs.proto"; | |||
import "src/ray/protobuf/logging.proto"; | |||
import "src/ray/protobuf/reporter.proto"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the critical fix
@scv119 I am 80% sure this will pass all builds, but let's see... |
Looks like some tests are still failing?
This is really really old....Looks like it's not maintained at all. Do we have any plans to migrate away from it if so? |
@rkooo567 is the long term solution to put related protobuf into a different namespace so it will never conflict? |
Signed-off-by: SangBin Cho <[email protected]>
Signed-off-by: SangBin Cho <[email protected]>
Signed-off-by: SangBin Cho <[email protected]>
Signed-off-by: SangBin Cho <[email protected]>
Signed-off-by: SangBin Cho <[email protected]>
@scv119 That's solution 4. I think the current solution is good enough in the long term (since we have a regression test now). If we ever have to import reporter.proto to core worker, solution 2 or 4 both will work. |
If anyone reorg proto or do something that breaks it, the test will fail, so we can detect them. If this ever should happen, we can choose option 4 (I will add TODO). The only edge case possible is an end user imports reporter_pb2 (private) to the driver and uses opencensus proto at the same time. Tbh, I don’t think there will be anyone who will do this... Note that this issue has existed for 3 years, and this was the first issue ever raised. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: SangBin Cho <[email protected]>
Signed-off-by: SangBin Cho <[email protected]> Here's how the conflict happens. We pull the opencensus-proto 0.3.0 for our bazel build. opencensus-proto 0.3.0 generated python code (metric_pb2 and resource_pb2), which is used by reporter.proto. We copy all python protobuf to python/ray/core/generated. Now reporter_pb2 is available from python/ray/core/generated. reporter_pb2 is not supposed to be imported from core workers (it is only for dashboard agent). However, this was imported (accidentally) from gcs_pubsub.py because pubsub.proto requires this module (which was unnecessary). As a result, reporter_pb2 was imported to the core worker, which subsequently imports metric_pb2 and resource_pb2. When we import the opencensus proto using open telemetry, we import resource_pb2 again, which causes the conflict. Theoretically, when the same package is imported (which is our case), protobuf is supposed to handle it. (I guess) This wasn't the case because our cpp version uses opencensus-proto 0.3.0, and the python only supports opencensus-proto 0.1.0 (omg https://pypi.org/project/opencensus-proto/#history). There are 4 potential solutions. [current PR] Make sure to not import reporter_pb2 from the core worker. It is the easiest fix and have the best compatibility story. Since we don't import reporter_pb2 (which is only for dashboard agent), there's no concern of conflict. Since we have unit tests now, we can avoid the future regressions. Use downloaded opencensus-proto for all python generated code (instead of cpp one we built). This requires us to introduce opencensus-proto 0.1.0 as a dependency, so I avoided this approach. Downgrade cpp opencensus-proto version to 0.1.0. This is not a very clean solution. What if opencensus releases 0.2.0? Copy & paste opencensus-proto and build our own proto with different names. This can potentially have compatibility issue with opencensus library, but it is unlikely (cuz the project doesn't seem to be maintained). This is the fallback option if the current approach doesn't work.
Why are these changes needed?
Here's how the conflict happens.
reporter.proto
.reporter_pb2 is not supposed to be imported from core workers (it is only for dashboard agent). However, this was imported (accidentally) from gcs_pubsub.py because pubsub.proto requires this module (which was unnecessary).
As a result, reporter_pb2 was imported to the core worker, which subsequently imports metric_pb2 and resource_pb2. When we import the opencensus proto using open telemetry, we import resource_pb2 again, which causes the conflict.
Theoretically, when the same package is imported (which is our case), protobuf is supposed to handle it. (I guess) This wasn't the case because our cpp version uses opencensus-proto 0.3.0, and the python only supports opencensus-proto 0.1.0 (omg https://pypi.org/project/opencensus-proto/#history).
There are 4 potential solutions.
Related issue number
Closes #31358
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.