-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-1789] [core] [runtime] [java-api] Allow adding of URLs to the usercode class loader #593
Conversation
I didn't spot anything. But, the travis build didn't pass ;) |
cbbb554
to
90cf0db
Compare
It builds now ;) |
👏 |
This does not work if the user uses classes that are not available on the local machine since you don't add the additional class path entries in JobWithJars.buildUserCodeClassLoader(). Correct? |
@aljoscha I'm sorry, I cannot follow? Can you elaborate? The idea is to add support for proper folders next to jars when opening an execution environment. Since folders cannot be handled the same way as jars during execution (i.e., serialize and ship them around by the blob manager), the assumption for folder paths is that they are accessible from all agents in the distributed runtime (JobManager + TaskManagers), e.g., via shared NFS folders. |
Yes, this is true, but the way it is implemented, the folders are not always added to the class loader. Maybe I'm wrong here, but JobWithJars.getUserCodeClassLoader and JobWithJars.buildUserCodeClassLoader don't add the URLs to the ClassLoader that they create. |
@aljoscha Yes, this is what I also mentioned in my PR description. If we are using a RemoteEnvironment, it is unlikely that the classpath that is available on the cluster is also valid on the local machine. The user has to add the libraries to his local project manually. |
OK, @StephanEwen, any thoughts on this? Should we allow that the local user code class loader in the client potentially doesn't have the same jars available as the workers? |
I think that keeping the classpaths in sync is really crucial. What is the problem in exposing this to the client as well? Can we make the following assumptions for the use case where we need the global class path:
Some other remark: This makes the code quite dirty on a lot of parts, since we have always Files and URLs in Client, executors, ... Can we not store everything as URLs (rather than having Strings for jar files and URLs extra). Files can be expressed as URLs anyways as well... |
Yes, this would make things a lot cleaner. @twalthr what do you think? |
It is not a problem in exposing this to the client as well, I was just unsure if it is necessary. I will add to the client as well to be in sync. |
Yes, user-facing remains strings which are file paths. |
90cf0db
to
ede7cb2
Compare
I have implemented your comments and rebased to the current master. The build succeeded except for a KafkaITCase, but I think that has nothing to do with my code. |
Oh .. failing |
Any comments for this PR? |
I was originally thinking to merge this after #681 , but I may have to merge this by itself to get it into the release. |
7b7a1ad
to
2230808
Compare
I have rebased the PR and added commits of @aalexandrov that introduce a CLI option |
Do you mean the 0.10-milestone-1 release? or the final 0.10 release? Fabian is managing the 0.10-milestone-1 release, as far as I know he has not started with the release activities, so we could still include it, given that we have consensus to merge the PR |
@rmetzger the final 10.0, this and the Scala 2.11 cross-building support (#885), which is currently handled by @chiwanpark, are the two pending issues that make the current Emma master incompatible with vanilla Flink. |
I rebased the PR two times. So before I will rebase it a third time, it would be great if we finally find a consensus. |
#858 is merged, so we can try to get consensus here |
@@ -47,6 +47,11 @@ | |||
"Class with the program entry point (\"main\" method or \"getPlan()\" method. Only needed if the " + | |||
"JAR file does not specify the class in its manifest."); | |||
|
|||
static final Option CLASSPATH_OPTION = new Option("C", "classpath", true, "Adds a URL to each user code classloader " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would hdfs:https:///
also be possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HDFS is not possible yet. the user would have to provide a special URL protocol handler via java -Djava.protocol.handler.pkgs=org.my.protocols
such that the URLClassloader understands the protocol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this documented somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tillrohrmann I added a comment that only protocols are supported which are supported by the URLClassLoader
. The parameter above is Java standard.
Conflicts solved again ;) |
Thanks a lot @twalthr for rebasing the pull request. There is an issue with the Scalashell test. Not sure why it only occurs in some builds. |
@mxm The issue is probably related to Scala 2.10, since the two passing builds have |
I think the NullPointerException was due to a wrong array to list conversion. I don't know why this is related to the Scala version. |
@twalthr It could be that the test is not executed for Scala 2.11. |
The build succeeded. How do we proceed? It would be very nice if we finally merge this. |
|
||
|
||
// add classpaths | ||
URL[] cpUrls = requiredClasspaths.toArray(new URL[requiredClasspaths.size()]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason you're creating an array here first instead of simply iterating over requiredClassPaths
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. It is completely unnecessary. I will correct that.
I went through the pull request again and it looks good. +1 for merging |
Do we need another committers +1? It touches very sensitive parts... |
I think @StephanEwen wanted to merge it as well. @rmetzger concerns should be out of the way with the new version. I would merge this by the end of the day. |
Looks good to merge from my side. The only uncertainty remaining is whether we need the static factory methods on the ExecutionEnvironment. Adding these global classpaths seems a very specific case, in which we can probably expect people to manually create a RemoteEnvironment. That way we don't over load the ExecutionEnvironment with too many factory methods (we have a tendency to get API bloat). |
* user-defined functions, user-defined input formats, or any libraries, those must be | ||
* provided in the JAR files. | ||
*/ | ||
public RemoteEnvironment(String host, int port, String... jarFiles) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we keep this constructor?
+1 for removing the factory methods from ExecutionEnvironment. Most users won't even think about using this feature and be rather confused by the additional methods. Let's also keep the old constructor of the RemoteEnvironment in addition to the new one. |
@twalthr Could you make the changes and update the PR? Thank you. |
33565ae
to
120df65
Compare
@mxm done ;) |
@twalthr Doesn't seem to compile :) |
120df65
to
9c18e3e
Compare
Merged. Congrats :) |
Thanks, @mxm ;) |
…s loader This closes apache#593.
…s loader This closes apache#593.
See [FLINK-1789].
I was a little bit unsure in
JobWithJars
because it is not guaranteed that the classpath is correct locally, maybe someone can have a look at it.