-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-35672][CORE][YARN] Pass user classpath entries to executors using config instead of command line. #34120
[SPARK-35672][CORE][YARN] Pass user classpath entries to executors using config instead of command line. #34120
Conversation
Test build #143658 has finished for PR 34120 at commit
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #143660 has finished for PR 34120 at commit
|
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
Outdated
Show resolved
Hide resolved
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
Outdated
Show resolved
Hide resolved
Kubernetes integration test starting |
Test build #143728 has finished for PR 34120 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
Kubernetes integration test status failure |
Test build #143726 has finished for PR 34120 at commit
|
overall changes look fine, at this point I think hold off on merging til 3.2 go out assuming you want to try to put in 3.2.1 |
SGTM, let's wait for the release to wrap up. I will also fix the style issue. |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #143770 has finished for PR 34120 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor thing (moving the replaceEnvVars
into a an object).
Otherwise looks fine. But let me have one more round when this small refactor is done.
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
Outdated
Show resolved
Hide resolved
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala
Outdated
Show resolved
Hide resolved
92e296b
to
ec146d1
Compare
Just pushed up a new commit moving |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #144096 has finished for PR 34120 at commit
|
ec146d1
to
023963c
Compare
@tgravescs @attilapiros -- now that the Spark 3.2 release is all wrapped up, can you take another look? I just rebased on latest master. |
@xkrogen I can only do the review on this Friday/Saturday |
Thanks @tgravescs , re-triggered the tests. @attilapiros no problem -- we can target to merge next week if there are no issues from your side. |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #144879 has finished for PR 34120 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few questions but in general it looks good.
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala
Outdated
Show resolved
Hide resolved
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala
Show resolved
Hide resolved
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala
Outdated
Show resolved
Hide resolved
…ing config instead of command line ### What changes were proposed in this pull request? Refactor the logic for constructing the user classpath from `yarn.ApplicationMaster` into `yarn.Client` so that it can be leveraged on the executor side as well, instead of having the driver construct it and pass it to the executor via command-line arguments. A new method, `getUserClassPath`, is added to `CoarseGrainedExecutorBackend` which defaults to `Nil` (consistent with the existing behavior where non-YARN resource managers do not configure the user classpath). `YarnCoarseGrainedExecutorBackend` overrides this to construct the user classpath from the existing `APP_JAR` and `SECONDARY_JARS` configs. ### Why are the changes needed? User-provided JARs are made available to executors using a custom classloader, so they do not appear on the standard Java classpath. Instead, they are passed as a list to the executor which then creates a classloader out of the URLs. Currently in the case of YARN, this list of JARs is crafted by the Driver (in `ExecutorRunnable`), which then passes the information to the executors (`CoarseGrainedExecutorBackend`) by specifying each JAR on the executor command line as `--user-class-path /path/to/myjar.jar`. This can cause extremely long argument lists when there are many JARs, which can cause the OS argument length to be exceeded, typically manifesting as the error message: > /bin/bash: Argument list too long A [Google search](https://www.google.com/search?q=spark%20%22%2Fbin%2Fbash%3A%20argument%20list%20too%20long%22&oq=spark%20%22%2Fbin%2Fbash%3A%20argument%20list%20too%20long%22) indicates that this is not a theoretical problem and afflicts real users, including ours. Passing this list using the configurations instead resolves this issue. ### Does this PR introduce _any_ user-facing change? No, except for fixing the bug, allowing for larger JAR lists to be passed successfully. Configuration of JARs is identical to before. ### How was this patch tested? New unit tests were added in `YarnClusterSuite`. Also, we have been running a similar fix internally for 4 months with great success. Closes apache#32810 from xkrogen/xkrogen-SPARK-35672-classpath-scalable. Authored-by: Erik Krogen <[email protected]> Signed-off-by: Thomas Graves <[email protected]>
…, add more test cases, follow Unix variable name conventions even on Windows based on the example of Hadoop's Shell class
…r to share a common variable for the name regex pattern.
…ateway replacement doesn't take place, and also make use of a non-empty environment variable
e1b0668
to
97540f8
Compare
Thanks a lot for the review @attilapiros , some great enhancements to the test cases resulted. I believe they all should be addressed now. |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #145071 has finished for PR 34120 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @xkrogen!
Thanks @attilapiros ! Would you or @tgravescs be willing to help merge this? |
merged to master |
Many thanks @attilapiros and @tgravescs ! Also thanks to @peter-toth for initially reporting the issue with the original PR. |
…scala for Scala 2.13 ### What changes were proposed in this pull request? This PR mitigate an issue that MiMa fails for Scala 2.13 after SPARK-35672 (#34120). ``` $ dev/change-scala-version.sh 2.13 $ dev/mima ... [error] spark-core: Failed binary compatibility check against org.apache.spark:spark-core_2.13:3.2.0! Found 8 potential problems (filtered 905) [error] * method userClassPath()scala.collection.mutable.ListBuffer in class org.apache.spark.executor.CoarseGrainedExecutorBackend#Arguments does not have a correspondent in current version [error] filter with: ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.executor.CoarseGrainedExecutorBackend#Arguments.userClassPath") [error] * method copy(java.lang.String,java.lang.String,java.lang.String,java.lang.String,Int,java.lang.String,scala.Option,scala.collection.mutable.ListBuffer,scala.Option,Int)org.apache.spark.executor.CoarseGrainedExecutorBackend#Arguments in class org.apache.spark.executor.CoarseGrainedExecutorBackend#Arguments does not have a correspondent in current version [error] filter with: ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.executor.CoarseGrainedExecutorBackend#Arguments.copy") [error] * synthetic method copy$default$10()Int in class org.apache.spark.executor.CoarseGrainedExecutorBackend#Arguments does not have a correspondent in current version [error] filter with: ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.executor.CoarseGrainedExecutorBackend#Arguments.copy$default$10") [error] * synthetic method copy$default$8()scala.collection.mutable.ListBuffer in class org.apache.spark.executor.CoarseGrainedExecutorBackend#Arguments has a different result type in current version, where it is scala.Option rather than scala.collection.mutable.ListBuffer [error] filter with: ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.executor.CoarseGrainedExecutorBackend#Arguments.copy$default$8") [error] * synthetic method copy$default$9()scala.Option in class org.apache.spark.executor.CoarseGrainedExecutorBackend#Arguments has a different result type in current version, where it is Int rather than scala.Option [error] filter with: ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.executor.CoarseGrainedExecutorBackend#Arguments.copy$default$9") [error] * method this(java.lang.String,java.lang.String,java.lang.String,java.lang.String,Int,java.lang.String,scala.Option,scala.collection.mutable.ListBuffer,scala.Option,Int)Unit in class org.apache.spark.executor.CoarseGrainedExecutorBackend#Arguments does not have a correspondent in current version [error] filter with: ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.executor.CoarseGrainedExecutorBackend#Arguments.this") [error] * the type hierarchy of object org.apache.spark.executor.CoarseGrainedExecutorBackend#Arguments is different in current version. Missing types {scala.runtime.AbstractFunction10} [error] filter with: ProblemFilters.exclude[MissingTypesProblem]("org.apache.spark.executor.CoarseGrainedExecutorBackend$Arguments$") [error] * method apply(java.lang.String,java.lang.String,java.lang.String,java.lang.String,Int,java.lang.String,scala.Option,scala.collection.mutable.ListBuffer,scala.Option,Int)org.apache.spark.executor.CoarseGrainedExecutorBackend#Arguments in object org.apache.spark.executor.CoarseGrainedExecutorBackend#Arguments does not have a correspondent in current version [error] filter with: ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.executor.CoarseGrainedExecutorBackend#Arguments.apply") ... ``` It's funny that the class `Arguments` is `public` but it's a member class of `CoarseGrainedExecutorBackend` which is `package private` and MiMa doesn't raise error for Scala 2.12, but adding an exclusion rule is one workaround. ### Why are the changes needed? To keep the build stable. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Confirmed MiMa passed. ``` $ dev/change-scala-version.sh 2.13 $ dev/mima ``` Closes #34649 from sarutak/followup-SPARK-35672-mima. Authored-by: Kousuke Saruta <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
Refactor the logic for constructing the user classpath from
yarn.ApplicationMaster
intoyarn.Client
so that it can be leveraged on the executor side as well, instead of having the driver construct it and pass it to the executor via command-line arguments. A new method,getUserClassPath
, is added toCoarseGrainedExecutorBackend
which defaults toNil
(consistent with the existing behavior where non-YARN resource managers do not configure the user classpath).YarnCoarseGrainedExecutorBackend
overrides this to construct the user classpath from the existingAPP_JAR
andSECONDARY_JARS
configs. Withinyarn.Client
, environment variables in the configured paths are resolved before constructing the classpath.Please note that this is a re-submission of #32810, which was reverted in #34082 due to the issues described in this comment. This PR additionally includes the changes described in #34084 to resolve the issue, though this PR has been enhanced to properly handle escape strings, unlike #34084.
Why are the changes needed?
User-provided JARs are made available to executors using a custom classloader, so they do not appear on the standard Java classpath. Instead, they are passed as a list to the executor which then creates a classloader out of the URLs. Currently in the case of YARN, this list of JARs is crafted by the Driver (in
ExecutorRunnable
), which then passes the information to the executors (CoarseGrainedExecutorBackend
) by specifying each JAR on the executor command line as--user-class-path /path/to/myjar.jar
. This can cause extremely long argument lists when there are many JARs, which can cause the OS argument length to be exceeded, typically manifesting as the error message:A Google search indicates that this is not a theoretical problem and afflicts real users, including ours. Passing this list using the configurations instead resolves this issue.
Does this PR introduce any user-facing change?
There is one small behavioral change which is a bug fix. Previously the
spark.yarn.config.gatewayPath
andspark.yarn.config.replacementPath
options were only applied to executors, meaning they would not work for the driver when running in cluster mode. This appears to be a bug; the documentation for this functionality does not mention any limitations that this is only for executors. This PR fixes that issue.Additionally, this fixes the main bash argument length issue, allowing for larger JAR lists to be passed successfully. Configuration of JARs is identical to before, and substitution of environment variables in
spark.jars
orspark.yarn.config.replacementPath
works as expected.How was this patch tested?
New unit tests were added in
YarnClusterSuite
. Also, we have been running a similar fix internally for 4 months with great success.