Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.12.0-incubating
-
None
-
None
Description
Issue based on a PR/issue opened on GitHub:
https://github.com/apache/incubator-predictionio/pull/441
Problem
pio batchpredict --input /tmp/pio/batchpredict-input.json --output /tmp/pio/batchpredict-output.json
[WARN] [ALSModel] Product factor is not cached. Prediction could be slow.
Exception in thread "main" org.apache.spark.SparkException: Only one SparkContext may be running in this JVM (seeSPARK-2243). To ignore this error, set spark.driver.allowMultipleContexts = true.
Root Cause
BatchPredict makes multiple SparkContexts:
https://github.com/apache/incubator-predictionio/blob/v0.12.0-incubating/core/src/main/scala/org/apache/predictionio/workflow/BatchPredict.scala#L160
https://github.com/apache/incubator-predictionio/blob/v0.12.0-incubating/core/src/main/scala/org/apache/predictionio/workflow/BatchPredict.scala#L183
When using a PersistentModel/PersistentModelLoader, PredictionIO don't stop the first SparkContext:
https://github.com/apache/incubator-predictionio/blob/v0.12.0-incubating/core/src/main/scala/org/apache/predictionio/controller/Engine.scala#L241-L250
For example, the Recommendation Engine Template uses this technique:
https://github.com/apache/incubator-predictionio-template-recommender/blob/develop/src/main/scala/ALSModel.scala
Solutions?
Due to the variability of SparkContext usage during deploy, how do we ensure a viable SparkContext for running batch queries?