Skip to content

Sample Spark app to demonstrate Python packaging & deployment in Spark cluster (Databricks, HDInsight, local)

Notifications You must be signed in to change notification settings

viessmann/spk-sample

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

spk-sample

Sample Spark app to to demonstrate Python packaging and deployment in Spark cluster (Databricks, HDInsight, Local)

Run on local

./scripts/run_local.sh

Build

make build

Run on Databricks

First setup Databricks CLI

  • Copy egg package and main file (wordcount.py) & files to Databricks filesystem:

dbfs cp dist/spk_sample-1.0.0-py3.6.egg dbfs:/spk_sample/dist/spk_sample-1.0.0-py3.6.egg

dbfs cp spk_sample/wordcount.py dbfs:/spk_sample/wordcount.py

dbfs cp sample.txt dbfs:/spk_sample/sample.txt

  • Run Spark submit from Jobs UI with following parameters:

["--py-files","dbfs:/spk_sample/dist/spk_sample-1.0.0-py3.6.egg","dbfs:/spk_sample/wordcount.py", "dbfs:/spk_sample/sample.txt"]

Limitations

  • With spark-submit, Databricks always creates new cluster. You cannot run this job on already running cluster.

About

Sample Spark app to demonstrate Python packaging & deployment in Spark cluster (Databricks, HDInsight, local)

Resources

Stars

Watchers

Forks

Packages

No packages published