Use the Sentry Integration for Scala Spark to track your error and crashes in your Spark application.
This integration is in alpha and has an unstable API.
Supports Spark 2.x.x
and above.
Interested in PySpark? Check out our PySpark integration.
Add the package as a library dependecy. For the current most update to date version, please see the changelog.
libraryDependencies += "io.sentry" %% "sentry-spark" % "0.0.1-alpha04"
Make sure to configure the Sentry SDK.
We recommend using a sentry.properties
file and place it in <SPARK_HOME>/conf
, or anywhere in the Spark Driver's classpath.
When using cluster mode, we recommend the --files
spark-submit option.
In order to use the integration, you will need to make the jar accesible to your Spark Driver.
SCALA_VERSION="2.11" # or "2.12"
./bin/spark-submit \
--jars "sentry-spark_$SCALA_VERSION-0.0.1-alpha05.jar" \
--files "sentry.properties" \
example-spark-job.jar
The sentry-spark
integration will automatically add tags and other metadata to your Sentry events. You can set it up like this:
import io.sentry.Sentry
import io.sentry.spark.SentrySpark;
...
Sentry.init();
val spark = SparkSession
.builder
.appName("Simple Application")
.getOrCreate();
SentrySpark.applyContext(spark);
SentrySpark.applyContext
can take a SparkSession
, SparkContext
or StreamingContext
.
The sentry-spark
integration exposes custom listeners that allow you to report events and errors to Sentry.
Supply the listeners as configuration properties so that they get instantiated as soon as possible.
The SentrySparkListener
hooks onto the spark scheduler and adds breadcrumbs, tags and reports errors accordingly.
// Using SparkSession
val spark = SparkSession
.builder
.appName("Simple Application")
.config("spark.extraListeners", "io.sentry.spark.listener.SentrySparkListener")
.getOrCreate()
// Using SparkContext
val conf = new SparkConf()
.setAppName("Simple Application")
.setMaster("local[2]")
.config("spark.extraListeners", "io.sentry.spark.listener.SentrySparkListener")
val sc = new SparkContext(conf)
The SentryQueryExecutionListener
listens for query events and reports failures as Sentry errors.
The configuration option spark.sql.queryExecutionListeners
is only supported for Spark 2.3 and above.
val spark = SparkSession
.builder
.appName("Simple Spark SQL application")
.config("spark.sql.queryExecutionListeners", "io.sentry.spark.listener.SentryQueryExecutionListener")
.getOrCreate()
The SentryStreamingQueryListener
listens for streaming queries and reports failures as Sentry errors.
val spark = SparkSession
.builder
.appName("Simple SQL Streaming Application")
.config("spark.sql.streaming.streamingQueryListeners", "io.sentry.spark.listener.SentryStreamingQueryListener")
.getOrCreate();
The SentryStreamingListener
listens for ongoing streaming computations and adds breadcrumbs, tags and reports errors accordingly.
import io.sentry.spark.listener.SentryStreamingListener;
val conf = new SparkConf().setMaster("local[2]").setAppName("NetworkWordCount")
val ssc = new StreamingContext(conf, Seconds(1))
ssc.addStreamingListener(new SentryStreamingListener);
Package the assets with
sbt package
To run tests
sbt test
Test local publishing using
sbt +publishLocal
To publish to bintray, first update your bintray credentials using your bintray username and API key (found on the settings page)
sbt bintrayChangeCredentials
Double check your configuration with:
sbt bintrayWhoami
For more info see sbt-bintray
By default, the sbt-pgp
library will use gpg
's default key to sign the files, but this can be changed, just read through the sbt-pgp docs.
To sign and publish the library:
sbt +publishSigned
You can then upload to maven through the bintray interface by entering in the proper credentials.
As this integration is under active work, reach out to us on our Discord if you are looking to get involved.