Skip to content

Latest commit

 

History

History
 
 

scala_modules

Introduction

This project is our collection of Scala modules.

Requirements

  • IntelliJ IDEA
  • IntelliJ Scalafmt Plugin
  • Scala 2.11
  • Spark 2.4.0+

Spark is expected to be a provided dependency, so you should have a working Spark install somewhere, and $SPARK_HOME should be set in your environment.

You should use IntelliJ IDEA (CE is fine). We use the scalafmt IntelliJ IDEA plugin, configured to update on file save, and scalastyle

Some editor config to put in place: Case Class Definition Style

We follow the Twitter Effective Scala style guide.

Saving this here for future reference: Spark + S3

Installing scala and sbt on Mac OS X

Use homebrew:

brew install [email protected]
brew install sbt

References

Notes

Getting AWS S3 to play nice with Spark is complicated, because it involves a dependency on both aws-java-sdk and hadoop-aws, and these two libraries need to be compatible versions (and compatible with Spark) or else everything explodes:

https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Missing_method_in_com.amazonaws_class

We currently use AWS 1.7.4 and hadoop-aws 2.7.1 as these are known to be compatible and work with Spark 2.4.0+