Introduces SparkKeywordProcessor
which is a thin Scala wrapper around the FlashTextJava library done by jasonsperske.
That project was a port of the flashtext.py into Java.
The motivation for this was to run FlashText on Spark to efficiently tag milliions of unstructured documents for matches against a large corpus of keywords (also in the millions).
Just clone the repo an if you are on UNIX:
./gradlew build
or on windows:
./gradlew.bat build
This will bootstrap the project with all the dependencies, just requiring java 8 to be installed.