This repository contains code and documentation for use with Google Cloud Dataproc.
codelabs/opencv-haarcascade
provides the source code for the OpenCV Dataproc Codelab, which demonstrates a Spark job that adds facial detection to a set of images.spark-tensorflow
provides an example of using Spark as a preprocessing toolchain for Tensorflow jobs. Optionally, it demonstrates the spark-tensorflow-connector to convert CSV files to TFRecords.
See each directories README for more information.
You can find more Dataproc resources in these github repositories:
- Dataproc initialization actions
- Dataproc Python examples
- Dataproc Java Bigtable sample
- Dataproc Spark-Bigtable samples
For more information, review the Dataproc
documentation. You can also
pose questions to the Stack
Overflow community
with the tag google-cloud-dataproc
.
See our other Google Cloud Platform github
repos for sample applications and
scaffolding for other frameworks and use cases.
- See CONTRIBUTING.md
- See LICENSE