This is a simple straigtforward spark applcation and an Airflow control layer monorepo.
It is intended to be run locally, and is not designed to be run in a production environment.
- Clone the repo
- Install the requirements
- Run the bootstrap installation script for airflow
- Launch the airflow webserver
- Launch the airflow scheduler
- Schedule the spark jobs on Airflow
- Run the tests locally:
python -m pytest
Test are run using pytest and are located in the tests
directory.
Test coverage is provided by pytest-cov and can be run using:
python -m pytest --cov=.
- TODO: Add end to end tests, and integration tests, run the whole application in ci and deploy to a staging environment.
- TODO: Add additional notes about how to deploy this on a live system