Take Home Assignment For Data Engineering Position at Toggl
- Postgres as database
- Python
- Alembic
- Sqlalchemy
- psycopg2
- Docker
You will need installed docker and docker-compose to run it locally
git clone https://github.com/mnurpeiissov/toggl-data-engineer.git
cd toggl-data-engineer
docker-compose up --build
This will:
- create a postgres database
- create
usa_jobs
table - run initial data ingestion
- run cron on foreground
- run data ingestion script based on cron schedule
Cron schedule can be adjusted on cron-schedules/crontab
There are various ways to deploy the pipeline in the cloud (I will consider Google Cloud Platform)
-
Deploy on the VM. The most straighforward way is to deploy it on the cloud VM, which is very similary to running it locally
-
Cloud Run and Cloud Scheduler
* Create a Dockerfile that defines the environment and dependencies for your application. * Build your Docker image using the docker build command. * Push your Docker image to a container registry, such as Google Container Registry. * Create a Cloud Run service. * In the Cloud Run service configuration, specify the Docker image that you want to deploy. * Click Deploy. * Configure Cloud Scheduler with schedule and url of cloud run service
-
GKE
* Create a GKE cluster. * Create a Kubernetes Deployment (Cron Job) for your application. * In the Kubernetes Deployment configuration, specify the Docker image that you want to deploy. * Apply the Kubernetes Deployment.