- Given raw OLTP data of user behaviour, perform transformation to make it easy to do analytics on it.
- You can use any type of infra to setup to perform the task
- Goal is to learn data pipeline.
- Use cron job to perform a task at schedule time
- Write a python script to read and transform the data.
- Use Metabase to let user analyse the data.
- Use DuckDb as Data Warehouse which is a embedded olap database.
- Maintainable transformation
- Easy to schedule a cron job, multiple
- Local directory will acts as data lake, where data given by customer will dumped to transform
- DuckDb to store after transform.
- use schedule library to execute data pipeline at predefined time.
- Deploy metabase on docker.
- Connect DuckDb to metabase.