Group8 project for DSCI 525 - Web and Cloud Computing as part of the Master of Data Science at UBC.
The goal of the project is to build and deploy ensemble machine learning models to predict daily rainfall in Australia. As part of the course objectives, we will examine the limitations of working on our computers as well as the advantages of doing so on the cloud.
Throughout the project we will be addressing the following milestones:
Milestone 1 (Week 1) - Get the Data from Web & familiarize with advanced file formats
Milestone 2 (Week 3) - Setup S3 bucket, EC2 instance & TLJH
Milestone 3 (Week 4) - Setup EMR-spark instance & rewrite ML model you have from previous milestone in spark
Milestone 4 (Week 5) - Deploy ML model using flask
Please find the report of the project in a notebook here.
The data used for this project is a very large rainfall dataset (>6GB) that can be found in figshare.
The features are outputs of different climate models and the target is the actual observed rainfall. This dataset contains observations from 1889 to 2014.
Contributor | GitHub handle |
---|---|
Rachel Wong | @rachelywong |
Santiago Rugeles | @ansarusc |
Rui Wang | @wang-rui |
Daniel Ortiz | @danielon-5 |
Beuzen. T., Daily rainfall over NSW, Australia (2021): https://figshare.com/articles/dataset/Daily_rainfall_over_NSW_Australia/14096681/3