Skip to content

AndreiVoicu04/Weather-Forecast

 
 

Repository files navigation

Weather-Forecast

Setup

Fork the repository, then clone it on your local machine. Inside the root folder run poetry install. That should install all the dependencies.

We will be using pandas a powerful Python framework that allows us to create data structures that can hold datasets. We will also be using Numpy and Seaborn. Take a bit of time to read through those!

Tasks

We will attempt to forecast the weather conditions in Seattle. We can find the dataset in 'seattle-weather.csv'.

Task 1 - Dataset

Looking through a CSV is very boring, but at the same time we should also know what kind of data we have at our disposal.

Ah, I know! I think Pandas has some kinds of functions to help out with that. Check out pandas.Dataframe.info.

Now that I think of it, we should check if there are any null or duplicate rows in our dataset.

And what are our min_temp and max_temp? Or what is the most common weather condition?

Use the dataset_info function to display whatever information you find useful about the dataset!

Task 2 - Exploratory analysis

I mean tables are not that much more interesting than CSVs. But you know what is?

✨ PLOTS ✨

Let's make lots and lots of plots and add them to the investigation file with a proper description.

Let's see what the most common max temperatures are by creating a plot with temp_max(X-axis) by count(Y-axis) with histplot.

That doesn't tell us much, so we could try using a FacetGrid in combination with a lineplot. We should create two new columns in our dataframe by extracting the year and the month. Don't forget to change the date column to datetime type! It should look similar to this:

lineplot.png

What about something similar for precipitation, but instead of a lineplot, use a scatterplot?

scatterplot.png

Lastly, we should know the distribution of weather in our dataset. Create a countplot (how many of each there are) and a pie chart (what percentages do they take out of the dataset). You can use either matplotlib or seaborn for this.

Task 3 - Weather prediction

Now it's time for the interesting part: Can we properly predict the weather? To do that we need to split our data in a training and a testing sub-datasets and train a model to do that prediction.

The gist of it is pretty simple:

  • Have data
  • Split data into training and testing splits - usually we use 80% of the data for training and 20% of it for testing - Use the train_test_split method
  • Choose a model! Any model! There are soooo many models! Choose the relevant features, fit the model and make the prediction.
  • Now what? Well...we test it! Some common prediction metrics are R2 score or Mean Squared error

Let's juggle with a couple of concepts and variations:

scatterplot_prediction.png

Don't shy away from juggling with the parameters, like train-test splits or any other. Or feel free to experiment other models as well. Please document all the work done in the investigation file and also add all the plots there. You can treat this exercise as a small research assignment!

Stretch goals

Okay, but can we actually see the future? Play with lagged features and predict the weather for the next week and the next month. Find a nice way to visualise this!

What about something...bigger...smarter...

What about...deep learning.

If there is enough time have a crack at solving this problem using a Convolutional Neural Network. How hard could it be?

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%