The data sets used in this exercise contain world population evolution (from 1960 to 2017) and the country's income classification for 264 observations. Firstly, the variables in the data sets have been carefully examined to get a proper understanding of the data sets. The two data sets have been merged using the common variable of Country Code. Then, the structure and attributes of the merged data set have been carefully checked. Data types of a few variables have been converted to have a better representation of the data. To make the data set tidier, unnecessary variables for the exercise have been dropped. Also, a few variables have been relabeled to have a better representation. After that, the data set have been transformed from wide format to long format. Subsequently, the missing values and special values in the data set have been appropriately treated. The outliers of the data set have been investigated using the z-score method. The numerical variable of “Total_population” has been checked for its distribution using Histogram and identified that it's not normal, but strongly right-skewed. By using logarithm base e (ln) transformation, this variable has been converted to a normally distributed representation for convenient analysis.
https://data.worldbank.org/indicator/SP.POP.TOTL
MATH2349_1850_Assignment_3_submission.rmd
Contains data-sets used for the project
MATH2349_1850 Assignment 3_s3400652.pdf