Skip to content

My journey through Udacity & Bertelsmann Data Science Scholarship

Notifications You must be signed in to change notification settings

TianaQ/udacity-bertelsmann-ds-challenge

Repository files navigation

Here are some code and the projects I made during my journey through Udacity & Bertelsmann Data Science Scholarship 2018/2019.

Challenge Phase

r-and-python-in-rmarkdown - an example of combining R and Python in one R Markdown document, made for a forum discussion.

See output in html

ubdsc-group-projects - files and scripts created for group project activities. Some exploration analysis was conducted on Boston fires data. The analysis of marketing freelancers offering their services online was submitted as a group project. My part was data cleaning with Python, data exploration with Python/R and data visualisations.

See Freelancers For Marketing presentation


Data Foundations Nanodegree

DFND Syllabus | DFND Certificate

After the challenge phase I was accepted to the full scholarship but due to the sorting process got into Data Foundations Nanodegree (recently rebranded as Business Analytics Nanodegree). It took me 5 days to finish it so after the graduation I was granted an upgrade to Data Analyst Nanodegree. Here are the topics covered and the projects I made for DFND.

dfnd-descriptive-stats - the project required using spreadsheets to practice descriptive statistics and analyse Udacity students survey data. I cleaned the students survey data in spreadsheets and performed exploratory analysis. I examined the characteristics of the respondents, their course preferences and the time they spent to complete their projects.

See Survey analysis report


dfnd-sql - the project required applying SQL to explore data in Chinook sample database. I used SQL and R together in R Markdown to pull data from the database with queries and to use the queries' results for further analysis and visualisations (the output included in the final presentation can be seen here. I explored the distribution of music albums by price and genres, the popularity of different genres in the USA and the most favoured composers.

See Music SQL database analysis report


DFND Tableau project can be found here. I used the data of US Census 2015 to visualise regional differences in the United States in terms of population, income and other aspects and presented them as a Tableau story.


Data Analyst Nanodegree

DAND Syllabus | DAND Certificate

dand-sql - the project required applying SQL to obtain data for chosen cities from the database of average temperatures in the student workspace as .csv files, and describing the trends. I recreated the database locally for selected data to conduct EDA in R and produce the report using R Markdown.

See Weather trends report


dand-intro-to-python - the project required to investigate US Bikeshare data using Python basics: using built-in data structures, writing functions, using libraries, etc. I explored the data in terms of time patterns, location differences and client status.

See US Bikeshare data analysis


dand-data-investigation - the project required applying the methods of exploratory data analysis to the chosen data from Gapminder.org, using Python libraries - pandas, numpy and matplotlib. I chose maternal mortality data for the data investigation project because of my previous experience in demographic studies. I collected from Gapminder data on maternal mortality and several related topics and wrangled and analysed them in Python using pandas, NumPy and matplotlib. I examined and visualised global tendencies in maternal mortality in 1980-2013, identified regional and economic patterns and also a number of possible influencing factors.

See Maternal mortality data investigation project


dand-practical-stats - the program included two projects which required statistics methods applications for hypothesis testing, the project structure was pre-set in both cases.
For the first project I performed A/B tests to compare conversion from the old and new versions of a web page. Also I applied logistic regression to estimate the conversion rate depending on the page version and user locations.
For the second project I conducted some statistical tests to make conclusions about the results of an experiment that explored the perceptual phenomenon called Stroop effect. It describes the delay in reaction when reading words whose ink colors don't match the meaning (e.g. "red" written in blue). The tests proved the difference in reading time for such words to be statistically significant.

See A/B testing project and Stroop effect analysis


dand-EDA-in-R - the project required performing exploratory data analysis on a chosen data set, using R and R Markdown. I did data cleaning in R and explored Prosper's loans data in different dimensions with univariate, bivariate and multivariate plots created with ggplot2 library. I determined three stages in the company's performance with specific features, differences between loans based on loan terms, explored borrowers' characteristics and number of investors for different loan amounts.

See Prosper loans exploration report


dand-data-wrangling - the project required applying the methods of data gathering, assessing and cleaning to the data from @dog_rates Twitter account, using Python, and performing exploratory data analysis on the cleaned data. For this project I gathered data programmatically from various sources, including Twitter API, assessed data to prepare the list of data cleaning steps and performed data cleaning. After that I explored WeRateDogs tweets for popularity in terms of likes and retweets, dog ratings and dog stages. If you'd like to have a short break and some fun right now, check the project report for the most popular tweet of WeRateDogs. Don't forget to come back!

See WeRateDogs data wrangling project


DAND Tableau project can be found here. It is based on exploratory analysis of Prosper's loans data, which was the project for EDA in R course (see above). Tableau project write-up, explaining design choices, which also contains EDA summary, is available here.

Releases

No releases published

Packages

No packages published