Name		Name	Last commit message	Last commit date
parent directory ..
01Introduction		01Introduction
02DeepLearning		02DeepLearning
03DataPreprocessing		03DataPreprocessing
04DataWrangling		04DataWrangling
05BigDataCloudPlatform		05BigDataCloudPlatform
06SoftSkills		06SoftSkills
achieve		achieve
.DS_Store		.DS_Store
00-course-setup.r		00-course-setup.r
README.md		README.md

README.md

Preparing Statistician to be Successful Big Data Scientist

Outline/Description: With recent big data revolution, enterprises ranging from FORTUNE 500 to startups across the US are hungry for data scientists to bring valuable business insight from all the data collected. Statisticians are great data scientist candidates, but there are relatively few data scientists with statistics education background. In this short course, we will walk through the needed data science knowledge and skills (such as deep learning and big data platform) through hands-on exercises to prepare statisticians to be successful data scientists. Data science is a combination of science and art with data as the foundation. We will also cover the art part to guide participants to learn typical data science project flow, general pitfalls in data science projects, and soft skills to communicate with business stakeholders effectively. The Databricks community edition cloud platform and R-Studio will be used to cover programming and platform (such as Spark, Hadoop, GPU, SQL, and R) and typical machine learning algorithms (including examples for unsupervised learning and deep learning). The prerequisite knowledge is MS level education in statistics and entry level of R knowledge. This is an enhanced version of a similar highly-rated and full-day training course (CE 11C) offered at JSM 2017 in Baltimore with updated material to reflect students suggestions and new trends in data science.

Introduction to data science. In this section, we will introduce the history and trends in data science. We will list typical requirements for successful data scientist and do an evaluation for participants to find the skill gaps and give recommendations to bridge the gaps. Participants will have a good understanding of what data scientists do and know their skill and knowledge gaps after taking this section.
Deep Learning Lecture. In this section, we will briefly introduce the history of deep learning and the essential concepts that we need to know for deep learning applications. Then we will introuduce the feed forward neural network (FFNN) and Convolutional Neural Network (CNN) with the MNIST hand written digits examples. The R package keras will be used to show how to build FFNN and CNN models.
Data Prepressing Using R Pipe Line. For R users not familar with the R pipe line way of written code, we have this brief introduction section of using R pipe line which will be used in most of the hands on sessiions.
Databricks account setup. In this section, we will walk through the steps to apply and setup a Databricks Community Edition free acount and all the hands on sessions will run in this account.
Deep Learning Hands On. In this section, we will walk through all the steps to (1) create a cloud computing node, (2) create a notebook using R, (3) import a notebook of deep learning applications with FFNN and CNN, (4) step-by-step illustration of FFNN model, (5) step-by-step illustration of CNN model.
Data Preprocessing & Wrangling. In this section, we will walk through major steps in data peprocessing and wrangling.
Big Data Cloud Platform. In this section, we will introduce the big data cloud plaftorm and steps to use R to directly interact with Spark dataframes for big data applications.
Data Preprocessing & Wrangling Hands On. In this section, we will use Databricks community edition to walk through the steps.
Big Data Cloud Platform Hands On. In this section, we will use Databricks community edition to walk through the steps.
Soft Skills and Data Science Project Cycle. In this section, we will introduce the needed soft skills that are essential in data science projects at enterprise environments. We will talk about basic project management skills with agile concepts and how to effectively communicate with business partners to define and solve data science problems. We will illustrate how to lead with confidence given the strong technical background that statisticians have.
Build static personal website using SSG+Netlify. In this section, we will introdue how to quickly build your personal-professional website for future career advancement opportunities.

Tentative Schedule

Topic	Time
Introduction	15 min
Deep Learning Lecture	15 min + 45 min + 45 min
Break	20 min
R Pipe `%>%`	10 min
Databricks account setup	30 min
Deep Learning Hands on Session	60 min
Lunch break
Data Preprocessing & Wrangling	45 min
Data Preprocessing & Wrangling Hands on	30 min
Big Data Cloud Platform Lecture	45 min
Break	20 min
Big Data Cloud Platform Hands on	60 min
Soft Skill and Project Cycle	20 min
Build static personal website using SSG+Netlify	20 min

Useful links:

Databrick free community edition account open: https://accounts.cloud.databricks.com/registration.html#signup/community
Notebook that contains all the steps in the deep learning section: https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/2961012104553482/1430974153588517/1806228006848429/latest.html
Notebook that contains all the steps in the big data platform section: https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/2961012104553482/3725396058299890/1806228006848429/latest.html
Data preprocessing code: https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/2961012104553482/3241206203474646/1806228006848429/latest.html
Data wrangling code: https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/2961012104553482/3241206203474687/1806228006848429/latest.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2018_06_ShortCourse

2018_06_ShortCourse

README.md

Preparing Statistician to be Successful Big Data Scientist

Files

2018_06_ShortCourse

Directory actions

More options

Directory actions

More options

Latest commit

History

2018_06_ShortCourse

Folders and files

parent directory

README.md

Preparing Statistician to be Successful Big Data Scientist