Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
eeholmes authored Oct 19, 2023
0 parents commit 6450bb1
Show file tree
Hide file tree
Showing 38 changed files with 1,924 additions and 0 deletions.
30 changes: 30 additions & 0 deletions .devcontainer/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Pre-built Dev Container Image for R. More info: https://github.com/rocker-org/devcontainer-images/pkgs/container/devcontainer%2Ftidyverse
# Available R version: 4, 4.1, 4.0
ARG VARIANT="4.2"
FROM ghcr.io/rocker-org/devcontainer/tidyverse:${VARIANT}

RUN install2.r --error --skipinstalled -n -1 \
statip \
patchwork \
paletteer \
here \
doParallel \
janitor \
vip \
ranger \
palmerpenguins \
skimr \
nnet \
kernlab \
plotly \
factoextra \
cluster \
tidymodels \
markdown \
ottr \
&& rm -rf /tmp/downloaded_packages \
&& R -q -e 'remotes::install_github("https://github.com/dcomtois/summarytools/tree/0-8-9")'

# Install Python packages
COPY requirements.txt /tmp/pip-tmp/
RUN python3 -m pip --disable-pip-version-check --no-cache-dir install -r /tmp/pip-tmp/requirements.txt
55 changes: 55 additions & 0 deletions .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
{
"name": "R Data Science Environment",
"build": {
"dockerfile": "Dockerfile",
// Update VARIANT to pick a specific R version: 4, 4.1, 4.0
// More info: https://github.com/rocker-org/devcontainer-images/pkgs/container/devcontainer%2Ftidyverse
"args": { "VARIANT": "4" }
},

// Install Dev Container Features. More info: https://containers.dev/features
"features": {
"ghcr.io/rocker-org/devcontainer-features/quarto-cli:1": {},
// Install JupyterLab and IRkernel.
// More info: https://github.com/rocker-org/devcontainer-templates/tree/main/src/r-ver
"ghcr.io/rocker-org/devcontainer-features/r-rig:1": {
"version": "none",
"installJupyterlab": true
}
},

"customizations": {
"vscode": {
"extensions": [
// Add Jupyter and Python vscode extensions
"ms-toolsai.jupyter",
"ms-toolsai.jupyter-renderers",
"ms-python.python",
"ms-python.vscode-pylance",
"vsls-contrib.codetour",
"GitHub.copilot"
]
}
},

// Forward Jupyter and RStudio ports
"forwardPorts": [8787, 8888],
"portsAttributes": {
"8787": {
"label": "Rstudio",
"requireLocalPort": true,
"onAutoForward": "ignore"
},
"8888": {
"label": "Jupyter",
"requireLocalPort": true,
"onAutoForward": "ignore"
}
},

// Use 'postAttachCommand' to run commands after the container is started.
"postAttachCommand": "sudo rstudio-server start"

// Uncomment to connect as root instead. More info: https://aka.ms/dev-containers-non-root
// "remoteUser": "root"
}
14 changes: 14 additions & 0 deletions .devcontainer/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
pybryt
pylint
datascience
otter-grader
numpy
pandas
scipy
folium>=0.9.1
matplotlib
ipywidgets>=7.0.0
bqplot
nbinteract>=0.0.12
okpy
scikit-learn
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
.Rproj.user
.Rhistory
.RData
.Ruserdata
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2022 David Smith

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
59 changes: 59 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Zero-setup R workshops with GitHub Codespaces

This is the repository supporting the presentation "Zero-setup R workshops with GitHub Codespaces".

* Presenter: [David Smith](https://www.linkedin.com/in/dmsmith/), Cloud Advocate at Microsoft
* Presented at: [rstudio::conf, July 28 2022](https://rstudioconf2022.sched.com/event/11iag/zero-setup-r-workshops-with-github-codespaces)
* Presentation slides: [PDF](ZeroSetupWorkshopsRStudioConf2022.pdf)
* Presentation video: [YouTube](https://www.youtube.com/watch?v=2uXLikk30Ew) | [RStudio](https://www.rstudio.com/conference/2022/talks/zero-setup-r-workshops-github/)

You can recreate the demos in the talk using the steps outlined below.

## Dev Containers in GitHub Codepaces

If you have access to GitHub CodeSpaces, click the green "<> Code" button at the top right on this repository page, and then select "Create codespace on main". (GitHub CodeSpaces is available with [GitHub Enterprise](https://github.com/enterprise) and [GitHub Education](https://education.github.com/).)

Now, browse to the file [explore-analyze-data-with-R/solution/challenge-Data_Exploration.ipynb](explore-analyze-data-with-R/solution/challenge-Data_Exploration.ipynb). Work through the Jupyter Notebook.

To open RStudio Server, click the Forwarded Ports "Radio" icon at the bottom of the VS Code Online window.

![Forwarded Ports](img/forwarded_ports.png)

In the Ports tab, click the Open in Browser "World" icon that appears when you hover in the "Local Address" column for the Rstudio row.

![Ports](img/ports.png)

This will launch RStudio Server in a new window. Log in with the username and password `rstudio/rstudio`.

* NOTE: Sometimes, the RStudio window may fail to open with a timeout error. If this happens, try again, or restart the Codepace.

In RStudio, use the File menu to open the `/workspaces`, folder and then browse to open the file `devcontainers-rstudio` / `explore-analyze-data-with-R` / `solution` / `all-systems-check` / `test.Rmd`. Use the "Knit" submenu to "Knit as HTML" and view the rendered "R Notebook" Markdown document.

* Note: You may be prompted to install an updated version of the `markdown` package. Select "Yes".

# Resources and Links

* [GitHub Codespaces](https://github.com/features/codespaces) - Available with GitHub Enterprise and GitHub Education
* [Microsoft Workshop Library](https://github.com/microsoft/workshop-library) - The source of the workshop "Explore and analyze data with R" included in this presentation
* [Rocker](https://www.rocker-project.org/) - Containers for R
* [Dev Containers](https://containers.dev/) - Overview and specification
* [Dev Containers in Visual Studio Code](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers) - Remote-Containers extension
* [Visual Studio Code](https://code.visualstudio.com/) - Free editor available for Windows, Mac and Linux
* Related talk: [Easy R Tutorials with Dev Containers](https://github.com/revodavid/devcontainers-r). This talk provides information on running Dev Containers in a local environment with Visual Studio Code.
# Thanks to

* えいつぴ (@[eitsupi](https://twitter.com/eitsupi)): For [helpful info on using RStudio in a Rocker container](https://www.rocker-project.org/images/versioned/rstudio)
* Eric Nantz ([R-Podcast](https://r-podcast.org/)): For the episode "[Fully containerized R dev environment with Docker, RStudio, and VS-Code](https://www.youtube.com/watch?v=4wRiPG9LM3o)"

## Image Credits

Images used in presentation slides:
* [File:A frustrated and depressed man holds his head in his hand.jpg - Wikimedia Commons](https://commons.wikimedia.org/wiki/File:A_frustrated_and_depressed_man_holds_his_head_in_his_hand.jpg)
* [File:Confused Felipe.jpg - Wikimedia Commons](https://commons.wikimedia.org/wiki/File:Confused_Felipe.jpg)
* [File:Woman looking depressed.jpg - Wikimedia Commons](https://commons.wikimedia.org/wiki/File:Woman_looking_depressed.jpg)
* [File:Angry woman.jpg - Wikimedia Commons](https://commons.wikimedia.org/wiki/File:Angry_woman.jpg)
* "Bit" artwork by Ashley Willis

# Feedback

If you have any comments or suggestions about this presentation, please leave an issue in this repository.
Binary file added ZeroSetupWorkshopsRStudioConf2022.pdf
Binary file not shown.
13 changes: 13 additions & 0 deletions devcontainers-rstudio.Rproj
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
Version: 1.0

RestoreWorkspace: Default
SaveWorkspace: Default
AlwaysSaveHistory: Default

EnableCodeIndexing: Yes
UseSpacesForTab: Yes
NumSpacesForTab: 2
Encoding: UTF-8

RnwWeave: Sweave
LaTeX: pdfLaTeX
136 changes: 136 additions & 0 deletions explore-analyze-data-with-R/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
# Explore and analyze data with R

## Module Source
[Explore and analyze data with R](https://docs.microsoft.com/en-us/learn/modules/explore-analyze-data-with-r/?WT.mc_id=academic-59300-cacaste)

## Goals

Hello and welcome to this learning adventure! In this folder, you will find a Data Exploration Notebook. This is an autograding guided assessment notebook that will help you test your understanding in using R to explore and analyze data! We hope that you will find that R, is at its heart, a beautiful and elegant language for Data Science.

| **Goal** | Description |
| ----------------------------- | -----------------------------------------------|
| **What will you learn** | How to use R to explore and analyze data |
| **What you'll need** | [Visual Studio Code](https://code.visualstudio.com?WT.mc_id=academic-59300-cacaste), [Docker Desktop](https://www.docker.com/products/docker-desktop), [Remote Developer Extension](https://aka.ms/vscode-remote/download/extension) and [Git](https://git-scm.com/downloads) |
| **Duration** | 2 hours |
| **Slides** | [Powerpoint](./slides.pptx) |

## Video

[![workshop walk-through](./images/promo.png)](https://youtu.be/VrVHaxarniY "workshop walk-through")
> 🎥 Click this image to watch Carlotta walk you through the workshop material and to gain some tips about delivering this workshop.
## Pre-Learning

This workshop allows learners to use the skills learnt in the module [Explore and analyze data with R](https://docs.microsoft.com/en-us/learn/modules/explore-analyze-data-with-r/?WT.mc_id=academic-59300-cacaste) to perform data analysis and visualization. As such, learners are encouraged to go through the module beforehand so as to be conversant with some of the concepts covered in this workshop.

## Prerequisites

To get you up and running and writing R code in no time, we have containarized this workshop such that you have a ready out of the box R coding environment.

### Setting up the development container

A **development container** is a running [Docker](https://www.docker.com) container with a well-defined tool/runtime stack and its prerequisites. You can try out development containers with **[GitHub Codespaces](https://github.com/features/codespaces)**, **[Binder](https://mybinder.org/)** or **[Visual Studio Code Remote - Containers](https://aka.ms/vscode-remote/containers)**.

#### GitHub Codespaces
Follow these steps to open this workshop in a Codespace:
1. Click the Code drop-down menu and select the **Open with Codespaces** option.
2. Select **+ New codespace** at the bottom on the pane.

For more info, check out the [GitHub documentation](https://docs.github.com/en/free-pro-team@latest/github/developing-online-with-codespaces/creating-a-codespace#creating-a-codespace).

#### Binder
This workshop is also available on Binder. To open the notebook in a Binder environment, just click the button below.

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/carlotta94c/workshop-library/workshop-binding?labpath=%2Ffull%2Fexplore-analyze-data-with-R%2Fsolution%2Fchallenge-Data_Exploration.ipynb)

#### Learn Sandbox
You can go through this challenge also leveraging on the Learn Sandbox environment, provided by [Unit 9](https://docs.microsoft.com/en-us/learn/modules/explore-analyze-data-with-r/9-challenge-data-exploration) of the MS Learn module - Explore and analyze data with R. Just sign in with your Microsoft or GitHub account and click on **Activate sandbox** to start.

#### VS Code Remote - Containers
Follow these steps to open this workshop in a container using the VS Code Remote - Containers extension:

1. If this is your first time using a development container, please ensure your system meets the pre-reqs (i.e. have Docker installed) in the [getting started steps](https://aka.ms/vscode-remote/containers/getting-started).

2. Press <kbd>F1</kbd> select and **Add Development Container Configuration Files...** command for **Remote-Containers** or **Codespaces**.

> **Note:** If needed, you can drag-and-drop the `.devcontainer` folder from this sub-folder in a locally cloned copy of this repository into the VS Code file explorer instead of using the command.
3. Select this definition. You may also need to select **Show All Definitions...** for it to appear.

4. Finally, press <kbd>F1</kbd> and run **Remote-Containers: Reopen Folder in Container** to start using the definition.

This definition includes some test code that will help you verify it is working as expected on your system. Open the `all-systems-check` folder where you can choose to run the `.R`, `.Rmd` or `.ipynb` scripts. You should see "Hello, remote world!" in an R terminal window (for `.R` and `.Rmd`) or within a Jupyter Notebook (for `.ipynb`) after the respective script executes.

At some point, you may want to make changes to your container, such as installing a new package. You'll need to rebuild your container for your changes to take effect.

## What you will learn

Let's say the Department of Transportation is considering the addition of a new airport. As the incredible data scientist you are, you have been requested to explore existing data. The results of your analysis might form the basis of a report or a machine learning mode.

In this challenge, you'll explore a real-world dataset containing flights data from the US Department of Transportation.

## Milestone 1: Clean the data

Rarely we find data in the right form for analysis. As such, once you’ve imported your data, a good place to start your analysis is by answering to the question: "Is the data accurate and appropriate for your desired analysis?". Cleaning data to handle errors, missing values, and other issues pays off in the long run and allows for easier and more accurate Exploratory Data Analysis.

In this section you will:

- Identify any null or missing data, and impute appropriate replacement values.

- Identify and eliminate any outliers in the DepDelay and ArrDelay columns.

## Milestone 2: Exploratory Data Analysis (EDA)

Exploratory data analysis (EDA) is used by data scientists to analyze and investigate data sets and summarize their main characteristics (e.g distribution), often by visualizing and transforming data.


In this section you will:

- View summary statistics for the numeric fields in the dataset.

- Determine the distribution of the DepDelay and ArrDelay columns.


### More EDA

The goal of EDA is to develop a better understanding of your data. More often than not, EDA will involve formulating some probing questions about your data, searching for answers by visualizing and transforming data and finally using the understanding gained to refine questions, drop the questions entirely and/or generate new questions.

In this section, you will:

- Use statistics, aggregate functions, and visualizations to answer the following questions:

- What are the average (mean) departure and arrival delays?

- How do the carriers compare in terms of arrival delay performance?

- Is there a noticeable difference in arrival delays for different days of the week?

- Which departure airport has the highest average departure delay?

- Do late departures tend to result in longer arrival delays than on-time departures?

- Which route (from origin airport to destination airport) has the most late arrivals?

- Which route has the highest average arrival delay?

## Quiz

Test your knowledge with [a short quiz](https://docs.microsoft.com/en-us/learn/modules/explore-analyze-data-with-r/8-knowledge-check)!

## Next steps

Congratulations on finishing this challenge 🏅!

There are other workshops around using R for Data Science. In this workshop, we learnt how to clean data, visualize data and transform data to derive insights and knowledge. The next set of workshops will show you how to [create regression models](../intro-regression-R-tidymodels), [create classification models](../intro-classification-R-tidymodels) and create clustering models (coming soon!). Be sure to check them out!

## Practice

In this workshop, you used already provided questions to guide your EDA. Sometimes this is not the case. Try generating questions of your own and answering them using the data visualization and transformation skills you have acquire in this module. What new insights do you reveal?


## Feedback

Be sure to give [feedback about this workshop](https://forms.office.com/r/MdhJWMZthR)! Happy Learning!

[Code of Conduct](../../CODE_OF_CONDUCT.md)

Binary file added explore-analyze-data-with-R/images/promo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added explore-analyze-data-with-R/slides.pptx
Binary file not shown.
2 changes: 2 additions & 0 deletions explore-analyze-data-with-R/solution/.gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Auto detect text files and perform LF normalization
* text=auto
Binary file not shown.
3 changes: 3 additions & 0 deletions explore-analyze-data-with-R/solution/.vs/ProjectSettings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"CurrentProjectSetting": null
}
11 changes: 11 additions & 0 deletions explore-analyze-data-with-R/solution/.vs/VSWorkspaceState.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
{
"ExpandedNodes": [
"",
"\\.devcontainer",
"\\.devcontainer\\library-scripts",
"\\all-systems-check",
"\\tests"
],
"SelectedNode": "\\challenge-Data_Exploration.ipynb",
"PreviewInSolutionExplorer": false
}
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Function that returns an awesome message
say_hello <- function(name) {
message(paste0("Hello, ", name, ":) In this module, we learn how to Explore
and Analyze Data with R."))
}

say_hello("remote world")
Loading

0 comments on commit 6450bb1

Please sign in to comment.