Getting-Data-Course-Project

Course project for "Getting and Cleaning Data" on Coursera, September 2014.

The run_analysis.R script accomplishes the goals set forward in the course project description. Specifically, the script:

Loads and combines the data sets
Extracts measurements on the mean and standard deviation for each measurement.
Adds descriptive activity names to the data set.
Appropriately labels the data set with descriptive variable names.
Creates a new tidy data set with the average of each variable for each activity and subject.

Comments throughout the script indicate which goal is accomplished by each snippet of code. The exception is Part 4, which is accomplished at several stages throughout the script.

The script begins by loading the "dplyr" package.

The script looks for the appropriate .txt files to be located in a subdirectory of the working directory named "./UCI HAR Dataset". The structure of this subdirectory should match the data files as downloaded from the original source.

Next, the script creates a table based on the "features.txt" file to collect the variable names. It then creates a vector (names "labelsVector") containing the variable names.

The next two snippets of code load and combine the test and train data sets. Each snippet loads the subject, X, and Y files and then combines all three into a data.frame. Lastly, each snippet renames the variables in the newly combined data.frame. The last line of code from this section combines the test and train data sets into a single data set, called "Data".

Part 2 of the script extracts measurements on the mean and standard deviation for each measurement.

A select() function is used to select all variables containing the phrase "mean()", which are stored to "meanVariables".
A select() function is used to select all variables containing the phrase "std()", which are stored to "stdVariables".
A select() function is used to select the Subject_ID and Activity columns, which are stored to "IDandActivity".
Data is rewritten using chained cbind() functions so that our data set only includes the required variables.

Part 3 of the script uses a series of sub() commands to substitute the activity names for the coded numbers in the original data set.

Part 5 of the script creates the new tidy data set using an aggregate() command. This section also corrects the variable names of the newly created "tidyDataSet" data.frame.

Lastly, the script uses write.table() to write the new tidy data set to the working directory. The output is named "Variable-Avg-by-Activity-and-Subject.txt".

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
CodeBook.md		CodeBook.md
README.md		README.md
run_analysis.R		run_analysis.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Getting-Data-Course-Project

About

Releases

Packages

Languages

JasonParker/Getting-Data-Course-Project

Folders and files

Latest commit

History

Repository files navigation

Getting-Data-Course-Project

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages