The main purpose of this project is to showcase the knowledge and skills acquired in the Getting and Cleaning Data Course by Johns Hopkins University at Coursera, as a part of the Data Science Specialization.
The main contents of this repository, as required in the project instructions, are:
- The tidy dataset.
- The R script for performing the analyses.
- The codebook that describes the variables, the data, and the data transformations performed.
- Clone this repository
git clone https://github.com/jclopeztavera/human-activity.git
. - Open the R-project file.
- Source the
run_analysis.R
file . - Drop me a line if you find any areas for improvement.
- Merge the training and the test sets to create one data set.
- Extract only the measurements on the mean and standard deviation for each measurement.
- Use descriptive activity names to name the activities in the data set
- Appropriately label the data set with descriptive variable names.
- From the data set in the previous step , create a second, independent tidy data set with the average of each variable for each activity and each subject.
- Make code book.
- Submit assignment.
- The submitted data set is tidy.
- The Github repo contains the required scripts.
- GitHub contains a code book that modifies and updates the available code books with the data to indicate all the variables and summaries calculated, along with units, and any other relevant information.
- The README that explains the analysis files is clear and understandable.
- The work submitted for this project is the work of the student who submitted it.
- Read the paper behind the data set.
- Properly explore the data set, look at all variables and understand them.
- Describe the data set in detail.
- Hands on ML: Train, test, and compare classification algorithms (besides the paper, you can read this IPython Notebook by Mark Regan)
- R version 3.3.2 (2016-10-31) -- "Sincere Pumpkin Patch".
- Platform: x86_64-apple-darwin13.4.0 (64-bit).
- RStudio Desktop 1.0.136 - makes R easier to use.
- Jeff Leek, Roger D. Peng, and Brian Caffo from the Bloomberg School of Public Health.
- R Core Team (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL.
- Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra and Jorge L. Reyes-Ortiz. A Public Domain Dataset for Human Activity Recognition Using Smartphones. 21th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2013. Bruges, Belgium 24-26 April 2013. URL.