Project of Getting and Cleaning Data of Johns Hopkins Univ. on Coursera
There's one run_analysis.R script in this repo. The script merge two datasets and create a new tidy one. It works as follows:
-
The script loads three train data files with
read.table()first. And it loads features.txt as column names of the "x_train" dataset. -
The column names of "x_train" data sets are modified using loaded features.txt. The first two column was named "Activity_Label" and "Subject".
-
Use
cbind()to combine the three data sets and we get a complete train data set. -
Do step 1-3 to test data files. A complete test data set is developed.
-
Use
rbind()to combine train and test data sets. We get the one data set (data1) meeting the first requirement of the project. -
Use
grep()to retrievedata1columns with names containing "mean()" or "std()". Measurements on the mean and standard deviation for each measurement are kept indata2with subjects and activity labels. -
Load activity_labels.txt as
actnamesand merge it withdata2by activity labels. In this step we translate activity labels to activity names. Then we get a new clean data setdata4. -
Use
aggregate()to calculate the average of each variable for each activity and each subject. The result data set is assigned toresult. This is the tidy data set required by the project. -
The script writes the result data set to a text file named
DataSet.txt, which is contained in this repo asResult.txt.
- Data file from the project was unzipped. You should get a folder named UCI HAR Dataset. Put it in your R default working directory with run_analysis.R.
- This R script is developed in R v3.0.2 with Mac OS X 10.9.2. It should work regardless of your operation system.