Machine Learning Course Project

Mackenzie Wildman
2/26/17

Data

The training and testing data sets are imported using the option na.strings=c('#DIV/0','','NA') to convert #DIV/0 values into NA values. Next all columns containing a significant number of NA values are removed.

Velloso, E.; Bulling, A.; Gellersen, H.; Ugulino, W.; Fuks, H. Qualitative Activity Recognition of Weight Lifting Exercises. Proceedings of 4th International Conference in Cooperation with SIGCHI (Augmented Human '13) . Stuttgart, Germany: ACM SIGCHI, 2013.

Read more: http://groupware.les.inf.puc-rio.br/har#wle_paper_section#ixzz4Zpw7sGI3

Preprocessing

Based on a boxplot of the classe variable, the outcome variable is not significantly skewed and the standard deviation is low (1.5) so no standardizing is applied to the classe variable. Results of Principle Component Analysis show that there is a high correlation among variables, for example, there is a clustering of the variables accel_belt_y and accel_belt_z that is explained by the variable roll_belt. Therefore, preprocessing with Principal Component Analysis is implemented into the model.

Cross validation

Applying the simple holdout method, training data set is split into two data sets - 80% to be used for training, 20% to be used for testing. On the training data set, compared the accuracy of a tree-based model with multi-class linear model. The tree model was built using the "rpart" method in caret package. The linear model was built using the function multinom() in the nnet package. Considered omission of various columns in both models. After recognizing significant clustering of the data according to the user_name variable, combined both methods to classify data according to user_name, then applying a linear model to each subset of observations. Applying this model to the testing data set results in 87.23% accuracy. Therefore, the out-of-sample estimation error is estimated to be 1-0.8723 = 12.77%.

Final model

First, a decision tree is applied according to the user_name predictor. Next, a multinomial log-linear model is applied to the corresponding user_name. All variables are used as predictors with the exception of X, td_timestamp, raw_timestamp_part_1, raw_timestamp_part_2, and new_window. This model is applied to pre-processed data, using the preProc="pca" method. The accuracy as tested on the testing data set, pml-testing.csv, is 90%.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
analysisCompiled_files		analysisCompiled_files
README.md		README.md
exercisePredictor.html		exercisePredictor.html
exercisePredictor.rmd		exercisePredictor.rmd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning Course Project

Data

Preprocessing

Cross validation

Final model

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Course Project

Data

Preprocessing

Cross validation

Final model

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages