Getting-and-Cleaning-Data

Course project

You should create one R script called run_analysis.R that does the following.

The first thing to do is download and unzip the data files.

download.file("https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip", "dataset.zip")
unzip("dataset.zip")
oldwd <- getwd()
setwd(file.path(oldwd, "UCI HAR Dataset"))

1. Merges the training and the test sets to create one data set.

Each of the tables is read and they are subsequently merged.

test <- read.table(file.path(".", "test", "X_test.txt"))
train <- read.table(file.path(".", "train", "X_train.txt"))
data <- rbind(test, train)

2. Extracts only the measurements on the mean and standard deviation for each measurement.

4. Appropriately labels the data set with descriptive variable names.

The column labels are read from the appropriate file so the correct variables may be identified. The names are set so only one filter must be applied. A logical vector is obtained from the grepl function, which is used to select the desired variables.

labels <- read.table("features.txt")
labels <- labels[,2]
names(data) <- labels
data <- data[, grepl("mean|std", labels)]

3. Uses descriptive activity names to name the activities in the data set.

The files that identify which activity each row is associated with are read and transformed into an unified table. The dplyr library is loaded so the mutate function can be used. For each row, the name of the activity is looked upon in the appropriate file. These names are then inserted into the dataset.

testActivities <- read.table(file.path(".", "test", "y_test.txt"))
trainActivities <- read.table(file.path(".", "train", "y_train.txt"))
activities <- rbind(testActivities, trainActivities)
library(dplyr)
activities <- as_tibble(activities)
names(activities) <- "code"
activityLabels <- read.table("activity_labels.txt")
activityLabels <- activityLabels[,2]
activities <- mutate(activities, Activity = activityLabels[code])
data <- cbind(activities[,2], data)

5. From the data set in step 4, creates a second, independent tidy data set with the average of each variable for each activity and each subject.

The files that identify the subjects are read and inserted into the dataset. It is then grouped according to subject and activity.

testSubjects <- read.table(file.path(".", "test", "subject_test.txt"))
trainSubjects <- read.table(file.path(".", "train", "subject_train.txt"))
subjects <- rbind(testSubjects, trainSubjects)
names(subjects) <- "Subject"
data2 <- cbind(subjects, data)
data2 <- as_tibble(data2)
data2 <- group_by(data2, Subject, Activity)

Finally, the datasets are written into files. the summarise_all function is used to apply the mean function to all variables according to the previously defined groups.

setwd(oldwd)
write.table(data, "tidydataset.txt", row.names = FALSE)
write.table(summarise_all(data2, funs(mean)), "tidydataset2.txt", row.names = FALSE)

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
CodeBook.md		CodeBook.md
README.md		README.md
run_analysis.R		run_analysis.R
tidydataset.txt		tidydataset.txt
tidydataset2.txt		tidydataset2.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Getting-and-Cleaning-Data

Course project

You should create one R script called run_analysis.R that does the following.

1. Merges the training and the test sets to create one data set.

2. Extracts only the measurements on the mean and standard deviation for each measurement.

4. Appropriately labels the data set with descriptive variable names.

3. Uses descriptive activity names to name the activities in the data set.

5. From the data set in step 4, creates a second, independent tidy data set with the average of each variable for each activity and each subject.

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Getting-and-Cleaning-Data

Course project

You should create one R script called run_analysis.R that does the following.

1. Merges the training and the test sets to create one data set.

2. Extracts only the measurements on the mean and standard deviation for each measurement.

4. Appropriately labels the data set with descriptive variable names.

3. Uses descriptive activity names to name the activities in the data set.

5. From the data set in step 4, creates a second, independent tidy data set with the average of each variable for each activity and each subject.

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages