Skip to content

Latest commit

 

History

History
238 lines (141 loc) · 6.1 KB

File metadata and controls

238 lines (141 loc) · 6.1 KB

Reproducible Research: Peer Assessment 1

Loading and preprocessing the data

The following code reads in the data, formats the date column, creates a weekday column and presents summary statistics:

data <- read.csv(file = "activity.csv", stringsAsFactors = F)

data$date <- as.Date(data$date, format = "%Y-%m-%d")

data$week <- weekdays(data$date)

str(data)
## 'data.frame':	17568 obs. of  4 variables:
##  $ steps   : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ date    : Date, format: "2012-10-01" "2012-10-01" ...
##  $ interval: int  0 5 10 15 20 25 30 35 40 45 ...
##  $ week    : chr  "Monday" "Monday" "Monday" "Monday" ...
library(pastecs)
## Loading required package: boot
options(scipen = 100)
options(digits = 2)
summ <- stat.desc(data[, -c(2, 4)])
summ
##                  steps    interval
## nbr.val       15264.00    17568.00
## nbr.null      11014.00       61.00
## nbr.na         2304.00        0.00
## min               0.00        0.00
## max             806.00     2355.00
## range           806.00     2355.00
## sum          570608.00 20686320.00
## median            0.00     1177.50
## mean             37.38     1177.50
## SE.mean           0.91        5.22
## CI.mean.0.95      1.78       10.24
## var           12543.00   479491.88
## std.dev         112.00      692.45
## coef.var          3.00        0.59

What is mean total number of steps taken per day?

We first compute the total steps for each day, aggregating the data for the different intervals within each day.

totalstepsday <- aggregate(data$steps, by = list(data$date), function(x) sum(x, 
    na.rm = T))

Then we present the boxplot and the histogram of the total steps for each day.

nf <- layout(mat = matrix(c(1, 2), 2, 1, byrow = TRUE), height = c(1, 1.5))
par(mar = c(3, 3, 0.2, 0.2))
boxplot(totalstepsday$x, horizontal = TRUE, outline = TRUE, ylim = c(0, 26000), 
    col = "lightblue", type = 3)
hist(totalstepsday$x, nclass = 20, xlab = "", ylab = "Frequency", col = "lightblue", 
    main = "", xlim = c(0, 26000))

plot of chunk histtotalsteps

The mean is 9354.23 and the median is 10395.

What is the average daily activity pattern?

We compute the average number of steps for each interval, across days.

totalinterval <- aggregate(data$steps, by = list(data$interval), function(x) mean(x, 
    na.rm = T))

The plot presents a time series of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all days (y-axis).

plot(y = totalinterval$x, x = totalinterval$Group.1, xlab = "Interval", ylab = "Average number of steps", 
    type = "l")

plot of chunk timeseries

activeinterval <- totalinterval[which(totalinterval$x == max(totalinterval$x)), 
    1]

The 5-minute interval, on average across all the days in the dataset, that contains the maximum number of steps is 835.

Imputing missing values

miss <- length(which(is.na(data$steps)))

The dataset has 17568 observations and 2304 missing observations.

The missing information for the intervals will be replaced with the median number of steps for the interval across all days.

imputat <- aggregate(data$steps, by = list(data$interval), function(x) median(x, 
    na.rm = T))

datacomplete <- data[!is.na(data$steps), ]
datamissing <- data[is.na(data$steps), ]

dataimputed <- datamissing
dataimputed$steps <- imputat[match(datamissing$interval, imputat[, 1]), 2]

datacompleteimputed <- rbind(datacomplete, dataimputed)

We then compute the total steps for each day, aggregating the data for the different intervals within each day, now considering the dataset including the imputed values.

totalstepsday <- aggregate(datacompleteimputed$steps, by = list(datacompleteimputed$date), 
    function(x) sum(x))

Boxplot and the histogram of the total steps for each day.

nf <- layout(mat = matrix(c(1, 2), 2, 1, byrow = TRUE), height = c(1, 1.5))
par(mar = c(3, 3, 0.2, 0.2))
boxplot(totalstepsday$x, horizontal = TRUE, outline = TRUE, ylim = c(0, 26000), 
    col = "lightblue", type = 3)
hist(totalstepsday$x, nclass = 20, xlab = "", ylab = "Frequency", col = "lightblue", 
    main = "", xlim = c(0, 26000))

plot of chunk histtotalstepsimput

The mean is 9503.87 and the median is 10395. The mean is now bigger than when excluding the missing values, but the median is the same.

Are there differences in activity patterns between weekdays and weekends?

We create e variable indicating "Weekend" or "Weekday":

datacompleteimputed$weekend <- ifelse(datacompleteimputed$week %in% c("Saturday", 
    "Sunday"), "Weekend", "Weekday")

We compute the average number of steps for each interval, across weekend days.

totalintervalweekend <- aggregate(datacompleteimputed$steps[datacompleteimputed$weekend == 
    "Weekend"], by = list(datacompleteimputed$interval[datacompleteimputed$weekend == 
    "Weekend"]), function(x) mean(x))

We compute the average number of steps for each interval, across weekday days.

totalintervalweekday <- aggregate(datacompleteimputed$steps[datacompleteimputed$weekend == 
    "Weekday"], by = list(datacompleteimputed$interval[datacompleteimputed$weekend == 
    "Weekday"]), function(x) mean(x))

The plot presents the time series of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all days (y-axis), separately for weekends and weekdays.

tmp <- max(c(totalintervalweekend$x, totalintervalweekday$x)) + 1
nf <- layout(mat = matrix(c(1, 2), 2, 1, byrow = TRUE))
par(mar = c(3, 3, 3, 3))

plot(y = totalintervalweekend$x, x = totalintervalweekend$Group.1, xlab = "Interval", 
    ylab = "Average number of steps", type = "l", main = "Weekends", ylim = c(0, 
        tmp))

plot(y = totalintervalweekday$x, x = totalintervalweekday$Group.1, xlab = "Interval", 
    ylab = "Average number of steps", type = "l", main = "Weekdays", ylim = c(0, 
        tmp))

plot of chunk timeseries2