forked from rdpeng/RepData_PeerAssessment1
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathPA1_template.Rmd
More file actions
104 lines (78 loc) · 2.83 KB
/
PA1_template.Rmd
File metadata and controls
104 lines (78 loc) · 2.83 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
---
title: "Reproducible Research: Peer Assessment 1"
output:
html_document:
keep_md: true
---
## Loading and preprocessing the data
```{r}
library(stringi)
library(lubridate)
library(ggplot2)
if (!file.exists("activity.csv")) unzip("activity.zip")
activity <- read.csv("activity.csv", colClasses = c("integer", "Date", "character"))
activity$time <- parse_date_time(stri_pad_left(activity$interval, 4, 0), "hm")
str(activity)
```
## What is mean total number of steps taken per day?
```{r}
total_steps_per_day <- aggregate(steps ~ date, activity, sum)
hist(total_steps_per_day$steps)
```
### Calculating the mean and median
```{r}
mean(total_steps_per_day$steps)
median(total_steps_per_day$steps)
```
## What is the average daily activity pattern?
```{r}
mean_steps_per_interval <- aggregate(steps ~ time, activity, mean)
```
### Calculating which 5-minute interval has the maximum of the mean steps
```{r}
max_steps <- mean_steps_per_interval[which.max(mean_steps_per_interval$steps),]
max_label <- paste(round(max_steps$steps, 1),
"steps maximum at", format(max_steps$time, "%H:%M"))
max_label
```
### Plotting the mean steps for each 5-minute interval
```{r}
with(mean_steps_per_interval, {
plot(time, steps, type = "l")
text(max_steps, max_label, pos = 4)
})
```
## Imputing missing values
### Calculating the total number of rows with missing values
```{r}
steps <- activity$steps
sum(is.na(steps))
```
### Filling in missing values with the mean for the 5-minute interval
```{r}
for (i in which(is.na(steps))) {
steps[i] <- subset(mean_steps_per_interval, time == activity$time[i])$steps
}
filled_in_total_steps_per_day <- aggregate(steps ~ date,
cbind(steps, subset(activity, select = 'date')), sum)
hist(filled_in_total_steps_per_day$steps)
```
### Mean and median with the missing values filled in
```{r}
mean(filled_in_total_steps_per_day$steps)
median(filled_in_total_steps_per_day$steps)
```
Using the means of the 5-minute intervals to fill in missing values did not affect the mean total steps per day, but it increased the median so that it became equal to the mean.
## Are there differences in activity patterns between weekdays and weekends?
### Assigning a factor to each observation identifying weekdays and weekends
```{r}
weekpart <- rep("weekday", length(activity$date))
weekpart[which(weekdays(activity$date) %in% c("Saturday", "Sunday"))] <- "weekend"
activity$weekpart <- as.factor(weekpart)
```
### Plotting the average daily activity for weekdays and weekends
```{r}
qplot(time, steps, geom = "line", facets = weekpart ~ .,
data = aggregate(steps ~ time + weekpart, activity, mean))
```
The weekday and weekend activity patterns appear similar, but more steps were taken in the morning on weekdays, whereas more steps were taken later in the day on weekends.