-
Notifications
You must be signed in to change notification settings - Fork 60
Expand file tree
/
Copy pathWeek1_worksheet.R
More file actions
164 lines (117 loc) · 4.1 KB
/
Week1_worksheet.R
File metadata and controls
164 lines (117 loc) · 4.1 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
#### 1. Basic R Use ####
## use R as a calculator
3 + 4 # addition
2 * 6 # multiplication
4 / 2 # division
2^3 # powers
## create a vector
c(1, 5, 4)
## assign results to an object
## note that after it is assigned, it shows up in RStudio in the "Environment"
x <- c(1, 3, 5)
## view an existing object
x
## round data (helpful for reporting)
## the first argument is the number to round
## the second argument is how many digits to use for rounding
round(1.214294254, digits = 2)
#### 2. Descriptive Statistics ####
## calculate the mean
## note that we reuse the previously assigned variable, x
mean(x)
## calculate the median
median(x)
## calculate the standard deviation
sd(x)
## minimum
min(x)
## maximum
max(x)
## if there are missing values, R will not calculate
## to see this, lets first create an object with some missing data
## we will call it "y"
## NA stands for Not Available, i.e., missing
## If we want the vector to have the numbers 1, 3, missing, then 7,
## what would we fill in the ??? with here?
y <- ???
## calculate mean on y
mean(y)
## Why didn't that work? We need to tell R
## to remove missing values first
## (na for not available; rm for remove)
## by adding an argument, na.rm = TRUE
mean(y, na.rm = TRUE)
#### 2b. You Try It ####
## find the mean of these numbers: 5, 3, 2, 9, 1
???
## find the standard deviation of the variable "y"
???
#### 3. Using a Dataset ####
## R has a built in dataset called "mtcars"
## this dataset has variables on about 32 different cars
## view the dataset
View(mtcars)
## one the variables in the dataset is how many
## miles per gallon of petrol each car gets
## this variable is called "mpg"
## to access the variable from within the mtcars dataset
## we use the "$" operator
## the code below accesses and prints all the observations
## from the mpg variable
mtcars$mpg
## note what happens if the case is wrong
## because a variable called "Mpg" does not exist
## R return NULL indicating no data
mtcars$Mpg
## it is also possible to round an entire set of numbers
round(mtcars$mpg, digits = 0)
#### 3b. You Try It ####
## calculate descriptive statistics (mean, standard deviation) of the
## variable mpg
???
## How do you get a summary of a whole dataset? Use google or the ? function
???
#### 4. Loading Data ####
## to start with, we will load a package for data management.
## Loading a package/library is like opening an app
## and you need to repeat this process each time you start up R
## note that if this does not work, try to install it first
## by uncommenting the install packages code
# install.packages("data.table", dependencies = TRUE)
library(data.table)
## import CSV - first insert your working directory instead of mine
## Note: if you use an Rproject, your directory may already be set
## if not an easy way to get the path to the directory (folder)
## is to navigate to it in the RStudio "Files" tab,
## then click on the the "Gear" icon and choose "Set As Working Directory"
## this will run some setwd() code in your console,
## which you can copy and paste here
setwd("C:/Users/michelle/Documents/git_repos/MonashHonoursStatistics")
d <- read.csv("IntroR_sample.csv")
# (When you have more time, also install package "tidyverse"
# - it takes a while)
## YOU TRY IT: get a summary of the data
???
#### 5. Logical Operators ####
## "==" : logical test if Depressed is equal to 1
d$Depressed == 1
## ">" : logical test whether zStress is greater than 0
d$zStress > 0
## "|" : logical "or"; test whether either condition is TRUE
## depressed or high stress
d$Depressed == 1 | d$zStress > 1
## "&" : logical "and"; test whether both conditions are TRUE
## depressed and low stress
d$Depressed == 1 & d$zStress < 1
## we can use square brackets, [], to subset a variable or dataset
## we can subset by number or by logical value.
## here are all the values for zStress
d$zStress
## here is just the first and third value
d$zStress[c(1, 3)]
## YOU TRY IT: What is the 10th value?
???
## here are just the values of zStress where Depressed == 1
d$zStress[d$Depressed == 1]
## here are just the values of zStress where zStress > 1
d$zStress[d$zStress > 1]