-
Notifications
You must be signed in to change notification settings - Fork 22
Expand file tree
/
Copy pathMakingAndBreakingFunctions_InClass.Rmd
More file actions
259 lines (160 loc) · 6.23 KB
/
MakingAndBreakingFunctions_InClass.Rmd
File metadata and controls
259 lines (160 loc) · 6.23 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
---
title: "MakingAndBreakingFunctions"
author: "Ian Dworkin"
date: "`r format(Sys.time(),'%d %b %Y')`"
output:
html_document:
toc: yes
number_sections: yes
keep_md: yes
editor_options:
chunk_output_type: console
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
options(digits = 2)
```
## Why write a function in R?
- We have now used a few built in functions in R (there are many).
- anything where you use "()" is a function.
- We have also written custom functions, as shown in class and in your assignment.
## Why write your own functions
However, if you write some good functions and scripts people will want to use them and edit them for their own purposes. Of course, they (and even you 6 months later) don't always know what the functions are always supposed to do or not do. So it is useful to always build robust functions that spit out warnings and errors, and also to consider unit tests.
As a review [here](https://r4ds.had.co.nz/functions.html#when-should-you-write-a-function) and [here](https://r4ds.had.co.nz/functions.html)
## When should you write a function
Have you ever copy and pasted a block of code more than a couple of times? If so, write a function! Indeed, this rule-of-thumb is called the [rule of three](https://en.wikipedia.org/wiki/Rule_of_three_(computer_programming). In addition to saving time for repeated use, it reduces the risks of introducing errors into your code.
Thankfully it is straightforward to write our own functions in R. You should definitely get in the habit of doing so whenever you are going to need to do something more than once. You can collect all of these functions in a script and can use `source()` to read them in whenever you need them. No copying-pasting functions in lots of different places!
## Review: functions have the following format
```{r, eval = FALSE}
aFunction <- function(input variable 1, input variable 2, argument, etc...) {
expressions to calculate}
```
This is abstract so let me give you a real example.
We want to compute the standard error of the mean (SEM) which is approximately equal to the standard deviation divided by the square root of the sample size. How might we do it?
## standard error of the mean
```{r}
a <- c(1,2,3,5,7,3,2,5,2) # our data
sd_a <- sd(a)
sample_a <- length(a)
sd_a/sqrt(sample_a)
```
Or we could do it in one line. ** Notice the call to a function within a function**
```{r}
sd(a)/sqrt(length(a))
```
But what happens if we have another set of data `b` that we want to examime
```{r}
b <- rnorm(100, mean = 10, sd = 1)
sd(b)/sqrt(length(b))
```
or c? d?
This gets not only very monotonous, but very error prone!
## SEM as a function
Try to write a function for the SEM
## SEM function
```{r}
StdErr <- function(dat_in) {
sd(dat_in) / sqrt(length(dat_in)) }
```
Now we can just use the variable `a` we created above that stores the vector of numbers and plug it in to our function.
```{r}
StdErr(a)
```
We can also easily use it for other data like in the object `b`
```{r}
StdErr(b)
```
## taking a look at our function
Just type the name of the function to see what the function is.
```{r}
StdErr
```
And if you want to edit it
```{r}
edit(StdErr)
```
## looking at base R functions does not always work so easily
```{r}
mean
```
This is because many of the functions are either hidden or in this case written in another programming language that R calls. You can [read here](https://stackoverflow.com/questions/19226816/how-can-i-view-the-source-code-for-a-function)
## What if you want to return multiple things from your function
Practically speaking the last thing you write will be returned in a function, but it is better to be a bit formal about it with the `return()` function.
So this will not behave as you might expect. Why?
```{r}
StdErr_V2 <- function(vector_vals) {
se <- sd(vector_vals)/sqrt(length(vector_vals))}
```
```{r}
StdErr_V2(a)
```
## using return
```{r}
StdErr_V3 <- function(vector_vals) {
se <- sd(vector_vals)/sqrt(length(vector_vals))
return(se)}
```
```{r}
StdErr_V3(a)
```
## you can give the output names
```{r}
StdErr_V4 <- function(vector_vals) {
se <- sd(vector_vals)/sqrt(length(vector_vals))
return(c(StandardError = se))}
```
```{r}
StdErr_V4(a)
```
You may have noticed that despite only having one output, I have concatenated this. This is a bit of an R thing, but it comes in handy when we want multiple outputs
```{r}
StdErr_V5 <- function(vector_vals) {
se <- sd(vector_vals)/sqrt(length(vector_vals))
return(c(StandardError = se,
StandardDeviation = sd(vector_vals)))}
```
```{r}
StdErr_V5(a)
```
## See if you can modify this to output a list instead of a vector, and it additionally outputs the mean (as well as SE and SD)
```{r}
StdErr_V6 <- function(vector_vals) {
se <- sd(vector_vals)/sqrt(length(vector_vals))
return(list(StandardError = se,
StandardDeviation = sd(vector_vals),
Mean_val = mean(vector_vals)))}
```
```{r}
StdErr_V6(a)
```
## The function you can work with in groups
This function computers the [coefficient of variation.](https://en.wikipedia.org/wiki/Coefficient_of_variation), a *positive*, unitless measure of relative variation. It is nothing more than the ratio of the standard deviation to the mean.
$$CV = \frac{\sigma}{\mu} = \frac{sd}{mean}$$
## Coefficient of variation
```{r}
CoefVar_v1 <- function(x) {
cv <- sd(x)/mean(x)
}
```
## Time to break it.
In groups of 4, each of you will (one at a time) give your version of the function to one other person in the group, who will write some code to "break" the function in some way. Don't tell your neighbours what the variables are, just let them try and use it and break it...
### Things to throw into it
- negative values that give a -ve CV
- NA
- non integer or numeric
- vector of length zero
- Something else?
### Fixes
- `stop` with an if
- `break`
-`stopifnot` # really useful
- checking object class
- `stop` vs. `warning` vs. `message`
-`exist` or `file.exists()`
- `try`
-
- using `if` in case of different types
- `conditions` for handling unusual conditions
### warning, messages, errors
- `message` (a diagnostic message)
- `warning`