forked from ourcodingclub/ourcodingclub.github.io
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathquestions.html
More file actions
363 lines (358 loc) · 32.6 KB
/
questions.html
File metadata and controls
363 lines (358 loc) · 32.6 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
---
layout: page
title: FAQ
permalink: /qanda/
---
<head>
<meta name="viewport" content="width=device-width, initial-scale=1">
<!-- CSS Style -->
<style>
.accordion {
background-color: #eee;
color: #444;
cursor: pointer;
padding: 18px;
width: 100%;
border: none;
text-align: left;
outline: none;
font-size: 15px;
transition: 0.4s;
}
.active,
.accordion:hover {
background-color: #ccc;
}
.panel {
padding: 0 18px;
background-color: white;
max-height: 0;
overflow: hidden;
transition: max-height 0.2s ease-out;
}
.accordion:after {
content: '\02795';
/* Unicode character for "plus" sign (+) */
font-size: 13px;
color: #777;
float: right;
margin-left: 5px;
}
.active:after {
content: "\2796";
/* Unicode character for "minus" sign (-) */
}
code {
color: black;
}
.code-container {
background-color: #f2f3f4;
border: solid;
border-radius: 0.25em;
border-width: 0.5px;
border-color: lightgray;
padding: 0.5em;
margin-bottom: 1em;
}
a {
text-decoration: none;
color: blue;
}
ul {
list-style-type: none;
padding-left: 0pt;
}
img {
display: block;
margin-left: auto;
margin-right: auto;
max-width: 75%;
max-height: 75%;
margin-top: 20px;
padding-bottom: 20px;
}
.highlight-red {
padding: 2px 4px;
font-size: 90%;
color: #c7254e;
background-color: #f9f2f4;
border-radius: 4px;
}
.indent {
display: inline-block;
text-indent: 2em;
}
.comment {
color: gray;
}
.credit-footer {
text-align: center;
color: gray;
}
@media only screen and (max-width: 600px) {
.code-container {
margin-bottom: 2em;
}
</style>
</head>
<!-- Slider -->
<section id="global-header">
<div class="container">
<div class="row">
<div class="col-md-12">
<div class="block">
<h1>Questions & Answers</h1>
<b><p><big>Here we have collated some of the questions we encounter most often during our workshops, plus their answers. We will continue expanding this page, so feel free to suggest additions to the content!</big></p></b>
</div>
</div>
</div>
</div>
</section>
<div class="container">
<!-- Basic skills -->
<div class="row">
<h2>Basic skills</h2>
<p>Click on the buttons to open the collapsible content.</p>
<button class="accordion">Setting up your workspace</button>
<div class="panel">
<p>First of all, what is a working directory? This is the folder that R will look into to find data and save any plots or scripts. To find out where your working directory currently is and to change it see the code below.<br/></p>
<div class="code-container">
<code><span class="comment"># Identify your current directory</span><br/>getwd()<br/><br/><span class="comment"># Set your working directory</span><br/>setwd("insert folder path")</code>
</div>
<p>Alternatively you can set it from the menu: <em>Session > Set Working Directory > Choose Directory.</em> For <code class="highlight-red">setwd()</code>, inside the brackets you should input your file path as follows <code class="highlight-red">setwd("C:/Documents/Directory")</code></p>
</div>
<button class="accordion">Loading data and packages</button>
<div class="panel">
<p><b>Saving and loading your script again</b><br/>You should always be typing your code into a script file in order to produce a reproducible record of your analysis; if you only type in the console, R will not save your work! You should save your script often to avoid any problems. To save, click the icon at the top of your R Script to save as an .Rdata file. Here you will have to choose a file name. Try to avoid spaces and capital letters, as R can get confused by these! Save the file to your working directory so it will be easy to locate whenever you need it next. To load your script again, go to <em>File > Open File</em> and choose your script. It should open on a new script tab in RStudio.<br/><br/><b>Saving CSV files</b><br/>A CSV, or a comma-separated values file, contains values as a series of rows organised so that each column is separated by a comma. If your data is entered in Excel, you can save it as a CSV file by clicking on <em>Save As</em> and then choosing CSV as your file extension. CSV files are often easier to work with in R.</p>
<b>Loading packages</b><br/>
<p>R contains thousands of different packages which allow you to do many different things, ranging from mapping to machine learning to web scraping. The best way to find out about what packages may be helpful to you is to do a google search and/or search the <a href="https://cran.r-project.org/web/packages/" target="_blank">CRAN</a> website. Once you have found your package, you must first install it on your machine and then call it in your script, as shown in the code below.</p>
<div class="code-container">
<code><span class="comment"># Load CSV file</span><br/>objectname <- read.csv("filepath/file.csv")<br/><br/><span class="comment"># Installing dplyr package</span><br/>install.packages("dplyr")<br/><br/><span class="comment"># Load package</span><br/>library(dplyr)</code>
</div>
</div>
<button class="accordion">Other tips and resources</button>
<div class="panel">
<p><b>Writing clean code</b><br/>R code should be easy to read, share and verify. Aim to keep your object naming conventions consistent across your script and make sure to comment your code using a hashtag. For extensive guidelines, please consult Google's R style guide <a href="https://google.github.io/styleguide/Rguide.xml" target="_blank">here.</a><br/><br/><b>Helpful tutorials</b></p>
<ul>
<li><a href="https://ourcodingclub.github.io/2016/11/13/intro-to-r.html" target="_blank">Introduction to R</a></li>
<li><a href="https://ourcodingclub.github.io/2016/11/15/troubleshooting.html" target="_blank">Troubleshooting R</a></li>
<li><a href="https://ourcodingclub.github.io/2017/04/25/etiquette.html" target="_blank">Coding Etiquette</a></li>
</ul>
<p><b>Useful commands for RStudio</b><br/>In order to clean your global environment (all the objects, functions etc. you have created), you can execute the following command in your console <code class="highlight-red">rm(list=ls())</code>. To clear your console, you can execute this command <code class="highlight-red">cat("\014")</code>.
</div>
</div>
<!-- Data manipulation -->
<div class="row">
<h2>Data manipulation</h2>
<p>Click on the buttons to open the collapsible content.</p>
<button class="accordion">Structuring datasets</button>
<div class="panel">
<p>When working with data, it is very important to keep it in the correct format to allow for easy and effective analysis, data visualisation and ultimately, to find an answer to your research question! To become more effective at preparing and cleaning your data, it is important to familiarise yourself with the principles of "tidy data," which provide a standard way to organise data values within a dataset that allows for easy manipulation. The three main principles are listed below.<br/></p>
<ol>
<li>Each variable forms a column.</li>
<li>Each observation forms a row.</li>
<li>Each type of observational unit forms a table.</li>
</ol>
<p>Please see Hadley Wickham's <a href="http://vita.had.co.nz/papers/tidy-data.pdf" target="_blank">academic paper</a> or a condensed <a href="https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html" target="_blank">article</a> on tidy data and how to implement it. Below you will find some example code on how to convert messy data to tidy data, using functions from the <code class="highlight-red">tidyr</code> and <code class="highlight-red">dplyr</code> packages.</p>
<div class="code-container">
<code><span class="comment"># Loading required packages</span><br/>library(tidyr)<br/>library(dplyr)<br/>
<br/><span class="comment"># Loading dataframe</span><br/>
iris <- as.data.frame(iris)<br/>
</code>
</div>
<img src="{{ site.baseurl }}/img/iris1.png" style="height: 130px;">
<div class="code-container">
<code><span class="comment"># Converting iris df to wide dataframe (messy data)</span><br/>iris.wide <- iris %>%<br/><span class="indent">select(Species, Petal.Width) %>% <span class="comment"># Selecting only two columns</span></span><br/><span class ="indent">filter(Species == "setosa") %>% <span class="comment"># Filtering column for one species</span></span><br/>
</code>
</div>
<img src="{{ site.baseurl }}/img/iris2.png" style="height: 130px;">
<div class="code-container">
<code><span class="indent">mutate(sample = row_number()) %>% <span class="comment"># Adding row number identifier</span></span><br/><span class="indent">spread(sample, Petal.Width) <span class="comment"># Spreading data to wide format</span></span>
</code>
</div>
<img src="{{ site.baseurl }}/img/iris3.png" style="width: 500px;">
<div class="code-container">
<code><span class="comment"># Converting messy iris dataframe to tidy (long) dataframe</span><br/>iris.long <- iris.wide %>%<br/><span class="indent">gather(Species, Petal.Width) %>%<span class="comment"> # Gather wide data to long format</span></span><br/><span class="indent">rename(Setosa.Sample = Species) <span class="comment"> # Rename to correct column name</span></span>
</code>
</div>
<img src="{{ site.baseurl }}/img/iris4.png" style="height:130px;">
</div>
<button class="accordion">Selecting & filtering</button>
<div class="panel">
<p>There are a variety of ways to select columns and rows. This can be done by specifying the column/row name or index. However, unlike other programming languages, R starts counting at 1 instead of at 0, as is the case in Python. Other methods include the use of functions from the <code><a href="https://dplyr.tidyverse.org/articles/dplyr.html" target="_blank" class="highlight-red">dplyr</a></code> package.<br/></p>
<div class="code-container">
<code><span class="comment"># Selecting single columns by name or by index</span><br/>dataframe$columnName or object[,1]<br/><br/><span class="comment"># Selecting rows in a column</span><br/>dataframe$columnName[1:2,]<br/><br/><span class="comment"># Selecting multiple rows and columns</span><br/>dataframe[1:2,4:6]<br/><br/><span class="comment"># Selecting columns using the select function</span><br/>select(dataframe, columnName1, columnName2)</code>
</div>
<p>You may also want to filter the dataset you are working with according to certain conditions. As an example, let's take the built-in iris dataset, which describes the petal width and length of different iris flowers grouped by their class (Iris Setosa, Iris Versicolour, Iris Virginica). If you are interested to see how flower attributes vary in only one of the classes, you can easily do so by using the <code><a href="https://dplyr.tidyverse.org/reference/filter.html" target="_blank" class="highlight-red">filter</a></code> function from the <code><a href="https://dplyr.tidyverse.org/articles/dplyr.html" target="_blank" class="highlight-red">dplyr</a></code> package. This function will return a dataset that meets the conditions that you define. See the code below for examples.<br/></p>
<div class="code-container">
<code><span class="comment"># Loading packages</span><br/>library(dplyr)<br/><br/><span class="comment"># Loading data and defining as an object</span><br/>iris <- as.data.frame(iris)<br/><br/><span class="comment"># Filtering for a single condition</span><br/>filter(iris, Species == "virginica")<br/><br/><span class="comment"># Filter for multiple conditions</span><br/>filter(iris, Species %in% c("virginica","setosa"), Sepal.Length > 5)<br/><br/><span class="comment"># Other useful filter conditions</span>
<ul style=text-decoration: none>
<li>== (equal to)</li>
<li>< (less than)</li>
<li><= (less than or equal to)</li>
<li>> (greater than)</li>
<li>>= (greater than or equal to)</li>
<li>& (and)</li>
<li>| (or)</li>
<li>! (inverse, e.g. != stands for not equal to)</li>
</ul>
</code>
</div>
<p>When dealing with larger datasets, you can use pipes <code class="highlight-red">(%>%)</code> from the <code class="highlight-red">magrittr</code> package to apply multiple operations to your dataset, without needing to create multiple objects. This is demonstrated in the code below.</p>
<div class="code-container">
<code>newDataset <- iris %>%<br/>select("Sepal.Length", "Species") %>%<br/>filter(Species == "setosa") %>%<br/>group_by(Species) %>%<br/>summarise(meanSepalLength = mean(Sepal.Length)</code>
</div>
</div>
<button class="accordion">Other tips and resources</button>
<div class="panel">
<p><b>Helpful tutorials</b></p>
<ul>
<li><a href="https://ourcodingclub.github.io/2017/01/16/piping.html#tidy" target="_blank">Data Manipulation</a></li>
<li><a href="https://ourcodingclub.github.io/2017/03/20/seecc.html" target="_blank">Working with big data</a></li>
</ul>
</div>
</div>
<!-- Data visualisation -->
<div class="row">
<h2>Data visualisation</h2>
<p>Click on the buttons to open the collapsible content.</p>
<button class="accordion">Plotting basics</button>
<div class="panel">
<p><b>How to plot your data</b><br/> There are a variety of ways to do this, however here we will focus on the use of the <code class="highlight-red">ggplot2</code> data visualisation package. This package presents numerous ways to visualise large and small datasets alike! The <a href="http://r4ds.had.co.nz/data-visualisation.html" target="_blank">data visualisation</a> chapter of the freely available<a href="http://r4ds.had.co.nz/" target="_blank"> R for Data Science</a> book is a great resource to find out about the many ways that you can visualise your data. First, make sure you have installed and loaded the <code class="highlight-red">ggplot2</code> package, then you can build your first plot!<br/><br/><b>Building a ggplot</b><br/>This consists of first calling the <code class="highlight-red">ggplot</code> function, whose arguments are the <code class="highlight-red">data</code> you are plotting, followed by <code class="highlight-red">aes</code>, which stands for aesthetics and encapsulates the x and y data, along with any other aesthetic arguments you want to include such as <code class="highlight-red">fill, group, shape</code> etc. See the code below for some basic examples and consult the <a href="http://r4ds.had.co.nz/data-visualisation.html" target="_blank">data visualisation</a> chapter for detailed information.</p>
<div class="code-container">
<code><span class="comment"># Basic scatter plot</span><br/>ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length)) +<br/><span class="indent">geom_point() +</span><br/><span class="indent">theme_classic()</span></code>
</div>
<img src="{{ site.baseurl }}/img/faqPlot1.png" style="height: 300px;">
<div class="code-container">
<code><span class="comment"># Grouped scatter plot</span><br/>ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length, color = Species)) +<br/><span class="indent">geom_point() +</span><br/><span class="indent">theme_classic()</span></code>
</div>
<img src="{{ site.baseurl }}/img/faqPlot2.png" style="height: 300px;">
<div class="code-container">
<code><span class="comment"># Facetted scatter plot</span><br/>ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length, color = Species)) +<br/><span class="indent">geom_point() + </span><br/><span class="indent">theme_classic() +</span><br/><span class="indent">facet_wrap(~Species)</span></code>
</div>
<img src="{{ site.baseurl }}/img/faqPlot3.png" style="height: 300px;">
<div class="code-container">
<code><span class="comment"># Facetted scatter plot with loess regression</span><br/>ggplot(iris) +<br/>geom_smooth(aes(x = Sepal.Width, y = Sepal.Length, color = Species)) +<br/><span class="indent">geom_point(aes(x = Sepal.Width, y = Sepal.Length, color = Species)) +</span><br/><span class="indent">theme_classic() +</span><br/><span class="indent">facet_wrap(~Species)</span></code>
</div>
<img src="{{ site.baseurl }}/img/faqPlot4.png" style="height: 300px;">
</div>
<button class="accordion">Reordering plots</button>
<div class="panel">
<p>When making plots in ggplot, the order is determined alphabetically. To order your plots according to a specific order (decreasing, ascending or custom), you need to explicitly specify the order of levels in the dataset. In this answer, we will explore the built-in <code class="highlight-red">mpg</code> dataset and will show you how to reorder bar plots.<br/><br/><b>Reordering levels in R</b><br/>Firstly, load your dataset and have a look at the data structure using <code class="highlight-red">head</code>. In our case, we are interested in looking at the relationship between car <code class="highlight-red">class</code> and cylinder size <code class="highlight-red">cyl</code>. When looking at the data, we see that there are multiple observations for each class, so first we take a mean of the variable we are interested in per car class using the <code class="highlight-red">aggregate()</code> function so that we can make a barplot.</p>
<div class="code-container">
<code><span class="comment"># View dataset</span><br/>head(mpg)<br/><br/><span class="comment"># Aggregate data per class (mean per class)</span><br/> mpg.class <- aggregate(cyl ~ class, mpg, mean)<br/><br/><span class="comment"># Generate unordered barplot</span><br/>ggplot(mpg.class, aes(class, cyl)) +<br/><span class="indent">geom_bar(stat="identity") +</span><br/><span class="indent">theme_classic()</span></code>
</div>
<img src="{{ site.baseurl }}/img/faqPlot5.png" style="height: 300px;">
<p>Now we can check the order of levels in our <code class="highlight-red">class</code> column and create a custom order according to our own choosing. We can then plot our custom ordered plot by specifying the order inside the <code class="highlight-red">scale_x_discrete</code> argument in ggplot.</p>
<div class="code-container">
<code><span class="comment"># Checking level order</span><br/>mpg.class$class<br/><br><span class="comment"># Create object with custom levels</span><br/>classes = c("midsize","suv","minivan")<br/><br/><span class="comment"># Plot barchart with custom levels</span><br/>ggplot(mpg.class, aes(class, cyl)) +<br/><span class="indent">geom_bar(stat = "identity") +</span><br/><span class="indent">theme_classic()</span><br/><span class="indent">scale_x_discrete(limits = classes)</span></code>
</div>
<img src="{{ site.baseurl }}/img/faqPlot6.png" style="height: 300px;">
<p>We can also reorder our barplot based on ascending or descending order. To do this, we need to reorder our levels by using the <code class="highlight-red">transform</code> and <code class="highlight-red">reorder</code> functions. This is creates a separate object which is subsequently plotted.</p>
<div class="code-container">
<code><span class="comment"># Reorder levels in ascending order</span><br/>mpg.asc <- transform(mpg.class, class = reorder(class, cyl))<br/><br/><span class="comment"># Plot levels in ascending order</span><br/>ggplot(mpg.asc, aes(class, cyl)) +<br/><span class="indent">geom_bar(stat = "identity")</span><br/><span class="indent">theme_classic()</span></code>
</div>
<img src="{{ site.baseurl }}/img/faqPlot7.png" style="height: 300px;">
<div class="code-container">
<code><span class="comment"># Reorder levels in descending order</span><br/>mpg.dsc <- transform(mpg.class, class = reorder(class, -cyl))<br/><br/><span class="comment"># Plot levels in descending order</span><br/>ggplot(mpg.dsc, aes(class, cyl)) +<br/><span class="indent">geom_bar(stat = "identity")</span><br/><span class="indent">theme_classic()</span></code>
</div>
<img src="{{ site.baseurl }}/img/faqPlot8.png" style="height: 300px;">
</div>
<button class="accordion">Other tips and resources</button>
<div class="panel">
<p><b>Helpful online resources</b><br/></p>
<ul>
<li><a href="http://r4ds.had.co.nz/data-visualisation.html" target="_blank">Visualisation chapter (R for Data Science)</a></li>
<li><a href="https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf" target="_blank">Cheatsheet (ggplot2)</a></li>
</ul>
<p><b>Helpful tutorials</b><br/></p>
<ul>
<li><a href="https://ourcodingclub.github.io/2017/01/29/datavis.html" target="_blank">Data visualisation 1</a></li>
<li><a href="https://ourcodingclub.github.io/2017/03/29/data-vis-2.html" target="_blank">Data visualisation 2</a></li>
<li><a href="https://ourcodingclub.github.io/2016/12/11/maps_tutorial.html" target="_blank">Spatial data visualisation 2</a></li>
<li><a href="https://ourcodingclub.github.io/2017/03/07/shiny.html" target="_blank">Interactive web-apps</a></li>
</ul>
</div>
</div>
<!-- Modelling basics -->
<div class="row">
<h2>Modelling basics</h2>
<p>Click on the buttons to open the collapsible content.</p>
<button class="accordion">Designing a model</button>
<div class="panel">
<p>Models are a helpful way to understand your data and to determine if there is a statistically significant relationship between your two (or more!) chosen variables. Before you start to create your model, predict what relationship your two variables may have. In this example, we inspect whether leaf area has a statistically significant relationship with stem area. What do you hypothesise? A simple linear model can be set up as follows: <br/></p>
<div class="code-container">
<code><span class="comment"># load your data</span><br/>graph1 <- read.csv("graph1.csv")<br/><br/><span class="comment"># check that your data is formatted as expected</span><br/> head(graph1)<br/><br/><span class="comment"># test if the relationship between your two variables is significant</span><br/> area.model <-lm(leaf_area ~stem_area, data=graph1)<br/><br/><span class="comment"># output of model</span><br/> summary(area.model)</code>
</div>
</div>
<button class="accordion">Model outputs</button>
<div class="panel">
<p>Once you create your model, RStudio outputs a lot of information, which can be very hard to pick through! Here, we breakdown each part for you.</p>
<img src="{{ site.baseurl }}/img/faqpicture9.png" style="height: 300px;">
<div class="code-container">
<p><b>Call</b><br/> This outputs the model you are testing. Make sure it’s correct!
<p><b>Residuals</b><br/> This provides you with a summary of the residuals’ distribution. In general, the median should be close to 0 and the 1st and 3rd quartiles (1Q/3Q) should be similar in magnitude. If this is not the case, you may should double check if you’ve met your model’s assumptions.</p>
<p><b>Coefficients</b><br/> Here you can see information about the model fit. Here, we are looking at the fit to the regression equation. The estimate value is the slope; if it significantly differs from zero, there is a relationship between the response and explanatory variables. If the estimate is positive, there is a positive relationship and vice versa. A positive relationship would mean, for example, as stem area increases, so does leaf area. The t-statistics and p-values indicate if the relationship is significant or not.</p>
<p><b>Summary</b><br/> The R-squared value indicates how much of the variation is explained by the model, or in our example, how much leaf area is explained by stem area. The adjusted R-squared accounts for sample size and thus is a more accurate representation. Here, our model explains 33.5% of the variation. The p-value shows the overall significance of the model, however, it is important to look at each constituent part to assess significance. Generally, a model is seen as significant if the p-value is less than 0.05. It is important to note that this is arbitrary; a p-value of 0.051 doesn’t necessarily mean your model is invalid!</p>
</div>
</div>
<button class="accordion">Model assumptions</button>
<div class="panel">
<p><b>Using code</b><br/>When designing your model, it’s important to check that you’ve met any assumptions your model has before you proceed with your analysis. However, it can be hard to understand what the assumptions mean, as well as to remember how to check them. Look below for an example of how to check if you’ve met the assumptions for a linear model.</p>
<p>1. The residuals (the difference between the observed and predicted value of the dependent variable) are normally distributed.A p-value of over 0.05 means that the residuals don’t deviate from a normal distribution. This means that your model assumptions are met!</p>
<div class="code-container">
<code>model.resid <- resid(area.model)<br/>shapiro.test(model.resid)<br/></code>
</div>
<p>2. The data are homoscedastic, meaning they have equal variances. The null hypothesis here is that the variance is the same across all groups. This means that a p-value of over 0.05 meets the model assumptions.</p>
<div class="code-container">
<code>bartlett.test(leaf_area ~ stem_area, data = graph1)<br/></code>
</div>
<br/><br/><b>Using plots</b><br/><p>Another way to check if your data meet your model’s assumptions is to use the command <code class="highlight-red">plot(your_model)</code>. This brings up four plots: (1) residual versus fitted plot, (2) Q-Q plot, (3) scale-location and (4) residuals versus leverage.</p>
<p>Residuals versus fitted helps you assess if you have constant variances, helping assess if your data is homoscedastic. It It also helps assess whether or not there is a linear relationship between your variables. R gives the row numbers or names of the biggest outliers.</p>
<img src="{{ site.baseurl }}/img/faqpicture10.png" style="height: 300px;">
<p>The normal Q-Q plot assesses if your residuals are normally distributed. If the points are close to the dashed line, this means that they are likely normally distributed. Here, the tails drift slightly from the line of normal distribution. This is common in small datasets and is nothing to be concerned about, especially if your Shapiro test output says your data is normally distributed!</p>
<img src="{{ site.baseurl }}/img/faqpicture11.png" style="height: 300px;">
<p> The scale-location plot aims to identify heteroscedasticity -- what we don’t want! This plot is a bit easier to read tan the first line: if the red line is not horizontal, then the residuals are not homoskedastic. However, the degree to which it has to be horizontal can be debated; a slightly horizontal line is okay!</p>
<img src="{{ site.baseurl }}/img/faqpicture12.png" style="height: 300px;">
<p> The residuals versus leverage plot measures the leverage, or how much each data point influences the fit of the model (think R-squared value). Points that are isolated and farther from zero will have a larger leverage. You can see on the plot that Cook’s distance is also measured - this is how much the model fit would change if the isolated point was deleted. You want to avoid having isolated residuals with a Cook’s distance of over 0.5. This plot has a few points that fit that description which may need to be removed or perhaps the data should be transformed!</p>
<img src="{{ site.baseurl }}/img/faqpicture13.png" style="height: 300px;">
<br/><br/><b>What if your data don’t fit your model’s assumptions?</b><br/>There are many ways to approach the problem of missed assumptions. First, consider if you’ve designed the correct model. Think about the ecological reasoning behind your decisions and make sure it is the most logical. Review some other design options to ensure you’ve created the best model. Second, it’s important to consider how far off your model is from meeting its assumptions. Plus or minus 0.01 is most likely arbitrary! If you are confident in your model, stick with it. If your data significantly don’t meet your model’s assumptions, it may be time to transform your data. This can be done by through a log or square root transformation of one or both of your variables. For example:</p>
<div class="code-container">
<code>area.model <- lm(log(leaf_area) ~ stem_area, data = graph1)</code>
</div>
<p>This takes a bit of trial and error to see what combinations work, but then you will have met your model assumptions.</p>
<p>In all cases, it’s important that you can backup your decisions on altering your model with logical reasoning, making well-informed conclusions.</p>
</div>
<button class="accordion">Other tips and resources</button>
<div class="panel">
<p><b>Helpful tutorials</b><br/>
<ul>
<li><a href="https://ourcodingclub.github.io/2018/04/06/model-design.html#models" target="_blank">Intro to model design</a></li>
<li><a href="https://ourcodingclub.github.io/2017/02/28/modelling.html" target="_blank">From distributions to linear models</a></li>
</ul>
<p/>
</div>
</div>
</div>
<div class="credit-footer">
<p> Do you have feedback or a question for us? <li><a href="https://docs.google.com/forms/d/e/1FAIpQLSeRxJnDZXRmVao8r8MrpjEKkOl62EC1pYAkKcjepXO3ZR8GNQ/viewform?usp=sf_link" target="_blank">Please fill out our survey!</a></li>
<p> <img src="{{ site.baseurl }}/img/spa.jpg">
<p>This page was developed by Izzy Rich and Sam Kellerhals and was supported through the Student Partnership Agreement at the University of Edinburgh.</p>
</div>
<!-- Accordion Script -->
<script>
var acc = document.getElementsByClassName("accordion");
var i;
for (i = 0; i < acc.length; i++) {
acc[i].addEventListener("click", function() {
this.classList.toggle("active");
var panel = this.nextElementSibling;
if (panel.style.maxHeight) {
panel.style.maxHeight = null;
} else {
panel.style.maxHeight = panel.scrollHeight + "px";
}
});
}
</script>
</div>