-
Notifications
You must be signed in to change notification settings - Fork 3
Expand file tree
/
Copy path2-7_Basic_plotting.qmd
More file actions
255 lines (178 loc) · 9.17 KB
/
2-7_Basic_plotting.qmd
File metadata and controls
255 lines (178 loc) · 9.17 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
# Basic plotting {.unnumbered}
R is a very powerful tool for producing custom graphics ranging from basic scatterplots to 3D plots. In this introduction to plotting we will learn how to create basic graphs like scatterplots, line graphs and bar charts, as well as histograms and boxplots, all of which are available in the base package. Different methods for saving plots will be introduced. We will use the data ToothGrowth throughout the following examples:
```{r}
data(ToothGrowth)
str(ToothGrowth)
```
## Scatterplots and Line Graphs
### Scatterplots
The plot function is the most basic function for plotting continuous data. If we use plot on one variable, the values of the variable will be plotted against their index, i.e., the order of the data within the object they're stored:
```{r}
plot(ToothGrowth$len)
```
With two variables plot puts the first variable on the x-axis and the second variable on the y-axis (y versus x):
```{r}
plot(ToothGrowth$dose, ToothGrowth$len)
```
Many parameters are available for customizing your plot. See ?par for an extensive list. We will just look at a couple here, like changing the axis labels:
```{r}
plot(ToothGrowth$dose, ToothGrowth$len, xlab = "Dose", ylab = "Tooth Length")
```
We can also add a title:
```{r}
plot(ToothGrowth$dose, ToothGrowth$len, xlab = "Dose", ylab = "Tooth Length", main = "Guinea Pig Tooth Growth by Dosage")
```
and even change the colors and characters of specific points:
```{r}
plot(ToothGrowth$dose,
ToothGrowth$len,
xlab = "Dose",
ylab = "Tooth Length",
main = "Guinea Pig Tooth Growth by Dosage",
col = ToothGrowth$supp,
pch = as.numeric(ToothGrowth$supp))
```
The legend function adds a legend so we can easily identify which points represent which groups:
```{r}
plot(ToothGrowth$dose, ToothGrowth$len, xlab = "Dose", ylab = "Tooth Length", main = "Guinea Pig Tooth Growth by Dosage", col = ToothGrowth$supp, pch = as.numeric(ToothGrowth$supp))
legend(1.5, 15, c("OJ", "VC"), col = 1:2, pch = 1:2)
```
The location of the legend can also be specified by stating a region of the plot e.g. "bottomright" to place it in the very bottom righthand corner of the plot.
The argument cex stands for "character expansion" and will enlarge the size of points:
```{r}
plot(ToothGrowth$dose, ToothGrowth$len, xlab = "Dose",
ylab = "Tooth Length", main = "Guinea Pig Tooth Growth by Dosage",
col = ToothGrowth$supp, pch = as.numeric(ToothGrowth$supp),
cex = 2)
legend(1.5, 15, c("OJ", "VC"), col = 1:2, pch = as.numeric(ToothGrowth$supp), cex=2)
```
### Line Graphs
To create a plot using lines instead of points, we can actually still use plot, but we need to specify the type of plot we want, e.g.:
```{r}
plot(ToothGrowth$len, type = "l") # note type is the letter l for "line"
```
```{r}
plot(ToothGrowth$len, type = "b") # note type is the letter b for "both"
```
We can add additional lines by calling lines:
```{r}
plot(ToothGrowth$len, type = "l")
lines(ToothGrowth$len + 2)
```
The line type, width, and color can be adjusted as follows:
```{r}
plot(ToothGrowth$len, type = "l")
lines(ToothGrowth$len + 2, lty = 3, lwd = 3, col = "darkgreen")
```
## Bar Charts and Histograms
### Bar Charts
Viewing categorical data in a scatterplot often doesn't make sense. For example, if we want to see the number of guinea pigs who received each dosage, the following doesn't provide this information in the most intuitive format:
```{r}
plot(ToothGrowth$dose)
```
```{r}
table(ToothGrowth$dose)
```
```{r}
barplot(table(ToothGrowth$dose))
```
We can plot proportions instead of frequencies, as is often helpful (though not in this case):
```{r}
props <- table(ToothGrowth$dose)/nrow(ToothGrowth)
props
```
```{r}
barplot(props)
```
Note that the y-axis doesn't quite extend to the height of the bars. The range of the y-axis is easily change in plots by adjusting the parameter ylim (similarly xlim for the x-axis):
```{r}
barplot(props, ylim = c(0, .4))
```
Yet again, there are many parameters available for customizing the plot. Let's try changing the width and colors of the bars:
```{r}
barplot(props, ylim = c(0, .4), width = .2, col = c("blue", "white", "red"))
```
And we can plot the bars horizontally, if preferable:
```{r}
barplot(props, xlim = c(0, .4),
width = .2, col = c("blue", "white", "red"),
horiz = TRUE)
```
Note that we adjusted the range of the x-axis now instead of the y-axis.
### Histograms
A bar chart is slightly different from a histogram. We use bar charts to plot frequencies (or proportions) of values present in categories. Continuous data don't have categories, so to make a similar plot we have to create "categories". They are often called bins or cells and are created by delining ranges in which the frequencies are calculated. Thankfully, R will do this, and plot the histogram, when we call the function `hist`:
```{r}
hist(ToothGrowth$len)
```
Note that in a histogram the sides of the cells touch, whereas they do not in a bar chart. We can also plot the proportions, now called "density", by changing a single parameter:
```{r}
hist(ToothGrowth$len, freq = FALSE)
```
In the next graph we plot the histogram with a different bin width by specifying where to make the breakpoints between the bars:
```{r}
hist(ToothGrowth$len, freq = FALSE, breaks = seq(0, 35, 2))
```
Note how the granularity of the plot changes with the width of the bin. We can also change the breaks by delining the number of cells to use:
```{r}
hist(ToothGrowth$len, freq = FALSE, breaks = 25)
```
The same parameters that change the axes and labels are applicable here:
```{r}
hist(ToothGrowth$len, freq = FALSE, breaks = seq(0, 35, 2),
main = "Histogram of Tooth Growth", xlab = "Tooth Length",
ylim = c(0, .07), col = "orange")
```
## Boxplots
A boxplot is another graph we use to view the distribution of a continuous variable. It displays the specified quantiles of the data in the following way:
```{r}
boxplot(ToothGrowth$len)
```
A single boxplot is nice for visualizing the distribution of one variable, but plotting several side by side allows for a simultaneous comparison of distributions. Here we will use the formula notation. The format `y ~ x`, where `y` is a numeric vector and `x` is a factor, tells the boxplot function that we want to separate the continuous `y` values into as many boxplots as there are levels of the factor `x`.
```{r}
boxplot(ToothGrowth$len~ToothGrowth$supp)
```
Just like in the bar chart, we can plot the boxes horizontally and change the colors, labels and axes:
```{r}
boxplot(ToothGrowth$len~ToothGrowth$supp,
col = c("darkgreen", "purple"),
ylab = "Supplement Type", xlab = "Tooth Length",
ylim = c(0, 40), horizontal = TRUE)
```
The settings of the parameters of a plot, including the margins, can be specified by calling `par()` before creating a plot. One nice setting we'll introduce here is how to put multiple plots into one graphics device:
```{r}
par(mfrow = c(1, 2))
boxplot(ToothGrowth$len~ToothGrowth$supp, main = "Boxplot",
xlab = "Supplement Type")
hist(ToothGrowth$len, main = "Histogram", xlab = "Tooth Length")
```
The settings will stay fixed until we close the device (e.g. by calling `dev.off()`) or reset `par` (e.g. `par(mfrow = c(1, 1)))`. It is highly recommended to get familiar with the help page for par to control graphing parameters to create custom graphs. Another helpful function is axis, which allows the specification of custom axis labels (e.g. where to put them and at what angle). See ?axis for more information.
## Saving Plots
Up until now we have been creating graphs in a single graphics window that gets overwritten every time we create new plots. If we want to save the images created, we can use "Export" in RStudio to specify what type of file we want to write, its size, location, and name.
Alternatively we can use functions such as png, pdf, and jpeg. Since we can save our code and not our mouse clicks, the use of these functions avoids any confusion about how an image was written, allows for quick simple changes, and provides a convenient way to reproduce multiple, similar plots. For example:
```{r, eval=F}
# note: run getwd() to see the working directory -
# that is the directory to which files will be written
pdf("plot1.pdf", width = 6, height = 9)
boxplot(ToothGrowth$len~ToothGrowth$supp,
col = c("darkgreen", "purple"),
xlab = "Supplement Type", ylab = "Tooth Length",
ylim = c(0, 40), horizontal = TRUE)
dev.off()
pdf("plot2.pdf", width = 4, height = 8)
boxplot(ToothGrowth$len~ToothGrowth$supp,
col = c("darkgreen", "purple"),
xlab = "Supplement Type", ylab = "Tooth Length",
ylim = c(0, 40), horizontal = TRUE)
dev.off()
```
Note that you have to call dev.off() to finish writing the image to the file. If more than one device is open, the return of that function will display a number larger than 1.
The units for the width and height arguments of pdf are in inches (7 by 7), but the default for e.g. png is pixels (there is an argument units to change it to in, cm, or mm):
```{r, eval=F}
png("plot1.png", units = "in", res = 300,
width = 6, height = 9)
boxplot(ToothGrowth$len~ToothGrowth$supp,
col = c("darkgreen", "purple"),
xlab = "Supplement Type", ylab = "Tooth Length",
ylim = c(0, 40), horizontal = TRUE)
dev.off()
```