-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathAnnotations.qmd
More file actions
503 lines (381 loc) · 20.5 KB
/
Annotations.qmd
File metadata and controls
503 lines (381 loc) · 20.5 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
---
title: "Annotated ggplot2 charts"
author: "Pablo Leon"
date: "`r Sys.Date()`"
format:
html:
toc: true
html-math-method: katex
code-fold: true
editor: source
editor: visual
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo=TRUE,warning = FALSE, message = FALSE,
dpi = 180, fig.width = 12, fig.height = 5)
library(tidyverse)
library(readxl)
library(here)
library(janitor)
library(gt)
```
## About
This is a small tutorial on how to annotate custom ggplot charts. GGPLOT2 includes shadowed boxes, arrows and text to highlight specific data points in charts, helping to drive the story telling, communicating meaningful information using plots and visualization elements in R.
## 1. Download A&E data
The plot we will build and annotate will use NHS England A&E England Attendances data. We choose to download England Time Series data available from NHS England Statistics section main website.
### 1.1 NHS England A&E Attendances and Emergency Admissions data
We download from NHS England the existing publicly available A&E England Attendances and Emergency Admissions from thier main website: <https://www.england.nhs.uk/statistics/statistical-work-areas/ae-waiting-times-and-activity/>
We use GET() function from {httr} library to download A&E data from NHS England URL
The Weekly and Monthly A&E Attendances and Emergency Admissions collection collects the total number of attendances in the specified period for all A&E types, including Urgent Treatment Centres, Minor Injury Units and Walk-in Centres, and of these, the number discharged, admitted or transferred within four hours of arrival.
### 1.2 Download tidy A&E data
We can make use of the {janitor} package to download A&E data and to obtain column names that ease further daat manipulation
```{r Download Clean AE data}
library(httr)
# CLEAN DATA SET
# Cleaned data set with all variable as Integers
url1<-'https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2023/10/Monthly-AE-Time-Series-September-2023.xls'
GET(url1, write_disk(tf <- tempfile(fileext = ".xls")))
MONTHLY_AE <- read_excel(tf, 2L,
col_names = TRUE,
col_types = c("date","numeric","numeric","numeric",
"numeric","numeric","numeric",
"numeric","numeric","numeric",
"numeric","numeric","numeric"),skip =13) %>%
clean_names() %>%
select(period,
type_1 = type_1_departments_major_a_e_2,
type_2 = type_2_departments_single_specialty_3,
type_3 = type_3_departments_other_a_e_minor_injury_unit_4) %>%
# Apply right format to get AE attendances as Numeric values
mutate(
type_1_num = as.integer(type_1),
type_2_num = as.integer(type_2),
type_3_num = as.integer(type_3)) %>%
# Retain just formatted colmns
select(period,type_1_num,type_2_num,type_3_num)
str(MONTHLY_AE)
MONTHLY_AE
```
Check start and end period of this A&E Time series data:
```{r Start end period, echo=FALSE,}
Min_period <- min(MONTHLY_AE$period)
Min_period
Max_period <- max(MONTHLY_AE$period)
Max_period
```
This data set from NHS England, includes includes England Time Series data for Type 1, Type 2 and Type 3 Attendances covering the period that starts on `r paste0(Min_period)` and ends on `r paste0(Max_period)` as we can see in the table below:
## 2. Present tables using GT package
The gt package is all about making it simple to produce nice-looking display tables. We can use it to improve how we present the previous table:
First we will filter the original data set to include just 2023 data
```{r Filter original AE data}
MONTHLY_AE_sub <- MONTHLY_AE %>%
filter(period >= '2023-01-01')
MONTHLY_AE_sub
```
```{r basic gt table example}
library(gt)
# Turn previous table into a gt table
MONTHLY_AE_tbl <- MONTHLY_AE %>%
filter(period >= '2023-01-01')
ae_gt_tbl <- gt(MONTHLY_AE_tbl) %>%
tab_header(
title = md("**A&E Attendances in England 2023**"),
subtitle = md("By type (*Type I*,*Type II*,*Type III)*")
) %>%
tab_source_note(
source_note = "Source: NHS England A&E Attendances and Emergency Admissions data"
) |>
tab_source_note(
source_note = md("England Time Series monthly data")
) %>%
fmt_number(
columns = c(type_1_num,type_2_num,type_3_num),
sep_mark = ",",
decimals = 0
)
ae_gt_tbl
```
## 3. Subset Type I attendances
We will focus on Type I attendances for this plot
```{r subset_type_I}
library(gt)
# Turn previous table into a gt table
TypeI <- MONTHLY_AE %>%
select(period,type_1_num)
TypeI
```
## 4. Building ggplot2 charts
We start by building a bare minimum ggplot2 line chart. This basic line chart includes `ggplot2()` function with aes() argument including x and y variables, and the geom.
There are four main parts of a basic ggplot2 visualization: the ggplot() function, the data parameter, the aes() function, and the geom. Get an accurate description from each of these elements from: <https://www.sharpsightlabs.com/blog/ggplot2-tutorial/>
**The ggplot function** The ggplot() function is the core function of ggplot2. It initiates plotting.
```{r minimum line chart}
Basic_plot <- ggplot(data = TypeI, aes(x = period, y = type_1_num)) +
geom_line()
Basic_plot
```
#### 4.1 Add title and subtitle
The first element we want to include in our chart is the tile and subtitle
```{r adding title and subtitle labels}
Basic_plot <- ggplot(data = TypeI, aes(x = period, y = type_1_num)) +
geom_line() +
labs(title = "Type I A&E Attendances in England",
subtitle = "2010-2023 Period")
Basic_plot
```
#### 4.2 Change axis labels
We can use labs to change also x and y axis labels
```{r change axis labels and subtitle}
Basic_plot <- ggplot(data = TypeI, aes(x = period, y = type_1_num)) +
geom_line() +
labs(title = "Type I A&E Attendances in England",
subtitle = "2010-2023 Period",
x = "Date",
y = "Total Attendances")
Basic_plot
```
#### 4.3 We might need to change Date format
After importing the data into R, date variables are defined as POSIX date, we can change them to standard Date R format.
```{r}
str(TypeI)
TypeI_date_format <- TypeI %>% mutate(Datef = as.Date(period))
TypeI_date_format
```
#### 4.4 Define date breaks for x axis
We have several years in our data set
```{r}
library(lubridate)
Years_in_data <- TypeI_date_format %>%
select (Datef) %>%
mutate(Year = year(Datef)) %>%
select(Year) %>%
distinct()
Years_in_data
```
```{r}
Total_years <- count(distinct(Years_in_data))
Total_years
```
There are `r paste0(Total_years)` years of data in the original data set.
In the chart below, we can see the X axis only displays two years out of the `r paste0(Total_years)` available in the data set. The way to fix it is by using scale_x_data() function defining labels as years '%Y' with date_breaks defined as '1 year' so X axis will display every single year in the data set.
```{r Dafined x axis breaks}
Basic_plot_format <- ggplot(data = TypeI_date_format, aes(x = Datef, y = type_1_num)) +
geom_line() +
labs(title = "Type I A&E Attendances in England",
subtitle = "2010-2023 Period",
x = "Date",
y = "Total Attendances") +
scale_x_date(date_labels="%Y",date_breaks ="1 year")
Basic_plot_format
```
#### 4.5 Apply theme to the plot
We can apply defined theme() setup to the chart. There are several to choose from:(theme_bw(), theme_dark(), them_grey(), theme_light(), theme_minimal(), theme_classic() ) . I will choose theme_minimal() as this will provide a lower inks ratio outcome.
We apply then theme_minimal() to the existing ggplot chart
```{r Applying theme_minimal theme to chart}
Basic_plot_format <- ggplot(data = TypeI_date_format, aes(x = Datef, y = type_1_num)) +
geom_line() +
labs(title = "Type I A&E Attendances in England",
subtitle = "2010-2023 Period",
x = "Date",
y = "Total Attendances") +
scale_x_date(date_labels="%Y",date_breaks ="1 year") +
theme_minimal()
Basic_plot_format
```
#### 4.6 Finally we apply colour to the line chart
We apply inside geom_line() function two parameters colour to add colour to the existing line and also we include size parameter equal to 1 to increase the size of the line used in the chart:
```{r add line chart}
Basic_plot_format <- ggplot(data = TypeI_date_format, aes(x = Datef, y = type_1_num)) +
geom_line(color="#69b3a2", size =1) +
labs(title = "Type I A&E Attendances in England",
subtitle = "2010-2023 Period",
x = "Date",
y = "Total Attendances") +
scale_x_date(date_labels="%Y",date_breaks ="1 year") +
theme_minimal()
Basic_plot_format
```
Then we can save this plot as an initial basic chart
```{r save plot}
ggsave("plots/01_AE_Attendances_basic_plot.png", width = 6, height = 4)
```
Another variation would be to include a line chart `geom_line()` and a dot chart `geom_point()` combined
```{r add dot chart}
Basic_plot_format <- ggplot(data = TypeI_date_format, aes(x = Datef, y = type_1_num)) +
geom_line(color="#69b3a2", size =1) +
geom_point(shape=21, color="black", fill="#69b3a2", size=1.5) +
labs(title = "Type I A&E Attendances in England",
subtitle = "2010-2023 Period",
x = "Date",
y = "Total Attendances") +
scale_x_date(date_labels="%Y",date_breaks ="1 year") +
theme_minimal()
Basic_plot_format
```
```{r save geom_point plot}
ggsave("plots/02_AE_Attendances_line_point_plot.png", width = 6, height = 4)
```
## 5. Add annotations to plot
The next step will include several annotations to the chart to help the storytelling part when using this chart.
### 5.1 Using annotate() function
From {ggplot2} package we use the annotate() function to create an annotation layer in the chart. This function is highly versatile as it allows us to include both text and geoms such as lines and arrows.
We then only require one extra function geom_segment() also from ggplot2 to draw straight vertical or horizontal reference lines in the chart area.
#### 5.1.1 Adding rectangle annotation
We can start by adding a rectangle to the previous chart
```{r add rectangle to chart}
Basic_plot_format <- ggplot(data = TypeI_date_format, aes(x = Datef, y = type_1_num)) +
geom_line(color="#69b3a2", size =1) +
geom_point(shape=21, color="black", fill="#69b3a2", size=1.5) +
labs(title = "Type I A&E Attendances in England",
subtitle = "2010-2023 Period",
x = "Date",
y = "Total Attendances") +
scale_x_date(date_labels="%Y",date_breaks ="1 year") +
theme_minimal() +
annotate('rect',
xmin = as.Date("2018-11-01"), xmax = as.Date("2019-11-01"),
ymin = 700000, ymax = 1100000,
alpha = 0.5,
fill = 'grey40',
col = 'black'
)
Basic_plot_format
```
#### 5.1.2 Adding text annotation
We can start by adding a rectangle to the previous chart Adding a text annotation:
COVID pandemic
- COVID-19 was confirmed to be present in the UK by the end of January 2020
- A legally-enforced Stay at Home Order, or lockdown, was introduced on 23 March 2021, banning all non-essential travel and contact with other people, and shut schools, businesses, venues and gathering places.
- A third wave of daily infections began in July 2021 due to the arrival and rapid spread of the highly transmissible SARS-CoV-2 Delta variant.\[62\]
So we can establish the COVID-19 pandemic period roughly from 01 January 2020 to 01 August 2021
```{r Text annotation}
Basic_plot_format <- ggplot(data = TypeI_date_format, aes(x = Datef, y = type_1_num)) +
geom_line(color="#69b3a2", size =1) +
geom_point(shape=21, color="black", fill="#69b3a2", size=1.5) +
labs(title = "Type I A&E Attendances in England",
subtitle = "2010-2023 Period",
x = "Date",
y = "Total Attendances") +
scale_x_date(date_labels="%Y",date_breaks ="1 year") +
theme_minimal() +
annotate('rect',xmin = as.Date("2020-01-01"), xmax = as.Date("2021-08-01"),ymin = 500000, ymax = 1100000,
alpha = 0.1 , fill = 'blue',col = 'black') +
annotate('text',x = as.Date("2020-09-01"),y = 1120000,label = "COVID-19")
Basic_plot_format
```
#### 5.1.3 Adding arrows to highlight single values
We can also add a curved line with an arrow at the end pointing to specific values. We need to include the arrow function to the curve annotation created earlier: `arrow = arrow(length = unit(0.4,'cm')))`
Note that the annotate() function is a good alternative that can reduces the code length for simple cases.
## 6. Annotated ggplot chart including Arrows and rectangles
This would be the annotated chart including shadowed rectangle are and an arrow with text to highlight specific data points in the plot area.
```{r Curved line annotation}
Curved_line_annotation <- ggplot(data = TypeI_date_format, aes(x = Datef, y = type_1_num)) +
geom_line(color="#69b3a2", size =1) +
geom_point(shape=21, color="black", fill="#69b3a2", size=1.5) +
labs(title = "Type I A&E Attendances in England",
subtitle = "2010-2023 Period",
x = "Date",
y = "Total Attendances") +
scale_x_date(date_labels="%Y",date_breaks ="1 year") +
theme_minimal() +
annotate('rect',xmin = as.Date("2020-01-01"), xmax = as.Date("2021-08-01"),ymin = 500000, ymax = 1100000,
alpha = 0.1 , fill = 'blue',col = 'black') +
annotate('text',x = as.Date("2020-09-01"),y = 1120000,label = "COVID-19") +
annotate('curve',
x = as.Date("2018-10-01"),
xend = as.Date("2019-12-01"),
y = 620000,
yend = 520000,
linewidth = 1,
curvature = 0.5,
# Adding an Arrow to my curvature
arrow = arrow(length = unit(0.4,'cm'))) +
annotate('text',x = as.Date("2018-06-27"),y = 700000,label = "Lowest \n Attendances level")
Curved_line_annotation
```
```{r save annotations plot}
ggsave("plots/07_AE_Attendances_annotations_plot.png", width = 6, height = 4)
```
## 7. Including vertical reference lines
In this new section we will include three vertical lines for each of the **lock downs**, the functions below can also be used to draw horizontal reference lines.
Set of geoms used in charts to annotate different sections:
- **rectangle:** when annotate() argument takes 'react' value. This displays a rectable in the GGPLOT chart defined by x and Y coordinates, and also color and transparency argument (alpha).
- Reference:<https://r-graphics.org/recipe-annotate-rect>
- **text:** when annotate() argument takes 'text' value. This argument displays a text in the chart. Text content defined by label argument, and color, text position in the chart defined by x and y coordinates.
- Reference:<https://r-graphics.org/recipe-annotate-text>
- **curve:**when annotate() argument takes 'curve' value. This displays a curved segment.
- Reference:<https://stackoverflow.com/questions/53858124/r-how-to-annotate-curves>
- **arrow:**when annotate() argument takes 'arrow' value. This displays a pointy end segment.
- Reference:<https://r-graphics.org/recipe-annotate-segment>
Draw lines across the chart:
- **geom_vline:** This geom allows you to annotate the plot with vertical lines. It has a xintercept argument and also a color and size arguments.
- Reference: <https://ggplot2.tidyverse.org/reference/geom_abline.html>
- **geom_segment:** geom_segment will only draw between specific end points. It helps to make a data frame with the relevant information for drawing the lines.
- Reference: <https://ggplot2.tidyverse.org/reference/geom_segment.html>
```{r lockdown dates}
Curved_line_annotation <- ggplot(data = TypeI_date_format, aes(x = Datef, y = type_1_num)) +
geom_line(color="#69b3a2", size =1) +
geom_point(shape=21, color="black", fill="#69b3a2", size=1.5) +
labs(title = "Type I A&E Attendances in England",
subtitle = "2010-2023 Period",
x = "Date",
y = "Total Attendances") +
scale_x_date(date_labels="%Y",date_breaks ="1 year") +
theme_minimal() +
annotate('rect',xmin = as.Date("2020-01-01"), xmax = as.Date("2021-08-01"),ymin = 500000, ymax = 1100000,
alpha = 0.1 , fill = 'blue',col = 'black') +
annotate('text',x = as.Date("2020-09-01"),y = 1130000,label = "COVID-19", color="red",size = 5) +
annotate('curve',
x = as.Date("2018-10-01"),
xend = as.Date("2019-12-01"),
y = 620000,
yend = 520000,
linewidth = 1,
curvature = 0.5,
# Adding an Arrow to my curvature
arrow = arrow(length = unit(0.2,'cm'))) +
annotate('text',x = as.Date("2018-06-27"),y = 700000,label = "Lowest \n Attendances level") +
#ADD LOCKDOWN DATES REFERENCE LINES
# Replacing vline by geom_segment (it allows controlling for reference lines lenght)
# First lockdown 26 March 2020
geom_segment(aes(x = as.Date("2020-03-26"), y = 500000, yend = 1100000,
xend = as.Date("2020-03-26"), colour = "segment"),
linewidth=1.5,color="orange") +
# Second Lockdown 5 November 2020
geom_segment(aes(x = as.Date("2020-11-05"), y = 500000, yend = 1100000,
xend = as.Date("2020-11-05"), colour = "segment"),
linewidth=1.5,color="orange") +
# Third lockdown 6 January 2021
geom_segment(aes(x = as.Date("2021-01-06"), y = 500000, yend = 1100000,
xend = as.Date("2021-01-06"), colour = "segment"),
linewidth=1.5,color="orange") +
# Text annotations for each lockdown
annotate('text',x = as.Date("2019-07-20"),y = 1200000,label = "First lockdown") +
annotate('text',x = as.Date("2019-05-26"),y = 1164000,label = "01/08/2020") +
annotate('text',x = as.Date("2022-10-20"),y = 1100000,label = "Second lockdown") +
annotate('text',x = as.Date("2023-04-20"),y = 1070000,label = "05/11/2020") +
annotate('text',x = as.Date("2022-10-20"),y = 980000,label = "Third lockdown") +
annotate('text',x = as.Date("2023-04-20"),y = 950000,label = "06/01/2021") +
# Adding arrows to lockdowns labels
# First lockdown arrow
annotate('curve',x = as.Date("2019-06-01"),xend = as.Date("2020-02-20"),
y = 1140000,yend = 1090000,
linewidth = 1,curvature = 0.6,arrow = arrow(length = unit(0.2,'cm'))) +
# Second lockdown arrow
annotate('curve',x = as.Date("2022-01-01"),xend = as.Date("2020-11-01"),y = 1080000,yend = 970000,
linewidth = 1,curvature = 0.6,arrow = arrow(length = unit(0.2,'cm'))) +
# Third lockdown arrow
annotate('curve',x = as.Date("2022-01-01"),xend = as.Date("2021-02-01"),y = 990000,yend = 910000,
linewidth = 1,curvature = 0.6,arrow = arrow(length = unit(0.2,'cm')))
Curved_line_annotation
```
### 7.1 List of lockdown relevant dates:
*First national lockdown (March to June 2020)*
England was in national lockdown between late March and June 2020. Lockdown measures legally came into force for the first time on **26 March 2020**.
First nationwide lockdown was legally effective from 1:00pm on **26th March 2020**.
Initially, all "non-essential" high street businesses were closed and people were ordered to stay at home, permitted to leave for essential purposes only, such as buying food or for medical reasons. Starting in May 2020, the laws were slowly relaxed. People were permitted to leave home for outdoor recreation (beyond exercise) from 13 May. On 1 June, the restriction on leaving home was replaced with a requirement to be home overnight, and people were permitted to meet outside in groups of up to six people.
*Second national lockdown (November 2020)*
On **5 November 2020**, national restrictions were reintroduced in England. During the second national lockdown, non-essential high street businesses were closed, and people were prohibited from meeting those not in their "support bubble" inside. People could leave home to meet one person from outside their support bubble outdoors.
*Third national lockdown (January to March 2021)*
Following concerns that the four-tier system was not containing the spread of the Alpha variant, national restrictions were reintroduced for a third time on **6 January 2021**.
The rules during the third lockdown were more like those in the first lockdown. People were once again told to stay at home. However, people could still form support bubbles (if eligible) and some gatherings were exempted from the gatherings ban (for example, religious services and some small weddings were permitted).
Reference for National lockdown dates: <https://commonslibrary.parliament.uk/research-briefings/cbp-9068/>