-
Notifications
You must be signed in to change notification settings - Fork 7
Expand file tree
/
Copy pathggplot-workshop-TEMPLATE.Rmd
More file actions
139 lines (78 loc) · 6.18 KB
/
ggplot-workshop-TEMPLATE.Rmd
File metadata and controls
139 lines (78 loc) · 6.18 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
---
title: "2. Introduction to {ggplot2}"
author: "Britta Schumacher"
date: "Last updated: `r format(Sys.time(), '%d %B, %Y')`"
output:
html_document:
theme: yeti
toc: yes
toc_float: true
---
*FIRST*: Many thanks to Alison Horst for her gorgeous [aRt](https://github.com/allisonhorst/stats-illustrations).
# Data Visualization with `ggplot2`
### What is `ggplot2`?
<center>

</center>
`ggplot2` is the bread and butter of data visualization; it takes clean, organized, manipulated data, and (with your direction) builds beautiful, & more importantly, *communicative*, plots. `ggplot2` is based on the *grammar of graphics*, which asserts that we can build every graph from the same few components: 1) a clean and tidy **data set**; 2) a set of **geoms** that map out data points; and 3) a **coordinate system** (in the broadest sense; i.e., how will the data be mapped onto a graphical surface).
To actually display data values, we map out our variables to aesthetic things like **size**, **color**, and **x** and **y** locations; we also tell `ggplot2` the type of visualization we are interested in building (e.g., bar graph, box plot, line graph, density plots, etc.). `ggplot2` opens up [data science](https://r4ds.had.co.nz/introduction.html) to broader audiences and helps all of us communicate our science. Download [this](https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf) cheat sheet and save it somewhere accessible--it's an incredible tool to refer back to!
`ggplot2` works with `data.frames`, the data type we built in our previous `tidyverse` workshop. The data we feed into `ggplot2` consists of rows and columns that "live peacefully" together in a tidy `data.frame`. This means, often, we need to do some data cleaning up front to get our data into the **tidy** format we need for visualization.
# Now let's practice on some REAL*ish* data!
### Load the `tidyverse` and any additional required packages:
```{r}
```
### Explore:
We should first familiarize ourselves with the data.
```{r, eval = FALSE}
```
### Wrangle:
This dataset is **pretty big**--we'll want to wrangle it so that it only includes the information that we're interested in. We will:
a. filter for the African continent
b. select relevant columns of data
c. rename columns
d. create new columns.
Let's create an efficient workflow by combining **ALL** of these data wrangling steps into one; i.e., let's review harnessing the great POWERS of `tidyverse`!
```{r}
```
With this simplified and cleaned data set, we're ready to explore! Let's first isolate data we want to visualize by:
a. selecting Malawi, Tanzania, Mozambique, Zimbabwe, and Zambia;
b. grouping observations by country;
b. finding the average life expectancy, and per capita GDP for each country combination
```{r}
```
### Column graph
Now that we have our data summarised and in tidy format, we're ready to make a plot! We want to:
a. create a column graph showing the summarised by country
b. make it pretty
**Note:** Only the first 3 lines of the following code are necessary to make the plot. Everything else simply modifies the appearance and make it a bit more presentable. There are *tons* of ways to customize plots -- we explore only a few options below.
```{r, fig.align = 'center', fig.width = 15, fig.height = 10}
```
### Line graph
Let's visualize this data in a different way. Let's:
a. create a line graph that shows how GDP per capita and life expectancy relate by country
b. make it pretty
```{r fig.align = 'center', fig.width = 15, fig.height = 10}
```
### Histogram
Let's visualize this data in a different way; Let's:
a. create a histogram that shows the range of life expectancy by country
b. make it pretty
```{r}
```
### Plots, plots, plots, plots, plots, plots, plots!
There are MANY different ways to visualize the same data:
<center>

</center>
... In the end we just have to choose the method that makes the most sense for our audience, our research questions, and our data!
Additionally, aRtists everywhere are using R to make ridiculously cool pieces of [generative](https://en.wikipedia.org/wiki/Generative_art) and patterned art. Check out some talented RLadies [here](https://ijeamaka-anyene.netlify.app/) and [here](https://djnavarro.net/projects/)! Plus, [#TidyTuesday](https://github.com/rfordatascience/tidytuesday) submissions can be [absolutely](https://tanyadoesscience.com/project/tidytuesday/), [wildly](https://martindevaux.com/2021/01/tidytuesday-art-collections/), [brilliantly](https://codingwithavery.com/posts/2020-08-12-tidy-tuesday-avatar/), good.
# Bonus
What's *especially* neat about `ggplot2` is that it interfaces seamlessly with `sf`, a package for spatial data wrangling, analysis, and viz. `sf` treats its spatial objects as `data.frames` meaning.... we can use all of the magic & power of the `tidyverse` to play with the data! Plus, these data look just like any other data.frame, except attached to them is a column `geometry` that houses the coordinate information for the observation!
This is how we build the maps in our OSDS reports!
There are many, many ways to make maps beautiful maps in `ggplot` using `sf`. Anyone who says R isn't built for spatial viz just hasn't tried hard enough--check out [this](https://timogrossenbacher.ch/2019/04/bivariate-maps-with-ggplot2-and-sf/) if you don't believe me (just about the most gorgeous, creative, and well-built map I've ever seen).
-----------
# Resources
* Emily Burchfield's data viz materials, [here](https://www.emilyburchfield.org/courses/data_viz/intro_to_ggplot_tutorial).
* [Secrets of a happy graphing life](https://stat545.com/secrets.html).
* [The ggplot2 cheatsheet!](https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf)
* Chapters 5 - 12 in [this online textbook](https://clauswilke.com/dataviz/visualizing-amounts.html) provide a great overview of when to use different visualization strategies.