Skip to content

Commit 71e9ccc

Browse files
committed
Add HDDD file
1 parent 1780626 commit 71e9ccc

2 files changed

Lines changed: 380 additions & 2 deletions

File tree

vignettes/biocasia2025.Rmd

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -125,13 +125,13 @@ which provides essential methods for conducting microbiome analysis.
125125

126126
#### Start your engines!
127127

128-
- Prepare your local R session
128+
- Open your virtual machine.
129129

130130
#### Import data
131131

132132
- [Import](https://microbiome.github.io/outreach/import.html#/importers-and-converters){preview-link="true"}
133133

134-
Below, we we import a dataset containing 60 samples from healthy controls and
134+
Below, we import a dataset containing 60 samples from healthy controls and
135135
patients with colorectal cancer (CRC). First, we import the data files.
136136

137137
```{r}

vignettes/hddd_bioinfo.Rmd

Lines changed: 378 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,378 @@
1+
---
2+
title: Bioinformatics and a changing society, 4th December 2025
3+
vignette: >
4+
%\VignetteIndexEntry{Bioinformatics and a changing society}
5+
%\VignetteEngine{knitr::rmarkdown}
6+
%\VignetteEncoding{UTF-8}
7+
output:
8+
html_document:
9+
message: false
10+
warning: false
11+
bibliography: ../inst/bibliography.bib
12+
---
13+
14+
```{r}
15+
#| label: setup
16+
#| include: false
17+
18+
knitr::opts_chunk$set(
19+
collapse = TRUE,
20+
comment = "#>",
21+
warning = FALSE,
22+
message = FALSE
23+
)
24+
```
25+
26+
Authors:
27+
Tuomas Borman^[University of Turku, tvborm@utu.fi],
28+
Matti Ruuskanen
29+
<br/>
30+
Last modified: 2 December, 2025.
31+
32+
<img src="figures/bioc_sticker.png" width="150"/> <img src="figures/mia_logo.png" width="150"/>
33+
34+
## Overview
35+
36+
### Description
37+
38+
Because of the complex nature of microbiome data, robust and reproducible
39+
computational approaches are essential. This workshop introduces the latest
40+
advances in microbiome analysis within Bioconductor, focusing on the
41+
`r BiocStyle::Biocpkg("mia")` (Microbiome Analysis) framework. Participants wil
42+
gain hands-on experience with data handling, visualization, and analysis through
43+
a practical case study. The workshop will also introduce the
44+
[Orchestrating Microbiome Analysis (OMA) online book](https://microbiome.github.io/OMA/docs/devel/),
45+
a freely available resource that promotes best practices and supports adoption
46+
of the ecosystem. Together, these resources enable scalable, transparent, and
47+
community-driven microbiome data science.
48+
49+
### Pre-requisites
50+
51+
To get most of the training session, you should meet the following
52+
pre-requisites.
53+
54+
- You have a basic understanding of R. You have written simple R scripts or used Quarto/RMarkdown documents.
55+
- You have basic understanding on what the microbiome is.
56+
57+
If your time allows, we recommend to spend some time to explore beforehand
58+
[Orchestrating Microbiome Analysis (OMA) online book](https://microbiome.github.io/OMA/docs/devel/).
59+
60+
### Participation
61+
62+
Participants are encouraged to ask questions throughout the workshop. The
63+
session will follow
64+
[a tutorial](https://microbiome.github.io/OMATutorials/),
65+
with participants running the tutorial alongside the instructor.
66+
67+
### _R_ / _Bioconductor_ packages used
68+
69+
In this training session, we will cover a common methods and packages for
70+
microbiome data science in `r BiocStyle::Biocpkg("SummarizedExperiment")`
71+
ecosystem. We will have specific focus on `r BiocStyle::Biocpkg("mia")`,
72+
which provides essential methods for conducting microbiome analysis.
73+
74+
### Time outline
75+
76+
| Activity | Time |
77+
|--------------------------------------|------------|
78+
| Practicalities and background | 20m |
79+
| Trained-guided live coding | 40m |
80+
| Break | 10m |
81+
| Trained-guided live coding continues | 40m |
82+
| Questions, discussion and recap | 10m |
83+
| **Total** | **2h** |
84+
85+
### Learning goals and objectives
86+
87+
#### Questions
88+
89+
- What is _mia_ and _OMA_?
90+
- How microbiome data science is conducted in `r BiocStyle::Biocpkg("SummarizedExperiment")` ecosystem?
91+
- What benefits this new ecosystem have compared to previous approaches?
92+
93+
#### Objectives
94+
95+
- **Analyze and apply methods**: Apply the `r BiocStyle::Biocpkg("SummarizedExperiment")` ecosystem to process and analyze microbiome data.
96+
- **Create visualizations**: Generate and interpret visualizations.
97+
- **Explore documentation**: Use the [OMA](https://microbiome.github.io/OMA/docs/devel) to explore additional tools and methods.
98+
99+
## Training session
100+
101+
### Background
102+
103+
- [Bioconductor](https://microbiome.github.io/outreach/bioconductor.html){preview-link="true"}
104+
- [Data containers](https://microbiome.github.io/outreach/data_containers.html){preview-link="true"}
105+
106+
### Trained-guided live coding
107+
108+
#### Start your engines!
109+
110+
Joining the Noppe virtual machine:
111+
112+
1. Go to [Noppe](https://noppe.2.rahtiapp.fi/)
113+
2. Log in with Haka (University account) or CSC id.
114+
3. Click “Join workspace", ask join code from theinstructor.
115+
4. My workspaces -> HDDD Bioinfo 25 -> Click "power button".
116+
117+
#### Import data
118+
119+
Below, we import a dataset containing 60 samples from healthy controls and
120+
patients with colorectal cancer (CRC). First, we import the data files.
121+
122+
```{r}
123+
#| label: import_files
124+
library(ape)
125+
126+
dir_name <- file.path("data", "GuptaA_2019")
127+
128+
# Abundance table
129+
path <- file.path(dir_name, "taxonomy_abundance.csv")
130+
assay <- read.csv(path, row.names = 1L)
131+
132+
# Taxonomy table
133+
path <- file.path(dir_name, "taxonomy_table.csv")
134+
taxonomy_table <- read.csv(path, row.names = 1L)
135+
136+
# Sample metadata
137+
path <- file.path(dir_name, "sample_metadata.csv")
138+
sample_metadata <- read.csv(path, row.names = 1L)
139+
140+
# Phylogeny
141+
path <- file.path(dir_name, "phylogeny.tree")
142+
phylogeny <- read.tree(path)
143+
```
144+
145+
Then we create `r BiocStyle::Biocpkg("TreeSummarizedExperiment")` object.
146+
**Note:** data types must be in specific format.
147+
148+
```{r}
149+
#| label: create_tse
150+
library(mia)
151+
# Abundance table
152+
assay <- assay |> as.matrix()
153+
assay_list <- SimpleList(counts = assay)
154+
155+
# Taxonomy table and sample metadata
156+
taxonomy_table <- taxonomy_table |> DataFrame()
157+
sample_metadata <- sample_metadata |> DataFrame()
158+
159+
# Construct TreeSE
160+
tse <- TreeSummarizedExperiment(
161+
assays = assay_list,
162+
rowData = taxonomy_table,
163+
colData = sample_metadata,
164+
rowTree = phylogeny
165+
)
166+
```
167+
168+
#### Data container
169+
170+
`r BiocStyle::Biocpkg("TreeSummarizedExperiment")` extends
171+
`r BiocStyle::Biocpkg("SummarizedExperiment")` class by adding a support for
172+
microbiome-specific datatypes. These include, for instance, `rowTree` slot that
173+
can be utilized to store phylogeny or any other hierarchical presentation of
174+
the data. All slots derived from `r BiocStyle::Biocpkg("SummarizedExperiment")`
175+
class are also available in `r BiocStyle::Biocpkg("TreeSummarizedExperiment")`,
176+
providing full backward compatibility.
177+
178+
::: columns
179+
::: column
180+
![](figures/TreeSE.png){width=100%}
181+
:::
182+
183+
::: column
184+
185+
```{r}
186+
#| label: print_treese
187+
tse
188+
```
189+
190+
:::
191+
:::
192+
193+
Slots can be accessed with dedicated accessor functions. For instance,
194+
`colData` (sample metadata) can be accessed with `colData()` function.
195+
196+
```{r}
197+
#| label: show_coldata
198+
# Show only first five rows and columns
199+
colData(tse)[1:5, 1:5]
200+
```
201+
202+
The key functionality of data containers is that it does the sample and feature
203+
bookkeeping for us. E.g., we can subset the data container without need for
204+
worrying about sample matching between abundance table and sample metadata.
205+
206+
```{r}
207+
#| label: subset
208+
tse[1:10, c(1, 2)]
209+
```
210+
211+
#### Data processing
212+
213+
Microbiome data has unique characteristics, meaning that dealing with such data
214+
also poses unique challenges and approaches. The `r BiocStyle::Biocpkg("mia")`
215+
package provides methods for performing common operations on microbiome data
216+
within the `r BiocStyle::Biocpkg("SummarizedExperiment")` ecosystem.
217+
218+
##### Transformation
219+
220+
Microbiome data is typically zero-inflated, meaning that there are lots of
221+
unobserved features. Let's first visualize the distribution of counts.
222+
223+
```{r}
224+
#| label: show_histogram
225+
library(miaViz)
226+
227+
plotHistogram(tse, assay.type = "counts")
228+
```
229+
230+
As we can see, the distribution is highly right-skewed. To make the data more
231+
normally-distributed, one can apply centered log-ratio transformation.
232+
233+
```{r}
234+
#| label: transformation
235+
tse <- transformAssay(
236+
tse,
237+
assay.type = "counts",
238+
method = "rclr"
239+
)
240+
```
241+
242+
And when we visualize the distribution...
243+
244+
```{r}
245+
#| label: visualize_clr
246+
plotHistogram(tse, assay.type = "rclr")
247+
```
248+
249+
... we see that the data is centered at zero and exhibit a distribution that is
250+
more similar to normal than before.
251+
252+
We can access the transformed data with the following command:
253+
254+
```{r}
255+
#| label: access_clr
256+
assay(tse, "rclr")[1:5, 1:5]
257+
```
258+
259+
#### Alpha diversity
260+
261+
- [Alpha diversity](https://microbiome.github.io/outreach/alpha_diversity.html)
262+
263+
Alpha diversity indices can be calculated with `addAlpha()`.
264+
265+
```{r}
266+
#| label: calculate_alpha
267+
tse <- addAlpha(tse, assay.type = "counts")
268+
```
269+
270+
The results are stored in `colData`. By default, the function returns a set of
271+
indices that considers different aspects of diversity. Below, we visualize
272+
Faith's diversity that assess the phylogenetic diversity of samples.
273+
274+
```{r}
275+
#| label: visualize_alpha
276+
plotBoxplot(tse, col.var = "faith_diversity", x = "disease")
277+
```
278+
279+
From the figure, we can observe that CRC patients have more diverse microbiomes.
280+
This may suggest that their gut is colonized by microbes that are not typically
281+
present in a healthy gut.
282+
283+
#### Beta diversity
284+
285+
- [Ordination](https://microbiome.github.io/outreach/ordination.html)
286+
287+
A common beta diversity method is Principal Coordinate Analysis (PCoA) also
288+
known as Multi-dimensional Scaling (MDS). It is unsupervised technique that can
289+
be utilized to find patterns from the data.
290+
291+
```{r}
292+
#| label: calculate_mds
293+
tse <- addMDS(
294+
tse,
295+
assay.type = "counts",
296+
method = "unifrac"
297+
)
298+
```
299+
300+
PCoA results are commonly visualized with a scatter plot. Here we color points
301+
based on disease.
302+
303+
```{r}
304+
#| label: visualize_mds
305+
library(scater)
306+
307+
plotReducedDim(tse, dimred = "MDS", colour_by = "disease")
308+
```
309+
310+
We can see clear pattern. CRC patients' microbiome profile seem to differ from
311+
healthy ones.
312+
313+
Next, we can utilize distance-based Redundancy Analysis (dbRDA). It is similar
314+
to PCoA, but it specifically aims to assess how much variance or association is
315+
accounted to sample covariates.
316+
317+
```{r}
318+
#| label: rda
319+
tse <- addRDA(
320+
tse,
321+
assay.type = "rclr",
322+
method = "euclidean",
323+
formula = x ~ disease + gender
324+
)
325+
```
326+
327+
Similarly, we can visualize the results with a biplot, specific type of scatter
328+
plot.
329+
330+
```{r}
331+
#| label: plot_rda
332+
plotRDA(tse, dimred = "RDA", colour.by = "disease")
333+
```
334+
335+
Feature loadings from the dbRDA analysis offer a first detailed look at the
336+
features that are associated with CRC.
337+
338+
```{r}
339+
#| label: rda_loadings
340+
#| fig-width: 10
341+
#| fig-height: 3
342+
plotLoadings(tse, dimred = "RDA", ncomponents = 2L, layout = "lollipop")
343+
```
344+
345+
For instance, _Prevotella copri_ is positively associated with the
346+
first coordinate (or x-axis in our biplot). Because, CRC was also positively
347+
associated with the first coordinate, this suggests association between higher
348+
abundance of _Prevotella copri_ and CRC.
349+
350+
#### Online book
351+
352+
![[microbiome.github.io/OMA](https://microbiome.github.io/OMA/docs/devel/)](figures/OMA_ss.png){width=500px}
353+
354+
## Questions, discussion and recap
355+
356+
1. Microbiome data science in `r BiocStyle::Biocpkg("SummarizedExperiment")` ecosystem
357+
2. Scalable and computationally efficient
358+
3. Integration of multi-table and multi-omics datasets
359+
360+
## Thank you for your time!
361+
362+
**Join us!**
363+
364+
- Online book: [microbiome.github.io/OMA](https://microbiome.github.io/OMA/docs/devel/)
365+
- Discussion forums: [github.com/microbiome/OMA/discussions](https://github.com/microbiome/OMA/discussions) and Bioconductor Zulip
366+
367+
<img src="figures/bioc_sticker.png" width="150"/> <img src="figures/mia_logo.png" width="150"/>
368+
369+
## Session information
370+
371+
```{r}
372+
#| label: session_info
373+
374+
sessionInfo()
375+
```
376+
377+
## References
378+

0 commit comments

Comments
 (0)