Add HDDD file

TuomasBorman · TuomasBorman · commit 71e9ccccb6a1 · 2025-12-02T09:49:09.000+02:00
diff --git a/vignettes/biocasia2025.Rmd b/vignettes/biocasia2025.Rmd
@@ -125,13 +125,13 @@ which provides essential methods for conducting microbiome analysis.
 
 #### Start your engines!
 
-- Prepare your local R session
+- Open your virtual machine.
 
 #### Import data
 
 - [Import](https://microbiome.github.io/outreach/import.html#/importers-and-converters){preview-link="true"}
 
-Below, we we import a dataset containing 60 samples from healthy controls and
+Below, we import a dataset containing 60 samples from healthy controls and
 patients with colorectal cancer (CRC). First, we import the data files.
 
 ```{r}
diff --git a/vignettes/hddd_bioinfo.Rmd b/vignettes/hddd_bioinfo.Rmd
@@ -0,0 +1,378 @@
+---
+title: Bioinformatics and a changing society, 4th December 2025
+vignette: >
+    %\VignetteIndexEntry{Bioinformatics and a changing society}
+    %\VignetteEngine{knitr::rmarkdown}
+    %\VignetteEncoding{UTF-8}
+output:
+  html_document:
+    message: false
+    warning: false
+bibliography: ../inst/bibliography.bib
+---
+
+```{r}
+#| label: setup
+#| include: false
+
+knitr::opts_chunk$set(
+    collapse = TRUE,
+    comment = "#>",
+    warning = FALSE,
+    message = FALSE
+)
+```
+
+Authors:
+    Tuomas Borman^[University of Turku, tvborm@utu.fi],
+    Matti Ruuskanen
+    <br/>
+Last modified: 2 December, 2025.
+
+<img src="figures/bioc_sticker.png" width="150"/> <img src="figures/mia_logo.png" width="150"/>
+
+## Overview
+
+### Description
+
+Because of the complex nature of microbiome data, robust and reproducible
+computational approaches are essential. This workshop introduces the latest
+advances in microbiome analysis within Bioconductor, focusing on the
+`r BiocStyle::Biocpkg("mia")` (Microbiome Analysis) framework. Participants wil
+gain hands-on experience with data handling, visualization, and analysis through
+a practical case study. The workshop will also introduce the
+[Orchestrating Microbiome Analysis (OMA) online book](https://microbiome.github.io/OMA/docs/devel/),
+a freely available resource that promotes best practices and supports adoption
+of the ecosystem. Together, these resources enable scalable, transparent, and
+community-driven microbiome data science.
+
+### Pre-requisites
+
+To get most of the training session, you should meet the following
+pre-requisites. 
+
+- You have a basic understanding of R. You have written simple R scripts or used Quarto/RMarkdown documents.
+- You have basic understanding on what the microbiome is.
+
+If your time allows, we recommend to spend some time to explore beforehand
+[Orchestrating Microbiome Analysis (OMA) online book](https://microbiome.github.io/OMA/docs/devel/).
+
+### Participation
+
+Participants are encouraged to ask questions throughout the workshop. The
+session will follow
+[a tutorial](https://microbiome.github.io/OMATutorials/),
+with participants running the tutorial alongside the instructor.
+
+### _R_ / _Bioconductor_ packages used
+
+In this training session, we will cover a common methods and packages for
+microbiome data science in `r BiocStyle::Biocpkg("SummarizedExperiment")`
+ecosystem. We will have specific focus on `r BiocStyle::Biocpkg("mia")`,
+which provides essential methods for conducting microbiome analysis.
+
+### Time outline
+
+| Activity                             | Time       |
+|--------------------------------------|------------|
+| Practicalities and background        | 20m        |
+| Trained-guided live coding           | 40m        |
+| Break                                | 10m        |
+| Trained-guided live coding continues | 40m        |
+| Questions, discussion and recap      | 10m        |
+| **Total**                            | **2h**     |
+
+### Learning goals and objectives
+
+#### Questions
+
+- What is _mia_ and _OMA_?
+- How microbiome data science is conducted in `r BiocStyle::Biocpkg("SummarizedExperiment")` ecosystem?
+- What benefits this new ecosystem have compared to previous approaches?
+
+#### Objectives
+
+- **Analyze and apply methods**:  Apply the `r BiocStyle::Biocpkg("SummarizedExperiment")` ecosystem to process and analyze microbiome data.
+- **Create visualizations**: Generate and interpret visualizations.
+- **Explore documentation**: Use the [OMA](https://microbiome.github.io/OMA/docs/devel) to explore additional tools and methods.
+
+## Training session
+
+### Background
+
+- [Bioconductor](https://microbiome.github.io/outreach/bioconductor.html){preview-link="true"}
+- [Data containers](https://microbiome.github.io/outreach/data_containers.html){preview-link="true"}
+
+### Trained-guided live coding
+
+#### Start your engines!
+
+Joining the Noppe virtual machine:
+
+1. Go to [Noppe](https://noppe.2.rahtiapp.fi/)
+2. Log in with Haka (University account) or CSC id.
+3. Click “Join workspace", ask join code from theinstructor.
+4. My workspaces -> HDDD Bioinfo 25 -> Click "power button".
+
+#### Import data
+
+Below, we import a dataset containing 60 samples from healthy controls and
+patients with colorectal cancer (CRC). First, we import the data files.
+
+```{r}
+#| label: import_files
+library(ape)
+
+dir_name <- file.path("data", "GuptaA_2019")
+
+# Abundance table
+path <- file.path(dir_name, "taxonomy_abundance.csv")
+assay <- read.csv(path, row.names = 1L)
+
+# Taxonomy table
+path <- file.path(dir_name, "taxonomy_table.csv")
+taxonomy_table <- read.csv(path, row.names = 1L)
+
+# Sample metadata
+path <- file.path(dir_name, "sample_metadata.csv")
+sample_metadata <- read.csv(path, row.names = 1L)
+
+# Phylogeny
+path <- file.path(dir_name, "phylogeny.tree")
+phylogeny <- read.tree(path)
+```
+
+Then we create `r BiocStyle::Biocpkg("TreeSummarizedExperiment")` object.
+**Note:** data types must be in specific format.
+
+```{r}
+#| label: create_tse
+library(mia)
+# Abundance table
+assay <- assay |> as.matrix()
+assay_list <- SimpleList(counts = assay)
+
+# Taxonomy table and sample metadata
+taxonomy_table <- taxonomy_table |> DataFrame()
+sample_metadata <- sample_metadata |> DataFrame()
+
+# Construct TreeSE
+tse <- TreeSummarizedExperiment(
+    assays = assay_list, 
+    rowData = taxonomy_table,
+    colData = sample_metadata,
+    rowTree = phylogeny
+)
+```
+
+#### Data container
+
+`r BiocStyle::Biocpkg("TreeSummarizedExperiment")` extends
+`r BiocStyle::Biocpkg("SummarizedExperiment")` class by adding a support for
+microbiome-specific datatypes. These include, for instance, `rowTree` slot that
+can be utilized to store phylogeny or any other hierarchical presentation of
+the data. All slots derived from `r BiocStyle::Biocpkg("SummarizedExperiment")`
+class are also available in `r BiocStyle::Biocpkg("TreeSummarizedExperiment")`,
+providing full backward compatibility.
+
+::: columns
+::: column
+![](figures/TreeSE.png){width=100%}
+:::
+
+::: column
+
+```{r}
+#| label: print_treese
+tse
+```
+
+:::
+:::
+
+Slots can be accessed with dedicated accessor functions. For instance,
+`colData` (sample metadata) can be accessed with `colData()` function.
+
+```{r}
+#| label: show_coldata
+# Show only first five rows and columns
+colData(tse)[1:5, 1:5]
+```
+
+The key functionality of data containers is that it does the sample and feature
+bookkeeping for us. E.g., we can subset the data container without need for
+worrying about sample matching between abundance table and sample metadata.
+
+```{r}
+#| label: subset
+tse[1:10, c(1, 2)]
+```
+
+#### Data processing
+
+Microbiome data has unique characteristics, meaning that dealing with such data
+also poses unique challenges and approaches. The `r BiocStyle::Biocpkg("mia")`
+package provides methods for performing common operations on microbiome data
+within the `r BiocStyle::Biocpkg("SummarizedExperiment")` ecosystem.
+
+##### Transformation
+
+Microbiome data is typically zero-inflated, meaning that there are lots of
+unobserved features. Let's first visualize the distribution of counts.
+
+```{r}
+#| label: show_histogram
+library(miaViz)
+
+plotHistogram(tse, assay.type = "counts")
+```
+
+As we can see, the distribution is highly right-skewed. To make the data more
+normally-distributed, one can apply centered log-ratio transformation.
+
+```{r}
+#| label: transformation
+tse <- transformAssay(
+    tse,
+    assay.type = "counts",
+    method = "rclr"
+)
+```
+
+And when we visualize the distribution...
+
+```{r}
+#| label: visualize_clr
+plotHistogram(tse, assay.type = "rclr")
+```
+
+... we see that the data is centered at zero and exhibit a distribution that is
+more similar to normal than before.
+
+We can access the transformed data with the following command:
+
+```{r}
+#| label: access_clr
+assay(tse, "rclr")[1:5, 1:5]
+```
+
+#### Alpha diversity
+
+- [Alpha diversity](https://microbiome.github.io/outreach/alpha_diversity.html)
+
+Alpha diversity indices can be calculated with `addAlpha()`.
+
+```{r}
+#| label: calculate_alpha
+tse <- addAlpha(tse, assay.type = "counts")
+```
+
+The results are stored in `colData`. By default, the function returns a set of
+indices that considers different aspects of diversity. Below, we visualize
+Faith's diversity that assess the phylogenetic diversity of samples.
+
+```{r}
+#| label: visualize_alpha
+plotBoxplot(tse, col.var = "faith_diversity", x = "disease")
+```
+
+From the figure, we can observe that CRC patients have more diverse microbiomes.
+This may suggest that their gut is colonized by microbes that are not typically
+present in a healthy gut.
+
+#### Beta diversity
+
+- [Ordination](https://microbiome.github.io/outreach/ordination.html)
+
+A common beta diversity method is Principal Coordinate Analysis (PCoA) also
+known as Multi-dimensional Scaling (MDS). It is unsupervised technique that can
+be utilized to find patterns from the data.
+
+```{r}
+#| label: calculate_mds
+tse <- addMDS(
+    tse,
+    assay.type = "counts",
+    method = "unifrac"
+)
+```
+
+PCoA results are commonly visualized with a scatter plot. Here we color points
+based on disease.
+
+```{r}
+#| label: visualize_mds
+library(scater)
+
+plotReducedDim(tse, dimred = "MDS", colour_by = "disease")
+```
+
+We can see clear pattern. CRC patients' microbiome profile seem to differ from
+healthy ones.
+
+Next, we can utilize distance-based Redundancy Analysis (dbRDA). It is similar
+to PCoA, but it specifically aims to assess how much variance or association is
+accounted to sample covariates.
+
+```{r}
+#| label: rda
+tse <- addRDA(
+    tse,
+    assay.type = "rclr",
+    method = "euclidean",
+    formula = x ~ disease + gender
+)
+```
+
+Similarly, we can visualize the results with a biplot, specific type of scatter
+plot.
+
+```{r}
+#| label: plot_rda
+plotRDA(tse, dimred = "RDA", colour.by = "disease")
+```
+
+Feature loadings from the dbRDA analysis offer a first detailed look at the
+features that are associated with CRC.
+
+```{r}
+#| label: rda_loadings
+#| fig-width: 10
+#| fig-height: 3
+plotLoadings(tse, dimred = "RDA", ncomponents = 2L, layout = "lollipop")
+```
+
+For instance, _Prevotella copri_ is positively associated with the
+first coordinate (or x-axis in our biplot). Because, CRC was also positively
+associated with the first coordinate, this suggests association between higher
+abundance of _Prevotella copri_ and CRC.
+
+#### Online book
+
+![[microbiome.github.io/OMA](https://microbiome.github.io/OMA/docs/devel/)](figures/OMA_ss.png){width=500px}
+
+## Questions, discussion and recap
+
+1. Microbiome data science in `r BiocStyle::Biocpkg("SummarizedExperiment")` ecosystem
+2. Scalable and computationally efficient
+3. Integration of multi-table and multi-omics datasets
+
+## Thank you for your time!
+
+**Join us!**
+
+- Online book: [microbiome.github.io/OMA](https://microbiome.github.io/OMA/docs/devel/)
+- Discussion forums: [github.com/microbiome/OMA/discussions](https://github.com/microbiome/OMA/discussions) and Bioconductor Zulip
+
+<img src="figures/bioc_sticker.png" width="150"/> <img src="figures/mia_logo.png" width="150"/>
+
+## Session information
+
+```{r}
+#| label: session_info
+
+sessionInfo()
+```
+
+## References
+