UtilsGGSV

UtilsGGSV provides ggplot2-based utilities that solve two common pain points in exploratory data analysis:

Cluster / group characterisation — the plot_group_* family creates publication-ready plots that help you understand what makes each cluster (or any labelled group) distinctive:
- plot_group_heatmap() — ECDF-percentile heat map showing each group’s relative position for every variable.
- plot_group_density() — per-variable density plots with per-group overlays (density curves and/or median lines).
- plot_group_scatter() — biaxial scatter with optional PCA / t-SNE / UMAP projection and cluster centroids.
- plot_group_mst() — minimum-spanning-tree layout coloured by the same ECDF scale as the heat map.
Correlation visualisation — ggcorr() creates paired scatter plots with Spearman, Pearson, Kendall, or concordance correlation coefficients overlaid as a formatted table, with support for log / asinh / any scales transformation.

Additional helpers round out the toolkit:

axis_limits() — force equal axis limits or expand axis coordinates without manually computing values.
add_text_column() — place a column of text annotations at a consistent relative position regardless of the underlying axis transformation.
get_trans() — retrieve any scales transformation by name, including higher-root and asinh transformations not available in base scales.

Installation

You can install UtilsGGSV from GitHub with:

if (!requireNamespace("remotes", quietly = TRUE)) install.packages("remotes")
remotes::install_github("SATVILab/UtilsGGSV")

Examples

library(UtilsGGSV)
library(ggplot2)
theme_set(cowplot::theme_cowplot())

Correlation Plots with `ggcorr`

The function ggcorr plots correlation coefficients:

set.seed(3)
response_vec_a <- rnorm(5)
response_tbl <- data.frame(
  group = rep(letters[1:3], each = 5),
  response = c(
    response_vec_a,
    response_vec_a * 1.2 + rnorm(5, sd = 0.2),
    response_vec_a * 2 + rnorm(5, sd = 2)
  ),
  pid = rep(paste0("id_", 1:5), 3)
)

ggcorr(
  data = response_tbl %>% dplyr::filter(group %in% c("a", "b")),
  grp = "group",
  y = "response",
  id = "pid"
)

We can display multiple correlation coefficients:

ggcorr(
  data = response_tbl %>% dplyr::filter(group %in% c("a", "b")),
  grp = "group",
  y = "response",
  id = "pid",
  corr_method = c("spearman", "pearson")
)

We can compare more than two groups:

ggcorr(
  data = response_tbl,
  grp = "group",
  y = "response",
  id = "pid",
  corr_method = "kendall"
)

We can compare more than two groups and multiple correlation coefficients:

ggcorr(
  data = response_tbl,
  grp = "group",
  y = "response",
  id = "pid",
  corr_method = c("spearman", "pearson")
)

Specific functionality to make appropriate plots for the concordance correlation coefficient is available:

ggcorr(
  data = response_tbl %>% dplyr::filter(group %in% c("a", "b")),
  grp = "group",
  y = "response",
  id = "pid",
  corr_method = "concordance",
  abline = TRUE,
  limits_equal = TRUE
)

Text in table can be moved around and resized:

ggcorr(
  data = response_tbl %>% dplyr::filter(group %in% c("a", "b")),
  grp = "group",
  y = "response",
  id = "pid",
  corr_method = c("spearman", "pearson", "concordance"),
  abline = TRUE,
  limits_equal = TRUE,
  coord = c(0.4, 0.17),
  font_size = 3,
  skip = 0.04,
  pval_signif = 2,
  est_signif = 2,
  ci_signif = 2
)

Finally, the text placement is kept consistent when the axes are visually transformed:

ggcorr(
  data = response_tbl %>% dplyr::mutate(response = abs(response + 1)^4),
  grp = "group",
  y = "response",
  id = "pid",
  corr_method = "spearman",
  abline = TRUE,
  limits_equal = TRUE,
  trans = "log10",
  skip = 0.06
)

Axis Limits with `axis_limits`

Fix axis limits to be equal between x- and y-axes, and/or expand axis coordinates. The primary use of axis_limits is forcing the x- and y-axes to have the same limits “automatically” (i.e. by inspecting the ggplot object, thus not requiring the user to manually calculate limits to pass to ggplot2::expand_limits).

data("cars", package = "datasets")

p0 <- ggplot(cars, aes(speed, dist)) +
  cowplot::background_grid(major = "xy") +
  geom_point() +
  theme(plot.title = element_text(hjust = 0.5)) +
  labs(title = "Axes unadjusted") +
  labs(x = "Speed", y = "Distance")
p1 <- axis_limits(
  p = p0,
  limits_equal = TRUE
) +
  labs(title = "Axes limits equal")
p2 <- axis_limits(
  p = p0,
  limits_expand = list(
    x = c(0, 50),
    y = c(-10, 200)
  )
) +
  labs(title = "Axes limits expanded")
cowplot::plot_grid(p0, p1, p2)

Text Annotations with `add_text_column`

Add a column of text easily to a plot, regardless of underlying transformation, using add_text_column.

data_mod <- data.frame(x = rnorm(mean = 1, 10)^2)
data_mod$y <- data_mod$x * 3 + rnorm(10, sd = 0.5)
fit <- lm(y ~ x, data = data_mod)
coef_tbl <- coefficients(summary(fit))
results_vec <- c(
  paste0(
    "Intercept: ",
    signif(coef_tbl[1, "Estimate"][[1]], 2),
    " (",
    signif(coef_tbl[1, 1][[1]] - 2 * coef_tbl[1, 2][[1]], 3),
    ", ",
    signif(coef_tbl[1, 1][[1]] + 2 * coef_tbl[1, 2][[1]], 3),
    "; p = ",
    signif(coef_tbl[1, 4][[1]], 3),
    ")"
  ),
  paste0(
    "Slope: ",
    signif(coef_tbl[2, "Estimate"][[1]], 2),
    " (",
    signif(coef_tbl[2, 1][[1]] - 2 * coef_tbl[2, 2][[1]], 3),
    ", ",
    signif(coef_tbl[2, 1][[1]] + 2 * coef_tbl[2, 2][[1]], 3),
    "; p = ",
    signif(coef_tbl[2, 4][[1]], 3),
    ")"
  )
)
p <- ggplot(
  data = data_mod,
  aes(x = x, y = y)
) +
  geom_point() +
  cowplot::background_grid(major = "xy") 
add_text_column(
  p = p,
  x = data_mod$x,
  y = data_mod$y,
  text = results_vec,
  coord = c(0.05, 0.95),
  skip = 0.07
)

Note that add_text_column places text in the same position, regardless of underlying transformation.

p <- p +
  scale_y_continuous(
    trans = UtilsGGSV::get_trans("asinh")
  )
add_text_column(
  p = p,
  x = data_mod$x,
  y = data_mod$y,
  text = results_vec,
  trans = UtilsGGSV::get_trans("asinh"),
  coord = c(0.05, 0.95),
  skip = 0.07
)

Cluster-Specific Plots

The plot_cluster_* family of functions helps visualise the characteristics of clusters identified by an unsupervised learning method.

Heat Maps with `plot_cluster_heatmap`

The function plot_cluster_heatmap creates a heat map where each tile shows the percentile of the median value of a variable for a cluster. This percentile is compared against the ECDF of that variable across all observations not in the cluster. Clusters and variables are ordered by hierarchical clustering.

set.seed(1)
cluster_data <- data.frame(
  cluster = rep(paste0("C", 1:3), each = 20),
  var1 = c(rnorm(20, 2), rnorm(20, 0), rnorm(20, -2)),
  var2 = c(rnorm(20, -1), rnorm(20, 1), rnorm(20, 0))
)
plot_cluster_heatmap(cluster_data, cluster = "cluster")

Density Plots with `plot_cluster_density`

The function plot_cluster_density visualises, for each variable, how each cluster’s observations are distributed relative to the overall population. The density argument controls what is shown: "overall" (default, overall density plus cluster median lines), "cluster" (one density curve per cluster), or "both" (overall density plus per-cluster density curves). When showing per-cluster densities, the scale argument controls scaling: by default ("max_overall") each cluster density is rescaled so its maximum equals the overall density maximum.

set.seed(1)
cluster_data <- data.frame(
  cluster = rep(paste0("C", 1:3), each = 20),
  var1 = c(rnorm(20, 2), rnorm(20, 0), rnorm(20, -2)),
  var2 = c(rnorm(20, -1), rnorm(20, 1), rnorm(20, 0))
)
# Default: overall density with cluster median lines
plot_cluster_density(cluster_data, cluster = "cluster")
#> $var1

#> 
#> $var2

# Both overall and per-cluster densities (scaled to overall maximum)
plot_cluster_density(cluster_data, cluster = "cluster", density = "both")
#> $var1

#> 
#> $var2

Scatter Plot with `plot_cluster_scatter`

The function plot_cluster_scatter creates a biaxial scatter plot with observations coloured by cluster and median centroids overlaid. When more than two variables are supplied it defaults to a PCA projection.

set.seed(123)
example_data <- data.frame(
  cluster = rep(c("A", "B", "C"), each = 20),
  var1 = c(rnorm(20, 2), rnorm(20, 0), rnorm(20, -2)),
  var2 = c(rnorm(20, -1), rnorm(20, 1), rnorm(20, 0)),
  var3 = c(rnorm(20, 1), rnorm(20, -1), rnorm(20, 0))
)

# Default: PCA projection (> 2 numeric variables)
plot_cluster_scatter(example_data, cluster = "cluster")
#> dim_red automatically set to 'pca' because more than two numeric variables are available.

Raw variables can also be used directly:

plot_cluster_scatter(
  example_data,
  cluster = "cluster",
  dim_red = "none",
  vars = c("var1", "var2")
)

Minimum-Spanning Tree with `plot_cluster_mst`

The function plot_cluster_mst computes the minimum-spanning tree (MST) over clusters, using Euclidean distance between cluster median profiles. Clusters are laid out in two dimensions via classical multidimensional scaling (MDS). For each variable, a separate plot is produced in which each node is filled according to the ECDF-standardised percentile of that cluster’s median — the same colour scale used by plot_cluster_heatmap. By default a named list of plots is returned; supplying n_col or n_row returns a combined cowplot::plot_grid figure.

set.seed(1)
cluster_data <- data.frame(
  cluster = rep(paste0("C", 1:3), each = 20),
  var1 = c(rnorm(20, 2), rnorm(20, 0), rnorm(20, -2)),
  var2 = c(rnorm(20, -1), rnorm(20, 1), rnorm(20, 0))
)
# Default: returns a named list of plots, one per variable
plot_list <- plot_cluster_mst(cluster_data, cluster = "cluster")
plot_list[["var1"]]

Combine into a grid with variable-name labels:

plot_cluster_mst(cluster_data, cluster = "cluster", n_col = 2)

Transformations with `get_trans`

The utility function get_trans returns trans objects (as implemented by the scales package) when given characters. It also adds various higher roots (such as cubic and quartic) and adds the asinh transformation.

get_trans("log10")
#> Transformer: log-10 [1e-100, Inf]

Name		Name	Last commit message	Last commit date
Latest commit History 265 Commits
.github		.github
R		R
data-raw		data-raw
issues		issues
man		man
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.html		README.html
README.md		README.md
_pkgdown.yml		_pkgdown.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UtilsGGSV

Installation

Examples

Correlation Plots with `ggcorr`

Axis Limits with `axis_limits`

Text Annotations with `add_text_column`

Cluster-Specific Plots

Heat Maps with `plot_cluster_heatmap`

Density Plots with `plot_cluster_density`

Scatter Plot with `plot_cluster_scatter`

Minimum-Spanning Tree with `plot_cluster_mst`

Transformations with `get_trans`

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

UtilsGGSV

Installation

Examples

Correlation Plots with ggcorr

Axis Limits with axis_limits

Text Annotations with add_text_column

Cluster-Specific Plots

Heat Maps with plot_cluster_heatmap

Density Plots with plot_cluster_density

Scatter Plot with plot_cluster_scatter

Minimum-Spanning Tree with plot_cluster_mst

Transformations with get_trans

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Correlation Plots with `ggcorr`

Axis Limits with `axis_limits`

Text Annotations with `add_text_column`

Heat Maps with `plot_cluster_heatmap`

Density Plots with `plot_cluster_density`

Scatter Plot with `plot_cluster_scatter`

Minimum-Spanning Tree with `plot_cluster_mst`

Transformations with `get_trans`

Packages