OmicsProcessing

Pre-analysis processing for metabolomics and proteomics: missingness filtering, outlier handling, imputation, transformation, matched case-control handling, batch/plate correction, and SERRF-based normalisation across batches or strata. Please visit out website for more information and vignettes

Choose your workflow

Semi-automated pipeline (`process_data()`)

End-to-end wrapper that can filter on missingness, impute, transform, remove outliers (PCA + LOF), handle matched case-control designs, correct for plate/batch effects, and centre/scale.
Takes three data frames (feature data, feature metadata, sample metadata) and returns processed data plus exclusion IDs and PCA/LOF plots.
Full walk-through: Semi-automated pipeline.

Modular workflow (build your own)

Compose individual steps to suit your study design. Typical sequence:
- Filter by missingness with filter_by_missingness() (vignette)
- Detect outlier samples with remove_outliers() (vignette)
- Impute with RF, LCMD, or both via hybrid_imputation() (vignette)
- Normalise with SERRF using normalise_SERRF() (vignette)
- Cluster features by RT or correlations using cluster_features_by_retention_time() (vignette)

Quick start

# install.packages("remotes")
remotes::install_github("IARCBiostat/OmicsProcessing")
library(OmicsProcessing)

Run the semi-automated pipeline with three input tables:

processed <- process_data(
  data = data_features,
  data_meta_features = data_meta_features,
  data_meta_samples = data_meta_samples,
  col_samples = "ID_sample",
  exclusion_extreme_feature = TRUE,
  exclusion_extreme_sample = TRUE,
  imputation = TRUE,
  transformation = TRUE,
  outlier = TRUE,
  plate_correction = TRUE
)

Or stitch together a modular workflow:

# Load data
df <- readr::read_csv("path/to/data")

# Filter by missingness
df_filtered <- filter_by_missingness(
  df,
  row_thresh = 0.5,
  col_thresh = 0.5,
  target_cols = "@",
  is_qc = grepl("^sQC", df$sample_type),
  filter_order = "iterative"
)

# Detect outlier samples (PCA + LOF)
outliers <- remove_outliers(
  df_filtered,
  target_cols = "@",
  is_qc = grepl("^sQC", df_filtered$sample_type),
  method = "pca-lof-overall",
  impute_method = "half-min-value",
  restore_missing_values = TRUE,
  return_ggplots = FALSE
)
df_clean <- outliers$df_filtered

# Log-transform features
df_clean <- df_clean %>%
  dplyr::mutate(dplyr::across(tidyselect::contains("@"), log1p))

# Impute missing values (RF + LCMD)
df_imputed <- hybrid_imputation(
  df_clean,
  target_cols = "@",
  method = "RF-LCMD",
  oobe_threshold = 0.1
)$hybrid_rf_lcmd

# SERRF normalisation
df_normalised <- normalise_SERRF(
  df_imputed,
  target_cols = "@",
  is_qc = grepl("^sQC", df_imputed$sample_type),
  strata_col = "batch"
)

# Cluster features by RT using correlations
clusters <- cluster_features_by_retention_time(
  df = df_normalised,
  target_cols = "@",
  rt_height = 0.07,
  method = "correlations",
  cut_height = 0.26,
  corr_thresh = 0.75
)

Developers & Contributors

We welcome contributions to OmicsProcessing. Our priorities are clean code and good documentation.

Please follow these guidelines: Developers & Contributors

Resources

Data filtering vignette: Filtering missingness
Outlier removal vignette: PCA + LOF outlier detection
Hybrid imputation vignette: Random Forest + LCMD
Function reference index: All functions
Semi-automated pipeline details: Semi-automated pipeline vignette
Log-transform features: Log transformation (log1p)
SERRF batch correction: Batch correction using SERRF
Feature clustering: Retention-time clustering
Developers & contributors: Developer guide

Name		Name	Last commit message	Last commit date
Latest commit History 121 Commits
.github/workflows		.github/workflows
R		R
data		data
inst		inst
man		man
pkgdown		pkgdown
tests		tests
vignettes		vignettes
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
README.md		README.md
_pkgdown.yml		_pkgdown.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OmicsProcessing

Choose your workflow

Semi-automated pipeline (`process_data()`)

Modular workflow (build your own)

Quick start

Developers & Contributors

Resources

About

Uh oh!

Releases 1

Packages

Contributors 2

Uh oh!

Languages

IARCBiostat/OmicsProcessing

Folders and files

Latest commit

History

Repository files navigation

OmicsProcessing

Choose your workflow

Semi-automated pipeline (process_data())

Modular workflow (build your own)

Quick start

Developers & Contributors

Resources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Uh oh!

Languages

Semi-automated pipeline (`process_data()`)

Packages