Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 57 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
name: CI

on:
push:
pull_request:

jobs:
test:
runs-on: ubuntu-latest

defaults:
run:
shell: bash -l {0} # so conda init works

steps:
- name: Check out repo
uses: actions/checkout@v4

- name: Set up Conda (Mambaforge)
uses: conda-incubator/setup-miniconda@v3
with:
miniforge-variant: Miniforge3
python-version: "3.12"
auto-update-conda: false
activate-environment: smf_snakemake
use-mamba: true
- name: Configure conda with strict channel priority
run: |
conda config --set channel_priority strict

- name: Install Snakemake + basic deps
run: |
mamba install -y -c conda-forge -c bioconda "snakemake>=9" pandas

- name: Show conda envs
run: conda info --envs

- name: Run unit tests via Snakemake
working-directory: tests
run: |
snakemake unit_tests \
--cores 1 \
--use-conda

- name: Run methyltransferase pipeline test
working-directory: tests
run: |
snakemake test_methyltransferase_pipeline \
--cores 2 \
--use-conda

- name: Run deaminase pipeline test
working-directory: tests
run: |
snakemake test_deaminase_pipeline \
--cores 2 \
--use-conda
2 changes: 0 additions & 2 deletions config/samples.tsv

This file was deleted.

Binary file added example_files/deaminase1_S1_R1_001.fastq.gz
Binary file not shown.
Binary file added example_files/deaminase1_S1_R2_001.fastq.gz
Binary file not shown.
Binary file added example_files/deaminase2_S2_R1_001.fastq.gz
Binary file not shown.
Binary file added example_files/deaminase2_S2_R2_001.fastq.gz
Binary file not shown.
7 changes: 6 additions & 1 deletion config/config.yaml → example_files/deaminase_config.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
# This file should contain everything to configure the workflow on a global scale.
# In case of sample based data, it should be complemented by a samples.tsv file that contains
# one row per sample. It can be parsed easily via pandas.
samples: "amplicon-smf/config/samples.tsv"

samples: "example_files/deaminase_samplesheet.tsv"
alignment_score_fraction: 0.8
alignment_length_fraction: 0.8
read1_length: 200
read2_length: 200
3 changes: 3 additions & 0 deletions example_files/deaminase_samplesheet.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
sample_name fastq_R1 fastq_R2 amplicon_fa experiment filter_contigs no_endog_meth ignore_bounds deaminase
deaminase1 example_files/deaminase1_S1_R1_001.fastq.gz example_files/deaminase1_S1_R2_001.fastq.gz example_files/opJS45.amplicon.fa deaminase_test FALSE TRUE FALSE TRUE
deaminase2 example_files/deaminase2_S2_R1_001.fastq.gz example_files/deaminase2_S2_R2_001.fastq.gz example_files/opJS45.amplicon.fa deaminase_test FALSE TRUE FALSE TRUE
2 changes: 0 additions & 2 deletions example_files/example_samplesheet.txt

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
# This file should contain everything to configure the workflow on a global scale.
# In case of sample based data, it should be complemented by a samples.tsv file that contains
# one row per sample. It can be parsed easily via pandas.
samples: "amplicon-smf/example_files/example_samplesheet.txt"

samples: "example_files/methyltransferase_samplesheet.tsv"
alignment_score_fraction: 0.8
alignment_length_fraction: 0.8
read1_length: 235
Expand Down
2 changes: 2 additions & 0 deletions example_files/methyltransferase_samplesheet.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
sample_name fastq_R1 fastq_R2 amplicon_fa experiment filter_contigs include_cpg ignore_bounds
methyltransferase example_files/methyltransferase_fastq_R1_001.fastq.gz example_files/methyltransferase_fastq_R2_001.fastq.gz example_files/opJS45.amplicon.long.fa methyltransferase_test TRUE FALSE FALSE
66 changes: 33 additions & 33 deletions example_files/opJS45.amplicon.fa

Large diffs are not rendered by default.

62 changes: 62 additions & 0 deletions example_files/opJS45.amplicon.long.fa

Large diffs are not rendered by default.

121 changes: 117 additions & 4 deletions tests/Snakefile
Original file line number Diff line number Diff line change
@@ -1,8 +1,121 @@
# Run tests with `snakemake --cores 1 --use-conda tests` in the tests directory
"""
Snakemake tests for key functions and smoke tests
on small methyltransferase and deaminase data.

rule tests:
message: "Running unit tests..."
conda: "../workflow/rules/envs/smf_py3_v7.yaml"
In the tests directory:
- run unit tests with
`snakemake unit_tests -c 1 --use-conda`
- run pipeline tests with
`snakemake test_methyltransferase_pipeline -c 1 --forceall`
`snakemake test_deaminase_pipeline -c 1 --forceall`

Previous pipeline test results will be deleted if any test is run!
"""
import os
import pandas as pd
from snakemake.common.configfile import load_configfile

# Figure out paths to some folders
TESTS_DIR = workflow.basedir
REPO_ROOT = os.path.abspath(os.path.join(TESTS_DIR, ".."))
EXAMPLES_DIR = os.path.join(REPO_ROOT, "example_files")
WORKFLOW_DIR = os.path.join(REPO_ROOT, "workflow")
RESULTS_DIR = os.path.join(REPO_ROOT, "results")

# For both methylation and deamination, figure out where output files will
# be and pick a file to check if the pipeline ran all the way through
METH_CONFIG = os.path.join(EXAMPLES_DIR, "methyltransferase_config.yaml")
_meth_cfg = load_configfile(METH_CONFIG)
_meth_samplesheet = _meth_cfg["samples"]
if not os.path.isabs(_meth_samplesheet):
_meth_samplesheet = os.path.join(REPO_ROOT, _meth_samplesheet)
_meth_df = pd.read_csv(_meth_samplesheet, sep="\t")
_meth_experiments = sorted(_meth_df["experiment"].unique())
if len(_meth_experiments) != 1:
raise ValueError(f"Expected exactly one experiment in methyl samplesheet, got {_meth_experiments}")
METH_SAMPLE = _meth_df.loc[0, "sample_name"]
METH_EXPERIMENT = _meth_experiments[0]
METH_TEST_OUTPUT = os.path.join(RESULTS_DIR,
METH_EXPERIMENT,
METH_SAMPLE,
"stats",
"methyltransferase.nuc_len_qc.stats.txt")

DEAM_CONFIG = os.path.join(EXAMPLES_DIR, "deaminase_config.yaml")
_deam_cfg = load_configfile(DEAM_CONFIG)
_deam_samplesheet = _deam_cfg["samples"]
if not os.path.isabs(_deam_samplesheet):
_deam_samplesheet = os.path.join(REPO_ROOT, _deam_samplesheet)
_deam_df = pd.read_csv(_deam_samplesheet, sep="\t")
_deam_experiments = sorted(_deam_df["experiment"].unique())
if len(_deam_experiments) != 1:
raise ValueError(f"Expected exactly one experiment in deamination samplesheet, got {_deam_experiments}")
DEAM_SAMPLE = _deam_df.loc[0, "sample_name"]
DEAM_EXPERIMENT = _deam_experiments[0]
DEAM_TEST_OUTPUT = os.path.join(RESULTS_DIR,
DEAM_EXPERIMENT,
DEAM_SAMPLE,
"stats",
f"{DEAM_SAMPLE}.nuc_len_qc.stats.txt")


rule test_methyltransferase_pipeline:
message:
"Running methyltransferase pipeline with test data..."
output:
stats = METH_TEST_OUTPUT
params:
repo_root = REPO_ROOT,
config_file = METH_CONFIG,
threads:
workflow.cores
run:
# Run pipeline
shell(r"""
cd {params.repo_root}
snakemake \
-s workflow/Snakefile \
-w 15 \
--configfile {params.config_file} \
--cores {threads} \
--use-conda \
--forceall
""")
# Check that output exists
assert os.path.exists(output.stats), f"Missing stats file: {output.stats}"
# TODO: put file content checks here

rule test_deaminase_pipeline:
message:
"Running deaminase pipeline with test data..."
output:
stats = DEAM_TEST_OUTPUT
params:
repo_root = REPO_ROOT,
config_file = DEAM_CONFIG,
threads:
workflow.cores
run:
# Run pipeline
shell(r"""
cd {params.repo_root}
snakemake \
-s workflow/Snakefile \
-w 15 \
--configfile {params.config_file} \
--cores {threads} \
--use-conda \
--forceall
""")
# Check that output exists
assert os.path.exists(output.stats), f"Missing stats file: {output.stats}"
# TODO: put file content checks here

rule unit_tests:
message:
"Running unit tests..."
conda:
os.path.join(REPO_ROOT, "workflow", "rules", "envs", "python3_v7.yaml")
shell:
r"""
PYTHONPATH=../workflow/scripts \
Expand Down
5 changes: 1 addition & 4 deletions workflow/Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,4 @@ include: "rules/other.smk"

rule all:
input:
all_input



all_input
Original file line number Diff line number Diff line change
Expand Up @@ -7,17 +7,15 @@ channels:
dependencies:
- biopython
- bwa
- fastcluster
- htslib
- matplotlib
- methyldackel
- numpy
- pandas
- pysam
- python
- python=3.13
- samtools
- scikit-learn
- scipy
- seaborn
- toolshed
- fastqc
- fastqc
Loading