Update reports / figures by imallona · Pull Request #4 · imallona/amet

imallona · 2026-05-12T06:18:17Z

explicit Rmd chaining
full runs
plate based stratification with combos selecting the best covered cells
add a flag to limit the distance between CpGs (Mark 13th May 2026)

Copilot

Pull request overview

This PR updates the Snakemake workflow and associated Rmd reports to make report generation depend on explicit RDS/CSV intermediates (instead of implicit HTML ordering), and renames some intermediate artifacts to be report-specific.

Changes:

Split dataset “analytical Rmd rendering” into multiple explicit rules and declare key RDS/CSV intermediates as Snakemake outputs/inputs (Ecker, Argelaguet, CRC).
Update figure Rmds to consume the renamed intermediates and add parameter-path fallbacks for CRC figures/SCE steps.
Improve CRC feature filename parsing to handle subcategory tokens containing underscores.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
workflow/rules/ecker.smk	Replace wildcard analytical render rule with explicit `render_ecker*` rules and wire figure dependencies via declared intermediates.
workflow/rules/crc.smk	Replace wildcard analytical render rule with explicit CRC render chain rules; figure rule now depends on declared intermediates.
workflow/rules/argelaguet.smk	Replace wildcard analytical render rule with explicit `render_argelaguet*` rules and wire figure dependencies via intermediates.
workflow/Rmd/fig_ecker.Rmd	Update per-cell summary filename reference and corresponding “missing” message.
workflow/Rmd/fig_crc.Rmd	Add fallback resolution for `de_list.rds` when `params$de` is empty.
workflow/Rmd/fig_crc_diffentropy.Rmd	Add fallback resolution for `de_list.rds` and corrected SCE path when params are empty.
workflow/Rmd/fig_argelaguet.Rmd	Update per-cell summary filename reference and corresponding “missing” message.
workflow/Rmd/ecker_windows.Rmd	Rename exported per-cell summary CSV to be windows-specific.
workflow/Rmd/ecker_embeddings.Rmd	Rename exported per-cell summary CSV to be embeddings-specific.
workflow/Rmd/crc.Rmd	Make filename parsing robust to underscores in `subcat` token.
workflow/Rmd/crc_windows.Rmd	Save SCE + DE list into `params$out_dir` and make the save chunks non-cached.
workflow/Rmd/crc_windows_sce.Rmd	Add fallback resolution for upstream SCE/DE inputs when params are empty; name the corrected-SCE save chunk.
workflow/Rmd/crc_embeddings.Rmd	Add fallback resolution for corrected SCE path; write key outputs under `params$out_dir` with non-cached save chunks.
workflow/Rmd/argelaguet_windows.Rmd	Rename exported per-cell summary CSV to be windows-specific.
workflow/Rmd/argelaguet_embeddings.Rmd	Rename exported per-cell summary CSV to be embeddings-specific.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+        embeddings_debug = op.join(CRC_RUN, "crc_embeddings_debug.rds"),
+        win_varexp = op.join(CRC_RUN, "crc_win_varexp.csv"),
+        per_cell_summary = op.join(CRC_RUN, "crc_per_cell_summary.csv"),
+        de_list = op.join(CRC_RUN, "de_list.rds"),
+        corrected_sce = op.join(CRC_RUN, "sce_windows_colon_corrected.rds"),


Copilot

Pull request overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 5 comments.

+      message("[render_logging] sinks already active; skipping re-sink")
+    } else {
+      log_con <- file(log_path, open = "at")
+      sink(log_con)


-  var_df$adjS_sd[!is.finite(var_df$adjS_sd)] <- 0
+  var_df$jsd_sd[!is.finite(var_df$jsd_sd)] <- 0
+  var_df$i_total_sd[!is.finite(var_df$i_total_sd)] <- 0



+        bedtools coverage -a {input.windows} -b {input.annotation} \
+          | cut -f7 >> {output.frac} 2> {log}


+        cat "$tmp" {input.windows} \
+          | paste - {input.fracs} \
+          | gzip -c > {output.tsv} 2> {log}


+        tmp_header=$(mktemp)
+        tmp_body=$(mktemp)
+        echo -e "chrom\tstart\tend\tfeature_id" > $tmp_header
+        cat $tmp_header {input.windows} > $tmp_body
+        paste $tmp_body {input.fracs} | gzip -c > {output.tsv} 2> {log}
+        rm -f $tmp_header $tmp_body


Copilot

Pull request overview

Copilot reviewed 36 out of 36 changed files in this pull request and generated 5 comments.

+    df_crc01 <- bind_rows(df_list) %>%
+      left_join(as.data.frame(colData(crc01))[, c("cell", "location", "patient")],


 configfile: op.join(workflow.basedir, "config", "sim.yaml")
-configfile: op.join(workflow.basedir, "config", "datasets.yaml")
+## Dataset config (proto or full) is picked by the Makefile via
+## --configfile workflow/config/datasets_{proto,full}.yaml. Running snakemake
+## directly without --configfile leaves the per-dataset keys unset.


+  ## proto_lineages: uncomment to restrict prototype runs to a subset.
+  ## Currently unused (no smk rule reads it).


+## Runs every (patient, location) for CRC, every (sub_region, sub_type) in
+## the configured region for Ecker, every (stage, lineage) for Argelaguet.


+## `amet:` block conflict between sim.yaml and datasets.yaml
+
+`workflow/Snakefile` loads `sim.yaml` then `datasets.yaml`. Both files define an `amet:` block, so the second one (datasets.yaml, `min_cells_per_group: 2`) silently overrides the first (sim.yaml, `min_cells_per_group: 10`). Result: simulation rules run with the dataset floor of 2 instead of the intended simulation floor of 10. Fix options: move the simulation-only amet defaults under a `sim.amet:` namespace and update the smk rules to read the namespaced keys, or pass per-rule `min_cells` literals from the Snakefile so the sim and dataset paths cannot collide on the same key.


Switch to explicit report rendering order

d67c2dc

imallona requested a review from Copilot May 12, 2026 06:18

Copilot started reviewing on behalf of imallona May 12, 2026 06:18 View session

Copilot AI reviewed May 12, 2026

View reviewed changes

imallona added 4 commits May 12, 2026 08:37

Address code review, add log sinks, go explicit in dep chains

4a0464d

Add logging safeguards

6e1270e

Fix logger paths

81a1a0c

Add feature annotations and other biology

a26c991

imallona requested a review from Copilot May 12, 2026 09:59

Copilot started reviewing on behalf of imallona May 12, 2026 09:59 View session

Copilot AI reviewed May 12, 2026

View reviewed changes

imallona added 5 commits May 12, 2026 12:11

Address code review

ba1966e

Draft full run

b86ed11

Separate full and proto modes

bb67131

Guard for configfiles

e5bfb2e

Switch Ecker's granularities to slabs, update proto vs full runs

89ccb3a

imallona requested a review from Copilot May 12, 2026 12:56

Copilot started reviewing on behalf of imallona May 12, 2026 12:57 View session

Copilot AI reviewed May 12, 2026

View reviewed changes

imallona added 3 commits May 12, 2026 15:13

Address code review, restrict ci/cd

86eff85

Add --max-pair-distance, as per Mark's feedback

4d6c43d

Fix bugs in proto run

767bac0

imallona merged commit f432c2c into main May 13, 2026
1 check passed

imallona deleted the dev branch May 13, 2026 11:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update reports / figures#4

Update reports / figures#4
imallona merged 13 commits into
mainfrom
dev

imallona commented May 12, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		bedtools coverage -a {input.windows} -b {input.annotation} \
		\| cut -f7 >> {output.frac} 2> {log}

		df_crc01 <- bind_rows(df_list) %>%
		left_join(as.data.frame(colData(crc01))[, c("cell", "location", "patient")],

		## proto_lineages: uncomment to restrict prototype runs to a subset.
		## Currently unused (no smk rule reads it).

		## Runs every (patient, location) for CRC, every (sub_region, sub_type) in
		## the configured region for Ecker, every (stage, lineage) for Argelaguet.

		## `amet:` block conflict between sim.yaml and datasets.yaml

		`workflow/Snakefile` loads `sim.yaml` then `datasets.yaml`. Both files define an `amet:` block, so the second one (datasets.yaml, `min_cells_per_group: 2`) silently overrides the first (sim.yaml, `min_cells_per_group: 10`). Result: simulation rules run with the dataset floor of 2 instead of the intended simulation floor of 10. Fix options: move the simulation-only amet defaults under a `sim.amet:` namespace and update the smk rules to read the namespaced keys, or pass per-rule `min_cells` literals from the Snakefile so the sim and dataset paths cannot collide on the same key.

Conversation

imallona commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

imallona commented May 12, 2026 •

edited

Loading