Skip to content

Update reports / figures#4

Merged
imallona merged 13 commits into
mainfrom
dev
May 13, 2026
Merged

Update reports / figures#4
imallona merged 13 commits into
mainfrom
dev

Conversation

@imallona
Copy link
Copy Markdown
Owner

@imallona imallona commented May 12, 2026

  • explicit Rmd chaining
  • full runs
  • plate based stratification with combos selecting the best covered cells
  • add a flag to limit the distance between CpGs (Mark 13th May 2026)

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Snakemake workflow and associated Rmd reports to make report generation depend on explicit RDS/CSV intermediates (instead of implicit HTML ordering), and renames some intermediate artifacts to be report-specific.

Changes:

  • Split dataset “analytical Rmd rendering” into multiple explicit rules and declare key RDS/CSV intermediates as Snakemake outputs/inputs (Ecker, Argelaguet, CRC).
  • Update figure Rmds to consume the renamed intermediates and add parameter-path fallbacks for CRC figures/SCE steps.
  • Improve CRC feature filename parsing to handle subcategory tokens containing underscores.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
workflow/rules/ecker.smk Replace wildcard analytical render rule with explicit render_ecker* rules and wire figure dependencies via declared intermediates.
workflow/rules/crc.smk Replace wildcard analytical render rule with explicit CRC render chain rules; figure rule now depends on declared intermediates.
workflow/rules/argelaguet.smk Replace wildcard analytical render rule with explicit render_argelaguet* rules and wire figure dependencies via intermediates.
workflow/Rmd/fig_ecker.Rmd Update per-cell summary filename reference and corresponding “missing” message.
workflow/Rmd/fig_crc.Rmd Add fallback resolution for de_list.rds when params$de is empty.
workflow/Rmd/fig_crc_diffentropy.Rmd Add fallback resolution for de_list.rds and corrected SCE path when params are empty.
workflow/Rmd/fig_argelaguet.Rmd Update per-cell summary filename reference and corresponding “missing” message.
workflow/Rmd/ecker_windows.Rmd Rename exported per-cell summary CSV to be windows-specific.
workflow/Rmd/ecker_embeddings.Rmd Rename exported per-cell summary CSV to be embeddings-specific.
workflow/Rmd/crc.Rmd Make filename parsing robust to underscores in subcat token.
workflow/Rmd/crc_windows.Rmd Save SCE + DE list into params$out_dir and make the save chunks non-cached.
workflow/Rmd/crc_windows_sce.Rmd Add fallback resolution for upstream SCE/DE inputs when params are empty; name the corrected-SCE save chunk.
workflow/Rmd/crc_embeddings.Rmd Add fallback resolution for corrected SCE path; write key outputs under params$out_dir with non-cached save chunks.
workflow/Rmd/argelaguet_windows.Rmd Rename exported per-cell summary CSV to be windows-specific.
workflow/Rmd/argelaguet_embeddings.Rmd Rename exported per-cell summary CSV to be embeddings-specific.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread workflow/rules/crc.smk
Comment on lines +531 to +535
embeddings_debug = op.join(CRC_RUN, "crc_embeddings_debug.rds"),
win_varexp = op.join(CRC_RUN, "crc_win_varexp.csv"),
per_cell_summary = op.join(CRC_RUN, "crc_per_cell_summary.csv"),
de_list = op.join(CRC_RUN, "de_list.rds"),
corrected_sce = op.join(CRC_RUN, "sce_windows_colon_corrected.rds"),
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 5 comments.

Comment thread workflow/scripts/render_logging.R Outdated
message("[render_logging] sinks already active; skipping re-sink")
} else {
log_con <- file(log_path, open = "at")
sink(log_con)
var_df$adjS_sd[!is.finite(var_df$adjS_sd)] <- 0
var_df$jsd_sd[!is.finite(var_df$jsd_sd)] <- 0
var_df$i_total_sd[!is.finite(var_df$i_total_sd)] <- 0

Comment thread workflow/rules/ecker.smk Outdated
Comment on lines +298 to +299
bedtools coverage -a {input.windows} -b {input.annotation} \
| cut -f7 >> {output.frac} 2> {log}
Comment thread workflow/rules/ecker.smk Outdated
Comment on lines +324 to +326
cat "$tmp" {input.windows} \
| paste - {input.fracs} \
| gzip -c > {output.tsv} 2> {log}
Comment thread workflow/rules/crc.smk
Comment on lines +318 to +323
tmp_header=$(mktemp)
tmp_body=$(mktemp)
echo -e "chrom\tstart\tend\tfeature_id" > $tmp_header
cat $tmp_header {input.windows} > $tmp_body
paste $tmp_body {input.fracs} | gzip -c > {output.tsv} 2> {log}
rm -f $tmp_header $tmp_body
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 36 out of 36 changed files in this pull request and generated 5 comments.

Comment thread workflow/Rmd/crc_windows_sce.Rmd Outdated
Comment on lines +366 to +367
df_crc01 <- bind_rows(df_list) %>%
left_join(as.data.frame(colData(crc01))[, c("cell", "location", "patient")],
Comment thread workflow/Snakefile
Comment on lines 19 to +22
configfile: op.join(workflow.basedir, "config", "sim.yaml")
configfile: op.join(workflow.basedir, "config", "datasets.yaml")
## Dataset config (proto or full) is picked by the Makefile via
## --configfile workflow/config/datasets_{proto,full}.yaml. Running snakemake
## directly without --configfile leaves the per-dataset keys unset.
Comment thread workflow/config/datasets.yaml Outdated
Comment on lines +48 to +49
## proto_lineages: uncomment to restrict prototype runs to a subset.
## Currently unused (no smk rule reads it).
Comment thread workflow/config/datasets_full.yaml Outdated
Comment on lines +3 to +4
## Runs every (patient, location) for CRC, every (sub_region, sub_type) in
## the configured region for Ecker, every (stage, lineage) for Argelaguet.
Comment thread TODO.md Outdated
Comment on lines +12 to +14
## `amet:` block conflict between sim.yaml and datasets.yaml

`workflow/Snakefile` loads `sim.yaml` then `datasets.yaml`. Both files define an `amet:` block, so the second one (datasets.yaml, `min_cells_per_group: 2`) silently overrides the first (sim.yaml, `min_cells_per_group: 10`). Result: simulation rules run with the dataset floor of 2 instead of the intended simulation floor of 10. Fix options: move the simulation-only amet defaults under a `sim.amet:` namespace and update the smk rules to read the namespaced keys, or pass per-rule `min_cells` literals from the Snakefile so the sim and dataset paths cannot collide on the same key.
@imallona imallona merged commit f432c2c into main May 13, 2026
1 check passed
@imallona imallona deleted the dev branch May 13, 2026 11:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants