Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the Snakemake workflow and associated Rmd reports to make report generation depend on explicit RDS/CSV intermediates (instead of implicit HTML ordering), and renames some intermediate artifacts to be report-specific.
Changes:
- Split dataset “analytical Rmd rendering” into multiple explicit rules and declare key RDS/CSV intermediates as Snakemake outputs/inputs (Ecker, Argelaguet, CRC).
- Update figure Rmds to consume the renamed intermediates and add parameter-path fallbacks for CRC figures/SCE steps.
- Improve CRC feature filename parsing to handle subcategory tokens containing underscores.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| workflow/rules/ecker.smk | Replace wildcard analytical render rule with explicit render_ecker* rules and wire figure dependencies via declared intermediates. |
| workflow/rules/crc.smk | Replace wildcard analytical render rule with explicit CRC render chain rules; figure rule now depends on declared intermediates. |
| workflow/rules/argelaguet.smk | Replace wildcard analytical render rule with explicit render_argelaguet* rules and wire figure dependencies via intermediates. |
| workflow/Rmd/fig_ecker.Rmd | Update per-cell summary filename reference and corresponding “missing” message. |
| workflow/Rmd/fig_crc.Rmd | Add fallback resolution for de_list.rds when params$de is empty. |
| workflow/Rmd/fig_crc_diffentropy.Rmd | Add fallback resolution for de_list.rds and corrected SCE path when params are empty. |
| workflow/Rmd/fig_argelaguet.Rmd | Update per-cell summary filename reference and corresponding “missing” message. |
| workflow/Rmd/ecker_windows.Rmd | Rename exported per-cell summary CSV to be windows-specific. |
| workflow/Rmd/ecker_embeddings.Rmd | Rename exported per-cell summary CSV to be embeddings-specific. |
| workflow/Rmd/crc.Rmd | Make filename parsing robust to underscores in subcat token. |
| workflow/Rmd/crc_windows.Rmd | Save SCE + DE list into params$out_dir and make the save chunks non-cached. |
| workflow/Rmd/crc_windows_sce.Rmd | Add fallback resolution for upstream SCE/DE inputs when params are empty; name the corrected-SCE save chunk. |
| workflow/Rmd/crc_embeddings.Rmd | Add fallback resolution for corrected SCE path; write key outputs under params$out_dir with non-cached save chunks. |
| workflow/Rmd/argelaguet_windows.Rmd | Rename exported per-cell summary CSV to be windows-specific. |
| workflow/Rmd/argelaguet_embeddings.Rmd | Rename exported per-cell summary CSV to be embeddings-specific. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+531
to
+535
| embeddings_debug = op.join(CRC_RUN, "crc_embeddings_debug.rds"), | ||
| win_varexp = op.join(CRC_RUN, "crc_win_varexp.csv"), | ||
| per_cell_summary = op.join(CRC_RUN, "crc_per_cell_summary.csv"), | ||
| de_list = op.join(CRC_RUN, "de_list.rds"), | ||
| corrected_sce = op.join(CRC_RUN, "sce_windows_colon_corrected.rds"), |
| message("[render_logging] sinks already active; skipping re-sink") | ||
| } else { | ||
| log_con <- file(log_path, open = "at") | ||
| sink(log_con) |
| var_df$adjS_sd[!is.finite(var_df$adjS_sd)] <- 0 | ||
| var_df$jsd_sd[!is.finite(var_df$jsd_sd)] <- 0 | ||
| var_df$i_total_sd[!is.finite(var_df$i_total_sd)] <- 0 | ||
|
|
Comment on lines
+298
to
+299
| bedtools coverage -a {input.windows} -b {input.annotation} \ | ||
| | cut -f7 >> {output.frac} 2> {log} |
Comment on lines
+324
to
+326
| cat "$tmp" {input.windows} \ | ||
| | paste - {input.fracs} \ | ||
| | gzip -c > {output.tsv} 2> {log} |
Comment on lines
+318
to
+323
| tmp_header=$(mktemp) | ||
| tmp_body=$(mktemp) | ||
| echo -e "chrom\tstart\tend\tfeature_id" > $tmp_header | ||
| cat $tmp_header {input.windows} > $tmp_body | ||
| paste $tmp_body {input.fracs} | gzip -c > {output.tsv} 2> {log} | ||
| rm -f $tmp_header $tmp_body |
Comment on lines
+366
to
+367
| df_crc01 <- bind_rows(df_list) %>% | ||
| left_join(as.data.frame(colData(crc01))[, c("cell", "location", "patient")], |
Comment on lines
19
to
+22
| configfile: op.join(workflow.basedir, "config", "sim.yaml") | ||
| configfile: op.join(workflow.basedir, "config", "datasets.yaml") | ||
| ## Dataset config (proto or full) is picked by the Makefile via | ||
| ## --configfile workflow/config/datasets_{proto,full}.yaml. Running snakemake | ||
| ## directly without --configfile leaves the per-dataset keys unset. |
Comment on lines
+48
to
+49
| ## proto_lineages: uncomment to restrict prototype runs to a subset. | ||
| ## Currently unused (no smk rule reads it). |
Comment on lines
+3
to
+4
| ## Runs every (patient, location) for CRC, every (sub_region, sub_type) in | ||
| ## the configured region for Ecker, every (stage, lineage) for Argelaguet. |
Comment on lines
+12
to
+14
| ## `amet:` block conflict between sim.yaml and datasets.yaml | ||
|
|
||
| `workflow/Snakefile` loads `sim.yaml` then `datasets.yaml`. Both files define an `amet:` block, so the second one (datasets.yaml, `min_cells_per_group: 2`) silently overrides the first (sim.yaml, `min_cells_per_group: 10`). Result: simulation rules run with the dataset floor of 2 instead of the intended simulation floor of 10. Fix options: move the simulation-only amet defaults under a `sim.amet:` namespace and update the smk rules to read the namespaced keys, or pass per-rule `min_cells` literals from the Snakefile so the sim and dataset paths cannot collide on the same key. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.