Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the external-tool benchmarking workflow and report to make comparisons across quantifiers/granularities more explicit, and to better support Chromium scTE and SmartSeq2 bulk-style tools.
Changes:
- Add optional barcode→cell_id translation for scTE harmonization (Chromium) via
--barcode-map, and plumb the simulator-produced mapping through the benchmark module. - Extend the external benchmark Snakemake module to support per-cell BAM splitting (SmartSeq2), add SQuIRE as a first-class tool chain (Fetch/Clean/Count/combine), and aggregate per-sample benchmarks into per-tool cost.
- Expand the RMarkdown benchmark report with “native vs rolled-up” coverage notes, a “top quantifiers” table, precision/F1 panels, and a memory (RSS) panel.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| workflow/scripts/harmonize_external_counts.py | Adds barcode-map parsing and applies it to scTE h5ad obs names before sample filtering; improves empty-output diagnostics. |
| workflow/scripts/external_benchmark_report.Rmd | Adds granularity-coverage explanation/table and new summary + metric panels (precision/F1, RSS). |
| workflow/modules/simulations.snmk | Exposes Chromium barcode_to_cell_id.tsv as a simulator output for downstream use. |
| workflow/modules/external_tools_benchmark.snmk | Adds SmartSeq2 per-cell BAM splitting, SQuIRE workflow chain, benchmark aggregation, and barcode-map wiring for scTE Chromium harmonization. |
| workflow/configs/external_benchmark_smartseq2.yaml | Updates default external tool set and relies on automatic sample expansion from simulation.n_cells. |
| workflow/configs/external_benchmark_chromium.yaml | Relies on automatic sample expansion from simulation.n_cells. |
| test/unit/test_harmonize_external_counts.py | Adds unit tests for the new barcode-map parser. |
| README.md | Updates benchmark description and expected outputs at a high level. |
| docs/methods.md | Updates methods text to reflect native vs rolled-up granularities and new external benchmark rule chain. |
| docs/external_tool_benchmark.md | Documents default sample scope and Chromium barcode→cell_id translation, plus updated SQuIRE pipeline notes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| rule split_smartseq2_per_cell_bams: | ||
| """Split the multi-cell SmartSeq2 BAM by @RG into per-cell BAMs so | ||
| bulk-style tools (TEcount, SQuIRE) can run one sample at a time.""" | ||
| conda: op.join(workflow.basedir, 'envs', 'squire.yaml') |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
workflow/envs/squire.yaml:15
- Installing SQuIRE from an unpinned git default branch makes the benchmark non-reproducible and may change behavior over time. Prefer pinning to a tag/commit SHA (or an archived release tarball) so reruns produce the same tool version.
- pip:
## SQuIRE 0.9.9.92 is not on PyPI (only an unrelated 0.0.1); install the
## upstream source, whose master version is 0.9.9.92.
- git+https://github.com/wyang17/SQuIRE.git
Comment on lines
+7
to
9
| - python=2.7 | ||
| - star=2.5.3a | ||
| - samtools=1.9 |
Comment on lines
19
to
+23
| - pip: | ||
| - scTE==1.0.0 | ||
| ## scTE pins anndata loosely and the resolver lands on 0.6.x, which | ||
| ## imports the removed pandas.core.index. Pin a modern anndata. | ||
| - anndata==0.10.9 | ||
| - git+https://github.com/JiekaiLab/scTE.git |
Comment on lines
151
to
154
| | TEtranscripts (TEcount) | bulk | gene_id (subfamily) | `workflow/envs/tetranscripts.yaml` (TEtranscripts 2.2.3) | | ||
| | scTE | single cell | family_id | `workflow/envs/scte.yaml` (scTE 1.0.0) | | ||
| | SQuIRE | bulk (optional) | locus | `workflow/envs/squire.yaml` (SQuIRE 0.9.9.92) | | ||
| | SQuIRE | bulk | locus | `workflow/envs/squire.yaml` (SQuIRE 0.9.9.92, Python 3.6, STAR 2.5.3a; isolated env) | | ||
|
|
Comment on lines
13
to
16
| | TEtranscripts (TEcount) | bulk | subfamily (gene_id) | Most cited bulk tool. Per-cell SmartSeq2 BAMs are treated as per-sample bulk replicates of one group. | `workflow/envs/tetranscripts.yaml` | | ||
| | scTE | single cell | family | The natural sc counterpart. Consumes a STARsolo BAM with CB and UB tags and emits a cell x family matrix. | `workflow/envs/scte.yaml` | | ||
| | SQuIRE | bulk | locus | Optional: pinned to Python 3.6 and STAR 2.5.3a; the upstream is unmaintained. Drop it from the config when the conda env fails to build. | `workflow/envs/squire.yaml` | | ||
| | SQuIRE | bulk | locus | Bulk locus-level counter. Pinned to Python 3.6 and STAR 2.5.3a; isolated in its own conda env so the version pin does not bleed into other rules. Reuses the existing STARsolo BAMs via a SQuIRE-shaped `map_folder` to keep the comparison apples-to-apples; `squire Map` is skipped. | `workflow/envs/squire.yaml` | | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.