Skip to content

Update methods comparison#9

Open
imallona wants to merge 10 commits into
masterfrom
dev
Open

Update methods comparison#9
imallona wants to merge 10 commits into
masterfrom
dev

Conversation

@imallona
Copy link
Copy Markdown
Owner

No description provided.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the external-tool benchmarking workflow and report to make comparisons across quantifiers/granularities more explicit, and to better support Chromium scTE and SmartSeq2 bulk-style tools.

Changes:

  • Add optional barcode→cell_id translation for scTE harmonization (Chromium) via --barcode-map, and plumb the simulator-produced mapping through the benchmark module.
  • Extend the external benchmark Snakemake module to support per-cell BAM splitting (SmartSeq2), add SQuIRE as a first-class tool chain (Fetch/Clean/Count/combine), and aggregate per-sample benchmarks into per-tool cost.
  • Expand the RMarkdown benchmark report with “native vs rolled-up” coverage notes, a “top quantifiers” table, precision/F1 panels, and a memory (RSS) panel.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
workflow/scripts/harmonize_external_counts.py Adds barcode-map parsing and applies it to scTE h5ad obs names before sample filtering; improves empty-output diagnostics.
workflow/scripts/external_benchmark_report.Rmd Adds granularity-coverage explanation/table and new summary + metric panels (precision/F1, RSS).
workflow/modules/simulations.snmk Exposes Chromium barcode_to_cell_id.tsv as a simulator output for downstream use.
workflow/modules/external_tools_benchmark.snmk Adds SmartSeq2 per-cell BAM splitting, SQuIRE workflow chain, benchmark aggregation, and barcode-map wiring for scTE Chromium harmonization.
workflow/configs/external_benchmark_smartseq2.yaml Updates default external tool set and relies on automatic sample expansion from simulation.n_cells.
workflow/configs/external_benchmark_chromium.yaml Relies on automatic sample expansion from simulation.n_cells.
test/unit/test_harmonize_external_counts.py Adds unit tests for the new barcode-map parser.
README.md Updates benchmark description and expected outputs at a high level.
docs/methods.md Updates methods text to reflect native vs rolled-up granularities and new external benchmark rule chain.
docs/external_tool_benchmark.md Documents default sample scope and Chromium barcode→cell_id translation, plus updated SQuIRE pipeline notes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

rule split_smartseq2_per_cell_bams:
"""Split the multi-cell SmartSeq2 BAM by @RG into per-cell BAMs so
bulk-style tools (TEcount, SQuIRE) can run one sample at a time."""
conda: op.join(workflow.basedir, 'envs', 'squire.yaml')
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

workflow/envs/squire.yaml:15

  • Installing SQuIRE from an unpinned git default branch makes the benchmark non-reproducible and may change behavior over time. Prefer pinning to a tag/commit SHA (or an archived release tarball) so reruns produce the same tool version.
  - pip:
      ## SQuIRE 0.9.9.92 is not on PyPI (only an unrelated 0.0.1); install the
      ## upstream source, whose master version is 0.9.9.92.
      - git+https://github.com/wyang17/SQuIRE.git

Comment thread workflow/envs/squire.yaml
Comment on lines +7 to 9
- python=2.7
- star=2.5.3a
- samtools=1.9
Comment thread workflow/envs/scte.yaml
Comment on lines 19 to +23
- pip:
- scTE==1.0.0
## scTE pins anndata loosely and the resolver lands on 0.6.x, which
## imports the removed pandas.core.index. Pin a modern anndata.
- anndata==0.10.9
- git+https://github.com/JiekaiLab/scTE.git
Comment thread docs/methods.md
Comment on lines 151 to 154
| TEtranscripts (TEcount) | bulk | gene_id (subfamily) | `workflow/envs/tetranscripts.yaml` (TEtranscripts 2.2.3) |
| scTE | single cell | family_id | `workflow/envs/scte.yaml` (scTE 1.0.0) |
| SQuIRE | bulk (optional) | locus | `workflow/envs/squire.yaml` (SQuIRE 0.9.9.92) |
| SQuIRE | bulk | locus | `workflow/envs/squire.yaml` (SQuIRE 0.9.9.92, Python 3.6, STAR 2.5.3a; isolated env) |

Comment on lines 13 to 16
| TEtranscripts (TEcount) | bulk | subfamily (gene_id) | Most cited bulk tool. Per-cell SmartSeq2 BAMs are treated as per-sample bulk replicates of one group. | `workflow/envs/tetranscripts.yaml` |
| scTE | single cell | family | The natural sc counterpart. Consumes a STARsolo BAM with CB and UB tags and emits a cell x family matrix. | `workflow/envs/scte.yaml` |
| SQuIRE | bulk | locus | Optional: pinned to Python 3.6 and STAR 2.5.3a; the upstream is unmaintained. Drop it from the config when the conda env fails to build. | `workflow/envs/squire.yaml` |
| SQuIRE | bulk | locus | Bulk locus-level counter. Pinned to Python 3.6 and STAR 2.5.3a; isolated in its own conda env so the version pin does not bleed into other rules. Reuses the existing STARsolo BAMs via a SQuIRE-shaped `map_folder` to keep the comparison apples-to-apples; `squire Map` is skipped. | `workflow/envs/squire.yaml` |

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants