Update methods comparison by imallona · Pull Request #9 · imallona/reclaim

imallona · 2026-05-21T13:48:06Z

No description provided.

Copilot

Pull request overview

This PR updates the external-tool benchmarking workflow and report to make comparisons across quantifiers/granularities more explicit, and to better support Chromium scTE and SmartSeq2 bulk-style tools.

Changes:

Add optional barcode→cell_id translation for scTE harmonization (Chromium) via --barcode-map, and plumb the simulator-produced mapping through the benchmark module.
Extend the external benchmark Snakemake module to support per-cell BAM splitting (SmartSeq2), add SQuIRE as a first-class tool chain (Fetch/Clean/Count/combine), and aggregate per-sample benchmarks into per-tool cost.
Expand the RMarkdown benchmark report with “native vs rolled-up” coverage notes, a “top quantifiers” table, precision/F1 panels, and a memory (RSS) panel.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
workflow/scripts/harmonize_external_counts.py	Adds barcode-map parsing and applies it to scTE h5ad obs names before sample filtering; improves empty-output diagnostics.
workflow/scripts/external_benchmark_report.Rmd	Adds granularity-coverage explanation/table and new summary + metric panels (precision/F1, RSS).
workflow/modules/simulations.snmk	Exposes Chromium `barcode_to_cell_id.tsv` as a simulator output for downstream use.
workflow/modules/external_tools_benchmark.snmk	Adds SmartSeq2 per-cell BAM splitting, SQuIRE workflow chain, benchmark aggregation, and barcode-map wiring for scTE Chromium harmonization.
workflow/configs/external_benchmark_smartseq2.yaml	Updates default external tool set and relies on automatic sample expansion from `simulation.n_cells`.
workflow/configs/external_benchmark_chromium.yaml	Relies on automatic sample expansion from `simulation.n_cells`.
test/unit/test_harmonize_external_counts.py	Adds unit tests for the new barcode-map parser.
README.md	Updates benchmark description and expected outputs at a high level.
docs/methods.md	Updates methods text to reflect native vs rolled-up granularities and new external benchmark rule chain.
docs/external_tool_benchmark.md	Documents default sample scope and Chromium barcode→cell_id translation, plus updated SQuIRE pipeline notes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+rule split_smartseq2_per_cell_bams:
+    """Split the multi-cell SmartSeq2 BAM by @RG into per-cell BAMs so
+    bulk-style tools (TEcount, SQuIRE) can run one sample at a time."""
+    conda: op.join(workflow.basedir, 'envs', 'squire.yaml')


Copilot

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

workflow/envs/squire.yaml:15

Installing SQuIRE from an unpinned git default branch makes the benchmark non-reproducible and may change behavior over time. Prefer pinning to a tag/commit SHA (or an archived release tarball) so reruns produce the same tool version.

  - pip:
      ## SQuIRE 0.9.9.92 is not on PyPI (only an unrelated 0.0.1); install the
      ## upstream source, whose master version is 0.9.9.92.
      - git+https://github.com/wyang17/SQuIRE.git

+  - python=2.7
  - star=2.5.3a
  - samtools=1.9


  - pip:
-      - scTE==1.0.0
+      ## scTE pins anndata loosely and the resolver lands on 0.6.x, which
+      ## imports the removed pandas.core.index. Pin a modern anndata.
+      - anndata==0.10.9
+      - git+https://github.com/JiekaiLab/scTE.git


 | TEtranscripts (TEcount) | bulk | gene_id (subfamily) | `workflow/envs/tetranscripts.yaml` (TEtranscripts 2.2.3) |
 | scTE | single cell | family_id | `workflow/envs/scte.yaml` (scTE 1.0.0) |
-| SQuIRE | bulk (optional) | locus | `workflow/envs/squire.yaml` (SQuIRE 0.9.9.92) |
+| SQuIRE | bulk | locus | `workflow/envs/squire.yaml` (SQuIRE 0.9.9.92, Python 3.6, STAR 2.5.3a; isolated env) |



 | TEtranscripts (TEcount) | bulk | subfamily (gene_id) | Most cited bulk tool. Per-cell SmartSeq2 BAMs are treated as per-sample bulk replicates of one group. | `workflow/envs/tetranscripts.yaml` |
 | scTE | single cell | family | The natural sc counterpart. Consumes a STARsolo BAM with CB and UB tags and emits a cell x family matrix. | `workflow/envs/scte.yaml` |
-| SQuIRE | bulk | locus | Optional: pinned to Python 3.6 and STAR 2.5.3a; the upstream is unmaintained. Drop it from the config when the conda env fails to build. | `workflow/envs/squire.yaml` |
+| SQuIRE | bulk | locus | Bulk locus-level counter. Pinned to Python 3.6 and STAR 2.5.3a; isolated in its own conda env so the version pin does not bleed into other rules. Reuses the existing STARsolo BAMs via a SQuIRE-shaped `map_folder` to keep the comparison apples-to-apples; `squire Map` is skipped. | `workflow/envs/squire.yaml` |



imallona added 2 commits May 21, 2026 15:28

Update external benchmark, add squire

f10c7cc

Add bam-split for smartseq2 and other fixes, untested

b84bb69

imallona requested a review from Copilot May 21, 2026 13:48

Copilot started reviewing on behalf of imallona May 21, 2026 13:49 View session

Copilot AI reviewed May 21, 2026

View reviewed changes

Comment thread workflow/modules/external_tools_benchmark.snmk

rule split_smartseq2_per_cell_bams:

"""Split the multi-cell SmartSeq2 BAM by @RG into per-cell BAMs so

bulk-style tools (TEcount, SQuIRE) can run one sample at a time."""

conda: op.join(workflow.basedir, 'envs', 'squire.yaml')

imallona added 8 commits May 22, 2026 08:20

Update scte version

edf56b5

Update scte call, add ulimit

e2b2a89

Rename scte index

1a37a77

Pin modern annData for scTE

4e98a2d

Update scte call so output does not contain dirname

2bb2ec1

Reduce thresholds for scte

58a46c1

Update squire and scte

73f884d

Patch smartseq2 bam headers so squire ingests them

0789109

imallona requested a review from Copilot May 22, 2026 14:12

Copilot started reviewing on behalf of imallona May 22, 2026 14:12 View session

Copilot AI reviewed May 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update methods comparison#9

Update methods comparison#9
imallona wants to merge 10 commits into
masterfrom
dev

imallona commented May 21, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

imallona commented May 21, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants