Allow cellranger renaming to ignore irregular filename patterns#11646
Draft
tgelafr-pfzr wants to merge 3 commits into
Draft
Allow cellranger renaming to ignore irregular filename patterns#11646tgelafr-pfzr wants to merge 3 commits into
tgelafr-pfzr wants to merge 3 commits into
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR aims to add an opt-out for the strict R1/R2 filename similarity check used during FASTQ renaming in the cellranger modules, to better support public/non-standard FASTQ naming (e.g., SRA-derived files).
Changes:
- Adds an
ignore_filename_patternboolean control intended to bypass the R1/R2 “same name except R1/R2” validation. - Updates
cellranger_count.py/cellranger_multi.pyto conditionally skip the validation check. - Adds new nf-test cases (and snapshot updates) covering non-standard filename behavior.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| modules/nf-core/cellranger/multi/main.nf | Adds ignore_filename_pattern as a new process input. |
| modules/nf-core/cellranger/multi/templates/cellranger_multi.py | Gates the R1/R2 filename validation on ignore_filename_pattern. |
| modules/nf-core/cellranger/multi/tests/main.nf.test | Adds tests asserting failure/success behavior for non-standard filenames. |
| modules/nf-core/cellranger/multi/tests/main.nf.test.snap | Updates snapshots and adds a new snapshot entry for the new test. |
| modules/nf-core/cellranger/count/main.nf | Adds skip_renaming and ignore_filename_pattern as new process inputs. |
| modules/nf-core/cellranger/count/templates/cellranger_count.py | Adds renaming skip logic and gates filename validation on ignore_filename_pattern. |
| modules/nf-core/cellranger/count/tests/main.nf.test | Updates existing tests for new inputs and adds non-standard filename tests. |
| modules/nf-core/cellranger/count/tests/main.nf.test.snap | Updates snapshots and adds a new snapshot entry for the new test. |
Comments suppressed due to low confidence (1)
modules/nf-core/cellranger/multi/templates/cellranger_multi.py:62
- The assertion message shown on filename mismatch doesn’t mention the new
ignore_filename_patternescape hatch. Since this PR adds an opt-out, consider updating the message to tell users they can enable the flag when working with non-standard public FASTQ naming (and that it relaxes a safety check).
raise AssertionError(
dedent(
f"""\
We expect R1 and R2 of the same sample to have the same filename except for R1/R2.
This has been checked by replacing "R1" with "R2" in the first filename and comparing it to the second filename.
If you believe this check shouldn't have failed on your filenames, please report an issue on GitHub!
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
7
to
+11
| input: | ||
| tuple val(meta), path(reads, stageAs: "fastq_???/*") | ||
| path reference | ||
| val skip_renaming | ||
| val ignore_filename_pattern |
Comment on lines
26
to
30
| path frna_sampleinfo , stageAs: "references/frna/*" | ||
| path ocm_barcodes , stageAs: "references/ocm/barcodes/*" | ||
| val skip_renaming | ||
| val ignore_filename_pattern | ||
|
|
| resolved_name_r2 = r2.name | ||
| else: | ||
| # double escapes are required because nextflow processes this python 'template' | ||
| if (re.sub(filename_pattern, r"\\1R2\\2", r1.name) != r2.name) and ("${ignore_filename_pattern}" == "false"): |
Comment on lines
+49
to
+56
| f"""\ | ||
| We expect R1 and R2 of the same sample to have the same filename except for R1/R2. | ||
| This has been checked by replacing "R1" with "R2" in the first filename and comparing it to the second filename. | ||
| If you believe this check shouldn't have failed on your filenames, please report an issue on GitHub! | ||
|
|
||
| Files involved: | ||
| - {r1} | ||
| - {r2} |
| else: | ||
| # double escapes are required because nextflow processes this python 'template' | ||
| if re.sub(filename_pattern, r"\\1R2\\2", r1.name) != r2.name: | ||
| if (re.sub(filename_pattern, r"\\1R2\\2", r1.name) != r2.name) and ("${ignore_filename_pattern}" == "false"): |
| input[18] = ch_frna_sampleinfo | ||
| input[19] = ch_ocm_barcodes | ||
| input[20] = false // default to false to guarantee renaming during test | ||
| input[21] = false // test false to test failure of renaming in non-standard filename scenario |
Comment on lines
+43
to
+44
| "nf-test": "0.9.3", | ||
| "nextflow": "25.04.8" |
Comment on lines
+175
to
+176
| "nf-test": "0.9.3", | ||
| "nextflow": "25.04.8" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR checklist
Add a new boolean flag (default false) that, when set, skips the check that paired FASTQ file names are identical except for R1/R2. This allow for flexibility for public data can often have non-standard naming, e.g. from SRA.
Also creates parity in behavior between cellranger/count and cellranger/multi by allowing passing
skip_renamingparameter to cellranger/count and adding appropriate logic.Closes #11638
nf-core modules test <MODULE> --profile dockernf-core modules test <MODULE> --profile singularitynf-core modules test <MODULE> --profile conda