Take in multiple inputs with different assemblies; pull chr info from bam instead of fasta #44

minkinaa · 2025-10-29T18:49:36Z

This PR address two main issues:

(1) Previously, a single assembly was defined in the config file and if multiple samples were submitted in the input file, all had to be associated with that assembly. This PR makes it possible to structure an input file to have 4 columns: sample, bam, ref, ref_name. See example: config/config_4col.tbl. The yaml file can omit ref and ref_name entirely or leave them blank (example: config/config_4col.yaml). If a single reference can be used with all samples, the original format where the ref and ref_name are only indicated in config.yaml will still work.

(2) Previously, a list of chrs was pulled from the input fasta. This can be problematic if not all of these are present in the input bam. This PR modifies these names to be pulled directly from the bam instead.

…ser controlled input filtering

…s from input bam instead of fasta

…ent assembly inputs

mrvollger

I have two comments that apply in lots of places.

Delete old unused code.
We should use only the bam header instead of fai_df and the bam header.

mrvollger · 2025-10-29T19:11:46Z

workflow/Snakefile

-FAI = get_fai()
-REF_NAME = config["ref_name"]
-EXCLUDES = get_excludes()
+#REF = get_ref()


delete completely instead of commenting

mrvollger · 2025-10-29T19:19:06Z

workflow/rules/common.smk

        raise ValueError(f"FIRE: reference file {ref} does not exist")
    return os.path.abspath(ref)

+def get_ref_old(wc):


I think the functions you have added with _old or _orig are not used. if so they should be removed.

mrvollger · 2025-10-29T19:21:31Z

workflow/rules/common.smk

    return fai

+def get_fai(wc):
+    ref = MANIFEST.loc[wc.sm, "ref"]


to get ref you should call get_ref. And then generally it is good to use f strings in situations like this:

ref = get_ref(wc) fai = f"{ref}.fai" ...

mrvollger · 2025-10-29T19:24:27Z

workflow/rules/common.smk

+
+    bam_chr_list=[]
+    input_bam_path=get_input_bam(wc)
+    input_bam = pysam.AlignmentFile(input_bam_path, "rc", threads=MAX_THREADS)


you don't need extra threads here, I would drop that. Also the "rc" part can be infered and not all inputs will be cram so we dont want it. basically delete the last two args.

mrvollger · 2025-10-29T19:26:55Z

workflow/rules/common.smk

+    input_bam = pysam.AlignmentFile(input_bam_path, "rc", threads=MAX_THREADS)
+    bam_header_dict = input_bam.header.to_dict()
+
+    for line in bam_header_dict['SQ']:


The length of each chromosome is also available in the bam header. So we should use the bam header only instead of mixing fai_df and this method.

mrvollger · 2025-10-29T19:27:46Z

workflow/rules/track-hub.smk

        suffix=get_hap_col_suffix,
        nzooms=NZOOMS,
-        chrom=get_chroms()[0],
+        #chroms=get_chroms,


delete comments

…d removed commented unused code

Mitchell R. Vollger and others added 3 commits June 18, 2025 11:50

drop filtering from mosdepth, allow it to only be done by the input u…

ad043e3

…ser controlled input filtering

Allow multiple inputs with different reference files and get chr name…

30018c0

…s from input bam instead of fasta

added example config files for running a group of samples with differ…

34dfa1e

…ent assembly inputs

mrvollger requested changes Oct 29, 2025

View reviewed changes

minkinaa added 3 commits October 29, 2025 13:00

change pixi version to 0.46.0

7e5b6bf

removed dependence on .fai for chr names where bam already present an…

7582e64

…d removed commented unused code

added error message to get_fai

d881fcb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Take in multiple inputs with different assemblies; pull chr info from bam instead of fasta #44

Take in multiple inputs with different assemblies; pull chr info from bam instead of fasta #44

Uh oh!

minkinaa commented Oct 29, 2025

Uh oh!

mrvollger left a comment

Uh oh!

mrvollger Oct 29, 2025

Uh oh!

mrvollger Oct 29, 2025

Uh oh!

mrvollger Oct 29, 2025

Uh oh!

mrvollger Oct 29, 2025

Uh oh!

mrvollger Oct 29, 2025

Uh oh!

mrvollger Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Take in multiple inputs with different assemblies; pull chr info from bam instead of fasta #44

Are you sure you want to change the base?

Take in multiple inputs with different assemblies; pull chr info from bam instead of fasta #44

Uh oh!

Conversation

minkinaa commented Oct 29, 2025

Uh oh!

mrvollger left a comment

Choose a reason for hiding this comment

Uh oh!

mrvollger Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

mrvollger Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

mrvollger Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

mrvollger Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

mrvollger Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

mrvollger Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants