Mapping consensus sequences from Merle's paper to genome sequence

Continuing from #1654 

After reviewing the information Michael Stadler sent to Val, it sounds like our problem is that simply pattern-matching the consensus sequences onto the genome is too restrictive to account for all the binding sites (which makes me wonder what their consensus sequences represent in the first place but I'm not an expert). 

They got their hit counts, which are different from ours, by using an R package which I must admit I don't fully understand what it does (the sequence scanning functionality is documented [here](https://www.bioconductor.org/packages/release/bioc/vignettes/universalmotif/inst/doc/SequenceSearches.pdf)). But from what I can tell, it doesn't seem to give the positions at which it detects the hits, only the number of hits for each motif.
The methods from Merle's paper only says "The universalmotif package version 1.16.0 was used to calculate motif similarities (compare_motifs function with parameters method = "PCC", tryRC=TRUE, min.overlap=4, min.mean.ic=0.25, normalise.scores=TRUE) and to scan sequences for motif hits (scan_sequences function with parameters threshold=1e-4, threshold.type="pvalue", RC=TRUE)."

One alternative they suggested for us is to use the [positional frequency matrices](https://github.com/fmicompbio/Spombe_TFome/blob/main/data/motifs.meme) rather than the consensus sequences. 
I found an online tool which does that https://epd.expasy.org/pwmtools/pwmscan.php and allows to download the results as a bed file. I tried with the Ace2 motif and it found 313 hits, which is a bit more than our previous 109. So maybe we could use this tool to generate bed files for all the motifs and load them into JBrowse...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mapping consensus sequences from Merle's paper to genome sequence #2468

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Mapping consensus sequences from Merle's paper to genome sequence #2468

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions