Skip to content

Mapping consensus sequences from Merle's paper to genome sequence #2468

@PCarme

Description

@PCarme

Continuing from #1654

After reviewing the information Michael Stadler sent to Val, it sounds like our problem is that simply pattern-matching the consensus sequences onto the genome is too restrictive to account for all the binding sites (which makes me wonder what their consensus sequences represent in the first place but I'm not an expert).

They got their hit counts, which are different from ours, by using an R package which I must admit I don't fully understand what it does (the sequence scanning functionality is documented here). But from what I can tell, it doesn't seem to give the positions at which it detects the hits, only the number of hits for each motif.
The methods from Merle's paper only says "The universalmotif package version 1.16.0 was used to calculate motif similarities (compare_motifs function with parameters method = "PCC", tryRC=TRUE, min.overlap=4, min.mean.ic=0.25, normalise.scores=TRUE) and to scan sequences for motif hits (scan_sequences function with parameters threshold=1e-4, threshold.type="pvalue", RC=TRUE)."

One alternative they suggested for us is to use the positional frequency matrices rather than the consensus sequences.
I found an online tool which does that https://epd.expasy.org/pwmtools/pwmscan.php and allows to download the results as a bed file. I tried with the Ace2 motif and it found 313 hits, which is a bit more than our previous 109. So maybe we could use this tool to generate bed files for all the motifs and load them into JBrowse...

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions