Currently, the APARENT dataloader gets the PolyA sites from the transcript GTF annotation:
|
def get_roi_from_transcript(transcript_start: int, transcript_end: int, is_on_negative_strand: bool) -> (int, int): |
|
""" |
|
Get region-of-interest for APARENT in relation to the 3'UTR of a transcript |
|
:param transcript_start: 0-based start position of the transcript |
|
:param transcript_end: 1-based end position of the transcript |
|
:param is_on_negative_strand: is the gene on the negative strand? |
|
:return: Tuple of (start, end) position for the region of interest |
|
""" |
|
# CSE should be roughly around position 70 of the 205bp sequence. |
|
# Since CSE is likely 30bp upstream of the cut site, we shift the cut site |
|
# by 100bp upstream and 105bp downstream |
|
if is_on_negative_strand: |
|
end = transcript_start + 100 |
|
# convert 0-based to 1-based |
|
end += 1 |
|
|
|
start = end - 205 |
|
else: |
|
start = transcript_end - 100 |
|
# convert 1-based to 0-based |
|
start -= 1 |
|
|
|
end = start + 205 |
|
|
|
return start, end |
@johli Is this a viable implementation of the strategy you explained here?
Do you maybe have some example data (vcf file + scores) against which we could compare the Kipoi predictions?
xref: #342
xref: johli/aparent#8
Currently, the APARENT dataloader gets the PolyA sites from the transcript GTF annotation:
models/APARENT/veff/dataloader.py
Lines 87 to 111 in ae8cf12
@johli Is this a viable implementation of the strategy you explained here?
Do you maybe have some example data (vcf file + scores) against which we could compare the Kipoi predictions?
xref: #342
xref: johli/aparent#8