Review APARENT predictions

Currently, the APARENT dataloader gets the PolyA sites from the transcript GTF annotation:
https://github.com/kipoi/models/blob/ae8cf12cf013d07fd2d6c8ac28a6b82069adac80/APARENT/veff/dataloader.py#L87-L111

@johli Is this a viable implementation of the [strategy you explained here](https://github.com/johli/aparent/issues/1)?
Do you maybe have some example data (vcf file + scores) against which we could compare the Kipoi predictions?


xref: https://github.com/kipoi/models/issues/342
xref: https://github.com/johli/aparent/issues/8

	def get_roi_from_transcript(transcript_start: int, transcript_end: int, is_on_negative_strand: bool) -> (int, int):
	"""
	Get region-of-interest for APARENT in relation to the 3'UTR of a transcript
	:param transcript_start: 0-based start position of the transcript
	:param transcript_end: 1-based end position of the transcript
	:param is_on_negative_strand: is the gene on the negative strand?
	:return: Tuple of (start, end) position for the region of interest
	"""
	# CSE should be roughly around position 70 of the 205bp sequence.
	# Since CSE is likely 30bp upstream of the cut site, we shift the cut site
	# by 100bp upstream and 105bp downstream
	if is_on_negative_strand:
	end = transcript_start + 100
	# convert 0-based to 1-based
	end += 1

	start = end - 205
	else:
	start = transcript_end - 100
	# convert 1-based to 0-based
	start -= 1

	end = start + 205

	return start, end

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review APARENT predictions #346

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Review APARENT predictions #346

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions