Describe the bug
A c. transcript variant which is in 5' or 3' untranslated regions (before cds start or after cds end) currently get mapped to protein coordinates using arithmetic, but those coordinates are not valid positions.
The example below uses clinvar variant 875307:
g.: NC_000001.11:226875386:C:T
c.: NM_000447.3:c.-184C>T
p.: N/A
Steps to reproduce
def main():
# https://www.ncbi.nlm.nih.gov/clinvar/variation/875307/
clinvar_spdi = "NC_000001.11:226875386:C:T"
clinvar_nm_hgvs = "NM_000447.3:c.-184C>T"
seqrepo_dir = "/Users/kferrite/dev/data/seqrepo/2024-12-20"
sr = SeqRepo(root_dir=seqrepo_dir)
cst = CoolSeqTool(sr=sr)
seq_accession = "NC_000001.11"
seq_pos_start = 226875386
seq_pos_end = 226875387
result = asyncio.run(
cst.mane_transcript.grch38_to_mane_c_p(
alt_ac=seq_accession,
start_pos=seq_pos_start,
end_pos=seq_pos_end,
coordinate_type=CoordinateType.INTER_RESIDUE,
)
)
cdna = result.cdna
protein = result.protein
print(f"cdna: {cdna}")
print(f"protein: {protein}")
# Check if variant is in an untranslated region
cds_length = cdna.coding_end_site - cdna.coding_start_site
if cdna.pos[1] <= 0:
print(
f"\nShould not map to protein: variant at cDNA pos {cdna.pos} is in the "
f"5' UTR (before CDS start at position 0). "
f"Absolute transcript pos: {cdna.pos[0] + cdna.coding_start_site}-"
f"{cdna.pos[1] + cdna.coding_start_site}, "
f"CDS spans: {cdna.coding_start_site}-{cdna.coding_end_site}"
)
elif cdna.pos[0] >= cds_length:
print(
f"\nShould not map to protein: variant at cDNA pos {cdna.pos} is in the "
f"3' UTR (after CDS end at position {cds_length}). "
f"Absolute transcript pos: {cdna.pos[0] + cdna.coding_start_site}-"
f"{cdna.pos[1] + cdna.coding_start_site}, "
f"CDS spans: {cdna.coding_start_site}-{cdna.coding_end_site}"
)
else:
print(f"\nVariant is in CDS (pos {cdna.pos}, CDS length {cds_length})")
print(f"cool-seq-tool returned protein pos {protein.pos}")
main()
Output:
cdna: gene='PSEN2' refseq='NM_000447.3' ensembl='ENST00000366783.8' pos=(-184, -183) strand=<Strand.POSITIVE: 1> status=<TranscriptPriority.MANE_SELECT: 'mane_select'> coding_start_site=383 coding_end_site=1730 alt_ac='NC_000001.11'
protein: gene='PSEN2' refseq='NP_000438.2' ensembl='ENSP00000355747.3' pos=(-62, -61) strand=<Strand.POSITIVE: 1> status=<TranscriptPriority.MANE_SELECT: 'mane_select'>
Should not map to protein: variant at cDNA pos (-184, -183) is in the 5' UTR (before CDS start at position 0). Absolute transcript pos: 199-200, CDS spans: 383-1730
cool-seq-tool returned protein pos (-62, -61)
Acceptance Criteria
If the variant end < cds start or variant start > cds end it's easy enough to have the protein coordinates be None.
But I don't know what the behavior should be if the variant transcript region spans the cds boundary such that part of it codes for a protein and part does not.
Possible reason(s)/Suggested Fix
Either:
(1) Add cds start/end args to _get_mane_p and return None if mane_c_pos_range is outside those bounds.
(2) Prior to calling _get_mane_p (from grch38_to_mane_c_p and get_mane_transcript), check the cds bounds and do not call _get_mane_p if outside them.
Environment & Version
pypi: cool-seq-tool==0.15.4 (but same code in branch: main / tag: 0.16.0)
Describe the bug
A c. transcript variant which is in 5' or 3' untranslated regions (before cds start or after cds end) currently get mapped to protein coordinates using arithmetic, but those coordinates are not valid positions.
The example below uses clinvar variant 875307:
g.:
NC_000001.11:226875386:C:Tc.:
NM_000447.3:c.-184C>Tp.:
N/ASteps to reproduce
Output:
Acceptance Criteria
If the
variant end < cds startorvariant start > cds endit's easy enough to have the protein coordinates beNone.But I don't know what the behavior should be if the variant transcript region spans the cds boundary such that part of it codes for a protein and part does not.
Possible reason(s)/Suggested Fix
Either:
(1) Add cds start/end args to
_get_mane_pand returnNoneif mane_c_pos_range is outside those bounds.(2) Prior to calling
_get_mane_p(fromgrch38_to_mane_c_pandget_mane_transcript), check the cds bounds and do not call_get_mane_pif outside them.Environment & Version
pypi:
cool-seq-tool==0.15.4(but same code in branch:main/ tag:0.16.0)