How to determine position_in_codon in upstream and downstream regions.
To retain continuous phase for the position_in_codon between upstream (u)-5′ UTR(-) or 3' UTR''(*)-downstream(d) boundary. We proposed a calculation method on position_in_codon, which is to extend the CDS reading frame upstream of the start codon and assign codon positions (value 1/2/3) to all bases in region u or - relative to the CDS frame. Under this convention, if the first base of region - corresponds to the second position in the codon, then the immediately upstream base is assigned codon position as 1 of the same projected codon. But this could cause discrepancy in u or d regions.
Example
If we have a coding sequence with the exons and CDS coordinate as
_exons = [(5, 8), (14, 20), (30, 35), (40, 44), (50, 52), (70, 72)]
_cds = (32, 43)
Following the above calculation method: crossmapper would convert coordinates from 2 to 5 into protein positions as below:
{'position': 1, 'position_in_codon': 2, 'region': 'u', 'offset': 0} # coordinate 2
{'position': 1, 'position_in_codon': 3, 'region': 'u', 'offset': 0} # coordinate 3
{'position': 1, 'position_in_codon': 1, 'region': 'u', 'offset': 0} # coordinate 4
{'position': 4, 'position_in_codon': 2, 'region': '-', 'offset': 0} # coordinate 5
In HGVS format, p.u1.1 (coordinate 4) looks adjacent to p.u1.2 (coordinate 2), but they are not on the coordinate.
Goal of the protein positions from crossmapper
- can convert from coordinate to positions and vice versa.
- deliver meaningful results
- no discrepancy between coordinate or position_in_codon
Possible solutions
- Add warning message if the region in positions are not in CDS.
- Extend from
- region and add count in offset
{'position': 4, 'position_in_codon': 2, 'region': '-', 'offset': -1} # coordinate 4
- Extend from
"" region and add count in offset
{'position': 1, 'position_in_codon': 1, 'region': '', 'offset': -28} # coordinate 4
How to determine position_in_codon in upstream and downstream regions.
To retain continuous phase for the
position_in_codonbetween upstream (u)-5′ UTR(-) or 3' UTR''(*)-downstream(d) boundary. We proposed a calculation method onposition_in_codon, which is to extend the CDS reading frame upstream of the start codon and assign codon positions (value 1/2/3) to all bases in regionuor-relative to the CDS frame. Under this convention, if the first base of region-corresponds to the second position in the codon, then the immediately upstream base is assigned codon position as 1 of the same projected codon. But this could cause discrepancy inuordregions.Example
If we have a coding sequence with the exons and CDS coordinate as
Following the above calculation method: crossmapper would convert coordinates from 2 to 5 into protein positions as below:
{'position': 1, 'position_in_codon': 2, 'region': 'u', 'offset': 0} # coordinate 2 {'position': 1, 'position_in_codon': 3, 'region': 'u', 'offset': 0} # coordinate 3 {'position': 1, 'position_in_codon': 1, 'region': 'u', 'offset': 0} # coordinate 4 {'position': 4, 'position_in_codon': 2, 'region': '-', 'offset': 0} # coordinate 5In HGVS format,
p.u1.1(coordinate 4) looks adjacent top.u1.2(coordinate 2), but they are not on the coordinate.Goal of the protein positions from crossmapper
Possible solutions
-region and add count in offset{'position': 4, 'position_in_codon': 2, 'region': '-', 'offset': -1} # coordinate 4""region and add count in offset{'position': 1, 'position_in_codon': 1, 'region': '', 'offset': -28} # coordinate 4