Skip to content

Discrepancy at position_in_codon within upstream/downstream regions #3

@XLIU-hub

Description

@XLIU-hub

How to determine position_in_codon in upstream and downstream regions.

To retain continuous phase for the position_in_codon between upstream (u)-5′ UTR(-) or 3' UTR''(*)-downstream(d) boundary. We proposed a calculation method on position_in_codon, which is to extend the CDS reading frame upstream of the start codon and assign codon positions (value 1/2/3) to all bases in region u or - relative to the CDS frame. Under this convention, if the first base of region - corresponds to the second position in the codon, then the immediately upstream base is assigned codon position as 1 of the same projected codon. But this could cause discrepancy in u or d regions.

Example
If we have a coding sequence with the exons and CDS coordinate as

_exons = [(5, 8), (14, 20), (30, 35), (40, 44), (50, 52), (70, 72)]
_cds = (32, 43)

Following the above calculation method: crossmapper would convert coordinates from 2 to 5 into protein positions as below:

{'position': 1, 'position_in_codon': 2, 'region': 'u', 'offset': 0} # coordinate 2
{'position': 1, 'position_in_codon': 3, 'region': 'u', 'offset': 0} # coordinate 3
{'position': 1, 'position_in_codon': 1, 'region': 'u', 'offset': 0} # coordinate 4
{'position': 4, 'position_in_codon': 2, 'region': '-', 'offset': 0} # coordinate 5

In HGVS format, p.u1.1 (coordinate 4) looks adjacent to p.u1.2 (coordinate 2), but they are not on the coordinate.

Goal of the protein positions from crossmapper

  • can convert from coordinate to positions and vice versa.
  • deliver meaningful results
  • no discrepancy between coordinate or position_in_codon

Possible solutions

  • Add warning message if the region in positions are not in CDS.
  • Extend from - region and add count in offset
{'position': 4, 'position_in_codon': 2, 'region': '-', 'offset': -1} # coordinate 4
  • Extend from "" region and add count in offset
{'position': 1, 'position_in_codon': 1, 'region': '', 'offset': -28} # coordinate 4

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions