Skip to content

FGS+ doesn't use the reverse complement in negative strand ORF's #19

@nielsdg

Description

@nielsdg

FragGeneScanPlus makes an error when translating protein fragments in negative frames, as it doesn't transform the sequence to its inverse complement before translating.

Attached to this issue is a small FASTA file, which is wrongly translated to:

ELNLNILSFNTNWVRTVSTPGSTFLTCLNVKNCFVQWTFHSLIFDKSFR**GIRMRANIFDSEVIFIDKKNTNFFPFNFDGFRLIFANVTLLTN*DPTHTLRPH

While the solution should be:

MRTQGMSWISVCQQGDVSEDEPKAVEVEGKKIGVFFVDENYFAIENVCPHAYALLTEGFIEDQTVECPLHEAIFDIQTGELKSGPGCRNLCTYPVRVEGQDIQIQL

Full test example:

Translating the following sequence with length 551:

TGTTCTGCTTCCTTGTACATGTGAGGACTAGAGTTGAATTTGAATATCCTGTCCTTCAACACGAACTGGGTACGTACATAGGTTTCTACACCCGGGTCCACTTTTTAGCTCACCTGTCTGAATGTCAAAAATTGCTTCGTGCAATGGACATTCCACAGTTTGATCTTCGATAAATCCTTCCGTTAATAAGGCATACGCATGAGGGCAAACATTTTCGATAGCGAAGTAATTTTCATCGACAAAAAAAACACCAATTTTTTTCCCTTCAACTTCGACGGCTTTCGGCTCATCTTCGCTAACGTCACCCTGCTGACAAACTGAGATCCAACTCATACCTTGCGTCCTCATTTTGTTTTATATACAAAACATAATTTGATTTTCAAAACACAAGCTAAGCATAATCCTCTTGATTAATTTTTGTCAAAGTAAAAATAAACATTAAAATCAATTGATTAATAAATTTTAAATAATTTGTTACGTTTCAAGTCAGAAACAATGTTTTAAATATAAAAATTGTTTTATGTAATCTTTATAATTACAATAGTTCTAAA


Performing 6-frame translation:
+1: CSASLYM*GLELNLNILSFNTNWVRT*VSTPGSTF*LTCLNVKNCFVQWTFHSLIFDKSFR**GIRMRANIFDSEVIFIDKKNTNFFPFNFDGFRLIFANVTLLTN*DPTHTLRPHFVLYTKHNLIFKTQAKHNPLD*FLSK*K*TLKSID**ILNNLLRFKSETMF*I*KLFYVIFIITIVL
+2: VLLPCTCED*S*I*ISCPSTRTGYVHRFLHPGPLFSSPV*MSKIASCNGHSTV*SSINPSVNKAYA*GQTFSIAK*FSSTKKTPIFFPSTSTAFGSSSLTSPC*QTEIQLIPCVLILFYIQNII*FSKHKLSIILLINFCQSKNKH*NQLINKF*IICYVSSQKQCFKYKNCFM*SL*LQ*F*
+3: FCFLVHVRTRVEFEYPVLQHELGTYIGFYTRVHFLAHLSECQKLLRAMDIPQFDLR*ILPLIRHTHEGKHFR*RSNFHRQKKHQFFSLQLRRLSAHLR*RHPADKLRSNSYLASSFCFIYKT*FDFQNTS*A*SS*LIFVKVKINIKIN*LINFK*FVTFQVRNNVLNIKIVLCNLYNYNSSK
-1: FRTIVIIKIT*NNFYI*NIVSDLKRNKLFKIY*SIDFNVYFYFDKN*SRGLCLACVLKIKLCFVYKTK*GRKV*VGSQFVSRVTLAKMSRKPSKLKGKKLVFFLSMKITSLSKMFALMRMPY*RKDLSKIKLWNVHCTKQFLTFRQVS*KVDPGVETYVRTQFVLKDRIFKFNSSPHMYKEAE
-2: LELL*L*RLHKTIFIFKTLFLT*NVTNYLKFINQLILMFIFTLTKINQEDYA*LVF*KSNYVLYIKQNEDARYELDLSLSAG*R*RR*AESRRS*REKNWCFFCR*KLLRYRKCLPSCVCLINGRIYRRSNCGMSIARSNF*HSDR*AKKWTRV*KPMYVPSSC*RTGYSNSTLVLTCTRKQN
-3: *NYCNYKDYIKQFLYLKHCF*LET*QII*NLLIN*F*CLFLL*QKLIKRIMLSLCFENQIMFCI*NKMRTQGMSWISVCQQGDVSEDEPKAVEVEGKKIGVFFVDENYFAIENVCPHAYALLTEGFIEDQTVECPLHEAIFDIQTGELKSGPGCRNLCTYPVRVEGQDIQIQL*SSHVQGSRT

Solution FragGeneScanPlus:
ELNLNILSFNTNWVRT*VSTPGSTF*LTCLNVKNCFVQWTFHSLIFDKSFR**GIRMRANIFDSEVIFIDKKNTNFFPFNFDGFRLIFANVTLLTN*DPTHTLRPH

Correct solution (using reverse complement of ORF):
MRTQGMSWISVCQQGDVSEDEPKAVEVEGKKIGVFFVDENYFAIENVCPHAYALLTEGFIEDQTVECPLHEAIFDIQTGELKSGPGCRNLCTYPVRVEGQDIQIQL

Generating code (don't forget test_fasta.txt):

#!/usr/bin/env python3

from Bio import SeqIO

seq = SeqIO.read('test_fasta.txt', 'fasta').seq
print('Translating the following sequence with length {}:'.format(len(seq)))
print('\n{}\n\n'.format(seq))

print('Performing 6-frame translation:\n')
for s, strand in ((seq, 1), (seq.reverse_complement(), -1)):
    for frame in range(3):
        print('{:+2d}: {}'.format((frame + 1) * strand, s[frame:].translate(table=11)))
print()


# Known ORF on the negative strand (visible in the 6-frame translation on -3)
orf_start = 28
orf_stop = 348
orf = seq[(orf_start + 2):orf_stop]

print('Solution FragGeneScanPlus:\n{}\n'.format(orf.translate(table=11)))

orf = orf.reverse_complement()
print('Correct solution (using reverse complement of ORF):\n{}\n'.format(orf.translate(table=11)))

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions