-
Notifications
You must be signed in to change notification settings - Fork 11
Open
Description
FragGeneScanPlus makes an error when translating protein fragments in negative frames, as it doesn't transform the sequence to its inverse complement before translating.
Attached to this issue is a small FASTA file, which is wrongly translated to:
ELNLNILSFNTNWVRTVSTPGSTFLTCLNVKNCFVQWTFHSLIFDKSFR**GIRMRANIFDSEVIFIDKKNTNFFPFNFDGFRLIFANVTLLTN*DPTHTLRPH
While the solution should be:
MRTQGMSWISVCQQGDVSEDEPKAVEVEGKKIGVFFVDENYFAIENVCPHAYALLTEGFIEDQTVECPLHEAIFDIQTGELKSGPGCRNLCTYPVRVEGQDIQIQL
Full test example:
Translating the following sequence with length 551:
TGTTCTGCTTCCTTGTACATGTGAGGACTAGAGTTGAATTTGAATATCCTGTCCTTCAACACGAACTGGGTACGTACATAGGTTTCTACACCCGGGTCCACTTTTTAGCTCACCTGTCTGAATGTCAAAAATTGCTTCGTGCAATGGACATTCCACAGTTTGATCTTCGATAAATCCTTCCGTTAATAAGGCATACGCATGAGGGCAAACATTTTCGATAGCGAAGTAATTTTCATCGACAAAAAAAACACCAATTTTTTTCCCTTCAACTTCGACGGCTTTCGGCTCATCTTCGCTAACGTCACCCTGCTGACAAACTGAGATCCAACTCATACCTTGCGTCCTCATTTTGTTTTATATACAAAACATAATTTGATTTTCAAAACACAAGCTAAGCATAATCCTCTTGATTAATTTTTGTCAAAGTAAAAATAAACATTAAAATCAATTGATTAATAAATTTTAAATAATTTGTTACGTTTCAAGTCAGAAACAATGTTTTAAATATAAAAATTGTTTTATGTAATCTTTATAATTACAATAGTTCTAAA
Performing 6-frame translation:
+1: CSASLYM*GLELNLNILSFNTNWVRT*VSTPGSTF*LTCLNVKNCFVQWTFHSLIFDKSFR**GIRMRANIFDSEVIFIDKKNTNFFPFNFDGFRLIFANVTLLTN*DPTHTLRPHFVLYTKHNLIFKTQAKHNPLD*FLSK*K*TLKSID**ILNNLLRFKSETMF*I*KLFYVIFIITIVL
+2: VLLPCTCED*S*I*ISCPSTRTGYVHRFLHPGPLFSSPV*MSKIASCNGHSTV*SSINPSVNKAYA*GQTFSIAK*FSSTKKTPIFFPSTSTAFGSSSLTSPC*QTEIQLIPCVLILFYIQNII*FSKHKLSIILLINFCQSKNKH*NQLINKF*IICYVSSQKQCFKYKNCFM*SL*LQ*F*
+3: FCFLVHVRTRVEFEYPVLQHELGTYIGFYTRVHFLAHLSECQKLLRAMDIPQFDLR*ILPLIRHTHEGKHFR*RSNFHRQKKHQFFSLQLRRLSAHLR*RHPADKLRSNSYLASSFCFIYKT*FDFQNTS*A*SS*LIFVKVKINIKIN*LINFK*FVTFQVRNNVLNIKIVLCNLYNYNSSK
-1: FRTIVIIKIT*NNFYI*NIVSDLKRNKLFKIY*SIDFNVYFYFDKN*SRGLCLACVLKIKLCFVYKTK*GRKV*VGSQFVSRVTLAKMSRKPSKLKGKKLVFFLSMKITSLSKMFALMRMPY*RKDLSKIKLWNVHCTKQFLTFRQVS*KVDPGVETYVRTQFVLKDRIFKFNSSPHMYKEAE
-2: LELL*L*RLHKTIFIFKTLFLT*NVTNYLKFINQLILMFIFTLTKINQEDYA*LVF*KSNYVLYIKQNEDARYELDLSLSAG*R*RR*AESRRS*REKNWCFFCR*KLLRYRKCLPSCVCLINGRIYRRSNCGMSIARSNF*HSDR*AKKWTRV*KPMYVPSSC*RTGYSNSTLVLTCTRKQN
-3: *NYCNYKDYIKQFLYLKHCF*LET*QII*NLLIN*F*CLFLL*QKLIKRIMLSLCFENQIMFCI*NKMRTQGMSWISVCQQGDVSEDEPKAVEVEGKKIGVFFVDENYFAIENVCPHAYALLTEGFIEDQTVECPLHEAIFDIQTGELKSGPGCRNLCTYPVRVEGQDIQIQL*SSHVQGSRT
Solution FragGeneScanPlus:
ELNLNILSFNTNWVRT*VSTPGSTF*LTCLNVKNCFVQWTFHSLIFDKSFR**GIRMRANIFDSEVIFIDKKNTNFFPFNFDGFRLIFANVTLLTN*DPTHTLRPH
Correct solution (using reverse complement of ORF):
MRTQGMSWISVCQQGDVSEDEPKAVEVEGKKIGVFFVDENYFAIENVCPHAYALLTEGFIEDQTVECPLHEAIFDIQTGELKSGPGCRNLCTYPVRVEGQDIQIQL
Generating code (don't forget test_fasta.txt):
#!/usr/bin/env python3
from Bio import SeqIO
seq = SeqIO.read('test_fasta.txt', 'fasta').seq
print('Translating the following sequence with length {}:'.format(len(seq)))
print('\n{}\n\n'.format(seq))
print('Performing 6-frame translation:\n')
for s, strand in ((seq, 1), (seq.reverse_complement(), -1)):
for frame in range(3):
print('{:+2d}: {}'.format((frame + 1) * strand, s[frame:].translate(table=11)))
print()
# Known ORF on the negative strand (visible in the 6-frame translation on -3)
orf_start = 28
orf_stop = 348
orf = seq[(orf_start + 2):orf_stop]
print('Solution FragGeneScanPlus:\n{}\n'.format(orf.translate(table=11)))
orf = orf.reverse_complement()
print('Correct solution (using reverse complement of ORF):\n{}\n'.format(orf.translate(table=11)))Metadata
Metadata
Assignees
Labels
No labels