Skip to content

Possible inconsistencies in annotation files? #6

@filonico

Description

@filonico

Dear authors,
Thank you for providing this repository with data from your work, as well as for making the analyses as much reproducible as possible. It's really helpful. I was using the genomic data of the four placozoans to do some comparative genomics, and I was extracting the full-lengths mRNAs of several genes from T. adaherens by using AGAT. However, I noticed that when trying to extract the corresponding CDS (still using AGAT), the result does not correspond to the one provided in the file Tadh_long.cds.fasta. It seems that the start and stop position in the gtf are out of phase, i.e., the sequence is shifted of 64 nucleotides downstream. For clarity, see below:

  • the code I used and the sequence I am given:
$ agat_sp_extract_sequences.pl -g gene_of_interest_below.gtf -f Tadh_gDNA.fasta -t cds

>Tadh_TriadT26009 gene=Tadh_TriadG26009 seq_id=scaffold_5 type=cds
TTGTCTCAGATTGTCTATGGCGTATCATTTTAAGCTGTTTCATGACCCTATTTATGTTGATGAGCTGCATTGGTAATGGCGCCGTCTTACTCGTTTTACGCTACCATCATGATGATATCAAGTCGGCATCTAACTATTTTATCACTAATTTAGCCTTAACTGATTTTTTACTGGGCGTACTATGCATGCCCTGTATTTTGATTTCCTGCTTAAATGGGCAATGGGTTTTTGGTCAGACCTTATGCAGTTTAACAGGGTTTGCTAACTCATTTTTTTGTATTAATTCCATGATTACTTTAGCCGCTGTTAGTGTGGAAAAATACTGTGCTATTGCTTCACCATTGACATATCATCATTATATGAGCAAAAGTAAAGTCACATGTGTAATTTCAATTATATGGATCCATTCAGCTATTAATGCTAGTCTACCCTTTTTGGGCTGGGGAGAATATGTCTACCTTCCTTTCGAAACAATTTGCACAGTTGCTTGGTGGAGCTTTCCAAATTATGTTGGTTTTATAGTTGGTATTAATTTTGGACTACCTACCGTGATCATGAGTTGTACTTATTTCCTCATACTAAAAATTGCTCGTAAACATTCAAGGCGGATAGGTGTATCTACGTCAACTGTAGCAATTTCAACTTATCTAAGCCCAACTGGTACATATAATAACCTTAGTCCAGTTTTTATAGTCTGCTGGCTACCGCATCTTATTAGTATGATATATTTAACCATTTATGAAATAAGCCCGTTACCCTGTAGTTTTCATCAAATTACAACATGGCTAGCAATGGCTAACTCGGCTTTTAACCCAATCATATATGGAGCTATGGATACATCTATAAGAAAAGGTCTTAAAACCTTACTCGGATCCTGGGTAAAATATTGTAAATTATACTAAATTCGAATGAAATTGGTGCAGTTTTGTTGTTTATATTTATCGTATTTTTATTTCTTGCATA
  • the same sequence as provided in Tadh_long.cds.fasta:
>Tadh_TriadT26009
ATGGCTGATACCTACATTAACAATTTCACGAATAAATCACTAGAGCTATGCAATGGGAGCCTAGTTGTCTCAGATTGTCTATGGCGTATCATTTTAAGCTGTTTCATGACCCTATTTATGTTGATGAGCTGCATTGGTAATGGCGCCGTCTTACTCGTTTTACGCTACCATCATGATGATATCAAGTCGGCATCTAACTATTTTATCACTAATTTAGCCTTAACTGATTTTTTACTGGGCGTACTATGCATGCCCTGTATTTTGATTTCCTGCTTAAATGGGCAATGGGTTTTTGGTCAGACCTTATGCAGTTTAACAGGGTTTGCTAACTCATTTTTTTGTATTAATTCCATGATTACTTTAGCCGCTGTTAGTGTGGAAAAATACTGTGCTATTGCTTCACCATTGACATATCATCATTATATGAGCAAAAGTAAAGTCACATGTGTAATTTCAATTATATGGATCCATTCAGCTATTAATGCTAGTCTACCCTTTTTGGGCTGGGGAGAATATGTCTACCTTCCTTTCGAAACAATTTGCACAGTTGCTTGGTGGAGCTTTCCAAATTATGTTGGTTTTATAGTTGGTATTAATTTTGGACTACCTACCGTGATCATGAGTTGTACTTATTTCCTCATACTAAAAATTGCTCGTAAACATTCAAGGCGGATAGGTGTATCTACCCGAAGAATACATTATAAAACACATATTAAAGCAACATTGATGTTATTAATTGTCATCGGTAGTTTTATAGTCTGCTGGCTACCGCATCTTATTAGTATGATATATTTAACCATTTATGAAATAAGCCCGTTACCCTGTAGTTTTCATCAAATTACAACATGGCTAGCAATGGCTAACTCGGCTTTTAACCCAATCATATATGGAGCTATGGATACATCTATAAGAAAAGGTCTTAAAACCTTACTCGGATCCTGGGTAAAATATTGTAAATTATAC
  • the nucleotide alignment between the two:
>Tadh_TriadT26009
ATGGCTGATACCTACATTAACAATTTCACGAATAAATCACTAGAGCTATGCAATGGGAGCCTAGTTGTCTCAGATTGTCTATGGCGTATCATTTTAAGCTGTTTCATGACCCTATTTATGTTGATGAGCTGCATTGGTAATGGCGCCGTCTTACTCGTTTTACGCTACCATCATGATGATATCAAGTCGGCATCTAACTATTTTATCACTAATTTAGCCTTAACTGATTTTTTACTGGGCGTACTATGCATGCCCTGTATTTTGATTTCCTGCTTAAATGGGCAATGGGTTTTTGGTCAGACCTTATGCAGTTTAACAGGGTTTGCTAACTCATTTTTTTGTATTAATTCCATGATTACTTTAGCCGCTGTTAGTGTGGAAAAATACTGTGCTATTGCTTCACCATTGACATATCATCATTATATGAGCAAAAGTAAAGTCACATGTGTAATTTCAATTATATGGATCCATTCAGCTATTAATGCTAGTCTACCCTTTTTGGGCTGGGGAGAATATGTCTACCTTCCTTTCGAAACAATTTGCACAGTTGCTTGGTGGAGCTTTCCAAATTATGTTGGTTTTATAGTTGGTATTAATTTTGGACTACCTACCGTGATCATGAGTTGTACTTATTTCCTCATACTAAAAATTGCTCGTAAACATTCAAGGCGGATAGGTGTATCTACCCGAAGAATA-CATTATAAAACACATATTAAAGCAACATTGATGTTATTAATTGTCAT----CGGTAGTTTTATAGTCTGCTGGCTACCGCATCTTATTAGTATGATATATTTAACCATTTATGAAATAAGCCCGTTACCCTGTAGTTTTCATCAAATTACAACATGGCTAGCAATGGCTAACTCGGCTTTTAACCCAATCATATATGGAGCTATGGATACATCTATAAGAAAAGGTCTTAAAACCTTACTCGGATCCTGGGTAAAATATTGTAAATTATAC----------------------------------------------------------------
>Tadh_TriadT26009 gene=Tadh_TriadG26009 seq_id=scaffold_5 type=cds
----------------------------------------------------------------TTGTCTCAGATTGTCTATGGCGTATCATTTTAAGCTGTTTCATGACCCTATTTATGTTGATGAGCTGCATTGGTAATGGCGCCGTCTTACTCGTTTTACGCTACCATCATGATGATATCAAGTCGGCATCTAACTATTTTATCACTAATTTAGCCTTAACTGATTTTTTACTGGGCGTACTATGCATGCCCTGTATTTTGATTTCCTGCTTAAATGGGCAATGGGTTTTTGGTCAGACCTTATGCAGTTTAACAGGGTTTGCTAACTCATTTTTTTGTATTAATTCCATGATTACTTTAGCCGCTGTTAGTGTGGAAAAATACTGTGCTATTGCTTCACCATTGACATATCATCATTATATGAGCAAAAGTAAAGTCACATGTGTAATTTCAATTATATGGATCCATTCAGCTATTAATGCTAGTCTACCCTTTTTGGGCTGGGGAGAATATGTCTACCTTCCTTTCGAAACAATTTGCACAGTTGCTTGGTGGAGCTTTCCAAATTATGTTGGTTTTATAGTTGGTATTAATTTTGGACTACCTACCGTGATCATGAGTTGTACTTATTTCCTCATACTAAAAATTGCTCGTAAACATTCAAGGCGGATAGGTGTATCTACGTCAACTGTAGCAATTTCA---ACTTATCTAAGCCCAACTGGTACATATAATAACCTTAGTCCAGT--TTTTATAGTCTGCTGGCTACCGCATCTTATTAGTATGATATATTTAACCATTTATGAAATAAGCCCGTTACCCTGTAGTTTTCATCAAATTACAACATGGCTAGCAATGGCTAACTCGGCTTTTAACCCAATCATATATGGAGCTATGGATACATCTATAAGAAAAGGTCTTAAAACCTTACTCGGATCCTGGGTAAAATATTGTAAATTATACTAAATTCGAATGAAATTGGTGCAGTTTTGTTGTTTATATTTATCGTATTTTTATTTCTTGCATA
  • and the corresponding gene model as per Tadh_long.annot.gtf:
scaffold_5      JGI     transcript      3427178 3428332 .       +       .       transcript_id "Tadh_TriadT26009"; gene_id "Tadh_TriadG26009";
scaffold_5      JGI     exon    3427178 3427863 .       +       .       transcript_id "Tadh_TriadT26009"; gene_id "Tadh_TriadG26009";
scaffold_5      JGI     exon    3428053 3428332 .       +       .       transcript_id "Tadh_TriadT26009"; gene_id "Tadh_TriadG26009";
scaffold_5      JGI     CDS     3427178 3427863 .       +       0       transcript_id "Tadh_TriadT26009"; gene_id "Tadh_TriadG26009";
scaffold_5      JGI     CDS     3428053 3428329 .       +       1       transcript_id "Tadh_TriadT26009"; gene_id "Tadh_TriadG26009";

I have noticed this issue also with other genes from T. adaherens, but apparently not for T. adaherens H2. I haven't checked with the other two species.

Am I doing something wrong or is there something I am not seeing? Thank you for helping!
Filippo

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions