Hello,
I have a modes proposal for the output file format improvements:
minus strand entries
tRNAscan-SE minus strand predictions in the output file have "tRNA Begin" > "tRNA End". Same goes for introns positions (if tRNA is spliced obviously). This is not an issue for the tRNAs themselves (BED files and fasta files have the correct 1:142656825-142656896 format/interval description) but the introns have to be flipped.
Would it be easier to have a same, BED-like start-end-strand numbering scheme in the output?
extra spaces
To convert the output to a still human readable but easy to parse TSV I do:
tail -n +4 trnascan_out.txt | tr -d ' ' > trnascan_out.tsv
Since you have a complicated header in the file I understand the need for the spaces. Which brings me to the next point
header / TSV
TSV format with named columns seem to be the default. With comment lines # on the top it could be even easier to understand than the current one and certainly easier to parse. For example:
"chrom", "trna_num", "trna_start", "trna_end", "trna_type", "anticodon",
"intr_start", "intr_end", "inf_score", "iso_CM", "iso_score", "note"
in order to fix minus strand issue the "strand" should be inserted somewhere.
These are just my 0.02$
Thank you for developing and maintaining tRNAScan-SE.
Darek Kedra
Hello,
I have a modes proposal for the output file format improvements:
minus strand entries
tRNAscan-SE minus strand predictions in the output file have "tRNA Begin" > "tRNA End". Same goes for introns positions (if tRNA is spliced obviously). This is not an issue for the tRNAs themselves (BED files and fasta files have the correct 1:142656825-142656896 format/interval description) but the introns have to be flipped.
Would it be easier to have a same, BED-like start-end-strand numbering scheme in the output?
extra spaces
To convert the output to a still human readable but easy to parse TSV I do:
Since you have a complicated header in the file I understand the need for the spaces. Which brings me to the next point
header / TSV
TSV format with named columns seem to be the default. With comment lines
#on the top it could be even easier to understand than the current one and certainly easier to parse. For example:in order to fix minus strand issue the "strand" should be inserted somewhere.
These are just my 0.02$
Thank you for developing and maintaining tRNAScan-SE.
Darek Kedra