Skip to content

dezordi/auto-ncbi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Auto-NCBI

Scripts to automatic recovery/process information of NCBI.

ncbi_seq_retrieve

Recover NCBI sequences, host and organism taxonomy information based on list of tax_ids

Usage

  • To recovery genbank information from nucleotide sequences:

python ncbi_seq_retrieve.py -in file_with_access_ids.txt -db nucleotide -ot gb

Or to recovery in xml format, just insert the parameter -tf xml.

  • To recovery cds translated to aminoacids from nucleotide sequences:

python ncbi_seq_retrieve.py -in file_with_acess_ids.txt -db nucleotide -ot fasta_cds_aa

Or to recovery cds not translated, just change fasta_cds_aa for fasta_cds_na

  • To recovery nucleotide of aminoacid sequences

python ncbi_seq_retrieve.py -in file_with_acess_ids.txt -db (nucleotide or protein) -ot fasta

Or to recovery in xml format, just insert the parameter -tf xml.

  • To recovery taxonomy information of ncbi acess IDs

python ncbi_seq_retrieve.py -in file_with_acess_ids.txt -db (nucleotide or protein) -ot gb -tx True

  • To recovery taxonomy information of host of ncbi acess IDs (ideal for viruses)

python ncbi_seq_retrieve.py -in file_with_acess_ids.txt -db (nucleotide or protein) -ot gb -tx True -th True

Some considerations

If you have a file with IDs from nucleotide sequences, you can't use this file in a protein database, and vice-versa. If you call help function, a table with which text formats are allowed per output type, and which output types are allowed per database.

split_by_tax

Sample a fasta file based on taxonomy and virus name, the header of sequence should follow the pattern: <ncbi-access>|<tax>|<sequence name>[<virus name>]. For example YP_010037467.1|Alphacoronavirus|polyprotein 1ab [Alphacoronavirus sp.] can be used to samble by Genus.

Usage

python split_by_tax.py input.fasta output_directory seed

With test data:

python split_by_tax.py test_data/ncbi_virus.fa test_out 123

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages