Skip to content

JasperBoom/caltha

Repository files navigation

Caltha

A python package for processing UMI tagged mixed amplicon metabarcoding data.

DOI Build Status Code Style: Black License: AGPL v3

Installation

The current version of Caltha requires Python 3.8.

To install Caltha, simply run the pip install command:

pip install caltha

NOTE: Caltha does require one more dependency which can not be installed with the Caltha PyPI package. This dependency is vsearch (2.15.0).
Conda can be used to manage the installation of this tool.
After installation of either anaconda or miniconda, executing the following conda install command should install the dependency.

conda install -c bioconda vsearch=2.15.2

How to run

Caltha can be run directly from the command line.

usage: caltha [-h] [-v] [-i FLINPUT] [-t FLTABULAR] [-z FLPREVALIDATION]
              [-b FLBLAST] [-f [{fasta,fastq}]] [-l [{umi5,umi3,umidouble}]]
              [-a [{primer,adapter,zero}]] [-u INTUMILENGTH] [-y FLTIDENTITY]
              [-c INTABUNDANCE] [-w STRFORWARD] [-r STRREVERSE]
              [-d STRDIRECTORY] [-@ INTTHREADS]

A python package for processing UMI tagged mixed amplicon metabarcoding data.

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit
  -i FLINPUT, --input FLINPUT
                        The input fasta/fastq file(s). This can either be a
                        zip archive or a single fasta/fastq file.
  -t FLTABULAR, --tabular FLTABULAR
                        The output tabular zip file.
  -z FLPREVALIDATION, --zip FLPREVALIDATION
                        The pre validation zip file.
  -b FLBLAST, --blast FLBLAST
                        The output blast zip file.
  -f [{fasta,fastq}], --format [{fasta,fastq}]
                        The format of the input file. (default: fasta)
  -l [{umi5,umi3,umidouble}], --location [{umi5,umi3,umidouble}]
                        Search for UMIs at the 5'-end, 3'-end or at the 5'-end
                        and 3'-end. (default: umi5)
  -a [{primer,adapter,zero}], --anchor [{primer,adapter,zero}]
                        Which anchor type to use. (default: primer)
  -u INTUMILENGTH, --length INTUMILENGTH
                        The length of the UMI sequence. (default: 5)
  -y FLTIDENTITY, --identity FLTIDENTITY
                        The identity percentage with which to perform
                        the validation. (default: 0.97)
  -c INTABUNDANCE, --abundance INTABUNDANCE
                        The minimum abundance of a sequence in order for
                        it to be included during validation. (default: 1)
  -w STRFORWARD, --forward STRFORWARD
                        The 5'-end anchor nucleotides.
  -r STRREVERSE, --reverse STRREVERSE
                        The 3'-end anchor nucleotides.
  -d STRDIRECTORY, --directory STRDIRECTORY
                        The location of the temporary working
                        directory (not created by Caltha). (default: .)
  -@ INTTHREADS, --threads INTTHREADS
                        The number of threads to run Caltha with. (default: 8)

This python package requires one extra dependency which can be easily
installed with conda (conda install -c bioconda vsearch=2.15.0).

Further documentation can be found here.

Package links

Source(s)

  • Langa L, Willing C, Meyer C, Zijlstra J, Naylor M, Dollenstein Z, Lees C,
    Black: The uncompromising Python code formatter.
    Black
  • Cock P, Antao T, Chang J, Chapman B, Cox C, Dalke A,
    Biopython: freely available Python tools for computational molecular biology and bioinformatics.
    Bioinformatics. 2009; 25(11): 1422-1423. doi: 10.1093/bioinformatics/btp163
    Biopython
  • Ziadé T, Cordasco I,
    Flake8: Your tool for style guide enforcement.
    Flake8
  • Turk J,
    Jellyfish: a python library for doing approximate and phonetic matching of strings.
    Jellyfish
  • Harris CR, Millman KJ, van der Walt SJ,
    Array programming with NumPy.
    Nature. 2020; 585: 357-362. doi: 0.1038/s41586-020-2649-2
    NumPy
  • Reback J, McKinney W, Mendel JB, van den Bossche J, Augspurger T, Cloud P,
    Pandas: powerful Python data analysis toolkit.
    Zenodo. 2020. doi: 10.5281/zenodo.4067057
    Pandas
  • Sottile A, Struys K, Kuehl C, Finkle M,
    Pre-commit: A framework for managing and maintaining multi-language pre-commit hooks.
    Pre-commit
  • Du L,
    Pyfastx: a robust Python module for fast random access to sequences from plain and gzipped FASTA/Q files.
    Pyfastx
  • Krekel H, Oliveira B, Hahler D, Pfannschmidt R, Benita R,
    Pytest: a framework making it easy to write small test, yet scales to support complex functional testing for applications and libraries.
    Pytest
  • Coombs JR, Ziadé T, Eby PJ, Fulton J, Bicking I, Ippolito B,
    Setuptools: a library designed t facilitate packaging Python projects.
    Setuptools
  • Hatch T, van den Berg J, Luo X, Oberländer J,
    sre_yield: efficiently generate all values that can match a given regular expression.
    Google.
    sre_yield
  • Rognes T, Flouri T, Nichols B, Quince C, Mahe F,
    VSEARCH: a versatile open source tool for metagenomics.
    PeerJ. 2016; 4. doi: 10.7717/peerj.2584
    vsearch
  • Python Software Foundation,
    Python 3.8+. 2019.
    Python
  • Python Packaging Authority, Python Software Foundation,
    The Python Package index. 2003.
    PyPI

Author(s)

Citation

Copyright (C) 2018 Jasper Boom

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License version 3 as
published by the Free Software Foundation.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.

About

A python package to process UMI tagged mixed amplicon metabarcoding data.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages