Brownaming v2.0.0

Taxonomy‑aware protein naming by fast homology search with DIAMOND. Brownaming assigns concise names to predicted proteins by prioritizing homologs from the closest available taxa and only expanding outward in the taxonomy when needed.

About

Goal: give each query protein the most specific, biologically meaningful name supported by homology.

Features

Iterative taxonomy expansion: start at the target species; if no hit, move up (genus → family → order ...)
Prioritize candidates by: (1) taxonomy distance (only varies across iterations) then (2) DIAMOND bitscore, then (3) % identity
If nothing found up to configured rank → "Uncharacterized protein"
Fully local: UniProt Swiss‑Prot + TrEMBL DIAMOND database with taxonomy metadata
Optional exclusion of unwanted taxa

Requirements

Linux environment for database build (tested on Ubuntu; search also works under WSL)
DIAMOND ≥ 2.1.x compiled with taxonomy support
curl, awk, gzip, coreutils (for the provided build script)
Disk space: ≈ 120–150 GB (TrEMBL + Swiss‑Prot FASTA + DIAMOND index + temporary files)
RAM: ≥ 32 GB recommended for faster makedb sorting (less will still work, slower)
Python ≥ 3.9
Conda or Mamba (recommended for dependency management)

Python Dependencies

See environment.yml for the complete list of dependencies. Main packages include:

BioPython
NumPy, Pandas
scikit-learn
openpyxl
matplotlib
requests

Installation

Database Setup (Required for Both Conda and Docker)

Before using Brownaming, you must create the local DIAMOND database:

Clone repository:

git clone <repo_url>
cd Brownaming

Create or edit config.json:

{
    "local_db_path": "/path/to/brownaming_db"
}

Run the database build script:

./create_local_db.sh

What it does:

Downloads UniProt Swiss‑Prot + TrEMBL (current release)
Extracts TaxIDs from FASTA headers (OX=)
Generates taxonmap.tsv, taxonomy JSON caches (parent/rank/children)
Builds two DIAMOND databases:
- full (Swiss‑Prot + TrEMBL)
- swissprot (Swiss‑Prot only)

Duration: ~8 h

Option 1: Conda/Mamba

Create conda environment:

conda env create -f environment.yml
conda activate brownaming

Running Brownaming

# Basic run
python main.py -p /path/to/query.fasta -s 83333 --threads 16

# SwissProt only
python main.py -p /path/to/query.fasta -s 83333 --swissprot-only

# Specify database path explicitly (overrides config.json)
python main.py -p /path/to/query.fasta -s 83333 --local-db /custom/path/to/db

# Specify custom final output directory
python main.py -p /path/to/query.fasta -s 83333 --working-dir /custom/output/path

# Resume run
python main.py --resume 2026-02-24-14-30-83333

Brownaming always executes in: runs/YYYY-MM-DD-HH-MM-TAXID/

If --working-dir is provided, the run directory is moved to that destination only after successful completion.

Improving Runtime Predictions

After running several analyses, update the time prediction model:

# Activate environment (if not already active)
conda activate brownaming

# Generate updated dataset from all run logs
cd time_prediction_model/
python create_data.py

# Retrain the model
python train_model.py

The more analyses you run, the more accurate the time estimates become.

Option 2: Docker

Docker provides an isolated, reproducible environment.

Build Docker image

docker build -t brownaming .

Configuration

The wrapper script brownaming-compose automatically:

Reads database path from config.json
Detects and mounts the directory containing your input file

Make the script executable:

chmod +x brownaming-compose

Running Brownaming

Use absolute paths for your input files:

# Basic run
./brownaming-compose run --rm brownaming \
  python main.py -p /absolute/path/to/query.fasta -s 83333 --threads 16

# SwissProt only
./brownaming-compose run --rm brownaming \
  python main.py -p /absolute/path/to/query.fasta -s 83333 --swissprot-only

# Specify database path explicitly (overrides config.json)
./brownaming-compose run --rm brownaming \
  python main.py -p /absolute/path/to/query.fasta -s 83333 --local-db /path/to/db

# Specify custom final output directory
./brownaming-compose run --rm brownaming \
  python main.py -p /absolute/path/to/query.fasta -s 83333 --working-dir /custom/output/path

# Resume run
./brownaming-compose run --rm brownaming \
  python main.py --resume 2026-02-24-14-30-83333

Brownaming always executes in: runs/YYYY-MM-DD-HH-MM-TAXID/

If --working-dir is provided, the run directory is moved to that destination only after successful completion.

Improving Runtime Predictions

After running several analyses, update the time prediction model:

# Generate updated dataset from all run logs
./brownaming-compose run --rm brownaming \
  python time_prediction_model/create_data.py

# Retrain the model
./brownaming-compose run --rm brownaming \
  python time_prediction_model/train_model.py

The updated model is immediately available - no rebuild needed! The more analyses you run, the more accurate the time estimates become.

Command‑Line Arguments

Required:

-p / --proteins : Query protein FASTA
-s / --species : NCBI TaxID of target species (root of initial search)

Optional:

--local-db : Path to local database directory (overrides LOCAL_DB_PATH env var and config.json)
--working-dir : Final output directory (optional). Computation still runs in script_dir/runs/<run_id> and is moved at the end if successful.
--threads : DIAMOND threads (default: all)
--last-tax : Stop expanding after this specific TaxID is reached.
--ex-tax : TaxID to exclude. For multiple exclusions, use this flag multiple times; each instance excludes the specified taxon and its subtree.
--swissprot-only: Run DIAMOND searches only on the SwissProt database.
--run-id <custom_id> : Custom run ID (optional, default: YYYY-MM-DD-HH-MM-TAXID). Useful for integration with external systems.
--resume <run_id> : Resume a previous run using its run ID (format: YYYY-MM-DD-HH-MM-TAXID)

Resume Notes

When using --resume, only the run_id is required. Brownaming reloads saved parameters from runs/<run_id>/state_args.json

Example:

python main.py --resume <run_id>

Database Path Priority

Brownaming determines the database location in the following order:

--local-db command-line argument
local_db_path in config.json

Outputs

query_file_name_brownamed.fasta : FASTA file with updated headers containing the assigned names.
query_file_name_diamond_results.xlsx : Excel table listing, for each query protein, the match used for naming, including homology scores (identity, evalue, bitscore, ...), and the rank and name of the last common ancestor.
query_file_name_brownaming_stats.png : Statistics figure showing the progression through taxonomic ranks.
YYYY-MM-DD-HH-MM-TAXID.log : Complete log file of the run (in the run directory).

Updating the Databases

Re-run: ./create_local_db.sh --refresh

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
time_prediction_model		time_prediction_model
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
brownaming-compose		brownaming-compose
config_model.json		config_model.json
create_local_db.sh		create_local_db.sh
create_taxonomy_json.py		create_taxonomy_json.py
diamond_time_model.pkl		diamond_time_model.pkl
docker-compose.yml		docker-compose.yml
environment.yml		environment.yml
excel.py		excel.py
homology.py		homology.py
main.py		main.py
stats.py		stats.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Brownaming v2.0.0

About

Features

Requirements

Python Dependencies

Installation

Database Setup (Required for Both Conda and Docker)

Option 1: Conda/Mamba

Running Brownaming

Improving Runtime Predictions

The more analyses you run, the more accurate the time estimates become.

Option 2: Docker

Build Docker image

Configuration

Running Brownaming

Improving Runtime Predictions

Command‑Line Arguments

Resume Notes

Database Path Priority

Outputs

Updating the Databases

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Brownaming v2.0.0

About

Features

Requirements

Python Dependencies

Installation

Database Setup (Required for Both Conda and Docker)

Option 1: Conda/Mamba

Running Brownaming

Improving Runtime Predictions

The more analyses you run, the more accurate the time estimates become.

Option 2: Docker

Build Docker image

Configuration

Running Brownaming

Improving Runtime Predictions

Command‑Line Arguments

Resume Notes

Database Path Priority

Outputs

Updating the Databases

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages