Releases · gbouras13/phold

Minor bugfix when running Phold with --cpu #123
Add support for HuggingFace database download from https://huggingface.co/datasets/gbouras13/phold-db (way faster than Zenodo)
Fix #124

Assets 2

31 Jan 12:32

gbouras13

v1.2.2

55e31eb

v1.2.2

Minor bugfix to make phold is compatible with transformers v5 and pandas v3

Assets 2

18 Jan 23:14

gbouras13

v1.2.1

2427da3

v1.2.1

Minor bugfix to make sure the ordering of ProstT5 confidence outputs matches the input .faa (was length-sorted, introduced by batching in v1.2.0)

Assets 2

08 Jan 05:19

gbouras13

v1.2.0

5f704db

v1.2.0

Improved ProstT5 3Di prediction throughput for phold run, phold predict and phold proteins-predict due to smarter batching implmentations
Addition of phold autotune subcommand to detect an appropriate --batch_size for your hardware
You can also use --autotune with phold run, phold predict and phold proteins-predict to automatically detect and use the optimal --batch_size (only recommended for large datasets with thousands of proteins)
Manuscript now published:

Bouras G., Grigson S.R., Mirdita M., Heinzinger M., Papudeshi B.,
Mallawaarachchi V., Green R., Kim S.R., Mihalia V., Psaltis A.J.,
Wormald P-J., Vreugde S., Steinegger M., Edwards R.A.

Protein Structure Informed Bacteriophage Genome Annotation with Phold
Nucleic Acids Research, Volume 54, Issue 1, 13 January 2026
https://doi.org/10.1093/nar/gkaf1448

Assets 2

06 Nov 23:49

gbouras13

v1.1.0

3528bfc

v1.1.0

Integration with suvtk to make to it easier to submit Pharokka and Phold annotated genomes to Genbank - thanks to @LanderDC for suvtk and integration. See https://github.com/gbouras13/phold?tab=readme-ov-file#genbank-submission for more details
Adds --restart parameter to complete large phold compare jobs #79

Contributors

LanderDC

Assets 2

06 Aug 06:42

gbouras13

v1.0.0

62db1e5

v1.0.0

Major Phold release to go with the preprint. For more details, see the preprint and updated documentation.

You will need to re-install the updated Phold search database with phold install to be compatible with v1.0.0

Major Changes

Phold search database has been modified, filtered and curated to contain 1,363,704 proteins structures with functional labels (see https://zenodo.org/records/16741548). In particular, since the previous release of Phold, the enVhogs were re-clustered and re-labelled by the authors of that work. This release contains the updated enVhog structures.
We additionally make available a larger database containing 3,166,602 structures (i.e. the Phold search database plus an extra 1.8M efam and enVhog proteins without PHROG assignment or functional label) to download using phold install --extended_db. Using this database provides marginally fewer functional annotations and takes longer than using the default Phold search database, so is not recommended for functional annotation, but finds more hits (i.e. including to unknown function proteins) overall, so may be of interest for viral identification tasks.
PHROG functional labels have been updated for 2,798 PHROGs using manual curation informed by structural similarity searches. See the preprint for more details. The updated annotations are available in the phold database under phold_annots.tsv
Phold search database is no longer pre-clustered, as it was shown not to significantly differ in terms of sensitivity and runtime from unclustered for the updated database.
Phold supports Foldseek-GPU acceleration for NVIDIA GPUs using --foldseek_gpu. Note that it is still ideal to run Phold with multiple CPU-threads (e.g. -t 8 or however many threads you have available), as GPU acceleration only accelerates and improves the prefilter of Foldseek.
Phold supports custom user-specified Foldseek databases with --custom_db.
Phold adds high, medium and low confidence annotation heuristics to guide the user (especially users from wet-lab backgrounds or without much understanding of protein structural alignment metrics) as to what annotations they should trust with a very high degree of confidence, and which they should prioritise for manual curation. See the documentation for more.
Phold will now mask all residues below 25 ProstT5 Confidence by default (can be varied with --mask_threshold), as this was shown to increase annotation performance compared to no masking.
If you only want to annotate hypothetical proteins from Pharokka to save runtime and resource usage, you can use --hyps
You can run Phold with fine-tuned ProstT5 models using --finetune (phage finetuned ProstT5 encoder and phage fine-tuned CNN) or --vanilla (phage finetuned ProstT5 encoder and vanilla PDB-based CNN). Annotation performance with these do not dramatically differ with the default ProstT5 (see the preprint), but may be of interest to some users of Phold.

Assets 2

13 Jul 06:20

gbouras13

v0.2.0

14e916c

v0.2.0

You will need to re-install the updated phold database for v0.2.0 using phold install
You will also need to upgrade Foldseek to v9.427df8a

v0.2.0 is a very large update adding:

Improved sensitivity and faster runtime for the foldseek search. This is achieved by clustering the Phold database at --min-seq-id 0.3 -c 0.8 and creating a cluster db before running with foldseek which significantly improves runtime
- Overall, just over 1.1M structures are clustered into around 372k clusters
--cluster-search 1 parameter is added to foldseek search to search against the cluster representatives first and then within each cluster, which increases sensitivity and reduces resource usage compared to phold v0.1.4
Changed default --max_seqs from 1000 to 10000 to improve sensitivity at little resource usage cost
Phold database is expanded adding:
- Extremely conservative high confidence efam proteins with hits to PHROGs.
- 95% dereplicated diversity-generating retroelements (DGRs) from Roux et al.
- 7153 netflax toxin-antitoxin system proteins from Ernits et al.
Adds --ultra_sensitive flag which turns off Foldseek prefiltering for maximum sensitivity. Recommended for small datasets/single phages only.
- This passes the --exhaustive-search parameter to foldseek search
Adds the ability to save ProstT5 embeddings with --save_per_residue_embeddings and --save_per_protein_embeddings
Adds .cif support (e.g. from Alphafold3 server) for structures, not just .pdb file format and changing the CLI to reflect this
Removes some experimental parameters from v0.1.4 (--split etc)

Breaking CLI parameter changes

--pdb has changed to --structures
--pdb_dir has changed to --structure_dir
--filter_pdbs has changed to --filter_structures

Assets 2

26 Mar 00:29

gbouras13

v0.1.4

ac2716e

v0.1.4

Fixes #31 issue with older Pharokka genbank input (prior to v1.5.0) that lacked 'transl_table' field, thanks @btemperton
- All Pharokka genbank input prior to v1.5.0 will be transl_table 11 (it is before pyrodigal-gv was added)
Fixes genbank parsing bug that would occur if the ID/locus tag of the CDS features in the input genbank were longer than 54 characters

Contributors

btemperton

Assets 2

Releases: gbouras13/phold

v1.2.5

Uh oh!

v1.2.4

Uh oh!

v1.2.3

Uh oh!

v1.2.2

Uh oh!

v1.2.1

Uh oh!

v1.2.0

Uh oh!

v1.1.0

Contributors

Uh oh!

v1.0.0

Uh oh!

v0.2.0

Uh oh!

v0.1.4

Contributors

Uh oh!