cfoldseeker finds homologous gene clusters via protein structural similarity. It searches structural homologs for your query protein structures using foldseek (both local and remote target databases supported) and identifies the genomically colocalised hits among these by fetching the genomic location of each protein's coding sequence (fetched from various remote cross-referencing APIs, or from a locally prepared database).
cfoldseeker has been designed as the structural similarity-driven sister tool of cblaster, which it tighly integrates for generating outputs. As such, cfoldseeker can naturally produce cblaster-style output and clinker visualisations.
Tip
Although cfoldseeker can be used as a stand-alone tool, it is the structural similarity-based discovery engine of the ✨ csuite ✨, our new integrated toolbox featuring streamlined workflows for both sequence- and protein structure-based gene cluster mining. Try it out!
- A remote search mode for searches against the AlphaFoldDB, leveraging the Foldseek webserver and various cross-referencing APIs for fetching genomic locations (
kegg_pull, UniProt ID mapping, ENA Browser API). - A local search mode for searches against a local protein structure DB prepared with
foldseek. - A local-clustered search mode for searches against a local
foldseekDB of representative proteins derived from a sequence set preclustered withMMseqs2. If the representative protein of a sequence cluster is identified as a homolog, all other members are added to the hit set. - A helper tool to construct local genomic context databases:
cfoldseeker-cds - Tight integration with
cblaster, facilitating similar output and interactiveclinkervisualisations
For installation instructions, usage, a tutorial and more, head over to the cfoldseeker docs!
If you found cfoldseeker useful, please cite our manuscript:
De Vrieze, L., Masschelein, J. (2026) In preparation
cfoldseeker relies heavily on the following tools, so please give these proper credit as well.
Gilchrist, C.L.M., Booth, T.J., van Wersch, B., van Grieken, L., Medema, M.H., & Chooi, Y-H. (2021). cblaster: a remote search tool for rapid identification and visualisation of homologous gene clusters. Bioinformatics Advances, https://doi.org/10.1093/bioadv/vbab016
van Kempen, M., Kim, S.S., Tumescheit, C., Mirdita, M., Lee, J., Gilchrist, C.L.M., Söding, J., Steinegger, M. (2024). Fast and accurate protein structure search with Foldseek. Nature Biotechnology, 42, https://doi.org/10.1038/s41587-023-01773-0
Huckvale, E., Moseley, H.N.B. (2023). kegg_pull: a software package for the RESTful access and pulling from the Kyoto Encyclopedia of Gene and Genomes. BMC Bioinformatics, 24(78), https://doi.org/10.1186/s12859-023-05208-0
cfoldseeker is freely available under an MIT license.
Use of the third-party software, libraries or code referred to in the References section above may be governed by separate terms and conditions or license provisions. Your use of the third-party software, libraries or code is subject to any such terms and you should check that you can comply with any applicable restrictions or terms and conditions before use.
