Rkmeans is a small interactive R/Shiny application for structural clustering of molecular files (.cif or .pdb) based on RMSD and k-means clustering.
This app was born from the need to cluster AlphaFold 3 output structures, especially for R users. At the time, no simple and direct online tool allowed structural clustering of .cif files.
This tool is best suited for clustering a small number of predicted structures (e.g., 5–100 models), adjusting the number of clusters (k) accordingly.
For example:
- If you input 10 structures, trying
k = 2–4is sensible. - For 30 structures, you might test
k = 3–6. - For 50+ structures,
k= 10 may be appropriate.
- Load multiple
.pdbor.ciffiles from a folder. - Automatically convert AlphaFold
.ciffiles to.pdb(via Python and Biopython). - Calculate pairwise RMSD using alpha carbons (Cα).
- Run multidimensional scaling (MDS) and k-means clustering.
- Visualize 3D structures with
r3dmoldirectly in the app. - Show cluster details (size, representative structure).
- Save cluster representatives.
Install these packages if you don’t have them already:
install.packages(c(
"shiny", "shinyFiles", "bio3d", "r3dmol", "tools",
"plotly", "dplyr", "readr", "tibble", "ggplot2"
))If you want to use the 3D structure viewer with .cif files (especially those downloaded from AlphaFold), you'll need Python and Biopython installed. This is only required for visualization — not for clustering.
pip install biopythonMake sure Python is installed and accessible from R via:
Sys.which("python")If you're only interested in the clustering results and don't need structure visualization, you can skip this step.
- Clone this repository or download the app files.
- Place your
.cifor.pdbfiles in a folder (e.g.,structures/). - Launch the app in R:
shiny::runApp("path_to_app_folder")- In the app:
- Select a folder containing
.cifor.pdbfiles. - The app will align the structures, calculate the RMSD matrix, and perform k-means clustering.
- You can choose the number of clusters (
k) manually. - Once clustering is complete, you'll see:
- A summary table showing cluster sizes, within-cluster RMSD, and medoid filenames.
- A 3D scatter plot of the structures in reduced dimensions (via MDS).
- Optional 3D viewer for inspecting any selected structure.
- Select a folder containing
This is not a polished piece of software — just a tool I made for myself to cluster structural models. Sharing it here in case it helps others using R for protein analysis.
MIT License.
Ivan Sanchis, PhD
Laboratorio de Péptidos Bioactivos
Facultad de Bioquímica y Ciencias Biológicas
Universidad Nacional del Litoral
Santa Fe, Argentina
📧 sanchisivan@gmail.com / sanchisivan@fbcb.unl.edu.ar