SSAlign is an ultra-fast and highly sensitive protein search tool designed to retrieve the most similar proteins from large databases. It leverages protein language models to represent sequence and structure information, and supports multi-GPU and multi-process execution. SSAlign performs batched searches for many query proteins using a two-stage alignment pipeline (prefilter + alignment refinement). In our benchmarks (vs. Foldseek and TM-align), SSAlign achieves strong sensitivity with much higher throughput.
- Two-stage search: fast prefilter + refinement alignment.
- Scales to very large databases with multi-GPU / multi-process acceleration.
- Benchmarks included: SwissProt, SCOPe40, AFDB50.
git clone https://github.com/ISYSLAB-HUST/SSAlign.git
cd SSAlignconda env create -f env.yml
conda activate SSAlign- Submit jobs directly at:
http://bioinfo.isyslab.info/ssalign/search/
Download or generate required intermediate files:
-
Protein structures
- SwissProt structures →
pdbData/pdb/SwissProt - SCOPe40 structures →
pdbData/pdb/SCOPe40
- SwissProt structures →
-
Models
- Download
SaProt_650M_AF2.ptfromhttps://huggingface.co/westlake-repl/SaProt_650M_AF2 - Place it under:
models/
- Download
-
Foldseek databases (optional)
-
Download Foldseek databases for SCOPe40 / SwissProt / AFDB50 and place under:
models/foldseekDB
-
References:
http://bioinfo.isyslab.info/ssalign/download/section/ssalign/https://github.com/steineggerlab/foldseek
-
-
SSAlign databases (recommended for local runs)
-
Place downloaded SSAlignDB under:
models/SSAlignDB/SwissProt(folder name follows your repo scripts)
-
-
Generate databases locally (alternative)
-
SwissProt / SCOPe40:
SwissProt/processDB.pySCOPe40/processDB.py
-
AFDB50:
AFDB50/build_indexDB.pyAFDB50/build_faiss.pyAFDB50/AFDB50fasta_whiteing.py
-
- To run
run_SSAlign.pyand search directly within the corresponding database by specifying the--dbparameter.Or you can:- Run SSAlign search on SwissProt:
SiwssPort/SiwssPort_SSAlign.py - Run SSAlign search on SCOPe40:
SCOPe40/SCOPe40_SSAlign.py - Run SSAlign search on AFDB50:
AFDB50/AFDB50_SSAlign.py
- Run SSAlign search on SwissProt:
| Option | Description |
|---|---|
--db |
Select the database to search. Currently supported: afdb50 |
--querypdbs |
Number of candidates selected in SSAlign-prefilter stage (default: 2000). Larger → slower. |
--prefilter_target |
Number of candidates to retain in the prefilter stage(default: 2000). Must be ≤ prefilter_target and ≤ max_target. May depend on index type. |
--prefilter_threshold |
Score cutoff for triggering SAligner re-ranking in the prefilter stage(default: 0.3). . |
--max_target |
Maximum number of final results returned by the tool (default: 1000). |
--mode |
Operation mode: 0 - prefilter only; 1 - full two-stage pipeline. |
--prefilter_mode |
Execute the FAISS-based prefilter stage on CPU or GPU (sharded across multiple GPUs)(default: cpu,choices=["cpu", "gpu"]). . |
--out_dir |
Output directory for saving search results. |
--nproc |
Number of parallel threads for the SAligner stage (default: 64). |
--cuda_device |
CUDA device identifier for running the SaProt model (e.g., 'cuda:0'). |
All tests were run on a server with:
- CPU: Intel Xeon Gold 6133 × 2 (40 cores / 80 threads), 2.50–3.00 GHz
- GPU: NVIDIA RTX A6000 48GB × 3
- Memory: 256GB
SAligner uses Numba to accelerate a Needleman–Wunsch alignment over 3Di sequences. The compiled version significantly speeds up alignment.
from numba.pycc import CC
from pair_align import saligner
cc = CC('saligner')
cc.verbose = True
cc.export('saligner', 'i8(string, string)')(saligner)
if __name__ == '__main__':
cc.compile()The compiled output is:
SAligner/saligner.cpython-310-x86_64-linux-gnu.so.so
from saligner import saligner
seq1 = "VGTSLSVLIRAELGHPGALI"
seq2 = "GDDQIYNVIVTAHAFVMIFFMVMPIMI"
saligner_score = saligner(seq1, seq2)| Non-accelerated SAligner | Biopython | SAligner |
|---|---|---|
| 0.4862s | 0.0066s | 0.00536s |
Faiss supports sharding large indices across multiple GPUs so that combined GPU memory can hold the index.
import faiss
index = faiss.read_index(faiss_index_file)
gpu_resources = [faiss.StandardGpuResources() for _ in range(2)]
co = faiss.GpuMultipleClonerOptions()
co.shard = True
index = faiss.index_cpu_to_gpu_multiple_py(gpu_resources, index, co)The prefilter_threshold determines the number of results that can be directly returned in the SSAlign-prefilter stage without requiring further filtering by SAligner. For the IndexFlatIP index, we conducted detailed tests across different dimensions. The figure below shows the relationship between accuracy (TM-Score >= 0.5) and recall under different thresholds. A threshold that is too low may lead to a decrease in accuracy, while a threshold that is too high may result in excessive time consumption during the SAligner stage.The figure below shows the impact of selecting different prefilter_threshold values on accuracy and recall when the dimensionality is 1280 and the prefilter_target = 2000.
| dim | 1280 | 512 |
|---|---|---|
| prefilter_threshold | 0.2 | 0.3 |
Benchmark intermediates can be downloaded from:
http://bioinfo.isyslab.info/ssalign/download/section/ssalign/
-
TM-align results:
utils.execTMalign.exec_tmalign_SwissProt- output →
../benchmarkData/SwissProt/tmalign
-
Foldseek results:
utils.execFoldseek.exec_foldseek_easy_search_para_SwissProt- output →
../benchmarkData/SwissProt/foldseek
-
SSAlign / SSAlign-prefilter:
SwissProt.benchmark_SSAlign_result.main- output →
../benchmarkData/SwissProt/SSAlign/SVD{dim}/ssalignand../benchmarkData/SwissProt/SSAlign/SVD{dim}/ssalign_prefilter
-
Overlap comparison:
SwissProt/benchmark_overlap.py- output →
../benchmarkData/SwissProt/benchmark
-
Cumulative score prep (NPZ):
SwissProt/benchmark_cumsum_score.py- output →
../benchmarkData/SwissProt/cumsumNpz
-
Plot figures:
SwissProt/benchmark_plot.py- output →
../benchmarkData/SwissProt/benchmark
-
Recommended
prefilter_thresholdfigure:SwissProt/cosine_threshold.py
-
SS-Score trainer:
SwissProt/LinearModel.py
-
TM-align:
utils.execTMalign.exec_tmalign_SCOPe40- output →
../benchmarkData/SCOPe40/tmalign
-
Foldseek:
utils.execFoldseek.exec_foldseek_easy_search_para_SCOPe40- output →
../benchmarkData/SCOPe40/foldseek
-
SSAlign / SSAlign-prefilter:
SCOPe40.benchmark_SSAlign_result.main- output →
../benchmarkData/SCOPe40/SSAlign/SVD{dim}/ssalignand../benchmarkData/SCOPe40/SSAlign/SVD{dim}/ssalign_prefilter
-
Add SCOPe family/superfamily/folderror:
SCOPe40/benckmark_add_lookup.py- output →
../benchmarkData/SCOPe40/tmalign/new05and../benchmarkData/SCOPe40/foldseek/new05and../benchmarkData/SCOPe40/SSAlign/SVD{dim}/new05
-
Prepare NPZ:
SCOPe40/benchmark_cumsum_PR_FP.py- output →
../benchmarkData/SCOPe40/cumsumNpz
-
Plot figures:
SCOPe40/benchmark_plot.py- output →
../benchmarkData/SCOPe40/benchmark
-
Recommended
prefilter_threshold:SCOPe40/cosine_threshold.py
-
SS-Score trainer:
SCOPe40/LinearModel.py
-
SAligner time benchmark:
AFDB50/SAligner_timebenchmark.py
-
Foldseek time benchmark:
AFDB50/foldseek_time_benckmark.sh- logs →
AFDB50/logs/foldseek_processing_times.log - raw results →
../benchmarkData/AFDB50/foldseek/timebenchmark
-
SSAlign time benchmark:
AFDB50/AFDB50_SSAlign_timebenchmark.py- raw results →
../benchmarkData/AFDB50/SSAlign/SVD{dim}/timebenchmark - example command:
python AFDB50_SSAlign_timebechmark.py \
--query_file_list_file .filenames_without_extension.txt \
--faiss_index ../model/SSAlignDB/AFDB50/afdb50_512_IndexFlatIP_faiss.faiss \
--dim 512 \
--mode 1 \
--prefilter_target 2000 \
--prefilter_mode cpu \
--prefilter_threshold 0.3 \
--max_target 1000 \
--out_dir ../benchmarkData/AFDB50/SSAlign/SVD512/timebenchmark \
--cuda_device cuda:1 \
--batch_size 20 \
--nproc 64| tool | Execution Time on CPUs(Seconds) | Execution Time on GPUs(Seconds) |
|---|---|---|
| foldseek easy-search | 325081s | \ |
| SSAlign(preload) | 633.53 | * |
| SSAlign-prefilter | 1621.5 | * |
| SSAlign-SAligner | 1070.84 | * |
| SSAlign | 2715.98s | * |
-
Generate SSAlign-prefilter results:
- run with:
--mode 0 --prefilter_target 2000 --max_target 2000
- run with:
-
Download structure files for both tools:
AFDB50/AFDB50_SSAlign_timebechmark.py
-
Add TM-align scores:
AFDB50/add_TMalign.py- output →
../benchmarkData/AFDB50/
-
Prepare NPZ for plotting:
AFDB50/afdb50_benchmark_cumsum_score.py- output →
../benchmarkData/AFDB50/cumsumNpz
-
Plot cumulative curves:
AFDB50/afdb50_benchmark_plot.py
-
(Optional) Compare statistics tables:
AFDB50/compare_7tools_stats_100.py→ CSVAFDB50/test_100_queries.py→ summary tables
| tool | mean_total_rows | mean_tm_non_na_rows | mean_avg_tmscore | mean_sum_tmscore | mean_avg_RMSD | mean_sum_RMSD | SUM(sum_tmscore) | SUM(sum_RMSD) | overall_avg_tmscore | overall_avg_RMSD |
|---|---|---|---|---|---|---|---|---|---|---|
| foldseek | 1160.77 | 1072.70 | 0.70 | 751.13 | 2.58 | 2750.98 | 75112.60 | 275097.63 | 0.70 | 2.56 |
| ssalign | 1000.00 | 929.45 | 0.77 | 719.12 | 2.46 | 2283.56 | 71911.83 | 228355.67 | 0.77 | 2.46 |
| ssalign_prefilter_2000 | 2000.00 | 1852.97 | 0.71 | 1305.91 | 2.59 | 4802.82 | 130591.18 | 480282.11 | 0.70 | 2.59 |
| tool | mean_total_rows | mean_tm_non_na_rows | mean_avg_tmscore | mean_sum_tmscore | mean_avg_RMSD | mean_sum_RMSD | SUM(sum_tmscore) | SUM(sum_RMSD) | overall_avg_tmscore | overall_avg_RMSD |
|---|---|---|---|---|---|---|---|---|---|---|
| foldseek_except_ssalign | 753.25 | 694.17 | 0.65 | 446.53 | 2.69 | 1922.34 | 44653.13 | 192234.39 | 0.64 | 2.77 |
| ssalign_except_foldseek | 592.48 | 550.92 | 0.76 | 410.55 | 2.52 | 1454.92 | 41055.43 | 145492.43 | 0.75 | 2.64 |
| foldseek_except_ssalign_prefilter_2000 | 527.38 | 485.68 | 0.63 | 295.55 | 2.76 | 1431.70 | 29555.28 | 143170.23 | 0.61 | 2.95 |
| ssalign_prefilter_2000_except_foldseek | 1366.61 | 1265.95 | 0.68 | 850.34 | 2.68 | 3483.55 | 85033.86 | 348354.71 | 0.67 | 2.75 |
- Download SSAlignDB and benchmark intermediates at:
http://bioinfo.isyslab.info/ssalign/download/section/ssalign/
AMPs example,you can see those pdb file in pdbData/specialpdb ,those search result you can also find in benchmark

