TCRen predicts which epitopes a T-cell receptor recognises from a single TCR–peptide–MHC structure (experimental or modelled). It extracts the TCR–peptide contact map and scores every candidate peptide with a residue-level statistical potential derived from contact preferences in TCR:pMHC crystal structures — answering not "what fancy complex can a model draw?" but "is this binding physically plausible?".
This is a documented, tested, CLI-driven Python library. TCR chains are annotated with the sibling
arda; MHC chains are mapped and the groove partitioned
against a curated reference; structures are oriented into one canonical frame; and the original
contact maps, potential, and scores are reproduced numerically (validated against committed oracles
to floating-point precision).
While the original tcren focused on TCR:peptide contacts, the new version brings in features to score TCR:MHC and peptide:MHC interactions, required to get full picture of TCR:pMHC binding mechanics and estimate ddG values.
pip install tcren # from PyPI — binary wheels ship the C++ extension; pulls in arda-mapperFor development (editable install, conda env with the build toolchain, and the reference data
fetched into data/):
bash setup.sh # creates the `tcren` conda env, installs arda + tcren, fetches data/
conda activate tcrentcren ships a small pybind11/C++ extension (tcren._align) for the MHC-pseudosequence
fitting-alignment hot path, built on install by scikit-build-core (a Biopython fallback runs if
it is not built). TCR annotation is provided by
arda, a runtime dependency published to PyPI as
arda-mapper (it imports as arda); pip/setup.sh
pull it automatically. setup.sh also runs tcren fetch-data to populate data/ with the
reference structure sets (Native2026, Canonical2026) used by orient/superimpose (set
TCREN_NO_FETCH=1 to skip).
# Full pipeline: annotate -> superimpose -> resmarkup / canonical Cα / contacts -> per-interface
# energies (TCRen for TCR↔peptide, MJ for TCR↔MHC and peptide↔MHC) + total
tcren pipeline -s complex.pdb -o scores.csv
# End-to-end candidate-epitope scoring from a structure
tcren score -s complex.pdb -c candidates.txt -o ranked.csv
# Substitute a peptide and refine its pose (knowledge-based MC scored by the DOPE atom-level
# statistical potential — independent of the TCRen/MJ scoring potentials, restrained to the input).
# Not physics relaxation — use Rosetta FlexPepDock for that.
tcren refine -s complex.pdb -o refined/ --substitute KQWLVWLFL
# Structures: any of .pdb / .cif / .pdb.gz / .cif.gz, a directory, or a .tar.gz batch
tcren contacts -s batch.tar.gz -o contacts.csv --interface tcr_peptide
# Per-residue markup: TCR (CDR/FR) + MHC groove (helix/floor) + peptide in one table.
# --regions all|tcr|mhc|peptide filters; --pseudo also marks NetMHCpan groove residues (MPS).
tcren annotate -s complex.cif.gz -o markup.csv --regions mhc --pseudo
# Superimpose structure(s) onto the canonical frame, by MHC, against the canonical database
# (data/Canonical2026, fetched at install). Detects MHC class + species and averages the
# superposition over every database structure of that class/species. Chains -> A=Vα B=Vβ
# C=peptide D=MHCα E=MHCβ/β2m. -s takes a file / directory / .tar.gz / glob; -o is a directory,
# or a single structure file (one input) whose extension must match --mmCIF/--compress; -t threads.
tcren superimpose -s complex.pdb -o oriented.pdb # single file
tcren superimpose -s 'data/*.pdb' -o oriented/ -t 8 # glob -> directory, threaded
# Build a canonical database from native complexes (how Canonical2026 is produced). Annotation
# is one batched mmseqs call; -t threads only the structural alignment + write.
tcren orient -s data/Native2026 -o data/Canonical2026 -t 8
# Structure outputs are plain .pdb by default; add --mmCIF for .cif and --compress for .gz.
tcren superimpose -s complex.pdb -o oriented/ --mmCIF --compress # -> oriented/<id>.cif.gz
# Fetch recent TCR-pMHC structures from RCSB -> data/pdb_recent (mmCIF .cif.gz, 5-chain validated)
tcren fetch-recent --discover --after 2024-01-01
# Build the MHC reference once (IMGT/HLA + mouse H-2; cached, not committed)
tcren build-mhc-ref
tcren info
tcren --install-completion # shell tab-completion (bash/zsh/fish)tcren orient and tcren superimpose need the reference sets in data/ (Native2026,
Canonical2026); setup.sh fetches them at install via tcren fetch-data (re-run it any time).
from tcren import run_pipeline, parse_structure, import_structure, ContactMap, score_peptides
from tcren.annotation import classify_chains
from tcren.potential import tcren
# One call: annotate -> superimpose -> contacts -> per-interface energies + total
res = run_pipeline("complex.pdb") # res.scores, res.markup, res.contacts, res.oriented
# …or the individual steps:
s = parse_structure("complex.pdb.gz") # also .cif/.cif.gz; import_structure trims the C-gene
classify_chains(s, organism="human") # TRA/TRB via arda, peptide, MHC
cm = ContactMap.from_structure(s) # 5 Å contacts + interface partitioning
ranked = score_peptides(cm, ["KQWLVWLFL", "RLLHPHHPL"], tcren())from tcren.structure import iter_structures
for pdb_id, structure in iter_structures("batch.tar.gz"): # file | directory | .tar.gz
classify_chains(structure, organism="human")
...from tcren.mhc import annotate_mhc
from tcren.orient import canonicalize_structure, superimpose, docking_angles
from tcren.contacts import multi_contacts, ContactDefinition
annotate_mhc(s)
oriented, info = canonicalize_structure(s) # frame: z=MHC→TCR, y=peptide, x=thin; chains A–E
oriented, info = superimpose(s) # orient onto data/Canonical2026 by MHC (class+species ensemble)
layers = multi_contacts(s, ContactDefinition(d1=5, d2=8, d3=12)) # heavy-atom / Cβ / Cα
d = docking_angles(s) # crossing (~20–70° αβ) + incident anglefrom tcren.project2d import (project_structure, residue_markup_table, contacts_table,
region_pair_summary)
from tcren.viz import render_complementarity_map, view_pocket_cdr
proj = project_structure(s) # canonical groove plane
svg = render_complementarity_map(residue_markup_table(s, proj),
contacts=contacts_table(s, threshold=5.0))
region_pair_summary(s, kind="closest") # contacts per region pair + bond types (cb/ca too)
view_pocket_cdr(s).show() # interactive 3D pocket + CDR overlay (py3Dmol)Structures live in the Hugging Face dataset
isalgo/tcren_structures, all gzipped:
| folder | contents |
|---|---|
Native2022 |
the 2022 paper set (oracle) |
Native2026 |
the comprehensive 2026 TCR:pMHC set the current potential is derived from |
Canonical2026 |
Native2026 re-oriented into the canonical frame (tcren orient) |
tcren reads .pdb/.cif/.pdb.gz/.cif.gz and .tar.gz batches; an installed library lazily
fetches the canonical reference structures from the Hub when orienting a new complex. The root
data/ holds Native2026 (+ Canonical2026, gitignored, fetched on demand), PDB_date.tsv,
orient_metadata.json, and TCRen_potential.csv — the current potential derived from the
Native2026 set (use it with tcren score -p data/TCRen_potential.csv).
Runnable examples under notebooks/ (rendered in the
docs):
complementarity_map_2d— 2D interface maps, multiple structural + map views of 1ao7contact_thresholds_and_bondtypes— region-pair contact counts (closest/Cβ/Cα) + bond typescanonical_frame_figures— canonical-frame QC across the Native2026 setpymol_canonical_figures— ray-traced PyMOL panels (overlay, groove, interface) by class/speciesmhc_pseudosequence_mps— NetMHCpan MHC pseudosequence (MPS) residues vs. peptide contactsexample_gil_a02_rs_motif— GILGFVFTL/HLA-A*02 and the public CDR3β Arg–Ser motifnatcompsci2022/— full reproduction of the Nat Comput Sci 2022 analyses
Per-stage timings on a TCR-pMHC complex (1ao7), Apple M3, single thread (RUN_BENCHMARK=1 pytest -k benchmark -s to reproduce):
| stage | time | notes |
|---|---|---|
| parse a gzipped structure | ~19 ms | .pdb.gz / .cif.gz |
| contact map (5 Å, cKDTree) | ~9 ms | per structure |
| score 1000 candidate peptides | ~8 ms | ~8 µs/peptide (vectorised) |
| annotate (TCR + MHC), batched | ~213 ms/structure | one mmseqs2 call for the whole set; vs ~1.5 s/structure unbatched |
| peak RSS, single-structure pipeline | ~195 MB |
Annotation is the only network/compute-heavy step and is always batched (one mmseqs2 search over
all chains; mmseqs2 parallelises internally — never per-structure, never Python-threaded). Threads are
used only for the embarrassingly-parallel, mmseqs-free stages (structural alignment, write, rendering):
tcren orient -t N.
pytest -m "not slow" # unit + fast regression (the CI gate)
pytest # add the arda/mmseqs-backed regression tests
RUN_BENCHMARK=1 pytest -k benchmark -sThe coordinate-level extensions — backbone-preserving peptide substitution and the potential-guided
Monte-Carlo refinement kernel (energy function, the restraint-necessity argument, sampler, and
citations) — are written up in the technical appendix appendix/tcren.tex
(built with make -C appendix → appendix/tcren.pdf).
TCRen is free for academic and non-commercial use. If you use it, please cite our latest Nature Computational Science 2024 paper:
Karnaukhov VK, Shcherbinin DS, Chugunov AO, Chudakov DM, Efremov RG, Zvyagin IV, Shugay M. Structure-based prediction of T cell receptor recognition of unseen epitopes using TCRen. Nat Comput Sci. 2024 Jul;4(7):510-521. doi: 10.1038/s43588-024-00653-0. Epub 2024 Jul 10. PMID: 38987378.