tcren — structure-based prediction of TCR–epitope recognition

tcren

tcren — structure-based prediction of TCR–epitope recognition

TCRen predicts which epitopes a T-cell receptor recognises from a single TCR–peptide–MHC structure (experimental or modelled). It extracts the TCR–peptide contact map and scores every candidate peptide with a residue-level statistical potential derived from contact preferences in TCR:pMHC crystal structures — answering not "what fancy complex can a model draw?" but "is this binding physically plausible?".

This is a documented, tested, CLI-driven Python library. TCR chains are annotated with the sibling arda; MHC chains are mapped and the groove partitioned against a curated reference; structures are oriented into one canonical frame; and the original contact maps, potential, and scores are reproduced numerically (validated against committed oracles to floating-point precision).

While the original tcren focused on TCR:peptide contacts, the new version brings in features to score TCR:MHC and peptide:MHC interactions, required to get full picture of TCR:pMHC binding mechanics and estimate ddG values.

Install

pip install tcren          # from PyPI — binary wheels ship the C++ extension; pulls in arda-mapper

For development (editable install, conda env with the build toolchain, and the reference data fetched into data/):

bash setup.sh              # creates the `tcren` conda env, installs arda + tcren, fetches data/
conda activate tcren

tcren ships a small pybind11/C++ extension (tcren._align) for the MHC-pseudosequence fitting-alignment hot path, built on install by scikit-build-core (a Biopython fallback runs if it is not built). TCR annotation is provided by arda, a runtime dependency published to PyPI as arda-mapper (it imports as arda); pip/setup.sh pull it automatically. setup.sh also runs tcren fetch-data to populate data/ with the reference structure sets (Native2026, Canonical2026) used by orient/superimpose (set TCREN_NO_FETCH=1 to skip).

Command line

# Full pipeline: annotate -> superimpose -> resmarkup / canonical Cα / contacts -> per-interface
# energies (TCRen for TCR↔peptide, MJ for TCR↔MHC and peptide↔MHC) + total
tcren pipeline -s complex.pdb -o scores.csv

# End-to-end candidate-epitope scoring from a structure
tcren score -s complex.pdb -c candidates.txt -o ranked.csv

# Substitute a peptide and refine its pose (knowledge-based MC scored by the DOPE atom-level
# statistical potential — independent of the TCRen/MJ scoring potentials, restrained to the input).
# Not physics relaxation — use Rosetta FlexPepDock for that.
tcren refine -s complex.pdb -o refined/ --substitute KQWLVWLFL

# Structures: any of .pdb / .cif / .pdb.gz / .cif.gz, a directory, or a .tar.gz batch
tcren contacts -s batch.tar.gz -o contacts.csv --interface tcr_peptide

# Per-residue markup: TCR (CDR/FR) + MHC groove (helix/floor) + peptide in one table.
# --regions all|tcr|mhc|peptide filters; --pseudo also marks NetMHCpan groove residues (MPS).
tcren annotate -s complex.cif.gz -o markup.csv --regions mhc --pseudo

# Superimpose structure(s) onto the canonical frame, by MHC, against the canonical database
# (data/Canonical2026, fetched at install). Detects MHC class + species and averages the
# superposition over every database structure of that class/species. Chains -> A=Vα B=Vβ
# C=peptide D=MHCα E=MHCβ/β2m. -s takes a file / directory / .tar.gz / glob; -o is a directory,
# or a single structure file (one input) whose extension must match --mmCIF/--compress; -t threads.
tcren superimpose -s complex.pdb -o oriented.pdb           # single file
tcren superimpose -s 'data/*.pdb' -o oriented/ -t 8        # glob -> directory, threaded

# Build a canonical database from native complexes (how Canonical2026 is produced). Annotation
# is one batched mmseqs call; -t threads only the structural alignment + write.
tcren orient -s data/Native2026 -o data/Canonical2026 -t 8

# Structure outputs are plain .pdb by default; add --mmCIF for .cif and --compress for .gz.
tcren superimpose -s complex.pdb -o oriented/ --mmCIF --compress   # -> oriented/<id>.cif.gz

# Fetch recent TCR-pMHC structures from RCSB -> data/pdb_recent (mmCIF .cif.gz, 5-chain validated)
tcren fetch-recent --discover --after 2024-01-01

# Build the MHC reference once (IMGT/HLA + mouse H-2; cached, not committed)
tcren build-mhc-ref

tcren info
tcren --install-completion        # shell tab-completion (bash/zsh/fish)

tcren orient and tcren superimpose need the reference sets in data/ (Native2026, Canonical2026); setup.sh fetches them at install via tcren fetch-data (re-run it any time).

Library

from tcren import run_pipeline, parse_structure, import_structure, ContactMap, score_peptides
from tcren.annotation import classify_chains
from tcren.potential import tcren

# One call: annotate -> superimpose -> contacts -> per-interface energies + total
res = run_pipeline("complex.pdb")              # res.scores, res.markup, res.contacts, res.oriented

# …or the individual steps:
s = parse_structure("complex.pdb.gz")          # also .cif/.cif.gz; import_structure trims the C-gene
classify_chains(s, organism="human")           # TRA/TRB via arda, peptide, MHC
cm = ContactMap.from_structure(s)              # 5 Å contacts + interface partitioning
ranked = score_peptides(cm, ["KQWLVWLFL", "RLLHPHHPL"], tcren())

Batch inputs, gzip, archives

from tcren.structure import iter_structures
for pdb_id, structure in iter_structures("batch.tar.gz"):   # file | directory | .tar.gz
    classify_chains(structure, organism="human")
    ...

Canonical orientation, contacts, docking geometry

from tcren.mhc import annotate_mhc
from tcren.orient import canonicalize_structure, superimpose, docking_angles
from tcren.contacts import multi_contacts, ContactDefinition

annotate_mhc(s)
oriented, info = canonicalize_structure(s)     # frame: z=MHC→TCR, y=peptide, x=thin; chains A–E
oriented, info = superimpose(s)                # orient onto data/Canonical2026 by MHC (class+species ensemble)
layers = multi_contacts(s, ContactDefinition(d1=5, d2=8, d3=12))   # heavy-atom / Cβ / Cα
d = docking_angles(s)                          # crossing (~20–70° αβ) + incident angle

2D complementarity maps & region-pair contacts

from tcren.project2d import (project_structure, residue_markup_table, contacts_table,
                             region_pair_summary)
from tcren.viz import render_complementarity_map, view_pocket_cdr

proj = project_structure(s)                                   # canonical groove plane
svg  = render_complementarity_map(residue_markup_table(s, proj),
                                  contacts=contacts_table(s, threshold=5.0))
region_pair_summary(s, kind="closest")        # contacts per region pair + bond types (cb/ca too)
view_pocket_cdr(s).show()                      # interactive 3D pocket + CDR overlay (py3Dmol)

Data

Structures live in the Hugging Face dataset isalgo/tcren_structures, all gzipped:

folder	contents
`Native2022`	the 2022 paper set (oracle)
`Native2026`	the comprehensive 2026 TCR:pMHC set the current potential is derived from
`Canonical2026`	`Native2026` re-oriented into the canonical frame (`tcren orient`)

tcren reads .pdb/.cif/.pdb.gz/.cif.gz and .tar.gz batches; an installed library lazily fetches the canonical reference structures from the Hub when orienting a new complex. The root data/ holds Native2026 (+ Canonical2026, gitignored, fetched on demand), PDB_date.tsv, orient_metadata.json, and TCRen_potential.csv — the current potential derived from the Native2026 set (use it with tcren score -p data/TCRen_potential.csv).

Notebooks

Runnable examples under notebooks/ (rendered in the docs):

complementarity_map_2d — 2D interface maps, multiple structural + map views of 1ao7
contact_thresholds_and_bondtypes — region-pair contact counts (closest/Cβ/Cα) + bond types
canonical_frame_figures — canonical-frame QC across the Native2026 set
pymol_canonical_figures — ray-traced PyMOL panels (overlay, groove, interface) by class/species
mhc_pseudosequence_mps — NetMHCpan MHC pseudosequence (MPS) residues vs. peptide contacts
example_gil_a02_rs_motif — GILGFVFTL/HLA-A*02 and the public CDR3β Arg–Ser motif
natcompsci2022/ — full reproduction of the Nat Comput Sci 2022 analyses

Performance

Per-stage timings on a TCR-pMHC complex (1ao7), Apple M3, single thread (RUN_BENCHMARK=1 pytest -k benchmark -s to reproduce):

stage	time	notes
parse a gzipped structure	~19 ms	`.pdb.gz` / `.cif.gz`
contact map (5 Å, cKDTree)	~9 ms	per structure
score 1000 candidate peptides	~8 ms	~8 µs/peptide (vectorised)
annotate (TCR + MHC), batched	~213 ms/structure	one mmseqs2 call for the whole set; vs ~1.5 s/structure unbatched
peak RSS, single-structure pipeline	~195 MB

Annotation is the only network/compute-heavy step and is always batched (one mmseqs2 search over all chains; mmseqs2 parallelises internally — never per-structure, never Python-threaded). Threads are used only for the embarrassingly-parallel, mmseqs-free stages (structural alignment, write, rendering): tcren orient -t N.

Tests

pytest -m "not slow"          # unit + fast regression (the CI gate)
pytest                        # add the arda/mmseqs-backed regression tests
RUN_BENCHMARK=1 pytest -k benchmark -s

Methods appendix

The coordinate-level extensions — backbone-preserving peptide substitution and the potential-guided Monte-Carlo refinement kernel (energy function, the restraint-necessity argument, sampler, and citations) — are written up in the technical appendix appendix/tcren.tex (built with make -C appendix → appendix/tcren.pdf).

Citing

TCRen is free for academic and non-commercial use. If you use it, please cite our latest Nature Computational Science 2024 paper:

Karnaukhov VK, Shcherbinin DS, Chugunov AO, Chudakov DM, Efremov RG, Zvyagin IV, Shugay M. Structure-based prediction of T cell receptor recognition of unseen epitopes using TCRen. Nat Comput Sci. 2024 Jul;4(7):510-521. doi: 10.1038/s43588-024-00653-0. Epub 2024 Jul 10. PMID: 38987378.

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
.github/workflows		.github/workflows
appendix		appendix
assets		assets
data		data
docs		docs
notebooks		notebooks
scripts		scripts
skills/tcren		skills/tcren
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
BENCHMARKS.md		BENCHMARKS.md
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
STATUS.md		STATUS.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

tcren — structure-based prediction of TCR–epitope recognition

Install

Command line

Library

Batch inputs, gzip, archives

Canonical orientation, contacts, docking geometry

2D complementarity maps & region-pair contacts

Data

Notebooks

Performance

Tests

Methods appendix

Citing

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

tcren — structure-based prediction of TCR–epitope recognition

Install

Command line

Library

Batch inputs, gzip, archives

Canonical orientation, contacts, docking geometry

2D complementarity maps & region-pair contacts

Data

Notebooks

Performance

Tests

Methods appendix

Citing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages