Python interface to MHC binding, presentation, immunogenicity, and antigen processing predictors.
pip install mhctoolsFor MHCflurry support, also run:
mhcflurry-downloads fetchfrom mhctools import NetMHCpan41
predictor = NetMHCpan41(alleles=["HLA-A*02:01", "HLA-B*07:02"])
# predict() returns a list of PeptideResult — one per peptide
results = predictor.predict(["SIINFEKL", "GILGFVFTL"])
for r in results:
if r.affinity:
print(f"{r.peptide} -> {r.affinity.allele} IC50={r.affinity.value:.1f}nM")predict() returns a list of PeptideResult — one per peptide. Each
result carries the peptide string and provides accessors for each
prediction kind (affinity, presentation, stability, etc.). Accessors
return None when a predictor doesn't produce that kind.
results = predictor.predict(["SIINFEKL", "GILGFVFTL"])
r = results[0]
r.peptide # "SIINFEKL"
r.affinity.value # IC50 in nM
r.affinity.percentile_rank # 0-100, lower = better
r.affinity.allele # best allele for this kind
r.presentation # None if predictor doesn't produce itUnder the hood, each PeptideResult wraps a tuple of Prediction objects —
frozen dataclasses, one per allele-kind combination. Everything converts
to DataFrames with consistent column names.
from mhctools import NetMHCpan41
predictor = NetMHCpan41(alleles=["HLA-A*02:01", "HLA-B*07:02"])
results = predictor.predict(["SIINFEKL", "GILGFVFTL"])
r = results[0]
r.peptide # "SIINFEKL"
r.offset # position in source protein (if scanned)
r.kinds # {"pMHC_affinity", "pMHC_presentation"}
r.alleles # {"HLA-A*02:01", "HLA-B*07:02"}
# best prediction by kind — None when the kind is absent
r.affinity # Prediction or None
r.presentation # Prediction or None
r.stability # None (predictor doesn't produce it)
if r.affinity:
r.affinity.value # IC50 in nM
r.affinity.percentile_rank # 0-100, lower = better
r.affinity.score # ~0-1, higher = better
r.affinity.allele # best allele for this kind
# by rank instead of score
r.best_affinity_by_rank # Prediction with lowest percentile rank, or None
# all predictions
r.preds # tuple of all Prediction objects
r.filter(kind="pMHC_affinity")
r.filter(allele="HLA-A*02:01")NetMHCpan 4.1 automatically emits both pMHC_affinity and pMHC_presentation
predictions per peptide-allele pair.
predict_proteins() takes a dictionary of protein sequences and returns
{sequence_name: list[PeptideResult]}:
proteins = predictor.predict_proteins(
{"TP53": "MEEPQSDPSVEPPLSQETFS...", "KRAS": "MTEYKLVVVGAGGVGKS..."},
peptide_lengths=[9, 10],
)
for r in proteins["TP53"]:
if r.affinity and r.affinity.value < 500:
print(f" offset={r.offset} {r.peptide} IC50={r.affinity.value:.0f}")Every level has a _dataframe variant that flattens to a pandas DataFrame
with consistent columns:
df = predictor.predict_dataframe(["SIINFEKL"], sample_name="pat001")
df = predictor.predict_proteins_dataframe({"TP53": "MEEPQ..."}, sample_name="pat001")Columns: sample_name, peptide, n_flank, c_flank,
source_sequence_name, offset, predictor_name, predictor_version,
allele, kind, score, value, percentile_rank.
MultiSample runs a predictor across multiple samples, each with its own
HLA genotype:
from mhctools import MultiSample, NetMHCpan41
ms = MultiSample(
samples={
"pat001": ["HLA-A*02:01", "HLA-B*07:02"],
"pat002": ["HLA-A*01:01", "HLA-B*08:01"],
},
predictor_class=NetMHCpan41,
)
# {sample_name: list[PeptideResult]}
results = ms.predict(["SIINFEKL", "GILGFVFTL"])
# {sample_name: {seq_name: list[PeptideResult]}}
protein_results = ms.predict_proteins({"TP53": "MEEPQ..."})
# flat DataFrames with sample_name column
df = ms.predict_dataframe(["SIINFEKL"])
df = ms.predict_proteins_dataframe({"TP53": "MEEPQ..."})Each Prediction has a kind string describing what it measures:
| Kind | Meaning |
|---|---|
pMHC_affinity |
Peptide-MHC binding affinity |
pMHC_presentation |
Likelihood of surface presentation (EL/processing) |
pMHC_stability |
Peptide-MHC complex stability |
immunogenicity |
T-cell immunogenicity |
antigen_processing |
Combined processing score |
proteasome_cleavage |
Proteasomal cleavage score |
tap_transport |
TAP transport score (reserved, not yet used) |
erap_trimming |
ERAP trimming score (reserved, not yet used) |
Every prediction is a frozen, self-contained Prediction dataclass:
from mhctools import Prediction
pred = Prediction(
kind="pMHC_affinity",
score=0.85, # ~0-1, higher = better
peptide="SIINFEKL",
allele="HLA-A*02:01",
value=120.5, # IC50 in nM
percentile_rank=0.8,
source_sequence_name="TP53",
offset=42,
predictor_name="netMHCpan",
predictor_version="4.1",
)score is always higher-is-better. value is in native units (nM for
affinity, hours for stability). percentile_rank is always optional,
0-100, lower = stronger.
| Predictor | Kinds produced | Requires |
|---|---|---|
NetMHCpan / NetMHCpan41 / NetMHCpan42 |
affinity + presentation | NetMHCpan |
NetMHCpan4 |
affinity or presentation | NetMHCpan 4.0 |
NetMHCpan3 / NetMHCpan28 |
affinity | older NetMHCpan |
NetMHC / NetMHC3 / NetMHC4 |
affinity | NetMHC |
NetMHCIIpan / NetMHCIIpan43 |
affinity or presentation | NetMHCIIpan |
NetMHCcons |
affinity | NetMHCcons |
NetMHCstabpan |
stability | NetMHCstabpan |
MHCflurry |
affinity + presentation | pip install mhcflurry + mhcflurry-downloads fetch |
MHCflurry_Affinity |
affinity | pip install mhcflurry + mhcflurry-downloads fetch |
BigMHC |
presentation or immunogenicity | BigMHC clone (set BIGMHC_DIR) |
MixMHCpred |
presentation | MixMHCpred |
IedbNetMHCpan / IedbSMM / IedbNetMHCIIpan |
affinity | IEDB web API |
RandomBindingPredictor |
affinity | (built-in) |
| Predictor | Kinds produced | Requires |
|---|---|---|
Pepsickle |
proteasome cleavage | pip install pepsickle (paper) |
NetChop |
proteasome cleavage | NetChop |
Processing predictors use configurable scoring to aggregate per-position
cleavage probabilities into peptide-level scores. See ProcessingPredictor
and ProteasomePredictor for details.
mhctools --sequence SIINFEKL SIINFEKLQ --mhc-predictor netmhc --mhc-alleles A0201mhctools --sequence AAAQQQSIINFEKL --extract-subsequences --mhc-peptide-lengths 8-10 --mhc-predictor mhcflurry --mhc-alleles A0201The old predict_peptides() and predict_subsequences() methods still work
and return BindingPredictionCollection objects:
predictor = NetMHCpan(alleles=["A*02:01"])
collection = predictor.predict_subsequences(
{"1L2Y": "NLYIQWLKDGGPSSGRPPPS"},
peptide_lengths=[9],
)
df = collection.to_dataframe()
for bp in collection:
if bp.affinity < 100:
print("Strong binder: %s" % bp)To convert legacy results to the new types:
preds = collection.to_preds() # list of Prediction
pp_list = collection.to_peptide_preds() # list of PeptideResult