Skip to content

openvax/mhctools

Repository files navigation

Tests PyPI

mhctools

Python interface to MHC binding, presentation, immunogenicity, and antigen processing predictors.

Installation

pip install mhctools

For MHCflurry support, also run:

mhcflurry-downloads fetch

Quick start

from mhctools import NetMHCpan41

predictor = NetMHCpan41(alleles=["HLA-A*02:01", "HLA-B*07:02"])

# predict() returns a list of PeptideResult — one per peptide
results = predictor.predict(["SIINFEKL", "GILGFVFTL"])

for r in results:
    if r.affinity:
        print(f"{r.peptide} -> {r.affinity.allele} IC50={r.affinity.value:.1f}nM")

Data model

predict() returns a list of PeptideResult — one per peptide. Each result carries the peptide string and provides accessors for each prediction kind (affinity, presentation, stability, etc.). Accessors return None when a predictor doesn't produce that kind.

results = predictor.predict(["SIINFEKL", "GILGFVFTL"])
r = results[0]

r.peptide                    # "SIINFEKL"
r.affinity.value             # IC50 in nM
r.affinity.percentile_rank   # 0-100, lower = better
r.affinity.allele            # best allele for this kind
r.presentation               # None if predictor doesn't produce it

Under the hood, each PeptideResult wraps a tuple of Prediction objects — frozen dataclasses, one per allele-kind combination. Everything converts to DataFrames with consistent column names.

Python API

Predicting peptides

from mhctools import NetMHCpan41

predictor = NetMHCpan41(alleles=["HLA-A*02:01", "HLA-B*07:02"])
results = predictor.predict(["SIINFEKL", "GILGFVFTL"])

r = results[0]
r.peptide                      # "SIINFEKL"
r.offset                       # position in source protein (if scanned)
r.kinds                        # {"pMHC_affinity", "pMHC_presentation"}
r.alleles                      # {"HLA-A*02:01", "HLA-B*07:02"}

# best prediction by kind — None when the kind is absent
r.affinity                     # Prediction or None
r.presentation                 # Prediction or None
r.stability                    # None (predictor doesn't produce it)

if r.affinity:
    r.affinity.value            # IC50 in nM
    r.affinity.percentile_rank  # 0-100, lower = better
    r.affinity.score            # ~0-1, higher = better
    r.affinity.allele           # best allele for this kind

# by rank instead of score
r.best_affinity_by_rank        # Prediction with lowest percentile rank, or None

# all predictions
r.preds                        # tuple of all Prediction objects
r.filter(kind="pMHC_affinity")
r.filter(allele="HLA-A*02:01")

NetMHCpan 4.1 automatically emits both pMHC_affinity and pMHC_presentation predictions per peptide-allele pair.

Scanning proteins

predict_proteins() takes a dictionary of protein sequences and returns {sequence_name: list[PeptideResult]}:

proteins = predictor.predict_proteins(
    {"TP53": "MEEPQSDPSVEPPLSQETFS...", "KRAS": "MTEYKLVVVGAGGVGKS..."},
    peptide_lengths=[9, 10],
)

for r in proteins["TP53"]:
    if r.affinity and r.affinity.value < 500:
        print(f"  offset={r.offset} {r.peptide} IC50={r.affinity.value:.0f}")

DataFrames

Every level has a _dataframe variant that flattens to a pandas DataFrame with consistent columns:

df = predictor.predict_dataframe(["SIINFEKL"], sample_name="pat001")
df = predictor.predict_proteins_dataframe({"TP53": "MEEPQ..."}, sample_name="pat001")

Columns: sample_name, peptide, n_flank, c_flank, source_sequence_name, offset, predictor_name, predictor_version, allele, kind, score, value, percentile_rank.

Multi-sample predictions

MultiSample runs a predictor across multiple samples, each with its own HLA genotype:

from mhctools import MultiSample, NetMHCpan41

ms = MultiSample(
    samples={
        "pat001": ["HLA-A*02:01", "HLA-B*07:02"],
        "pat002": ["HLA-A*01:01", "HLA-B*08:01"],
    },
    predictor_class=NetMHCpan41,
)

# {sample_name: list[PeptideResult]}
results = ms.predict(["SIINFEKL", "GILGFVFTL"])

# {sample_name: {seq_name: list[PeptideResult]}}
protein_results = ms.predict_proteins({"TP53": "MEEPQ..."})

# flat DataFrames with sample_name column
df = ms.predict_dataframe(["SIINFEKL"])
df = ms.predict_proteins_dataframe({"TP53": "MEEPQ..."})

Measurement kinds

Each Prediction has a kind string describing what it measures:

Kind Meaning
pMHC_affinity Peptide-MHC binding affinity
pMHC_presentation Likelihood of surface presentation (EL/processing)
pMHC_stability Peptide-MHC complex stability
immunogenicity T-cell immunogenicity
antigen_processing Combined processing score
proteasome_cleavage Proteasomal cleavage score
tap_transport TAP transport score (reserved, not yet used)
erap_trimming ERAP trimming score (reserved, not yet used)

The Prediction object

Every prediction is a frozen, self-contained Prediction dataclass:

from mhctools import Prediction

pred = Prediction(
    kind="pMHC_affinity",
    score=0.85,           # ~0-1, higher = better
    peptide="SIINFEKL",
    allele="HLA-A*02:01",
    value=120.5,          # IC50 in nM
    percentile_rank=0.8,
    source_sequence_name="TP53",
    offset=42,
    predictor_name="netMHCpan",
    predictor_version="4.1",
)

score is always higher-is-better. value is in native units (nM for affinity, hours for stability). percentile_rank is always optional, 0-100, lower = stronger.

Supported predictors

MHC binding & presentation

Predictor Kinds produced Requires
NetMHCpan / NetMHCpan41 / NetMHCpan42 affinity + presentation NetMHCpan
NetMHCpan4 affinity or presentation NetMHCpan 4.0
NetMHCpan3 / NetMHCpan28 affinity older NetMHCpan
NetMHC / NetMHC3 / NetMHC4 affinity NetMHC
NetMHCIIpan / NetMHCIIpan43 affinity or presentation NetMHCIIpan
NetMHCcons affinity NetMHCcons
NetMHCstabpan stability NetMHCstabpan
MHCflurry affinity + presentation pip install mhcflurry + mhcflurry-downloads fetch
MHCflurry_Affinity affinity pip install mhcflurry + mhcflurry-downloads fetch
BigMHC presentation or immunogenicity BigMHC clone (set BIGMHC_DIR)
MixMHCpred presentation MixMHCpred
IedbNetMHCpan / IedbSMM / IedbNetMHCIIpan affinity IEDB web API
RandomBindingPredictor affinity (built-in)

Antigen processing

Predictor Kinds produced Requires
Pepsickle proteasome cleavage pip install pepsickle (paper)
NetChop proteasome cleavage NetChop

Processing predictors use configurable scoring to aggregate per-position cleavage probabilities into peptide-level scores. See ProcessingPredictor and ProteasomePredictor for details.

Commandline examples

Prediction for user-supplied peptide sequences

mhctools --sequence SIINFEKL SIINFEKLQ --mhc-predictor netmhc --mhc-alleles A0201

Automatically extract peptides as subsequences of specified length

mhctools --sequence AAAQQQSIINFEKL --extract-subsequences --mhc-peptide-lengths 8-10 --mhc-predictor mhcflurry --mhc-alleles A0201

Legacy API

The old predict_peptides() and predict_subsequences() methods still work and return BindingPredictionCollection objects:

predictor = NetMHCpan(alleles=["A*02:01"])
collection = predictor.predict_subsequences(
    {"1L2Y": "NLYIQWLKDGGPSSGRPPPS"},
    peptide_lengths=[9],
)
df = collection.to_dataframe()

for bp in collection:
    if bp.affinity < 100:
        print("Strong binder: %s" % bp)

To convert legacy results to the new types:

preds = collection.to_preds()           # list of Prediction
pp_list = collection.to_peptide_preds() # list of PeptideResult

About

Python interface to running command-line and web-based MHC binding predictors

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors