Releases: antigenomics/seqtree
v0.2.0 — Miyazawa-Jernigan structural matrix
Highlights
structural is now a Miyazawa-Jernigan interaction-strength matrix. Each residue's interaction strength q(a) = mean_b e(a,b) is read off the MJ contact potential, and sim(a,b) = 10·(1 − |q̂(a) − q̂(b)|). Substitutions between residues of like interaction strength are cheap, so the matrix separates strong hydrophobic interactors (F W C L Y M I V) from weak polar/charged ones (S Q D E K) — the strong/weak-interactor axis of TCR-recognition models (Košmrlj et al., PNAS 2008, doi:10.1073/pnas.0808081105; MJ energies from Miyazawa & Jernigan, J Mol Biol 1996). This lets seqtrie align dissimilar-but-chemically-equivalent loops.
Other
bench/gen_matrices.pyis now self-contained (embeds the MJ contact matrix; drops the texshade.sty/kpsewhich build dependency).- Docs, header comments, and test anchors updated.
Full diff: v0.1.0...v0.2.0
seqtree 0.1.0
First non-beta release.
- Add
SubstitutionMatrix.penalty(a, b)Python API: char-based Gram-distance substitution penalty (0 on identity, larger for dissimilar pairs), the same per-position cost the seqtm/seqtrie engines apply. Enables downstream MI-weighted BLOSUM scoring (mhcmatch).
Builds wheels (cp310-cp313; Linux/macOS/Windows) and publishes to PyPI via CI.
seqtree v0.0.3
seqtree v0.0.3 — pMHC epitope search, position-aware scoring, KmerIndex, MHC-allele guessing, and a reproducible benchmark pipeline.
Highlights since v0.0.2:
- pMHC epitope homology layer (anchor-masked k-mers, mimics, allele assignment) and C++ KmerIndex seed-and-gather.
- Position-aware scoring (PositionalMatrix) and local mode.
- Control-set E-values with Elhanati selection factor; MHC-I/II ROC-PR guessing benchmark + non-binder filter.
- PAM50 + custom (Gram-distance) substitution matrices; seqtm collision metric.
- Reproducible benchmark pipeline: deterministic table producers (shell+python) → plot scripts → committed oracle, with CI oracle-diff and time/memory regression checks.
Wheels (cp310–cp313; Linux x86-64, macOS arm64, Windows x86-64) and the sdist are published to PyPI by the Publish workflow.
seqtree v0.0.2
Second release of seqtree — fast fuzzy search over biological sequences (C++ core, Python bindings).
Highlights
- Scoring: principled Gram→squared-distance penalty
pen[a][b] = s_aa + s_bb − 2·s_abfor BLOSUM62/PAM50. - Index serialization:
Index.save()/Index.load()(flat, fast reload). - Control-set E-values (
seqtree.evalues,seqtree.load_control): TCRNET-style neighbour counting against a real background, with a rigorous derivation inappendix/evalue.tex(Poisson/Chen–Stein bound, clonotype collapsing, multiple-testing control, epitope detection complexity, Karlin–Altschul reduction). - Benchmarks: comprehensive E-value matrix (reference × control size × query × scope) and an NLV-vs-GIL epitope detection-complexity study, rendered to SVG via gnuplot.
- Robustness tests (long/empty/homopolymer sequences); exact-hit exclusion option.
Install
pip install seqtree
Wheels: CPython 3.10–3.13 on Linux, macOS, Windows.
seqtree v0.0.1
First public release of seqtree: fast fuzzy search over biological sequences (C++ core + Python bindings).
- Two engines over one trie:
seqtm(branch-and-bound, exact per-type edit caps) andseqtrie(banded DP, matrix-weighted score budgets). - Parallel batch / batch-of-batches search, on-demand alignment, BLOSUM62 + custom matrices.
- GPL-3.0-or-later.
Wheels (cp310–cp313, Linux/macOS/Windows) and the sdist are built and published to PyPI by the Publish To PyPI workflow.