bindsight

Expression → Binder. The first open-source pipeline that takes RNA-seq counts and outputs ranked de novo protein binder candidates, with full provenance back to the patient cohort.

👉 Try it live

Primary (Hugging Face Space, 16 GB CPU): huggingface.co/spaces/Mikhaeelatefrizk/bindsight Mirror (Streamlit Community Cloud, 1 GB CPU): bindsight.streamlit.app

Zero install — runs in your browser. Click the Demo tab and watch the full pipeline rediscover HER2 + EGFR from synthetic RNA-seq counts in ~60 seconds (cached for ~0.1 s on every revisit).

Both hosts are free-tier and will sleep after several days without traffic; a GitHub Actions cron pings both URLs every 6 hours so the next visitor lands on a warm container. If you hit either link after a long quiet stretch, give the wake-up screen 30–60 s and reload once.

🚀 v0.1.0 — discovery half end-to-end on CPU; design + validation wired for free Colab; web UI deployed on Streamlit Cloud.

New here? → What is bindsight? (5-min read) · How to use it · Use cases · Designing on Colab · Hugging Face Space backup · Keeping the demo warm (free playbook)

Three ways to try it

1. Web app — Hugging Face Space (zero install) · Streamlit mirror

Anyone visiting either URL above gets:

The Home page with what bindsight is
A Demo button that runs the full pipeline live and renders a report
A Run on my data page (upload counts.tsv + design.tsv → get results)
A Browse a run page to inspect any output directory

The Hugging Face Space is the primary mirror (16 GB CPU). The Streamlit Cloud deploy at bindsight.streamlit.app is the same app on smaller free-tier infrastructure (1 GB CPU). Both hosts sleep after several days of inactivity; a 6-hourly GitHub Actions ping keeps them warm, but the very first visit after a long quiet period can still take ~30–120 s to wake.

2. Local web app (one command)

pip install -e ".[discover,report]"
bindsight ui
# → opens http://localhost:8501 with the same multi-page interface

3. CLI (60 seconds)

bindsight demo

Runs the full discovery half on a shipped 10-gene tumor-vs-normal cohort, produces a real HTML report you can open in a browser. The pipeline rediscovers HER2 (ERBB2) and EGFR as the top antibody-tractable surface antigens — entirely from RNA-seq counts. ~30 seconds, no internet, no GPU.

$ bindsight demo
╭───────────── Demo run ──────────────╮
│ The pipeline should rediscover      │
│ ERBB2 (HER2) and EGFR as top        │
│ antibody-tractable surface antigens.│
╰─────────────────────────────────────╯
INFO  DEGs: 10 total, 5 significant
INFO  surfaceome filter: 5 → 2
INFO  wrote runs/demo/report.html
╭───────── bindsight demo ─────────────╮
│ Demo complete!                      │
│ Report HTML: runs/demo/report.html  │
╰─────────────────────────────────────╯

Why this exists

Two ecosystems in computational biology operate side-by-side and barely talk to each other:

Genomics (DESeq2, edgeR, Seurat, scanpy, TCGA, recount3) stops at "here are the interesting genes."
Protein design (RFdiffusion, ProteinMPNN, BindCraft, BoltzGen, AlphaFold, Boltz-2) starts from "given a target..."

The bridge between them — "this gene is up in disease, low in healthy tissue, surface-exposed, has a known targetable site, here is a docked binder seed and a designed binder ranked by predicted affinity, with the receipts back to the patient cohort" — is missing. People build it ad-hoc, per project, never reproducibly. bindsight ships that bridge as one tool.

What it does

  RNA-seq counts (bulk or sc)                       Designed protein binders
              │                                              ▲
              │                                              │
              ▼                                              │
   Differential expression  ──►  Surface-exposed  ──►  De novo backbone
   (pydeseq2 or DESeq2)         (SURFY)              (RFdiffusion / BindCraft / BoltzGen)
                                     │                       │
                                     ▼                       ▼
                              Targetable sites          Sequence design
                              (SURFACE-Bind)            (ProteinMPNN)
                                     │                       │
                                     ▼                       ▼
                              AlphaFoldDB structure     Affinity + structure
                                                        validation
                                                        (Boltz-2 / Chai-1r)
                                                              │
                                                              ▼
                                                  Multi-objective ranking
                                                              │
                                                              ▼
                                       Quarto report + RO-Crate (Zenodo)
                                       with full PROV-O provenance

Who it's for

Translational researchers who want a free, reproducible "data → designed binder" pipeline.
Clinical biologists who need an audit trail back from a binder to the patient cohort.
Method developers who want a held-out evaluation harness (rediscovery of known antigens) to benchmark new designers/validators.
Pharma early-discovery teams who want an open comparator they can extend with proprietary designers via the plugin interface.

What's distinctive

	Existing protein-design tools	bindsight
Input	Target structure	RNA-seq counts
Provenance	PDB + maybe a log	PROV-O JSON-LD + RO-Crate, audit trail to patient cohort
Hardware	HPC assumed	CPU laptop + offload to free Colab / Modal / Kaggle
Cost-awareness	None	`--dry-run` estimates GPU $ before running
Negative results	Discarded	Catalogued (`failure_taxonomy.parquet`)
Citability	Code dump	DOI per release, JSON-Schema-validated outputs, JOSS-style

For the full landscape comparison, see ARCHITECTURE.md.

What works today (v0.1.0)

Capability	Status	How to try
Web UI — multi-page Streamlit app (Home / Demo / Run on my data / Browse / About)	✅ ready	`bindsight ui` or Streamlit Cloud
`bindsight demo` — full discovery on shipped example + paper-style report	✅ ready	`bindsight demo`
`bindsight discover` — your own RNA-seq cohort → ranked targets	✅ ready	`bindsight discover my.yaml --out runs/x`
`bindsight rank` — multi-objective composite scoring of validated binders	✅ ready	`bindsight rank runs/x`
`bindsight report --format html` — paper-style HTML, embedded volcano + tables + provenance	✅ ready	`bindsight report runs/x`
`bindsight report --format streamlit` — interactive dashboard for one run	✅ ready	`bindsight report runs/x --format streamlit`
`bindsight run` — full pipeline orchestrator (discover → design → validate → rank → report → export)	✅ ready	`bindsight run my.yaml --out runs/x`
`bindsight export` — RO-Crate zip for Zenodo deposit	✅ ready	`bindsight export runs/x --out runs/x.crate.zip`
`bindsight design --dry-run` — GPU cost estimate for any backend	✅ ready	`bindsight design runs/x --backend modal --dry-run`
`bindsight doctor` — diagnose deps, caches, vendored data	✅ ready	`bindsight doctor`
`bindsight verify-licenses` — per-component license inventory	✅ ready	`bindsight verify-licenses`
GPU design notebook — RFdiffusion + ProteinMPNN + Boltz-2 wired in templated Colab notebook	✅ ready	`bindsight design runs/x --backend colab` opens a notebook with real install + inference cells
Manual Colab recipe — step-by-step for the GPU half	✅ ready	docs/colab-design-howto.md

Status & roadmap

✅ v0.1.0 (current) — discovery + rank + report + export + web UI + real Colab notebook patterns
🔬 v0.1.x — first-user GPU validation (someone with a real GPU runs the Colab notebook end-to-end and reports any install/inference issues; we patch fast)
⏳ v0.2.0 — live Modal/Colab job submission via API, BindCraft + BoltzGen plugins fully wired, scRNA-seq input
⏳ v1.0.0 — JOSS submission + validation paper (rediscovery of HER2/EGFR/MSLN/CLDN6 from blinded TCGA cohorts)

See ARCHITECTURE.md § Phased Roadmap for details.

Install

bindsight is not yet on PyPI. Install from source (Windows / macOS / Linux, Python 3.11+):

git clone <repo-url> bindsight
cd bindsight
python -m venv .venv

# Windows
.venv\Scripts\activate

# macOS / Linux
source .venv/bin/activate

pip install -e ".[dev,discover,report]"
bindsight --version
bindsight doctor                # confirm install is clean
bindsight demo                  # run the 60-second demo

For Conda users, envs/discover.yaml provides the same set of dependencies:

mamba env create -f envs/discover.yaml
mamba activate bindsight-discover
pip install -e ".[dev,report]"

Quickstart (target: v0.0.x)

# 1. Discover targets from a TCGA cohort (CPU only, ~10 minutes on a laptop)
bindsight discover examples/tcga_luad.yaml --out runs/luad_v01

# 2. Inspect the discovered targets
bindsight report runs/luad_v01 --format html
open runs/luad_v01/report.html

# 3. (v0.1+) Design binders for the top 5 targets via Colab GPU
bindsight design runs/luad_v01 --backend colab --trajectories 50

# 4. (v0.1+) Validate with Boltz-2
bindsight validate runs/luad_v01 --backend colab --validator boltz2

# 5. (v0.1+) Rank, report, export as RO-Crate
bindsight rank runs/luad_v01
bindsight report runs/luad_v01 --format html --include-binders
bindsight export runs/luad_v01 --format ro-crate --out runs/luad_v01.crate.zip

Repository layout

bindsight/                 # Python package
├── io/                   # Parquet, FASTA, PDB, mmCIF, manifest readers
├── deg/                  # pydeseq2 wrapper (+ optional R bridge)
├── targets/              # Open Targets, HPA, GTEx, recount3 clients
├── surfaceome/           # SURFY filter + SURFACE-Bind client
├── structures/           # AlphaFoldDB + RCSB/PDBe fetch
├── epitopes/             # SURFACE-Bind site lookup; fpocket fallback (v0.2)
├── design/               # Designer plugin interface
├── runners/              # Colab / Modal / Kaggle / local-Docker adapters
├── validate/             # Boltz-2 default; Chai-1r, AF2-IG opt-in
├── rank/                 # Multi-objective scoring
├── provenance/           # PROV-O JSON-LD schema + RO-Crate emitter
├── report/               # Quarto template + Streamlit app
└── cli.py                # Click entrypoint

envs/                     # Conda environment files (one per stage)
examples/                 # Example pipeline configs (TCGA-LUAD, etc.)
tests/                    # Pytest smoke + integration tests + fixtures
docs/                     # mkdocs-material site source
.github/workflows/        # CI + Zenodo deposit on tag

ARCHITECTURE.md           # Architectural source of truth
LICENSING.md              # Per-dependency license inventory
CONTRIBUTING.md           # How to contribute
CHANGELOG.md              # Per-version changes
CITATION.cff              # Zenodo / GitHub citation metadata
Snakefile                 # Snakemake DAG
pyproject.toml            # Python packaging

Documentation

ARCHITECTURE.md — system design, module contracts, design rationale
LICENSING.md — per-dependency license inventory and commercial-use guidance
CONTRIBUTING.md — dev setup, testing, commit conventions
CHANGELOG.md — per-version changes
docs/ — long-form docs (built with mkdocs build)

Acknowledgments

bindsight is an opinionated wrapper. Real intellectual credit belongs to the upstream tool authors. See LICENSING.md for the full inventory; the work this builds on most directly:

SURFACE-Bind (Khakzad et al., PNAS 2025) — the targetable-sites catalog that makes the bridge tractable
pydeseq2 (Muzellec et al., Bioinformatics 2023) — Python DESeq2 implementation
RFdiffusion (Watson et al., Nature 2023) — backbone generation
ProteinMPNN (Dauparas et al., Science 2022) — sequence design
Boltz-2 (Wohlwend et al., 2025) — structure + affinity prediction
BindCraft (Pacesa et al., Nature 2025) — one-shot binder design
Snakemake (Mölder et al., F1000Research 2021) — workflow orchestration

Citation

If you use bindsight in your work, please cite it via the Zenodo DOI:

Atef Rizk, M. (2026). bindsight: a reproducible bridge from RNA-seq to de novo protein binder design (v0.1.0). Zenodo. https://doi.org/10.5281/zenodo.20121496

BibTeX:

@software{atefrizk_bindsight_2026,
  author       = {Atef Rizk, Mikhaeel},
  title        = {bindsight: a reproducible bridge from RNA-seq to de novo protein binder design},
  year         = {2026},
  publisher    = {Zenodo},
  version      = {v0.1.0},
  doi          = {10.5281/zenodo.20121496},
  url          = {https://doi.org/10.5281/zenodo.20121496},
  orcid        = {https://orcid.org/0009-0006-1069-9558}
}

GitHub also exposes a "Cite this repository" button on the right sidebar of the repo page that auto-generates citations in BibTeX, APA, and other formats from CITATION.cff. Please also cite the upstream tools you used (the per-run manifest emits a software.bib to make this easy).

About the author

bindsight is built and maintained by Mikhaeel Atef Rizk — PharmD graduate of the German University in Cairo (GUC), currently finishing the Egyptian post-PharmD applied-pharmacy term (Imtiyaz). Earlier in 2026 he had a research rotation at the German International University in Berlin (GIU Berlin) where he picked up R / RStudio.

ORCID: 0009-0006-1069-9558
GitHub: @mikhaeelatefrizk
Email: mikhaeelatefrizk@proton.me
Languages: Arabic (native), English (full professional), German (professional working ≈ B2), French, Russian

Sister projects on GitHub

bindsight sits at the deep end of an ongoing bioinformatics portfolio:

bioinformatics-portfolio — an end-to-end bioinformatics portfolio with three subprojects, each fully reproducible from raw data to figures:
- 01-rnaseq-fox-domestication — RNA-seq differential expression on GEO GSE76517, replicating the Kukekova et al. PNAS 2018 silver-fox domestication study
- 02-tcga-survival-kidney-cancer — TCGA-KIRC clinical survival analysis identifying EPAS1 / HIF-2α as a prognostic biomarker (target of FDA-approved belzutifan)
- 03-scrnaseq-pbmc-seurat — Seurat v5 single-cell RNA-seq workflow on the 10x PBMC 3k dataset, recovering 8 immune populations
affect-labeling-review — a pre-registered systematic review + meta-analysis of affect labeling (Lieberman et al. 2007 paradigm). Real random-effects meta-analysis (k=9), PRISMA 2020, RoB 2 / ROBINS-I, ~14,000-word manuscript, open data + open code, .zenodo.json for citable archival
awesome-protein-design-software — curated list of protein-design / structure-prediction software (RFdiffusion, ProteinMPNN, Boltz, AlphaFold, ESMFold, etc.)
Awesome-Bioinformatics — curated list of bioinformatics libraries and tools

License

MIT. See LICENSE and LICENSING.md for component-level details.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.github/workflows		.github/workflows
.huggingface		.huggingface
announcement		announcement
bindsight		bindsight
data/surface_bind		data/surface_bind
docs		docs
envs		envs
examples		examples
paper		paper
scripts		scripts
tests		tests
tools		tools
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
CUSTOM_DOMAIN.md		CUSTOM_DOMAIN.md
GO_LIVE.ps1		GO_LIVE.ps1
LICENSE		LICENSE
LICENSING.md		LICENSING.md
PUBLISH_PYPI.md		PUBLISH_PYPI.md
README.md		README.md
RENAME.md		RENAME.md
SECURITY.md		SECURITY.md
Snakefile		Snakefile
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
streamlit_app.py		streamlit_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bindsight

👉 Try it live

Three ways to try it

1. Web app — Hugging Face Space (zero install) · Streamlit mirror

2. Local web app (one command)

3. CLI (60 seconds)

Why this exists

What it does

Who it's for

What's distinctive

What works today (v0.1.0)

Status & roadmap

Install

Quickstart (target: v0.0.x)

Repository layout

Documentation

Acknowledgments

Citation

About the author

Sister projects on GitHub

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

bindsight

👉 Try it live

Three ways to try it

1. Web app — Hugging Face Space (zero install) · Streamlit mirror

2. Local web app (one command)

3. CLI (60 seconds)

Why this exists

What it does

Who it's for

What's distinctive

What works today (v0.1.0)

Status & roadmap

Install

Quickstart (target: v0.0.x)

Repository layout

Documentation

Acknowledgments

Citation

About the author

Sister projects on GitHub

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages