OpenProteinAI · GiorgiaBorgmann · May 8, 2026 · May 6, 2026 · May 6, 2026 · May 6, 2026
diff --git a/source/_static/opmodels/dataset-assay/advanced-filters-panel.png b/source/_static/opmodels/dataset-assay/advanced-filters-panel.png
diff --git a/source/_static/opmodels/dataset-assay/aligned-vh-vl-cdrs.png b/source/_static/opmodels/dataset-assay/aligned-vh-vl-cdrs.png
diff --git a/source/_static/opmodels/dataset-assay/antibody-settings-panel.png b/source/_static/opmodels/dataset-assay/antibody-settings-panel.png
diff --git a/source/_static/opmodels/dataset-assay/cluster-bubble.png b/source/_static/opmodels/dataset-assay/cluster-bubble.png
diff --git a/source/_static/opmodels/dataset-assay/cluster-dropdown.png b/source/_static/opmodels/dataset-assay/cluster-dropdown.png
diff --git a/source/_static/opmodels/dataset-assay/cluster-modal.png b/source/_static/opmodels/dataset-assay/cluster-modal.png
diff --git a/source/_static/opmodels/dataset-assay/customize-columns-menu.png b/source/_static/opmodels/dataset-assay/customize-columns-menu.png
diff --git a/source/_static/opmodels/dataset-assay/dataset-assay-overview.png b/source/_static/opmodels/dataset-assay/dataset-assay-overview.png
diff --git a/source/_static/opmodels/dataset-assay/predict-dropdown.png b/source/_static/opmodels/dataset-assay/predict-dropdown.png
diff --git a/source/_static/opmodels/dataset-assay/predict-modal-1.png b/source/_static/opmodels/dataset-assay/predict-modal-1.png
diff --git a/source/_static/opmodels/dataset-assay/predict-modal-2.png b/source/_static/opmodels/dataset-assay/predict-modal-2.png
diff --git a/source/_static/opmodels/dataset-assay/predict-modal-3.png b/source/_static/opmodels/dataset-assay/predict-modal-3.png
diff --git a/source/_static/walkthroughs/antibody-hit-selection-ngs/dataset-assay-overview.png b/source/_static/walkthroughs/antibody-hit-selection-ngs/dataset-assay-overview.png
diff --git a/source/_static/walkthroughs/antibody-hit-selection-ngs/joint-plot-two-scores.png b/source/_static/walkthroughs/antibody-hit-selection-ngs/joint-plot-two-scores.png
diff --git a/..._static/walkthroughs/antibody-hit-selection-ngs/umap-clusters-vs-prediction.png b/..._static/walkthroughs/antibody-hit-selection-ngs/umap-clusters-vs-prediction.png
diff --git a/source/walkthroughs/antibody-hit-selection-ngs.rst b/source/walkthroughs/antibody-hit-selection-ngs.rst
@@ -0,0 +1,178 @@
+============================================
+Antibody hit selection from NGS data
+============================================
+
+
+This recommended end-to-end workflow guides you through selecting antibody hits 
+from NGS-derived libraries using the **Dataset Assay Details** page. Each step assumes 
+the previous step's output is in place.
+
+This walkthrough is task-oriented. For a detailed feature reference of the controls used below like Predict, Clustering, Advanced Filters, and the Antibody
+settings panel, see comprehensive guide at:doc:`/web-app/opmodels/dataset-assay`.
+
+.. figure:: /_static/walkthroughs/antibody-hit-selection-ngs/dataset-assay-overview.png
+   :alt: Dataset Assay Details page overview, showing tabs, header chips, and action bar
+
+
+Prepare the dataset
+=====================
+
+Upload your NGS-derived antibody library as an assay dataset. The platform
+auto-annotates antibody datasets, including germline assignment, CDR3 extraction, and mutation counts. 
+Wait for the dataset status to reach *SUCCESS* before continuing to the next step.
+
+.. note::
+
+   Once the dataset reaches *SUCCESS*, a default UMAP job is queued
+   automatically and you should see it appear in the Jobs panel on the left. The **UMAP** tab will beempty until that job finishes.
+
+
+Configure the antibody view
+===========================
+
+On the **Dataset** tab, open the **Antibody** panel, then configure the following settings:
+
+1. **Set numbering scheme to IMGT**: Pick **IMGT** as the numbering scheme (matches most NGS annotation tools).
+2. **Select CDR regions**: Tick **Show CDR1 / CDR2 / CDR3** so regions are visually obvious in the
+   table.
+3. **Align sequences for comparisons**: Tick **Aligned** (and **Trim non-standard positions**) so VH and VL line up
+   across rows. This required for visual comparison and for the *Liabilities*
+   column.
+4. **Add key annotation colums**: Click **Customize columns** and enable: *Heavy V-Gene*, *Light V-Gene*,
+   *Germline pair*, *Total Mutations*, *CDR3 length*, *Germline distance (%)*,
+   *Liabilities*.
+
+You now have a fully annotated table view of the library.
+
+
+Reduce redundancy with Clustering
+=================================
+
+NGS libraries are dominated by closely related clones. Cluster first so
+downstream steps operate on diverse families.
+
+1. **Initiate Clustering**: **Cluster → New clustering.**
+2. **Choose PoET-2**: Pick **PoET-2** (chain-aware; works for heavy:light pairs).
+3. **Use default parameters**: Leave **Reduction = Mean**, **Linkage = Ward**, **Metric = Euclidean**
+   unless you have a reason to change.
+4. **Submit and Optimize cluster counts**: When the chip turns active, open the chip and tune **Number of
+   clusters** while watching the UMAP. Choose a number that visually
+   separates populations.
+
+You now have a ``Cluster Number`` column.
+
+
+Pre-filter using NGS / antibody metadata
+========================================
+
+Open **Advanced Filters** from the Dataset tab and apply the following filters in sequence to refine your candidate pool:
+
+1. **Quality/abundance gate**: Filter on read count or replicate measurement
+   (e.g. ``count ≥ N``) to drop singletons.
+2. **Drop liability-heavy clones**: *(optional).* If the *Liabilities* column
+   flags many rows, sort by this column and exclude the worst candidates.
+3. **Germline focus**: *(optional but common).* Filter by **Heavy V-Gene** /
+   **Light V-Gene** or **Germline Pair** to focus on a developability-friendly
+   germline family.
+4. **Mutation window**: Apply **Total Mutations** filter (e.g. ``≥ 3`` to skip
+   naive sequences, or ``≤ 15`` to skip over-mutated clones) or alternatively, use
+   **Germline distance (%)** if your sequences differ in length.
+5. **Enforce CDR3 length constraints**: Filter on **CDR3 Length** to enforce a length range
+   that suits your therapeutic format.
+6. **Diversity by cluster**: Add ``Group by = Cluster Number`` and
+   ``Top K per group = 1–5``, sorted by your strongest assay readout. This guarantees that each family has a 
+   representative candidate, ensuring your final selection spans the full diversity of the library.
+
+Toggle **Show select column** if you want to see what got rejected instead of
+hiding it.
+
+
+Score with Predict
+======================
+
+With the candidate set narrowed, run a model to rank within it.
+
+1. **Predict → New prediction.**
+2. **Use your custom model**: If you have a trained user model for the property you care about (binding,
+   expression, developability), pick it under *User models*.
+3. Otherwise, on antibody datasets, the **Recommended for you** tab proposes
+   preset PoET-2 configurations using the dataset itself as the prompt
+   context — a good default when you have no labels yet.
+4. Submit. A **Predict** chip appears, and a score column is added.
+
+**Scale with parallel predictions**: You can run multiple predictions in parallel — for example, one for binding
+and one for developability. Each gets its own chip and its own column.
+
+
+Combine signals
+================
+
+Add a new filter card at the top of your existing filter stack to prioritize high-scoring candidates:
+
+- **Column =** ``<your prediction>`` **· Operator =** ``≥`` **· Value =**
+  ``<threshold>``
+
+Or sort: **Sort by** ``<prediction>`` **Desc**, **Top K = 96**.
+
+**Multi-criteria selection**: If you ran two predictions, stack two filter cards (one per score) to require
+both signals — e.g., high binding score AND high developability score.
+
+
+Inspect visually
+=================
+Use the built-in visualization tools to validate your filtered candidate set and explore relationships between key metrics:
+
+- **UMAP visualization** — Review the scatter plot with points colored by cluster assignment and 
+  toggle the prediction score as an alternative color axis. Confirm the surviving candidates are spread across the
+  embedding space rather than clustered in a single region, ensuring you've maintained diversity.
+- **Joint plot** — Examine pairwise relationships, e.g., prediction score vs. CDR3
+  length, or score-A vs. score-B.
+- **Interactive filtering** - All visualizations respect the active filters: filtered-out sequences appear dimmed 
+  while selected candidates remain highlighted. Selecting points in the UMAP also selects the corresponding
+  rows in the table, and vice versa.
+
+.. figure:: /_static/walkthroughs/antibody-hit-selection-ngs/umap-clusters-vs-prediction.png
+   :alt: UMAP coloured by Cluster Number, and the same UMAP coloured by a prediction score
+
+.. figure:: /_static/walkthroughs/antibody-hit-selection-ngs/joint-plot-two-scores.png
+   :alt: Joint plot with two prediction scores as axes
+
+
+Export the hit list
+====================
+
+Return to the Dataset tab, where your shortlisted and validated candidates are present.
+Next steps options:
+
+- **Iterate with machine learning**: Train a new model on the selected rows (footer **Train model** action) for
+  an iterative round.
+- **Proceed to wet-lab validation**: Export the candidate table for ordering or expermimental wet-lab follow-up.
+
+
+Quick decision guide
+====================
+
+.. list-table::
+   :header-rows: 1
+   :widths: 50 50
+
+   * - Goal
+     - Use
+   * - See CDRs / aligned heavy + light side-by-side
+     - **Antibody panel** → CDR checkboxes + Aligned + IMGT
+   * - Add germline / mutation columns to the table
+     - **Antibody panel** → Customize columns
+   * - Remove near-duplicate clones from NGS
+     - **Cluster** + filter ``Top K per Cluster Number``
+   * - Restrict to a germline family
+     - **Advanced Filter** on ``Heavy V-Gene`` / ``Germline Pair``
+   * - Filter out clones with developability liabilities
+     - **Antibody panel** → enable *Liabilities* column, then sort/exclude
+   * - Rank by predicted property
+     - **Predict** + sort by score column
+   * - Combine binding + developability
+     - Two **Predict** runs + two filter cards
+   * - See structure of the library
+     - **UMAP** tab, coloured by ``Cluster Number`` or a prediction score
+   * - See pairwise tradeoffs
+     - **Joint plot** with two prediction scores as axes
diff --git a/source/walkthroughs/index.rst b/source/walkthroughs/index.rst
@@ -14,7 +14,9 @@ Web App
   * - Walkthroughs
     - Tools covered
   * - `Lead optimization of monoclonal antibody to meet target product profile <./antibody-engineering.rst>`_
-    - Optimization and Prediction Models 
+    - Optimization and Prediction Models
+  * - `Antibody hit selection from NGS data <./antibody-hit-selection-ngs.rst>`_
+    - Optimization and Prediction Models
   * - `Finding mutational hotspots and designing one-shot variant libraries <./enzyme-engineering.rst>`_
     - PoET, Structure Prediction
   * - `Designing libraries of multimeric proteins <./multichain.rst>`_
@@ -75,6 +77,7 @@ Python API
   :maxdepth: 2
 
   Lead optimization of monoclonal antibody to meet target product profile <antibody-engineering>
+  Antibody hit selection from NGS data <antibody-hit-selection-ngs>
   Finding mutational hotspots and designing one-shot variant libraries <enzyme-engineering>
   Predicting the fitness of isomerases without experimental data <./predicting-fitness.ipynb>
   Understanding the impact of substitution and deletions on aliphatic amidase using different large language models <./AMIE_substitution_deletion_analysis_poet.ipynb>