Skip to content

fix: resolve breaking issues in docking pipeline#2

Merged
AJPreto merged 1 commit into
mainfrom
fix/docking-breaking-issues
May 4, 2026
Merged

fix: resolve breaking issues in docking pipeline#2
AJPreto merged 1 commit into
mainfrom
fix/docking-breaking-issues

Conversation

@AJPreto
Copy link
Copy Markdown
Collaborator

@AJPreto AJPreto commented Apr 30, 2026

Fixes three breaking issues discovered during GPU docking runs (Boltz2 + DiffDock):

  1. Makefile - Expand LD_LIBRARY_PATH with nvidia CUDA library paths (cu13, cuda_nvrtc, cudnn, cublas) to fix DiffDock crashing with NVRTC_ERROR_COMPILATION at runtime.
  2. scripts/run_guild.py - Add --no-decoys CLI flag so runs can proceed without a decoy file present.
  3. guild/bulk.py - Boltz2 template robustness: import PROTEINS_FOLDER, prefer single-chain clean PDB as template, and retry without template on empty manifest.

All three fixes validated end-to-end on a GPU node.

- Makefile: expand LD_LIBRARY_PATH with nvidia CUDA library paths to fix
  DiffDock NVRTC crash at runtime
- scripts/run_guild.py: add --no-decoys flag to allow running without a
  decoy file present
- guild/bulk.py: add PROTEINS_FOLDER import, prefer single-chain PDB as
  Boltz2 template, and retry without template on empty manifest (fixes
  Boltz2 template parsing IndexError)
Copilot AI review requested due to automatic review settings April 30, 2026 15:59
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses runtime breakages observed during GPU docking runs by improving container CUDA library visibility, adding a CLI option to skip decoys, and making Boltz2 templating more robust during bulk docking.

Changes:

  • Add a --no-decoys flag to allow running the pipeline when a decoy file is unavailable.
  • Improve Boltz2 template handling by preferring a single-chain cleaned PDB template and retrying without a template when Boltz produces an empty manifest.
  • Extend LD_LIBRARY_PATH in Docker run targets to include additional NVIDIA CUDA-related shared library paths.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
scripts/run_guild.py Adds --no-decoys CLI flag and wires it into BulkRun(use_decoys=...).
guild/bulk.py Uses a single-chain cleaned PDB as the Boltz template and retries Boltz without a template when the manifest is empty.
Makefile Updates GPU docker run targets to include additional NVIDIA library paths in LD_LIBRARY_PATH.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread guild/bulk.py
Comment on lines +597 to +598
with open(manifest_path) as _mf:
_manifest = _json.load(_mf)
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

manifest.json is loaded without any error handling; if the file is truncated/invalid (e.g., Boltz interrupted mid-write) this will raise JSONDecodeError and abort the entire batch. Consider wrapping the manifest load in try/except and treating parse errors like an empty manifest (log + retry without template, or skip with warning).

Suggested change
with open(manifest_path) as _mf:
_manifest = _json.load(_mf)
try:
with open(manifest_path) as _mf:
_manifest = _json.load(_mf)
except (_json.JSONDecodeError, OSError) as exc:
logger.warning(
f"Boltz2 produced unreadable manifest for {run_id}: {exc}. "
"Retrying without template..."
)
_manifest = {}

Copilot uses AI. Check for mistakes.
Comment thread guild/bulk.py
Comment on lines +588 to +603
# Check if Boltz produced valid output (manifest with records).
# Template PDB parsing can fail silently in Boltz2, resulting
# in an empty manifest. If that happens, retry without the template.
manifest_path = (
f"{boltz_out_dir}/boltz_results_{run_id}_boltz/processed/manifest.json"
)
if os.path.exists(manifest_path):
import json as _json

with open(manifest_path) as _mf:
_manifest = _json.load(_mf)
if not _manifest.get("records"):
logger.warning(
f"Boltz2 produced empty manifest for {run_id} "
"(likely template parsing failure). Retrying without template..."
)
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new Boltz empty-manifest retry path is complex and impacts run stability, but there are no unit tests exercising it. Since this repo already has BulkRun tests, consider adding a test that mocks deploy_boltz/generate_boltz_yaml and verifies a retry occurs when manifest.json has no records.

Copilot uses AI. Check for mistakes.
Comment thread Makefile
Comment on lines +83 to 84
-e LD_LIBRARY_PATH=/opt/localcolabfold/.pixi/envs/default/lib:/usr/local/lib:/app/.venv/lib/python3.10/site-packages/nvidia/cu13/lib:/app/.venv/lib/python3.10/site-packages/nvidia/cuda_nvrtc/lib:/app/.venv/lib/python3.10/site-packages/nvidia/cudnn/lib:/app/.venv/lib/python3.10/site-packages/nvidia/cublas/lib \
guild:latest \
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LD_LIBRARY_PATH is fully hardcoded here and duplicated across multiple targets, which can drift from the value baked into the image and is easy to forget to update in one place. Consider factoring this into a Makefile variable and/or appending to the container’s existing LD_LIBRARY_PATH instead of replacing it entirely.

Copilot uses AI. Check for mistakes.
Comment thread Makefile
Comment on lines +127 to 128
-e LD_LIBRARY_PATH=/opt/localcolabfold/.pixi/envs/default/lib:/usr/local/lib:/app/.venv/lib/python3.10/site-packages/nvidia/cu13/lib:/app/.venv/lib/python3.10/site-packages/nvidia/cuda_nvrtc/lib:/app/.venv/lib/python3.10/site-packages/nvidia/cudnn/lib:/app/.venv/lib/python3.10/site-packages/nvidia/cublas/lib \
guild:latest \
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LD_LIBRARY_PATH is duplicated here (and differs from the image’s ENV LD_LIBRARY_PATH), which increases the chance of future drift between targets/images. Consider reusing a single Makefile variable (shared with run-boltz) and/or appending to the existing container LD_LIBRARY_PATH rather than replacing it.

Copilot uses AI. Check for mistakes.
@AJPreto AJPreto merged commit 63d4c7d into main May 4, 2026
6 checks passed
@AJPreto AJPreto deleted the fix/docking-breaking-issues branch May 4, 2026 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants