fix: resolve breaking issues in docking pipeline by AJPreto · Pull Request #2 · enveda/guild

AJPreto · 2026-04-30T15:59:13Z

Fixes three breaking issues discovered during GPU docking runs (Boltz2 + DiffDock):

Makefile - Expand LD_LIBRARY_PATH with nvidia CUDA library paths (cu13, cuda_nvrtc, cudnn, cublas) to fix DiffDock crashing with NVRTC_ERROR_COMPILATION at runtime.
scripts/run_guild.py - Add --no-decoys CLI flag so runs can proceed without a decoy file present.
guild/bulk.py - Boltz2 template robustness: import PROTEINS_FOLDER, prefer single-chain clean PDB as template, and retry without template on empty manifest.

All three fixes validated end-to-end on a GPU node.

- Makefile: expand LD_LIBRARY_PATH with nvidia CUDA library paths to fix DiffDock NVRTC crash at runtime - scripts/run_guild.py: add --no-decoys flag to allow running without a decoy file present - guild/bulk.py: add PROTEINS_FOLDER import, prefer single-chain PDB as Boltz2 template, and retry without template on empty manifest (fixes Boltz2 template parsing IndexError)

Copilot

Pull request overview

This PR addresses runtime breakages observed during GPU docking runs by improving container CUDA library visibility, adding a CLI option to skip decoys, and making Boltz2 templating more robust during bulk docking.

Changes:

Add a --no-decoys flag to allow running the pipeline when a decoy file is unavailable.
Improve Boltz2 template handling by preferring a single-chain cleaned PDB template and retrying without a template when Boltz produces an empty manifest.
Extend LD_LIBRARY_PATH in Docker run targets to include additional NVIDIA CUDA-related shared library paths.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File	Description
`scripts/run_guild.py`	Adds `--no-decoys` CLI flag and wires it into `BulkRun(use_decoys=...)`.
`guild/bulk.py`	Uses a single-chain cleaned PDB as the Boltz template and retries Boltz without a template when the manifest is empty.
`Makefile`	Updates GPU docker run targets to include additional NVIDIA library paths in `LD_LIBRARY_PATH`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-30T16:04:51Z

+                            with open(manifest_path) as _mf:
+                                _manifest = _json.load(_mf)


manifest.json is loaded without any error handling; if the file is truncated/invalid (e.g., Boltz interrupted mid-write) this will raise JSONDecodeError and abort the entire batch. Consider wrapping the manifest load in try/except and treating parse errors like an empty manifest (log + retry without template, or skip with warning).

Suggested change

with open(manifest_path) as _mf:

_manifest = _json.load(_mf)

try:

with open(manifest_path) as _mf:

_manifest = _json.load(_mf)

except (_json.JSONDecodeError, OSError) as exc:

logger.warning(

f"Boltz2 produced unreadable manifest for {run_id}: {exc}. "

"Retrying without template..."

)

_manifest = {}

Copilot · 2026-04-30T16:04:52Z

+                        # Check if Boltz produced valid output (manifest with records).
+                        # Template PDB parsing can fail silently in Boltz2, resulting
+                        # in an empty manifest.  If that happens, retry without the template.
+                        manifest_path = (
+                            f"{boltz_out_dir}/boltz_results_{run_id}_boltz/processed/manifest.json"
+                        )
+                        if os.path.exists(manifest_path):
+                            import json as _json
+
+                            with open(manifest_path) as _mf:
+                                _manifest = _json.load(_mf)
+                            if not _manifest.get("records"):
+                                logger.warning(
+                                    f"Boltz2 produced empty manifest for {run_id} "
+                                    "(likely template parsing failure). Retrying without template..."
+                                )


The new Boltz empty-manifest retry path is complex and impacts run stability, but there are no unit tests exercising it. Since this repo already has BulkRun tests, consider adding a test that mocks deploy_boltz/generate_boltz_yaml and verifies a retry occurs when manifest.json has no records.

Copilot · 2026-04-30T16:04:52Z

+		-e LD_LIBRARY_PATH=/opt/localcolabfold/.pixi/envs/default/lib:/usr/local/lib:/app/.venv/lib/python3.10/site-packages/nvidia/cu13/lib:/app/.venv/lib/python3.10/site-packages/nvidia/cuda_nvrtc/lib:/app/.venv/lib/python3.10/site-packages/nvidia/cudnn/lib:/app/.venv/lib/python3.10/site-packages/nvidia/cublas/lib \
 		guild:latest \


LD_LIBRARY_PATH is fully hardcoded here and duplicated across multiple targets, which can drift from the value baked into the image and is easy to forget to update in one place. Consider factoring this into a Makefile variable and/or appending to the container’s existing LD_LIBRARY_PATH instead of replacing it entirely.

Copilot · 2026-04-30T16:04:52Z

+		-e LD_LIBRARY_PATH=/opt/localcolabfold/.pixi/envs/default/lib:/usr/local/lib:/app/.venv/lib/python3.10/site-packages/nvidia/cu13/lib:/app/.venv/lib/python3.10/site-packages/nvidia/cuda_nvrtc/lib:/app/.venv/lib/python3.10/site-packages/nvidia/cudnn/lib:/app/.venv/lib/python3.10/site-packages/nvidia/cublas/lib \
 		guild:latest \


LD_LIBRARY_PATH is duplicated here (and differs from the image’s ENV LD_LIBRARY_PATH), which increases the chance of future drift between targets/images. Consider reusing a single Makefile variable (shared with run-boltz) and/or appending to the existing container LD_LIBRARY_PATH rather than replacing it.

Copilot AI review requested due to automatic review settings April 30, 2026 15:59

Copilot started reviewing on behalf of AJPreto April 30, 2026 15:59 View session

Copilot AI reviewed Apr 30, 2026

View reviewed changes

AJPreto merged commit 63d4c7d into main May 4, 2026
6 checks passed

AJPreto deleted the fix/docking-breaking-issues branch May 4, 2026 13:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: resolve breaking issues in docking pipeline#2

fix: resolve breaking issues in docking pipeline#2
AJPreto merged 1 commit into
mainfrom
fix/docking-breaking-issues

AJPreto commented Apr 30, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 30, 2026

Uh oh!

Copilot AI Apr 30, 2026

Uh oh!

Copilot AI Apr 30, 2026

Uh oh!

Copilot AI Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		-e LD_LIBRARY_PATH=/opt/localcolabfold/.pixi/envs/default/lib:/usr/local/lib:/app/.venv/lib/python3.10/site-packages/nvidia/cu13/lib:/app/.venv/lib/python3.10/site-packages/nvidia/cuda_nvrtc/lib:/app/.venv/lib/python3.10/site-packages/nvidia/cudnn/lib:/app/.venv/lib/python3.10/site-packages/nvidia/cublas/lib \
		guild:latest \

Conversation

AJPreto commented Apr 30, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants