[codex] fix #42 with query-only skip-MSA feature generation#611
[codex] fix #42 with query-only skip-MSA feature generation#611
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6eaa0922e8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if FLAGS.pair_msa and any(getattr(interactor, "skip_msa", False) for interactor in interactors): | ||
| raise ValueError( | ||
| "--skip_msa generates query-only MSAs and cannot be combined with " | ||
| "--pair_msa=True. Re-run structure prediction with --pair_msa=False." | ||
| ) |
There was a problem hiding this comment.
Restrict skip_msa pairing check to multimer jobs
pre_modelling_setup now raises as soon as any interactor has skip_msa=True, before determining whether the job is monomeric or multimeric. Since pair_msa defaults to True in this script, a single-chain prediction built with --skip_msa now fails even though no MSA pairing is performed for monomer runs, which blocks the query-only workflow unless users pass an unrelated override.
Useful? React with 👍 / 👎.
| if FLAGS.skip_msa and not getattr(monomer, "skip_msa", False): | ||
| logging.info( | ||
| "Existing monomer features for %s were generated with bulk MSAs. " | ||
| "Recomputing query-only features for --skip_msa.", | ||
| source_name, | ||
| ) | ||
| return None |
There was a problem hiding this comment.
Block reuse of skip_msa cache for full-MSA generation
The reuse guard only handles --skip_msa reusing a bulk-MSA pickle, but not the reverse mismatch. If a cached monomer was created with skip_msa=True and the user reruns without --skip_msa, this path still reuses the query-only pickle, silently skipping full MSA regeneration and propagating skip_msa=True into outputs (which can later trigger pairing errors in prediction).
Useful? React with 👍 / 👎.
Summary
This PR adds a real
--skip_msamode tocreate_individual_features.pyso AlphaPulldown can generate query-only single-sequence features instead of running bulk MSA searches.Closes #42.
Root Cause
AlphaPulldown always built feature pickles through the standard AF2 or MMseqs2 MSA paths. Even when users wanted a single-sequence workflow, the feature-generation step still ran bulk MSA searches and stored full MSA-derived tensors. There was also no persisted marker to prevent those query-only features from being used later with
--pair_msa=True, which would be semantically invalid.What Changed
--skip_msatocreate_individual_features.pysingle_sequencebehavior and keep template-only search supportskip_msamarker on generated monomer objectsrun_structure_prediction.py --pair_msa=Truewhen any interactor was generated with--skip_msa--skip_msais requested--skip_msain the README feature-generation flag sectionUser Impact
Users can now generate lightweight query-only feature pickles across AF2, MMseqs2, and AF3 workflows without running bulk MSA searches. Those pickles are guarded at prediction time so they cannot be paired accidentally; they must be used with
--pair_msa=False.Validation
PYTHONPATH=/tmp/ap_pyshim:$PYTHONPATH conda run -n AlphaPulldown python -m pytest -q test/unit/test_modelling_setup.py test/unit/test_script_entrypoints.py test/unit/test_objects.py test/integration/test_create_individual_features.py137 passedNote: the temporary
PYTHONPATHshim was only needed on this workstation to work around the local AlphaFold/Biopython import mismatch during test collection. It is not part of the repo changes.