-
Notifications
You must be signed in to change notification settings - Fork 13
Register all modifications for msfragger reader #389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add mass_mapped_mods and modification_mapping entries for common modifications to fix mass-based mod lookup errors like "Unknown modification: mass_shift=27.9949 at T" (Formyl@T). Modifications added include: - Formyl@T/S for formylation at T/S - GG@C/S/T for ubiquitination at non-K sites - Phospho@H/D for histidine/aspartate phosphorylation - Various acylations (Butyryl, Crotonyl, Succinyl, Malonyl, etc.) - iTRAQ/mTRAQ labeling variants - DiLeu4plex labeling variants - Deamidation, Methylthio, Cysteinyl, and other common PTMs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR standardizes the placeholder atom used in N-terminal modification SMILES from [Ts] to [Lv] (Livermorium), resolving a conflict where [Ts] was previously used for both C-terminal amino acid placeholders and N-terminal modification placeholders. Additionally, it significantly expands MSFragger modification mappings to support a broader range of post-translational modifications.
Key changes:
- Replaced
[Ts]with[Lv]as the N-terminal modification placeholder throughout the codebase - Added comprehensive modification mappings for MSFragger reader (100+ new modification entries)
- Fixed minor mass precision issue for Nethylmaleimide@C modification
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/test_smiles.py | Updated all test cases and documentation to reflect the new [Lv] placeholder |
| alphabase/smiles/smiles.py | Changed MOD_N_TERM_PLACEHOLDER_ATOM constant from "Ts" to "Lv" and updated related documentation |
| alphabase/smiles/adding_smiles.py | Updated N-terminal modification SMILES definitions to use [Lv] placeholder |
| alphabase/constants/const_files/psm_reader.yaml | Added extensive modification mappings for MSFragger and corrected Nethylmaleimide mass value |
| alphabase/constants/const_files/modification.tsv | Updated modification definitions to use [Lv] placeholder and added PSMtag modifications |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Add glucose-thioacetyl (Glc-TA) and related glycosylation modifications to mass_mapped_mods and modification_mapping: - Glc-TA@K/Any_N-term (236.0354) - Glc-TA-Succinamide@K/Any_N-term (351.0624) - Gal-b14-Glc-TA@K/Any_N-term (398.0882) - Gal-b14-Glc-TA-Succinamide@K/Any_N-term (513.1152) - Sia-a23-Gal-b14-Glc-TA@K/Any_N-term (689.1836) - Sia-a23-Gal-b14-Glc-TA-Succinamide@K/Any_N-term (820.2055) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Add TMT6plex modifications at serine, threonine, and histidine to fix mass lookup error for mass_shift=229.1629 at S. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
jalew188
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall. Mass-based mod representation is somehow not safe. One solution is to use decimal based repr e.g. S[+79.9xxx] for phoS and int based repr e.g. S[+80] for phoS, thus it may support many kinds of scenario. For MSFragger itself, it not easy to support, as the mod mass list (a kind of search parameters) may be also provided by users, then it is not easy to know the value is 'C(125.0477)' or 'C(125.0476)'
- Add Benzyl-TA-Succinamide@K/Any_N-term - Add Benzyl-TA@K/Any_N-term - Add Benzyl@K/Any_N-term - Add DiLeu4plex115/117/118@S/T/Y variants - Fix mass truncation (not rounding) for all mods 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Resolved conflict in psm_reader.yaml by keeping msfragger modification mappings. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Fix 15 N-terminal modifications in modification.tsv to use [Lv] placeholder instead of [Ts] (Formyl, Dimethyl variants, GG, Lactyl, YnLactyl) - Remove TMT6plex@Y from msfragger mappings (not in modification database) - Update test regex to handle negative mass modifications 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This is not very elegant but the current way of operations. I suggest to move entirely to mass based mods, always match the closest within limits and then just using the explicit mapping for overrides or psm files which do not support mass based mapping.