Skip to content

Conversation

@GeorgWa
Copy link
Collaborator

@GeorgWa GeorgWa commented Jan 5, 2026

This is not very elegant but the current way of operations. I suggest to move entirely to mass based mods, always match the closest within limits and then just using the explicit mapping for overrides or psm files which do not support mass based mapping.

GeorgWa and others added 2 commits January 5, 2026 14:28
Add mass_mapped_mods and modification_mapping entries for common
modifications to fix mass-based mod lookup errors like
"Unknown modification: mass_shift=27.9949 at T" (Formyl@T).

Modifications added include:
- Formyl@T/S for formylation at T/S
- GG@C/S/T for ubiquitination at non-K sites
- Phospho@H/D for histidine/aspartate phosphorylation
- Various acylations (Butyryl, Crotonyl, Succinyl, Malonyl, etc.)
- iTRAQ/mTRAQ labeling variants
- DiLeu4plex labeling variants
- Deamidation, Methylthio, Cysteinyl, and other common PTMs

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR standardizes the placeholder atom used in N-terminal modification SMILES from [Ts] to [Lv] (Livermorium), resolving a conflict where [Ts] was previously used for both C-terminal amino acid placeholders and N-terminal modification placeholders. Additionally, it significantly expands MSFragger modification mappings to support a broader range of post-translational modifications.

Key changes:

  • Replaced [Ts] with [Lv] as the N-terminal modification placeholder throughout the codebase
  • Added comprehensive modification mappings for MSFragger reader (100+ new modification entries)
  • Fixed minor mass precision issue for Nethylmaleimide@C modification

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/test_smiles.py Updated all test cases and documentation to reflect the new [Lv] placeholder
alphabase/smiles/smiles.py Changed MOD_N_TERM_PLACEHOLDER_ATOM constant from "Ts" to "Lv" and updated related documentation
alphabase/smiles/adding_smiles.py Updated N-terminal modification SMILES definitions to use [Lv] placeholder
alphabase/constants/const_files/psm_reader.yaml Added extensive modification mappings for MSFragger and corrected Nethylmaleimide mass value
alphabase/constants/const_files/modification.tsv Updated modification definitions to use [Lv] placeholder and added PSMtag modifications

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

GeorgWa and others added 3 commits January 5, 2026 14:44
Add glucose-thioacetyl (Glc-TA) and related glycosylation
modifications to mass_mapped_mods and modification_mapping:
- Glc-TA@K/Any_N-term (236.0354)
- Glc-TA-Succinamide@K/Any_N-term (351.0624)
- Gal-b14-Glc-TA@K/Any_N-term (398.0882)
- Gal-b14-Glc-TA-Succinamide@K/Any_N-term (513.1152)
- Sia-a23-Gal-b14-Glc-TA@K/Any_N-term (689.1836)
- Sia-a23-Gal-b14-Glc-TA-Succinamide@K/Any_N-term (820.2055)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Add TMT6plex modifications at serine, threonine, and histidine
to fix mass lookup error for mass_shift=229.1629 at S.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Collaborator

@jalew188 jalew188 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. Mass-based mod representation is somehow not safe. One solution is to use decimal based repr e.g. S[+79.9xxx] for phoS and int based repr e.g. S[+80] for phoS, thus it may support many kinds of scenario. For MSFragger itself, it not easy to support, as the mod mass list (a kind of search parameters) may be also provided by users, then it is not easy to know the value is 'C(125.0477)' or 'C(125.0476)'

Base automatically changed from add-GlyGly-N-term to main January 5, 2026 23:10
GeorgWa and others added 3 commits January 6, 2026 00:43
- Add Benzyl-TA-Succinamide@K/Any_N-term
- Add Benzyl-TA@K/Any_N-term
- Add Benzyl@K/Any_N-term
- Add DiLeu4plex115/117/118@S/T/Y variants
- Fix mass truncation (not rounding) for all mods

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Resolved conflict in psm_reader.yaml by keeping msfragger modification mappings.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Fix 15 N-terminal modifications in modification.tsv to use [Lv] placeholder
  instead of [Ts] (Formyl, Dimethyl variants, GG, Lactyl, YnLactyl)
- Remove TMT6plex@Y from msfragger mappings (not in modification database)
- Update test regex to handle negative mass modifications

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@GeorgWa GeorgWa merged commit 4710a7d into main Jan 7, 2026
3 checks passed
@GeorgWa GeorgWa deleted the register-msfragger-mods branch January 7, 2026 23:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants