Summary
Add functionality to fetch tRNA modification annotations from the MODOMICS database, align them to reference tRNA sequences, and integrate into the clover data model.
Context
read_mod_annotations() exists but is a minimal TSV reader with no way to generate annotation data. We need a way to programmatically download known tRNA modifications from MODOMICS and map them onto our reference sequences.
Proposed Approach
New file: R/modomics.R
Exported function: fetch_modomics_mods(fasta, organism, cache_dir = NULL, min_identity = 0.7)
- Fetch MODOMICS modification dictionary (maps unicode chars → mod names) via
/api/modifications?format=json
- Fetch tRNA sequences for organism via
/api/sequences?RNAtype=tRNA&organism={org}&format=json
- Strip modification codes to get alignable RNA sequences
- Align to reference FASTA using
Biostrings::pairwiseAlignment(type = "local")
- Transfer modification positions through the alignment
- Return tibble with columns
ref, pos, mod_full, mod1 (compatible with read_mod_annotations())
Internal helpers: .fetch_modomics_modifications(), .fetch_modomics_sequences(), .strip_modifications(), .extract_mod_positions(), .align_modomics_to_ref(), .match_modomics_to_refs(), .load_or_fetch() (cache helper)
Other changes
R/clover-se.R: Update read_mod_annotations() to also accept a tibble (not just file path)
DESCRIPTION: Add httr2 and jsonlite to Suggests
R/globals.R: Add new global variables
tests/testthat/test-modomics.R: Unit tests (mock data) + integration test (skip_if_offline())
Design Decisions
- Local alignment handles different 5'/3' ends and CCA tails between MODOMICS and reference
- Pre-filter candidates by amino acid before alignment to reduce cost
httr2/jsonlite as Suggests (gated by rlang::check_installed())
- Optional
cache_dir for offline use via saveRDS()/readRDS()
Organisms
S. cerevisiae, E. coli, H. sapiens, M. musculus initially (any MODOMICS organism supported)
Verification
fa <- clover_example("yeast/trna-ref.fa.gz")
mods <- fetch_modomics_mods(fa, "Saccharomyces cerevisiae")
mods
Summary
Add functionality to fetch tRNA modification annotations from the MODOMICS database, align them to reference tRNA sequences, and integrate into the clover data model.
Context
read_mod_annotations()exists but is a minimal TSV reader with no way to generate annotation data. We need a way to programmatically download known tRNA modifications from MODOMICS and map them onto our reference sequences.Proposed Approach
New file:
R/modomics.RExported function:
fetch_modomics_mods(fasta, organism, cache_dir = NULL, min_identity = 0.7)/api/modifications?format=json/api/sequences?RNAtype=tRNA&organism={org}&format=jsonBiostrings::pairwiseAlignment(type = "local")ref,pos,mod_full,mod1(compatible withread_mod_annotations())Internal helpers:
.fetch_modomics_modifications(),.fetch_modomics_sequences(),.strip_modifications(),.extract_mod_positions(),.align_modomics_to_ref(),.match_modomics_to_refs(),.load_or_fetch()(cache helper)Other changes
R/clover-se.R: Updateread_mod_annotations()to also accept a tibble (not just file path)DESCRIPTION: Addhttr2andjsonliteto SuggestsR/globals.R: Add new global variablestests/testthat/test-modomics.R: Unit tests (mock data) + integration test (skip_if_offline())Design Decisions
httr2/jsonliteas Suggests (gated byrlang::check_installed())cache_dirfor offline use viasaveRDS()/readRDS()Organisms
S. cerevisiae, E. coli, H. sapiens, M. musculus initially (any MODOMICS organism supported)
Verification