Skip to content

Fix megaresv3.00 db#51

Open
jtclaypool wants to merge 2 commits into
Microbial-Ecology-Group:masterfrom
jtclaypool:fix-megaresv3.00-db
Open

Fix megaresv3.00 db#51
jtclaypool wants to merge 2 commits into
Microbial-Ecology-Group:masterfrom
jtclaypool:fix-megaresv3.00-db

Conversation

@jtclaypool

Copy link
Copy Markdown

MEGARes annotations contains misspellings ('multi-compount') and also a loop that should not be present. The loop present in both the MEGARes db directly and within AMR++ is:

meg type class mechanism group
MEG_6144 Drugs Aminoglycosides Aminoglycoside-resistant_30S_ribosomal_protein_S12 RPSL
MEG_8677 Drugs Aminoglycosides Aminoglycoside-resistant_16S_ribosomal_subunit_protein RPSL
MEG_8678 Drugs Aminoglycosides Aminoglycoside-resistant_16S_ribosomal_subunit_protein RPSL

If using the 'megares_to_external_header_mappings_v3.00.csv' and updating all the headers to current (removed 'REMOVED' and incorporating any 'UpdatedHeader'), this should now be fixed.

Additionally by doing this, spaces are removed from group, which IMO, improves the "All sequence metadata has been formatted to work well with the majority of bioinformatics software. Sequence headers contain no whitespace or non-compliant symbols" outlined in "What distinguishes MEGARes" even though this is not the sequence header.

Two changes were made to files. The external header mappings file was updated directly from MEGARes. The annotation file was updated from this file using R:

require(tidyr)
require(dplyr)

db <- read.table('./data/amr/megares_to_external_header_mappings_v3.00.csv',header=T,sep=',',quote = '\"')

new_db <- db %>%
  mutate(MEGARes_header = 
           case_when(
             UpdatedHeader!="" ~ UpdatedHeader,
             T ~ MEGARes_header
             )
         ) %>%
  filter(MEGARes_header!="REMOVED") %>%
  mutate(header=MEGARes_header) %>%
  separate_wider_delim(MEGARes_header,delim = "|",names = c("MEG","type","class","mechanism","group","snp"),too_few = "align_start") %>%
  select(c("header","type","class","mechanism","group","snp")) %>%
  mutate(snp = case_when(
    is.na(snp) ~ "",
    T ~ snp
  ))

The annotation file was marked with CRLF line endings as seemed to be the original file ending, but I could be wrong.

@EnriqueDoster

Copy link
Copy Markdown
Collaborator

Thanks for bringing this to our attention! We're looking to make an update to MEGARes soon and will incorporate your changes as well! We left the spaces in the "annotations" file more so that the labels could be used directly when making figures, but I totally agree with you. Better to leave the spaces out and people can just swap out the "_" with spaces for plotting.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants