Fix megaresv3.00 db#51
Open
jtclaypool wants to merge 2 commits into
Open
Conversation
Collaborator
|
Thanks for bringing this to our attention! We're looking to make an update to MEGARes soon and will incorporate your changes as well! We left the spaces in the "annotations" file more so that the labels could be used directly when making figures, but I totally agree with you. Better to leave the spaces out and people can just swap out the "_" with spaces for plotting. Thanks! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
MEGARes annotations contains misspellings ('multi-compount') and also a loop that should not be present. The loop present in both the MEGARes db directly and within AMR++ is:
If using the 'megares_to_external_header_mappings_v3.00.csv' and updating all the headers to current (removed 'REMOVED' and incorporating any 'UpdatedHeader'), this should now be fixed.
Additionally by doing this, spaces are removed from group, which IMO, improves the "All sequence metadata has been formatted to work well with the majority of bioinformatics software. Sequence headers contain no whitespace or non-compliant symbols" outlined in "What distinguishes MEGARes" even though this is not the sequence header.
Two changes were made to files. The external header mappings file was updated directly from MEGARes. The annotation file was updated from this file using R:
The annotation file was marked with CRLF line endings as seemed to be the original file ending, but I could be wrong.