Context
There's a push to replace the Genomic Detective hardcoded files with a Nextclade dataset, as suggested in this comment and this comment. This change aims to improve automation and consistency in genotyping processes.
Creating a Nextclade dataset could require:
- investigating clade-defining mutations or
- creating a guide tree based on the Genome Detective results (
ORF1_type and ORF2_type) and augur traits.
Current situation:
A Nextclade dataset would greatly facilitate automated genotyping.
Potential next steps:
Context
There's a push to replace the Genomic Detective hardcoded files with a Nextclade dataset, as suggested in this comment and this comment. This change aims to improve automation and consistency in genotyping processes.
Creating a Nextclade dataset could require:
ORF1_typeandORF2_type) andaugur traits.Current situation:
A Nextclade dataset would greatly facilitate automated genotyping.
Potential next steps:
Build a guide tree based on CDC reference sequences and identified clades
* [ ] Scraped and cleaned into CDC_references (3).xls (estimated labor: 3+ hours) * due to recombination, genotype follows VP1 (capsid)
* [ ] Pull capsid sequences and annotate header into norovirus_cdc_reference.fasta.txt (estimated labor: 15mins)
* [ ] Optional: Pull reference sequences from Chhabra et al, 2019 Table 1 and duel nomenclature from Table 2
Compare results against other tools
ORF1_typeandORF2_typeIf results are consistent across tools, identify clade-defining mutations
Develop Nextclade dataset files (reference sequence, pathogen.json , etc.)
Implement the Nextclade workflow
Implement rules that test the Nextclade dataset using example data