Description
We need to refine how we tag and track the curation status of publications in our system. Currently, literature topics are used as the primary indicator, but this approach is outdated and doesn't accurately reflect the actual curation status of publications.
Current Issues
- Literature topics are not particularly useful to users and have become outdated with modern AI summarization tools
- The current system doesn't properly track whether all relevant data has been extracted from publications
- There's no clear definition of what constitutes a "curated" publication
Proposed Solution
Define a publication as "curated" only when the following three key types of information have been extracted (or verified as not present):
-
Functional annotation data
- Gene names
- Gene product names
- Descriptions
-
GO terms
- All relevant Gene Ontology annotations
-
Strain/phenotype annotations
- Strain information
- Associated phenotypes
Implementation Requirements
- Design and implement a data structure to track the curation status of each publication
- Create an interface to update and view the curation status with checkboxes or similar indicators for each of the three categories
- Modify existing publication displays to show curation status
- Add API endpoints to query publications by curation status
- Develop a migration strategy for existing publications
Additional Consideration
Consider implementing a gene-level curation status similar to UniProt's approach:
- Track an annotation score for each gene
- Track a data existence score for each gene
- Examples from UniProt: Human protein and Dicty ortholog
Benefits
- More accurate tracking of publication curation status
- Better prioritization of curation efforts
- Improved user experience by clearly indicating the completeness of information
- Ability to generate metrics about curation progress and coverage
Note
Literature topics need not be eliminated entirely as they still serve to identify specific publication types (reviews, phylogenetic analyses, etc.), but they should not be the determining factor for curation status.
Description
We need to refine how we tag and track the curation status of publications in our system. Currently, literature topics are used as the primary indicator, but this approach is outdated and doesn't accurately reflect the actual curation status of publications.
Current Issues
Proposed Solution
Define a publication as "curated" only when the following three key types of information have been extracted (or verified as not present):
Functional annotation data
GO terms
Strain/phenotype annotations
Implementation Requirements
Additional Consideration
Consider implementing a gene-level curation status similar to UniProt's approach:
Benefits
Note
Literature topics need not be eliminated entirely as they still serve to identify specific publication types (reviews, phylogenetic analyses, etc.), but they should not be the determining factor for curation status.