Skip to content

Implement comprehensive curation status tracking for publications #420

@cybersiddhu

Description

@cybersiddhu

Description

We need to refine how we tag and track the curation status of publications in our system. Currently, literature topics are used as the primary indicator, but this approach is outdated and doesn't accurately reflect the actual curation status of publications.

Current Issues

  • Literature topics are not particularly useful to users and have become outdated with modern AI summarization tools
  • The current system doesn't properly track whether all relevant data has been extracted from publications
  • There's no clear definition of what constitutes a "curated" publication

Proposed Solution

Define a publication as "curated" only when the following three key types of information have been extracted (or verified as not present):

  1. Functional annotation data

    • Gene names
    • Gene product names
    • Descriptions
  2. GO terms

    • All relevant Gene Ontology annotations
  3. Strain/phenotype annotations

    • Strain information
    • Associated phenotypes

Implementation Requirements

  1. Design and implement a data structure to track the curation status of each publication
  2. Create an interface to update and view the curation status with checkboxes or similar indicators for each of the three categories
  3. Modify existing publication displays to show curation status
  4. Add API endpoints to query publications by curation status
  5. Develop a migration strategy for existing publications

Additional Consideration

Consider implementing a gene-level curation status similar to UniProt's approach:

  • Track an annotation score for each gene
  • Track a data existence score for each gene
  • Examples from UniProt: Human protein and Dicty ortholog

Benefits

  • More accurate tracking of publication curation status
  • Better prioritization of curation efforts
  • Improved user experience by clearly indicating the completeness of information
  • Ability to generate metrics about curation progress and coverage

Note

Literature topics need not be eliminated entirely as they still serve to identify specific publication types (reviews, phylogenetic analyses, etc.), but they should not be the determining factor for curation status.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions