Skip to content

Loader Implementation: Add Named Gene Detection Field #90

@tmushayahama

Description

@tmushayahama

from this issue #89

This is a temp implementation until the boolean is implemented on the downstream data.

Implement a named_gene boolean field to identify genes with resolved symbols versus those that only have UniProtKB identifiers.

Add named_gene boolean field that is false when the gene ID (without "UniProtKB:" prefix) matches the gene_symbol.

annos_df['named_gene'] = annos_df['gene'].str.replace('UniProtKB:', '') != annos_df['gene_symbol']

Examples

Named gene (resolved symbol):

{
  "gene": "UniProtKB:Q8TD07",
  "gene_symbol": "RAET1E",
  "gene_name": "Retinoic acid early transcript 1E",
  "named_gene": true
}
**Unnamed gene (unresolved):**
```json
{
  "gene": "UniProtKB:A0A6Q8PHA8",
  "gene_symbol": "A0A6Q8PHA8",
  "gene_name": "Uncharacterized protein",
  "named_gene": false
}

Benefits

  • Easy filtering of genes with/without resolved symbols
  • Better data quality visibility
  • For future what we will display on the ui

Testing

  • Verify named_gene field is added to all gene records
  • Confirm logic correctly identifies matching symbols
  • Check that output JSON includes the new field

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions