from this issue #89
This is a temp implementation until the boolean is implemented on the downstream data.
Implement a named_gene boolean field to identify genes with resolved symbols versus those that only have UniProtKB identifiers.
Add named_gene boolean field that is false when the gene ID (without "UniProtKB:" prefix) matches the gene_symbol.
annos_df['named_gene'] = annos_df['gene'].str.replace('UniProtKB:', '') != annos_df['gene_symbol']
Examples
Named gene (resolved symbol):
{
"gene": "UniProtKB:Q8TD07",
"gene_symbol": "RAET1E",
"gene_name": "Retinoic acid early transcript 1E",
"named_gene": true
}
**Unnamed gene (unresolved):**
```json
{
"gene": "UniProtKB:A0A6Q8PHA8",
"gene_symbol": "A0A6Q8PHA8",
"gene_name": "Uncharacterized protein",
"named_gene": false
}
Benefits
- Easy filtering of genes with/without resolved symbols
- Better data quality visibility
- For future what we will display on the ui
Testing
from this issue #89
This is a temp implementation until the boolean is implemented on the downstream data.
Implement a
named_geneboolean field to identify genes with resolved symbols versus those that only have UniProtKB identifiers.Add
named_geneboolean field that isfalsewhen the gene ID (without "UniProtKB:" prefix) matches thegene_symbol.Examples
Named gene (resolved symbol):
{ "gene": "UniProtKB:Q8TD07", "gene_symbol": "RAET1E", "gene_name": "Retinoic acid early transcript 1E", "named_gene": true }Benefits
Testing
named_genefield is added to all gene records