Related to #89
Add a sort_priority field to rank genes by annotation quality, prioritizing well-annotated genes with known GO terms over those with unknown terms or unresolved symbols.
Problem
Currently, genes are not sorted by annotation quality, making it difficult to:
- Quickly access well-annotated genes
- So many unknowns on the first pages
Solution
Implement a priority system based on:
- Number of unknown GO terms (UNKNOWN:CC, UNKNOWN:BP, UNKNOWN:MF)
- Whether the gene has a resolved symbol
Priority Levels
| Priority |
Condition |
Description |
| 1 |
Default |
Genes with known GO terms |
| 10 |
1 unknown term |
Contains 1 unknown GO term |
| 20 |
2 unknown terms |
Contains 2 unknown GO terms |
| 30 |
3 unknown terms |
Contains 3 unknown GO terms |
| 50 |
Unnamed gene |
named_gene: false |
Sorting Order
Genes sorted by (in order):
sort_priority (ascending) - Best quality first
coordinates_chr_num (ascending) - Chromosome number
gene_symbol (ascending) - Alphabetical for deterministic ordering
Elasticsearch Query:
sort=[
{"sort_priority": {"order": "asc"}},
{"coordinates_chr_num.keyword": {"order": "asc"}},
{"gene_symbol.keyword": {"order": "asc"}}
]
Benefits
- Highlights high-quality, well-annotated genes
- Consistent quality ranking across entire dataset
- Deterministic ordering via chromosome and gene symbol
Discussion
Should genes with lower priority (10, 20, 30, 50) be sorted:
- At the bottom of the entire list (current implementation) - All priority 1 genes first (sorted by chromosome then gene_symbol), then all priority 10+ genes (sorted by chromosome then gene_symbol)
- At the bottom of each chromosome - Within each chromosome, priority 1 genes first (sorted by gene_symbol), then priority 10+ genes for that chromosome (sorted by gene_symbol)
- At the bottom of each gene_symbol letter - Within each letter group (A*, B*, C*, etc.), priority 1 genes first, then priority 10+ genes for that letter
Testing
Related to #89
Add a
sort_priorityfield to rank genes by annotation quality, prioritizing well-annotated genes with known GO terms over those with unknown terms or unresolved symbols.Problem
Currently, genes are not sorted by annotation quality, making it difficult to:
Solution
Implement a priority system based on:
Priority Levels
named_gene: falseSorting Order
Genes sorted by (in order):
sort_priority(ascending) - Best quality firstcoordinates_chr_num(ascending) - Chromosome numbergene_symbol(ascending) - Alphabetical for deterministic orderingElasticsearch Query:
Benefits
Discussion
Should genes with lower priority (10, 20, 30, 50) be sorted:
Testing
sort_priorityfield