Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
8c87fe7
add RefSeq missing assembly pipeline
alinakbase Feb 13, 2026
43d0dec
change the output format
alinakbase Feb 13, 2026
b819ccb
Refactor missing RefSeq utility to use distributed Spark text output …
alinakbase Feb 17, 2026
d6f80b9
Add parquet files
alinakbase Feb 18, 2026
41b3d58
EDA
alinakbase Feb 19, 2026
54cb8ef
change
alinakbase Feb 19, 2026
3b3697b
fix error
alinakbase Feb 19, 2026
273060f
compare different datasets
alinakbase Feb 20, 2026
7450175
list normalization prefix
alinakbase Feb 24, 2026
7d6fe7d
add prefix normalization
alinakbase Feb 28, 2026
09694f1
Add two more files from jupyterhub
alinakbase Mar 3, 2026
67cce76
modify registry alignment
alinakbase Mar 3, 2026
fb07714
add datasets
alinakbase Mar 4, 2026
9ed3e60
update uniprot prefixes
alinakbase Mar 5, 2026
8a93eb7
update uniprot prefixes
alinakbase Mar 5, 2026
626d921
organize
alinakbase Mar 6, 2026
091d2fa
organize the prefix files
alinakbase Mar 6, 2026
e7acc84
change format
alinakbase Mar 6, 2026
5238709
final organization
alinakbase Mar 7, 2026
8586be3
revise the prefix scripts
alinakbase Mar 10, 2026
682f323
Merge remote-tracking branch 'origin/develop' into prefix-remapper
alinakbase Mar 10, 2026
4da557c
Tidy up investigation notebook
ialarmedalien Mar 12, 2026
93469ea
Adding saved prefixes file and rerunning queries in uniprot_prefixes …
ialarmedalien Mar 12, 2026
ddeff0c
prefix remapper investigation
alinakbase Mar 12, 2026
9854bc5
uniprot prefix formatting
alinakbase Mar 12, 2026
0710cf4
Running unrun cells in prefix remapper notebook
ialarmedalien Mar 13, 2026
c931b64
update the investigation
alinakbase Mar 13, 2026
810ebe4
merge two scripts
alinakbase Mar 16, 2026
52998da
delete two ipynb
alinakbase Mar 16, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
ABCD
AGR
Agora
Allergome
AlphaFoldDB
AntiFam
Antibodypedia
ArachnoServer
Araport
BMRB
BRENDA
Bgee
BindingDB
BioCyc
BioGRID
BioGRID-ORCS
BioMuta
CARD
CAZy
CCDS
CD-CODE
CDD
CGD
CIViC
CORUM
CPTAC
CPTC
CTD
CarbonylDB
ChEMBL
ChiTaRS
ClinPGx
CollecTF
ComplexPortal
ConoServer
DEPOD
DIP
DMDM
DNASU
DisGeNET
DisProt
DrugBank
DrugCentral
EC
ELM
EMDB
ESTHER
EchoBASE
EnsemblBacteria
EnsemblFungi
EnsemblMetazoa
EnsemblPlants
EnsemblProtists
EvolutionaryTrace
ExpressionAtlas
FlyBase
FunCoup
FunFam
GO
Gene3D
GeneCards
GeneID
GeneReviews
GeneTree
GeneWiki
GenomeRNAi
GlyConnect
GlyCosmos
GlyGen
Gramene
GuidetoPHARMACOLOGY
HAMAP
HGNC
HOGENOM
HPA
IDEAL
IMGT_GENE-DB
InParanoid
IntAct
InterPro
JaponicusDB
KEGG
LegioList
Leproma
MEROPS
MGI
MIM
MINT
MaizeGDB
MalaCards
MassIVE
MetOSite
MoonDB
MoonProt
NCBITaxon
NCBIfam
NIAGADS
OGP
OMA
OpenTargets
Orphanet
OrthoDB
PAN-GO
PANTHER
PATRIC
PCDDB
PDB
PDBsum
PHI-base
PIR
PIRSF
PRIDE
PRINTS
PRO
PROSITE
PathwayCommons
PaxDb
PeptideAtlas
PeroxiBase
Pfam
Pharos
PhosphoSitePlus
PhylomeDB
PlantReactome
PomBase
ProMEX
Proteomes
ProteomicsDB
PseudoCAP
Pumba
REBASE
REPRODUCTION-2DPAGE
RGD
RNAct
Reactome
SABIO-RK
SASBDB
SFLD
SGD
SIGNOR
SMART
SMR
STRENDA-DB
STRING
SUPFAM
SignaLink
SwissLipids
SwissPalm
TAIR
TCDB
TopDownProteomics
TubercuList
UCSC
UniLectin
UniPathway
UniProt
VEuPathDB
VGNC
WBParaSite
WormBase
Xenbase
YCharOS
ZFIN
dictyBase
eggNOG
ensembl
euHCVdb
genbank
iPTMnet
jPOST
refseq
103 changes: 103 additions & 0 deletions notebooks/uniprot_prefix_investigation/data/prefixes.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
Allergome
ArachnoServer
Araport
BioCyc
BioGRID
BioMuta
CCDS
CGD
CPTAC
CRC64
ChEMBL
ChiTaRS
CollecTF
ComplexPortal
ConoServer
DIP
DMDM
DNASU
DisProt
DrugBank
EMBL
EMBL-CDS
EMDB
ESTHER
EchoBASE
Ensembl
EnsemblGenome
EnsemblGenome_PRO
EnsemblGenome_TRS
Ensembl_PRO
Ensembl_TRS
FlyBase
GI
GeneCards
GeneID
GeneReviews
GeneTree
GeneWiki
Gene_Name
Gene_ORFName
Gene_OrderedLocusName
Gene_Synonym
GenomeRNAi
GlyConnect
GuidetoPHARMACOLOGY
HGNC
HOGENOM
IDEAL
JaponicusDB
KEGG
LegioList
Leproma
MEROPS
MGI
MIM
MINT
MaizeGDB
NCBI_TaxID
OMA
OpenTargets
Orphanet
OrthoDB
PATRIC
PDB
PHI-base
PeroxiBase
PharmGKB
PlantReactome
PomBase
ProteomicsDB
PseudoCAP
REBASE
RGD
Reactome
RefSeq
RefSeq_NT
SGD
STRING
SwissLipids
TAIR
TCDB
TreeFam
TubercuList
UCSC
UniParc
UniPathway
UniProtKB-ID
UniRef100
UniRef50
UniRef90
VEuPathDB
VGNC
WBParaSite
WBParaSite_TRS_PRO
WormBase
WormBase_PRO
WormBase_TRS
Xenbase
ZFIN
dictyBase
eggNOG
euHCVdb
neXtProt
Loading
Loading