Hello,
Thanks for this useful package!
I have some questions on what exactly is stored in the resulting KEGG.db, and how that relates to the options of clusterProfiler::enrichKEGG.
enrichKEGG has an option keyType, which accepts kegg, ncbi-geneid, ncbi-proteinid or uniprot.
Background/context
I would like to have a solution for doing KEGG enrichment analysis, starting from gene SYMBOL. I want to be able to use the same solution from any arbitrary species.
From this reply YuLab-SMU/clusterProfiler#108 (comment)
KEGG id and ENTREZID are the same for only some of the species, but not always the same.
and this blog post https://guangchuangyu.github.io/2016/05/convert-biological-id-with-kegg-api-using-clusterprofiler/
A rule of thumb for the ‘kegg’ ID is entrezgene ID for eukaryote species and Locus ID for prokaryotes.
I conclude that kegg id are not reliable enough/not sufficiently well described for my use. I would thus prefer to use ncbi-geneid.
However, when opening the sqlite database created through createKEGGdb, I only see a field gene_or_orf_id in table pathway2gene.
Questions:
- what is the
gene_or_orf_id present in the KEGG.db database? Is it a kegg id?
- can I use
createKEGGdb to create a KEGG.db package, and then use it for clusterProfiler::enrichKEGG with keyType = ncbi-geneid (and use_internal_data = TRUE)
Than you in advance for your help,
All the best
Hello,
Thanks for this useful package!
I have some questions on what exactly is stored in the resulting
KEGG.db, and how that relates to the options ofclusterProfiler::enrichKEGG.enrichKEGGhas an optionkeyType, which acceptskegg,ncbi-geneid,ncbi-proteinidoruniprot.Background/context
I would like to have a solution for doing KEGG enrichment analysis, starting from gene SYMBOL. I want to be able to use the same solution from any arbitrary species.
From this reply YuLab-SMU/clusterProfiler#108 (comment)
and this blog post https://guangchuangyu.github.io/2016/05/convert-biological-id-with-kegg-api-using-clusterprofiler/
I conclude that
keggid are not reliable enough/not sufficiently well described for my use. I would thus prefer to usencbi-geneid.However, when opening the
sqlitedatabase created throughcreateKEGGdb, I only see a fieldgene_or_orf_idin tablepathway2gene.Questions:
gene_or_orf_idpresent in theKEGG.dbdatabase? Is it akeggid?createKEGGdbto create aKEGG.dbpackage, and then use it forclusterProfiler::enrichKEGGwithkeyType = ncbi-geneid(anduse_internal_data = TRUE)Than you in advance for your help,
All the best