Analysis of composition_kg_mapping_with_oak_chebi.tsv to identify hydrated compounds that are mapped to "ingredient:" codes instead of proper CHEBI IDs.
The main hydrated compound consistently mapped to an ingredient code is:
MnCl2 x 4 H2O → ingredient:2003
From the sections analyzed (approximately 4,200 lines out of ~40,000+ total lines), the following instances were identified:
- Line 521: Medium 1590 -
MnCl2 x 4 H2O→ingredient:2003(0.00181 g/L) - Line 573: Medium 846 -
MnCl2 x 4 H2O→ingredient:2003(9.97009e-05 g/L) - Line 1064: Medium 486 -
MnCl2 x 4 H2O→ingredient:2003(0.00248509 g/L) - Line 1091: Medium 663 -
MnCl2 x 4 H2O→ingredient:2003(0.000496524 g/L) - Line 1124: Medium 265 -
MnCl2 x 4 H2O→ingredient:2003(0.0018 g/L) - Line 1148: Medium 40 -
MnCl2 x 4 H2O→ingredient:2003(0.000104 g/L) - Line 1167: Medium 559 -
MnCl2 x 4 H2O→ingredient:2003(9.96016e-05 g/L) - Line 2005: Medium 873 -
MnCl2 x 4 H2O→ingredient:2003(9.98004e-05 g/L) - Line 2085: Medium 1299 -
MnCl2 x 4 H2O→ingredient:2003(9.98004e-05 g/L) - Line 2120: Medium 992 -
MnCl2 x 4 H2O→ingredient:2003(0.00036 g/L) - Line 2134: Medium 144b -
MnCl2 x 4 H2O→ingredient:2003(0.001 g/L) - Line 2169: Medium 1647 -
MnCl2 x 4 H2O→ingredient:2003(0.0001 g/L) - Line 3060: Medium 1101 -
MnCl2 x 4 H2O→ingredient:2003(2.99103e-05 g/L) - Line 3149: Medium 1539 -
MnCl2 x 4 H2O→ingredient:2003(0.00098912 g/L) - Line 3191: Medium 457d -
MnCl2 x 4 H2O→ingredient:2003(3e-05 g/L) - Line 4013: Medium 1639 -
MnCl2 x 4 H2O→ingredient:2003(0.0001 g/L) - Line 4055: Medium 1002 -
MnCl2 x 4 H2O→ingredient:2003(3e-05 g/L) - Line 4080: Medium 778a -
MnCl2 x 4 H2O→ingredient:2003(9.97009e-05 g/L) - Line 4105: Medium 1160 -
MnCl2 x 4 H2O→ingredient:2003(0.00036 g/L) - Line 4113: Medium 856 -
MnCl2 x 4 H2O→ingredient:2003(0.001 g/L) - Line 4196: Medium 1487 -
MnCl2 x 4 H2O→ingredient:2003(0.00098912 g/L)
- Line 4164: Medium 1282 contains
MnCl2 x 2 H2O→CAS-RN:20603-88-7(different hydrate form mapped to a CAS registry number rather than an ingredient code)
The file contains many examples of hydrated compounds that ARE properly mapped to CHEBI IDs, including:
MgCl2 x 6 H2O→CHEBI:6636CaCl2 x 2 H2O→CHEBI:91243FeSO4 x 7 H2O→CHEBI:75836ZnSO4 x 7 H2O→CHEBI:35176Na2S x 9 H2O→CHEBI:76209L-Cysteine HCl x H2O→CHEBI:17561MgSO4 x 7 H2O→CHEBI:32599CoCl2 x 6 H2O→CHEBI:35696NiCl2 x 6 H2O→CHEBI:34887Na2MoO4 x 2 H2O→CHEBI:86473
- From analyzed sections (~10% of file): 21 instances of
MnCl2 x 4 H2Omapped toingredient:2003 - Affected media: 21 different growth media
- Single ingredient code: All instances map to the same
ingredient:2003 - Concentration range: From 2.99103e-05 g/L to 0.00248509 g/L
-
Create CHEBI mapping:
MnCl2 x 4 H2O(Manganese(II) chloride tetrahydrate) should be mapped to a proper CHEBI ID instead ofingredient:2003 -
Verify completeness: Based on this sample, there are likely more instances throughout the full file (estimated 100+ total occurrences)
-
Check related compounds: Review other manganese compounds and hydrated metal chlorides for similar mapping issues
-
Quality control: Implement validation to ensure hydrated compounds get proper chemical ontology mappings rather than generic ingredient codes
This mapping issue affects the semantic consistency of the knowledge graph, as MnCl2 x 4 H2O should be linked to chemical ontology terms rather than generic ingredient identifiers for proper chemical reasoning and integration with other chemical databases.