Renders large parts of the analysis I have done on usage of out-of-date codelists unreliable.
Provides inaccurate information on appropriate codelist use to researchers.
In [47]: original_hierarchy_data
Out[47]:
{'root': '138875005',
'nodes': ['73211009',
'362965005',
'170745003',
'164971000119101',
'64572001',
'404684003',
'170763003',
'308537004',
'441742003',
'24481000000101',
'75934005',
'243860001',
'44054006',
'126877002',
'362969004',
'307824009',
'444110003',
'127325009',
'243851009',
'138875005',
'20957000',
'243815002'],
'child_map': {'307824009': ['243815002'],
'362965005': ['362969004'],
'404684003': ['307824009', '127325009', '64572001'],
'441742003': ['170763003'],
'126877002': ['73211009'],
'170763003': ['444110003'],
'64572001': ['362965005', '75934005'],
'138875005': ['404684003'],
'243851009': ['243860001'],
'170745003': ['24481000000101'],
'243815002': ['308537004'],
'362969004': ['73211009'],
'20957000': ['126877002'],
'308537004': ['243851009'],
'127325009': ['441742003'],
'444110003': ['164971000119101'],
'44054006': ['444110003'],
'75934005': ['20957000'],
'73211009': ['44054006'],
'243860001': ['170745003']},
'parent_map': {'243815002': ['307824009'],
'362969004': ['362965005'],
'127325009': ['404684003'],
'170763003': ['441742003'],
'73211009': ['362969004', '126877002'],
'444110003': ['44054006', '170763003'],
'75934005': ['64572001'],
'307824009': ['404684003'],
'404684003': ['138875005'],
'243860001': ['243851009'],
'24481000000101': ['170745003'],
'308537004': ['243815002'],
'64572001': ['404684003'],
'126877002': ['20957000'],
'243851009': ['308537004'],
'441742003': ['127325009'],
'164971000119101': ['444110003'],
'20957000': ['75934005'],
'44054006': ['73211009'],
'362965005': ['64572001'],
'170745003': ['243860001']}}
In [48]: new_hierarchy_data
Out[48]:
{'root': '138875005',
'nodes': ['64572001',
'308537004',
'307824009',
'75934005',
'170763003',
'20957000',
'24481000000101',
'126877002',
'444110003',
'362965005',
'362969004',
'73211009',
'138875005',
'170745003',
'243815002',
'44054006',
'243860001',
'164971000119101',
'243851009',
'127325009',
'404684003',
'441742003'],
'child_map': {'444110003': ['164971000119101'],
'126877002': ['73211009'],
'404684003': ['307824009', '127325009', '64572001'],
'307824009': ['243815002'],
'64572001': ['75934005', '362965005'],
'127325009': ['441742003'],
'243851009': ['243860001'],
'138875005': ['404684003'],
'308537004': ['243851009'],
'170763003': ['444110003'],
'75934005': ['20957000'],
'362965005': ['362969004'],
'362969004': ['73211009'],
'441742003': ['170763003'],
'73211009': ['44054006'],
'20957000': ['126877002'],
'243815002': ['308537004'],
'44054006': ['444110003'],
'170745003': ['24481000000101'],
'243860001': ['170745003']},
'parent_map': {'164971000119101': ['444110003'],
'73211009': ['362969004', '126877002'],
'127325009': ['404684003'],
'243815002': ['307824009'],
'75934005': ['64572001'],
'441742003': ['127325009'],
'243860001': ['243851009'],
'404684003': ['138875005'],
'243851009': ['308537004'],
'307824009': ['404684003'],
'444110003': ['44054006', '170763003'],
'20957000': ['75934005'],
'64572001': ['404684003'],
'362969004': ['362965005'],
'170763003': ['441742003'],
'44054006': ['73211009'],
'126877002': ['20957000'],
'308537004': ['243815002'],
'362965005': ['64572001'],
'24481000000101': ['170745003'],
'170745003': ['243860001']}}
In [50]: original_hierarchy_data["nodes"] == new_hierarchy_data["nodes"]
Out[50]: False
In [51]: set(original_hierarchy_data["nodes"]) == set(new_hierarchy_data["nodes"])
Out[51]: True
In [55]: set(original_hierarchy_data["child_map"].keys()).difference(set(new_hierarchy_data["child_map"].keys()))
Out[55]: set()
In [56]: set(original_hierarchy_data["parent_map"].keys()).difference(set(new_hierarchy_data["parent_map"].keys()))
Out[56]: set()
In [62]: [(k,v) for k,v in original_hierarchy_data["child_map"].items() if v != new_hierarchy_data["child_map"][k]]
Out[62]: [('64572001', ['362965005', '75934005'])]
In [64]: new_hierarchy_data["child_map"]["64572001"]
Out[64]: ['75934005', '362965005']
Equally, the keys for the parent and child maps are the same, and where there are differing values, the codes within those values are the same just in a different order.
Summary
update_codelist_version_compatibilityruns when a new release of a coding system is imported.This updates the
compatible_versionsproperty of relevantCodelistVersions.As part of its compatibility checking, it compares the
Hierarchyof a CodelistVersion within the previousCodingSystemReleaseto the incoming release using_check_version_by_hierarchy.This function compares parts of the hierarchies in an order-sensitive manner when an order-insensitive approach is more appropriate, leading to false assertions of incompatibility of a
CodelistVersionwith aCodingSystemRelease.There are now several codelist versions presented on OpenCodelists with inaccurate coding system release compatibility information, which misleads researchers as to their suitability for use.
What did you do and what did you expect to happen?
Impact of the bug
Renders large parts of the analysis I have done on usage of out-of-date codelists unreliable.
Provides inaccurate information on appropriate codelist use to researchers.
Worked Example
Get the most recent SNOMED CT release, and the previous one
Ascertain the versions to have their compatibility checked as per
update_codelist_version_compatibility, isolate above-linked version of interest.Call the version search and hierarchy compatibility inner functions
Manually compare state of original and new hierarchies as generated within
_check_version_by_hierarchyWe can see that the
nodesare the same between these hierarchies, just ordered differently. Since they are compared aslists rather thansetsthe comparison is order-sensitive.Equally, the keys for the parent and child maps are the same, and where there are differing values, the codes within those values are the same just in a different order.