Skip to content

CT cache submission_lookup issue - submission values are not unique #1170

@ASL-rmarshall

Description

@ASL-rmarshall

Describe the bug
At the moment, a submission_lookup dictionary is created within each cached CT file. This dictionary uses the CT submission value a key, with values containing codelist code and term code (which is "N/A" for codelists), and is (intended to be) populated with all codelists and all terms.

However, submission values are not unique:

  • A codelist may have the same submission value as a term (e.g., "TTYPE" is the submission value for both codelist "C66739" and term "C49660").
  • Two (or more) terms may have the same submission value (e.g., "0" is the submission value for 53 terms in the SDTM CT 2024-09-27).

[Note that submission values for codelists are unique: no two codelists within the same version of CT for the same standard will have the same submission value]

It appears that the submission_lookup ends up being populated with the last occurrence of a submission value, regardless of whether it's for a codelist or for a term. This means that:

  • There may be no entry in submission_lookup for any codelist whose submission value matches the submission value of a (later) term.
  • There will be only one entry in submission_lookup for any term submission value that is used for multiple terms.

Expected behavior

I think the submission_lookup dictionary needs to be updated so that the value is a list of the codelist code and term code combinations for all occurrences of the submission value key. Any functionality that references submission_lookup should then be updated parse the value list - either to find codelist entries or to process each of the term entries.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions