Skip to content

Adding input study descriptions as of 2025-12-9. These have 1 row per…#39

Open
cgmeyer wants to merge 3 commits intomainfrom
Gen3_input_BIH_20251209
Open

Adding input study descriptions as of 2025-12-9. These have 1 row per…#39
cgmeyer wants to merge 3 commits intomainfrom
Gen3_input_BIH_20251209

Conversation

@cgmeyer
Copy link
Copy Markdown
Collaborator

@cgmeyer cgmeyer commented Dec 9, 2025

Per the recent request by TDP3b, this input dataframe has 1 row per unique StudyDescription, with modalities per StudyDescription listed. Also listed per StudyDescription are project_id and platform for reference.
This is different from previous input file(s) which had 1 row per unique combination of StudyDescription / Modality.

Christopher Meyer added 2 commits December 9, 2025 16:47
… StudyDescription with modalities per StudyDescription listed.
… StudyDescription with modalities per StudyDescription listed.
@cgmeyer
Copy link
Copy Markdown
Collaborator Author

cgmeyer commented Dec 9, 2025

Second commit is to add the platform list per row.

Comment thread in/BIH_StudyDescriptions_Gen3.tsv Outdated
@@ -0,0 +1,5135 @@
StudyDescription Modality project_id platform frequency
XR Chest AP or PA {'CR', 'PR', 'DX', 'XR', 'CT', 'MR', 'CR,DX', nan} {'NIHCC-CXR8', 'TCIA-covid-19-ar', 'IDC-IDC_acrin_nsclc_fdg_pet', 'TCIA-cmb-lca', 'MIDRC-Open-R1', 'MIDRC-Open-A1_PETAL_BLUECORAL', 'IDC-IDC_cmb_crc', 'MIDRC-Open-A1_PETAL_REDCORAL', 'MIDRC-TCIA-COVID-19-AR', 'MIDRC-Open-A1_SCCM_VIRUS', 'TCIA-cmb-mml', 'TCIA-midrc-ricord-1c', 'MIDRC-TCIA-RICORD_1c', 'TCIA-cmb-crc', 'TCIA-cptac-ccrcc', 'IDC-IDC_midrc_ricord_1c', 'IDC-IDC_cmb_mml', 'IDC-IDC_cmb_lca', 'TCIA-pseudo-phi-dicom-data', 'IDC-IDC_cptac_ccrcc', 'MIDRC-Open-A1', 'TCIA-acrin-nsclc-fdg-pet', 'AIMI-CheXpertPlus', 'IDC-IDC_pseudo_phi_dicom_data', 'IDC-IDC_covid_19_ar'} {'IDC', 'TCIA', 'AIMI', 'NIHCC', 'MIDRC'} 428198
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cgmeyer do we need project_id (or collection_id, in the nomenclature used by IDC and TCIA)? I don't recall discussing this, but perhaps I forgot?

Comment thread in/BIH_StudyDescriptions_Gen3.tsv Outdated
CHEST AP PORT {'CR', 'DX', nan} {'MIDRC-TCIA-COVID-19-NY-SBU', 'IDC-IDC_covid_19_ny_sbu', 'TCIA-covid-19-ny-sbu'} {'IDC', 'TCIA', 'MIDRC'} 12732
CHEST PORT 1 VIEW (RAD)-CS {'CR', 'CR,DX', 'DX'} {'MIDRC-Open-A1'} {'MIDRC'} 7341
MAMMO screening digital bilateral {'MG', nan} {'TCIA-breast-cancer-screening-dbt', 'IDC-IDC_breast_cancer_screening_dbt'} {'IDC', 'TCIA'} 6752
CHEST AP VIEWONLY {'CR', 'DX', nan} {'MIDRC-TCIA-COVID-19-NY-SBU', 'IDC-IDC_covid_19_ny_sbu', 'TCIA-covid-19-ny-sbu'} {'IDC', 'TCIA', 'MIDRC'} 5297
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the meaning of nan in Modality?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some repositories do not specify a Modality for studies. I did not assign one, so they get "nan", which means there is no value.

Comment thread in/BIH_StudyDescriptions_Gen3.tsv Outdated
@@ -0,0 +1,5135 @@
StudyDescription Modality project_id platform frequency
XR Chest AP or PA {'CR', 'PR', 'DX', 'XR', 'CT', 'MR', 'CR,DX', nan} {'NIHCC-CXR8', 'TCIA-covid-19-ar', 'IDC-IDC_acrin_nsclc_fdg_pet', 'TCIA-cmb-lca', 'MIDRC-Open-R1', 'MIDRC-Open-A1_PETAL_BLUECORAL', 'IDC-IDC_cmb_crc', 'MIDRC-Open-A1_PETAL_REDCORAL', 'MIDRC-TCIA-COVID-19-AR', 'MIDRC-Open-A1_SCCM_VIRUS', 'TCIA-cmb-mml', 'TCIA-midrc-ricord-1c', 'MIDRC-TCIA-RICORD_1c', 'TCIA-cmb-crc', 'TCIA-cptac-ccrcc', 'IDC-IDC_midrc_ricord_1c', 'IDC-IDC_cmb_mml', 'IDC-IDC_cmb_lca', 'TCIA-pseudo-phi-dicom-data', 'IDC-IDC_cptac_ccrcc', 'MIDRC-Open-A1', 'TCIA-acrin-nsclc-fdg-pet', 'AIMI-CheXpertPlus', 'IDC-IDC_pseudo_phi_dicom_data', 'IDC-IDC_covid_19_ar'} {'IDC', 'TCIA', 'AIMI', 'NIHCC', 'MIDRC'} 428198
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You use a different set of conventions compared to what we have currently. When we have multiple modalities, they should appear without quotes, separated by the commas. I would expect the same conventions used for the platform. Is there a good reason to revisit the existing convention?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the file

… StudyDescription with modalities per StudyDescription listed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants