Adding input study descriptions as of 2025-12-9. These have 1 row per…#39
Adding input study descriptions as of 2025-12-9. These have 1 row per…#39
Conversation
… StudyDescription with modalities per StudyDescription listed.
… StudyDescription with modalities per StudyDescription listed.
|
Second commit is to add the platform list per row. |
| @@ -0,0 +1,5135 @@ | |||
| StudyDescription Modality project_id platform frequency | |||
| XR Chest AP or PA {'CR', 'PR', 'DX', 'XR', 'CT', 'MR', 'CR,DX', nan} {'NIHCC-CXR8', 'TCIA-covid-19-ar', 'IDC-IDC_acrin_nsclc_fdg_pet', 'TCIA-cmb-lca', 'MIDRC-Open-R1', 'MIDRC-Open-A1_PETAL_BLUECORAL', 'IDC-IDC_cmb_crc', 'MIDRC-Open-A1_PETAL_REDCORAL', 'MIDRC-TCIA-COVID-19-AR', 'MIDRC-Open-A1_SCCM_VIRUS', 'TCIA-cmb-mml', 'TCIA-midrc-ricord-1c', 'MIDRC-TCIA-RICORD_1c', 'TCIA-cmb-crc', 'TCIA-cptac-ccrcc', 'IDC-IDC_midrc_ricord_1c', 'IDC-IDC_cmb_mml', 'IDC-IDC_cmb_lca', 'TCIA-pseudo-phi-dicom-data', 'IDC-IDC_cptac_ccrcc', 'MIDRC-Open-A1', 'TCIA-acrin-nsclc-fdg-pet', 'AIMI-CheXpertPlus', 'IDC-IDC_pseudo_phi_dicom_data', 'IDC-IDC_covid_19_ar'} {'IDC', 'TCIA', 'AIMI', 'NIHCC', 'MIDRC'} 428198 | |||
There was a problem hiding this comment.
@cgmeyer do we need project_id (or collection_id, in the nomenclature used by IDC and TCIA)? I don't recall discussing this, but perhaps I forgot?
| CHEST AP PORT {'CR', 'DX', nan} {'MIDRC-TCIA-COVID-19-NY-SBU', 'IDC-IDC_covid_19_ny_sbu', 'TCIA-covid-19-ny-sbu'} {'IDC', 'TCIA', 'MIDRC'} 12732 | ||
| CHEST PORT 1 VIEW (RAD)-CS {'CR', 'CR,DX', 'DX'} {'MIDRC-Open-A1'} {'MIDRC'} 7341 | ||
| MAMMO screening digital bilateral {'MG', nan} {'TCIA-breast-cancer-screening-dbt', 'IDC-IDC_breast_cancer_screening_dbt'} {'IDC', 'TCIA'} 6752 | ||
| CHEST AP VIEWONLY {'CR', 'DX', nan} {'MIDRC-TCIA-COVID-19-NY-SBU', 'IDC-IDC_covid_19_ny_sbu', 'TCIA-covid-19-ny-sbu'} {'IDC', 'TCIA', 'MIDRC'} 5297 |
There was a problem hiding this comment.
What is the meaning of nan in Modality?
There was a problem hiding this comment.
Some repositories do not specify a Modality for studies. I did not assign one, so they get "nan", which means there is no value.
| @@ -0,0 +1,5135 @@ | |||
| StudyDescription Modality project_id platform frequency | |||
| XR Chest AP or PA {'CR', 'PR', 'DX', 'XR', 'CT', 'MR', 'CR,DX', nan} {'NIHCC-CXR8', 'TCIA-covid-19-ar', 'IDC-IDC_acrin_nsclc_fdg_pet', 'TCIA-cmb-lca', 'MIDRC-Open-R1', 'MIDRC-Open-A1_PETAL_BLUECORAL', 'IDC-IDC_cmb_crc', 'MIDRC-Open-A1_PETAL_REDCORAL', 'MIDRC-TCIA-COVID-19-AR', 'MIDRC-Open-A1_SCCM_VIRUS', 'TCIA-cmb-mml', 'TCIA-midrc-ricord-1c', 'MIDRC-TCIA-RICORD_1c', 'TCIA-cmb-crc', 'TCIA-cptac-ccrcc', 'IDC-IDC_midrc_ricord_1c', 'IDC-IDC_cmb_mml', 'IDC-IDC_cmb_lca', 'TCIA-pseudo-phi-dicom-data', 'IDC-IDC_cptac_ccrcc', 'MIDRC-Open-A1', 'TCIA-acrin-nsclc-fdg-pet', 'AIMI-CheXpertPlus', 'IDC-IDC_pseudo_phi_dicom_data', 'IDC-IDC_covid_19_ar'} {'IDC', 'TCIA', 'AIMI', 'NIHCC', 'MIDRC'} 428198 | |||
There was a problem hiding this comment.
You use a different set of conventions compared to what we have currently. When we have multiple modalities, they should appear without quotes, separated by the commas. I would expect the same conventions used for the platform. Is there a good reason to revisit the existing convention?
There was a problem hiding this comment.
I've updated the file
… StudyDescription with modalities per StudyDescription listed.
Per the recent request by TDP3b, this input dataframe has 1 row per unique StudyDescription, with modalities per StudyDescription listed. Also listed per StudyDescription are project_id and platform for reference.
This is different from previous input file(s) which had 1 row per unique combination of StudyDescription / Modality.