-
Notifications
You must be signed in to change notification settings - Fork 33
Closed
Labels
Description
#1011 is adding basic metadata extraction for BIDS datasets. Introduced in #1011 is more of a "hack" than proper addition of support for BIDS datasets. It is 'ad-hoc' in part due to the clear separation of "asset types" in https://github.com/dandi/dandi-cli/blob/HEAD/dandi/files.py e.g. to NWBAsset (with custom metadata extraction and validation) vs VideoAsset (nothing special ATM) to GenericAsset (really nothing special ;) ). With introduction of support of BIDS datasets it gets tricky:
- we need to upload pretty much every file (not just .nwb) if it is found to be a BIDS dataset
- we decide if it is a BIDS dataset if there is
dataset_description.jsonwithBIDSVersionin it
- we decide if it is a BIDS dataset if there is
- we might have "super-BIDS datasets" like https://dandiarchive.org/dandiset/000026/draft/files?location= where we have following hierarchy within a dandiset
derivatives/<some subdatasets some of which are BIDS>/
rawdata/ - BIDS dataset
so a dandiset can contain multiple BIDS (sb)datasets
- There is multiple "files" from which metadata could be loaded from. Below I outline 3 possible ways, but most likely we would offload both to 1 - file format specific (nwb and its .overwrite.json) + 2 - BIDS specific (using BIDS library), with BIDS overloading what prior one provided. But here are the details
- metadata-precedence-1: metadata will/can come from filename in addition to being extracted from the data file. .nwb files are legit within BIDS datasets, so NWBAsset by itself is not describing entirety of the case. And for NWBAsset belonging to BIDS dataset we would want filename based metadata overload what is in the file.
- metadata-precedence-2 NWB folks are working on introducing overlays support (WiP, not yet finalized). So for
sub-1_slice-1.nwbit would likely to come fromsub-1_slice-1.overwrite.json(if present). - metadata-precedence-3: metadata can come from BIDS sidecar file, e.g for
sub-1_slice-1.nwbit could come fromsub-1_slice-1.json
- note: it will be up for
validatorto complain whenever there is incongruence between different sources of metadata
@jwodder -- how files.py and anything else needed should be refactored so we support such multiple sources of metadata: file format based + BIDS