Skip to content

RF "files.py" validation/metadata-loading to support BIDS #1044

@yarikoptic

Description

@yarikoptic

#1011 is adding basic metadata extraction for BIDS datasets. Introduced in #1011 is more of a "hack" than proper addition of support for BIDS datasets. It is 'ad-hoc' in part due to the clear separation of "asset types" in https://github.com/dandi/dandi-cli/blob/HEAD/dandi/files.py e.g. to NWBAsset (with custom metadata extraction and validation) vs VideoAsset (nothing special ATM) to GenericAsset (really nothing special ;) ). With introduction of support of BIDS datasets it gets tricky:

  • we need to upload pretty much every file (not just .nwb) if it is found to be a BIDS dataset
    • we decide if it is a BIDS dataset if there is dataset_description.json with BIDSVersion in it
  • we might have "super-BIDS datasets" like https://dandiarchive.org/dandiset/000026/draft/files?location= where we have following hierarchy within a dandiset
derivatives/<some subdatasets some of which are BIDS>/
rawdata/ - BIDS dataset

so a dandiset can contain multiple BIDS (sb)datasets

  • There is multiple "files" from which metadata could be loaded from. Below I outline 3 possible ways, but most likely we would offload both to 1 - file format specific (nwb and its .overwrite.json) + 2 - BIDS specific (using BIDS library), with BIDS overloading what prior one provided. But here are the details
    • metadata-precedence-1: metadata will/can come from filename in addition to being extracted from the data file. .nwb files are legit within BIDS datasets, so NWBAsset by itself is not describing entirety of the case. And for NWBAsset belonging to BIDS dataset we would want filename based metadata overload what is in the file.
    • metadata-precedence-2 NWB folks are working on introducing overlays support (WiP, not yet finalized). So for sub-1_slice-1.nwb it would likely to come from sub-1_slice-1.overwrite.json (if present).
    • metadata-precedence-3: metadata can come from BIDS sidecar file, e.g for sub-1_slice-1.nwb it could come from sub-1_slice-1.json
  • note: it will be up for validator to complain whenever there is incongruence between different sources of metadata

@jwodder -- how files.py and anything else needed should be refactored so we support such multiple sources of metadata: file format based + BIDS

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions