Skip to content

Add datasets to test scaling for ephys, ophys, and icephys #159

@oruebel

Description

@oruebel

Status File Available at:

Share local files

  • Files available at: /pscratch/sd/o/oruebel/benchmark_hdf5_scaling
  • Create the files locally:
    • ecephys HDF5 files
    • ecephys Zarr files
    • ophys HDF5 files
    • ophys Zarr files
    • icephys HDF5 files
    • icephys Zarr files
  • Upload files to DANDI
    • ecephys HDF5 files
    • ecephys Zarr files
    • ophys HDF5 files
    • ophys Zarr files
    • icephys HDF5 files
    • icephys Zarr files
  • Update benchmark configurations

Data Description

ephys_scaling

The original file is the full scale and then we reduce the number of timesteps in the main "acquistion/ElectricalSeries.data" dataset to test scaling with increasing number of chunks. All files use the same chunking with chunk_shape=(262144, 32).

Script files:

  • scaling_data_ecephys_h5.py : Python script to compute a single ephys file from the source file
  • scaling_data_ephys.sh : Slurm job for constructing the 3 additional HDF5 files
  • convert_ecephys_h5_to_zarr.py : Python script to convert the HDF5 files to zarr
  • convert_ecephys_h5_to_zarr.sh : Slrum job to convert the HDF5 files to Zarr
  • slurm-43385949.out: Output from running the script to generate the file
  • slurm-43515411.out: Output from running the conversion to Zarr job

New files:

  • sub-npI3_ses-20190421_behavior+ecephys_rechunk_numchunks_334_12.nwb
    • Shape: (87556096, 384)
    • #chunk: (334, 12
    • filesize: 40B
  • sub-npI3_ses-20190421_behavior+ecephys_rechunk_numchunks_668_12.nwb
    • Shape: (175112192, 384)
    • #chunk: (668, 12)
    • filesize: 70GB
  • sub-npI3_ses-20190421_behavior+ecephys_rechunk_numchunks_1002_12.nwb
    • Shape: (262668288, 384)
    • #chunk: (1002, 12)
    • filesize: 100GB

Original file:

  • sub-npI3_ses-20190421_behavior+ecephys_rechunk.nwb
    • Shape: (349975807, 384)
    • #chunk: (1336, 12)
    • filesize = 130GB

ophys_scaling

The original file is the full scale and then we reduce the number of timesteps in the main "acquistion/TwoPhotonSeries.data" dataset to test scaling with increasing number of chunks. All files use the same chunking with chunk_shape=(20, 796, 512).

Script files:

  • scaling_data_ophys_h5.py : Python script to compute a single ephys file from the source file
  • scaling_data_ophys.sh : Slurm job for constructing the 3 additional HDF5 files
  • slurm-43385953.out: Output from running the script to generate the file

New files:

  • sub-R6_ses-20200206T210000_behavior+ophys_rechunk_numchunks_521_1_1.nwb
    • Shape:((10420, 796, 512)
    • #chunk: (521, 1 , 1)
    • filesize: 9.4GB
  • sub-R6_ses-20200206T210000_behavior+ophys_rechunk_numchunks_1042_1_1.nwb
    • Shape: (20840, 796, 512)
    • #chunk: (1042, 1, 1)
    • filesize: 18GB
  • sub-R6_ses-20200206T210000_behavior+ophys_rechunk_numchunks_1563_1_1.nwb
    • Shape: (31260, 796, 512)
    • #chunk: (1563, 1, 1)
    • filesize: 25GB

Original file:

  • sub-R6_ses-20200206T210000_behavior+ophys_rechunk.nwb
    • Shape: (41673, 796, 512)
    • #chunk: (2084, 1, 1)
    • filesize: 33GB

icephys_scaling

The original file is not used here, but instead we creat 4 new files. The design is to increase the number of Stimulus/Response TimeSeries pairs. The original test file uses the deprecated SweepTable and the new files use the IntracelluarRecordingsTable instead. A key difference is that IntracellularRecordingTable stores references to the TimeSeries so if references are not read lazily, then that could also affect scaling here.

Script files:

  • scaling_data_icephys.py : Python script to compute a single ephys file from the source file

New files:

  • icephys_scaling_50_pairs.nwb and .zarr
    • Number of Groups and Datasets (not including Attributes): 662
    • #TimeSeries: 50 * 2 = 100
  • icephys_scaling_100_pairs.nwb and .zarr
    • Number of Groups and Datasets (not including Attributes): 1262
    • #TimeSeries: 100 * 2 = 200
  • icephys_scaling_150_pairs.nwb and .zarr
    • Number of Groups and Datasets (not including Attributes): 1862
    • #TimeSeries= 150 * 2 = 300
  • icephys_scaling_200_pairs.nwb and .zarr
    • Number of Groups and Datasets (not including Attributes): 2462
    • #TimeSeries = 200 * 2 = 400

Current icephys test file (not used here):

  • Filename: sub-1214579789_ses-1214621812_icephys.nwb
  • Number of Groups and Datasets (not including Attributes): 2462
  • #Stimulus/Response TimeSeries: 139 * 2 = 278

Metadata

Metadata

Assignees

No one assigned

    Labels

    papertasks required for benchmarks paper submission

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions