-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Status File Available at:
Share local files
- Files available at:
/pscratch/sd/o/oruebel/benchmark_hdf5_scaling - Create the files locally:
- ecephys HDF5 files
- ecephys Zarr files
- ophys HDF5 files
- ophys Zarr files
- icephys HDF5 files
- icephys Zarr files
- Upload files to DANDI
- ecephys HDF5 files
- ecephys Zarr files
- ophys HDF5 files
- ophys Zarr files
- icephys HDF5 files
- icephys Zarr files
- Update benchmark configurations
Data Description
ephys_scaling
The original file is the full scale and then we reduce the number of timesteps in the main "acquistion/ElectricalSeries.data" dataset to test scaling with increasing number of chunks. All files use the same chunking with chunk_shape=(262144, 32).
Script files:
scaling_data_ecephys_h5.py: Python script to compute a single ephys file from the source filescaling_data_ephys.sh: Slurm job for constructing the 3 additional HDF5 filesconvert_ecephys_h5_to_zarr.py: Python script to convert the HDF5 files to zarrconvert_ecephys_h5_to_zarr.sh: Slrum job to convert the HDF5 files to Zarrslurm-43385949.out: Output from running the script to generate the fileslurm-43515411.out: Output from running the conversion to Zarr job
New files:
sub-npI3_ses-20190421_behavior+ecephys_rechunk_numchunks_334_12.nwb- Shape: (87556096, 384)
- #chunk: (334, 12
- filesize: 40B
sub-npI3_ses-20190421_behavior+ecephys_rechunk_numchunks_668_12.nwb- Shape: (175112192, 384)
- #chunk: (668, 12)
- filesize: 70GB
sub-npI3_ses-20190421_behavior+ecephys_rechunk_numchunks_1002_12.nwb- Shape: (262668288, 384)
- #chunk: (1002, 12)
- filesize: 100GB
Original file:
sub-npI3_ses-20190421_behavior+ecephys_rechunk.nwb- Shape: (349975807, 384)
- #chunk: (1336, 12)
- filesize = 130GB
ophys_scaling
The original file is the full scale and then we reduce the number of timesteps in the main "acquistion/TwoPhotonSeries.data" dataset to test scaling with increasing number of chunks. All files use the same chunking with chunk_shape=(20, 796, 512).
Script files:
- scaling_data_ophys_h5.py : Python script to compute a single ephys file from the source file
- scaling_data_ophys.sh : Slurm job for constructing the 3 additional HDF5 files
- slurm-43385953.out: Output from running the script to generate the file
New files:
sub-R6_ses-20200206T210000_behavior+ophys_rechunk_numchunks_521_1_1.nwb- Shape:((10420, 796, 512)
- #chunk: (521, 1 , 1)
- filesize: 9.4GB
sub-R6_ses-20200206T210000_behavior+ophys_rechunk_numchunks_1042_1_1.nwb- Shape: (20840, 796, 512)
- #chunk: (1042, 1, 1)
- filesize: 18GB
sub-R6_ses-20200206T210000_behavior+ophys_rechunk_numchunks_1563_1_1.nwb- Shape: (31260, 796, 512)
- #chunk: (1563, 1, 1)
- filesize: 25GB
Original file:
sub-R6_ses-20200206T210000_behavior+ophys_rechunk.nwb- Shape: (41673, 796, 512)
- #chunk: (2084, 1, 1)
- filesize: 33GB
icephys_scaling
The original file is not used here, but instead we creat 4 new files. The design is to increase the number of Stimulus/Response TimeSeries pairs. The original test file uses the deprecated SweepTable and the new files use the IntracelluarRecordingsTable instead. A key difference is that IntracellularRecordingTable stores references to the TimeSeries so if references are not read lazily, then that could also affect scaling here.
Script files:
- scaling_data_icephys.py : Python script to compute a single ephys file from the source file
New files:
icephys_scaling_50_pairs.nwband.zarr- Number of Groups and Datasets (not including Attributes): 662
- #TimeSeries: 50 * 2 = 100
icephys_scaling_100_pairs.nwband .zarr- Number of Groups and Datasets (not including Attributes): 1262
- #TimeSeries: 100 * 2 = 200
icephys_scaling_150_pairs.nwband.zarr- Number of Groups and Datasets (not including Attributes): 1862
- #TimeSeries= 150 * 2 = 300
icephys_scaling_200_pairs.nwband.zarr- Number of Groups and Datasets (not including Attributes): 2462
- #TimeSeries = 200 * 2 = 400
Current icephys test file (not used here):
- Filename:
sub-1214579789_ses-1214621812_icephys.nwb - Number of Groups and Datasets (not including Attributes): 2462
- #Stimulus/Response TimeSeries: 139 * 2 = 278