Add datasets to test scaling for ephys, ophys, and icephys

## Status File Available at:

**Share local files**
- [X] Files available at: `/pscratch/sd/o/oruebel/benchmark_hdf5_scaling`
- **Create the files locally:**
   - [X] ecephys HDF5 files
   - [x] ecephys Zarr files
   - [X] ophys HDF5 files
   - [ ] ophys Zarr files
   - [X] icephys HDF5 files
   - [X] icephys Zarr files
- **Upload files to DANDI**
   - [ ] ecephys HDF5 files
   - [ ] ecephys Zarr files
   - [ ] ophys HDF5 files
   - [ ] ophys Zarr files
   - [ ] icephys HDF5 files
   - [ ] icephys Zarr files
- [ ] Update benchmark configurations
  

# Data Description
## ephys_scaling

The original file is the full scale and then we reduce the number of timesteps in the main "acquistion/ElectricalSeries.data" dataset to test scaling with increasing number of chunks. All files use the same chunking with chunk_shape=(262144, 32).

**Script files:**
- `scaling_data_ecephys_h5.py` : Python script to compute a single ephys file from the source file
- `scaling_data_ephys.sh` : Slurm job for constructing the 3 additional HDF5 files
- `convert_ecephys_h5_to_zarr.py` : Python script to convert the HDF5 files to zarr
- `convert_ecephys_h5_to_zarr.sh` : Slrum job to convert the HDF5 files to Zarr
- `slurm-43385949.out`: Output from running the script to generate the file
- `slurm-43515411.out`: Output from running the conversion to Zarr job

**New files:**
- `sub-npI3_ses-20190421_behavior+ecephys_rechunk_numchunks_334_12.nwb`
   - Shape: (87556096, 384)
   - #chunk: (334, 12
   - filesize: 40B
- `sub-npI3_ses-20190421_behavior+ecephys_rechunk_numchunks_668_12.nwb`
   - Shape: (175112192, 384)
   - #chunk: (668, 12)
   - filesize: 70GB
- `sub-npI3_ses-20190421_behavior+ecephys_rechunk_numchunks_1002_12.nwb`
   - Shape: (262668288, 384)
   - #chunk: (1002, 12)
   - filesize: 100GB

**Original file:**
- `sub-npI3_ses-20190421_behavior+ecephys_rechunk.nwb`
   - Shape: (349975807, 384)
   - #chunk: (1336, 12)
   - filesize = 130GB


## ophys_scaling

The original file is the full scale and then we reduce the number of timesteps in the main "acquistion/TwoPhotonSeries.data" dataset to test scaling with increasing number of chunks. All files use the same chunking with chunk_shape=(20, 796, 512).

**Script files:**
- scaling_data_ophys_h5.py : Python script to compute a single ephys file from the source file
- scaling_data_ophys.sh : Slurm job for constructing the 3 additional HDF5 files
- slurm-43385953.out: Output from running the script to generate the file

**New files:**
- `sub-R6_ses-20200206T210000_behavior+ophys_rechunk_numchunks_521_1_1.nwb`
   - Shape:((10420, 796, 512)
   - #chunk: (521, 1 , 1)
   - filesize: 9.4GB
- `sub-R6_ses-20200206T210000_behavior+ophys_rechunk_numchunks_1042_1_1.nwb`
   - Shape: (20840, 796, 512)
   - #chunk: (1042, 1, 1)
   - filesize: 18GB
- `sub-R6_ses-20200206T210000_behavior+ophys_rechunk_numchunks_1563_1_1.nwb`
   - Shape: (31260, 796, 512)
   - #chunk: (1563, 1, 1)
   - filesize: 25GB

Original file:
- `sub-R6_ses-20200206T210000_behavior+ophys_rechunk.nwb`
   - Shape: (41673, 796, 512)
   - #chunk: (2084, 1, 1)
   - filesize: 33GB



## icephys_scaling

The original file is not used here, but instead we creat 4 new files. The design is to increase the number of Stimulus/Response TimeSeries pairs. The original test file uses the deprecated SweepTable and the new files use the IntracelluarRecordingsTable instead. A key difference is that IntracellularRecordingTable stores references to the TimeSeries so if references are not read lazily, then that could also affect scaling here.

**Script files:**
- scaling_data_icephys.py : Python script to compute a single ephys file from the source file

**New files:**
- `icephys_scaling_50_pairs.nwb` and `.zarr`
   - Number of Groups and Datasets (not including Attributes): 662
   - #TimeSeries: 50 * 2 = 100
- `icephys_scaling_100_pairs.nwb `and .zarr
   - Number of Groups and Datasets (not including Attributes): 1262
   - #TimeSeries: 100 * 2 = 200
- `icephys_scaling_150_pairs.nwb` and `.zarr`
   - Number of Groups and Datasets (not including Attributes): 1862
   - #TimeSeries= 150 * 2 = 300
- `icephys_scaling_200_pairs.nwb` and `.zarr`
   - Number of Groups and Datasets (not including Attributes): 2462
   - #TimeSeries = 200 * 2 = 400

**Current icephys test file (not used here):**
   - Filename: `sub-1214579789_ses-1214621812_icephys.nwb`
   - Number of Groups and Datasets (not including Attributes): 2462
   - #Stimulus/Response TimeSeries: 139 * 2 = 278



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add datasets to test scaling for ephys, ophys, and icephys #159

Status File Available at:

Data Description

ephys_scaling

ophys_scaling

icephys_scaling

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add datasets to test scaling for ephys, ophys, and icephys #159

Description

Status File Available at:

Data Description

ephys_scaling

ophys_scaling

icephys_scaling

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions