When you concat multiple datasets together, you get a deeply nested structure.
Something like this:
concat_dataset/
├─ concat_dataset/
│ ├─ concat_dataset/
│ │ ├─ concat_dataset/
│ │ │ ├─ dataset_0
│ │ │ ├─ dataset_1
│ │ ├─ dataset_2
│ ├─ dataset_3
├─ dataset_4
This is inefficient when instead we could flatten nested concat datasets into:
concat_dataset/
├─ dataset_0
├─ dataset_1
├─ dataset_2
├─ dataset_3
├─ dataset_4
I don't have numbers on the actual performance implications, but it will become significant if a user is doing many splits and concats.
When you concat multiple datasets together, you get a deeply nested structure.
Something like this:
This is inefficient when instead we could flatten nested concat datasets into:
I don't have numbers on the actual performance implications, but it will become significant if a user is doing many splits and concats.