Target Use Case
Simplify the implementation of Arrow layers by requiring RecordBatch input, not Table input.
Why:
-
This maps to the existing data structures supported by the deck.gl binary attributes API.
-
It's more reliable for the end user, as they know that a single arrow layer will always create one underlying deck.gl layer.
-
Remove need for internal rechunking code.
Multiple arrow Vectors that have the same overall length can have different chunking structures. E.g. despite column A and column B both having length 30, column A could have two chunks (Data in Arrow JS terminology) of 15 rows each, and column B could have three chunks of 10 rows each. If deck.gl's Arrow support allowed Vector input, then deck.gl would have to manage rechunking the data across input objects.
In Lonboard, I don't currently hit this issue because I pre-process the data in Python, but for JS-facing APIs, I think it would significantly simplify the deck.gl implementation to accept only RecordBatch input, which enforces contiguous buffers. This pushes the responsibility of rechunking onto the user, if necessary. There can be multiple options for rechunking Arrow data, including pure-JS and Wasm compiled options, and the end user can choose the best option for their use case.
Proposal
Right now the arrow layers accept a Table for the main data prop and arrow Vector objects for any accessors. This would change these layers to accept a RecordBatch for the main data prop and arrow contiguous arrays (called Data in the Arrow JS implementation) for any accessors.
Details
No response
Target Use Case
Simplify the implementation of Arrow layers by requiring
RecordBatchinput, notTableinput.Why:
This maps to the existing data structures supported by the deck.gl binary attributes API.
It's more reliable for the end user, as they know that a single arrow layer will always create one underlying deck.gl layer.
Remove need for internal rechunking code.
Multiple arrow
Vectors that have the same overall length can have different chunking structures. E.g. despite column A and column B both having length30, column A could have two chunks (Datain Arrow JS terminology) of15rows each, and column B could have three chunks of10rows each. If deck.gl's Arrow support allowed Vector input, then deck.gl would have to manage rechunking the data across input objects.In Lonboard, I don't currently hit this issue because I pre-process the data in Python, but for JS-facing APIs, I think it would significantly simplify the deck.gl implementation to accept only
RecordBatchinput, which enforces contiguous buffers. This pushes the responsibility of rechunking onto the user, if necessary. There can be multiple options for rechunking Arrow data, including pure-JS and Wasm compiled options, and the end user can choose the best option for their use case.Proposal
Right now the arrow layers accept a
Tablefor the maindataprop and arrowVectorobjects for any accessors. This would change these layers to accept aRecordBatchfor the maindataprop and arrow contiguous arrays (calledDatain the Arrow JS implementation) for any accessors.Details
No response