Skip to content

[Feat] Update Arrow layers to support both RecordBatch and Table input #145

@kylebarron

Description

@kylebarron

Target Use Case

Simplify the implementation of Arrow layers by requiring RecordBatch input, not Table input.

Why:

  • This maps to the existing data structures supported by the deck.gl binary attributes API.

  • It's more reliable for the end user, as they know that a single arrow layer will always create one underlying deck.gl layer.

  • Remove need for internal rechunking code.

    Multiple arrow Vectors that have the same overall length can have different chunking structures. E.g. despite column A and column B both having length 30, column A could have two chunks (Data in Arrow JS terminology) of 15 rows each, and column B could have three chunks of 10 rows each. If deck.gl's Arrow support allowed Vector input, then deck.gl would have to manage rechunking the data across input objects.

    In Lonboard, I don't currently hit this issue because I pre-process the data in Python, but for JS-facing APIs, I think it would significantly simplify the deck.gl implementation to accept only RecordBatch input, which enforces contiguous buffers. This pushes the responsibility of rechunking onto the user, if necessary. There can be multiple options for rechunking Arrow data, including pure-JS and Wasm compiled options, and the end user can choose the best option for their use case.

Proposal

Right now the arrow layers accept a Table for the main data prop and arrow Vector objects for any accessors. This would change these layers to accept a RecordBatch for the main data prop and arrow contiguous arrays (called Data in the Arrow JS implementation) for any accessors.

Details

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions