Skip to content

feat: implement DFExtensionType for remaining canonical Arrow extension types#21458

Open
EeshanBembi wants to merge 2 commits intoapache:mainfrom
EeshanBembi:main
Open

feat: implement DFExtensionType for remaining canonical Arrow extension types#21458
EeshanBembi wants to merge 2 commits intoapache:mainfrom
EeshanBembi:main

Conversation

@EeshanBembi
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #21144

Rationale for this change

PR #20312 added the extension type registry framework with UUID pretty-printing support. This PR implements DFExtensionType for the remaining six Arrow canonical extension types so they are recognized and correctly formatted by MemoryExtensionTypeRegistry::new_with_canonical_extension_types().

What changes are included in this PR?

Added DFExtensionType implementations for:

  • Bool8 (arrow.bool8): displays Int8 values as true/false instead of raw integers (zero → false, non-zero → true)
  • Json (arrow.json): registered with default string formatter; values are already valid UTF-8 JSON
  • Opaque (arrow.opaque): registered with default formatter; storage type is Null per the spec recommendation
  • FixedShapeTensor (arrow.fixed_shape_tensor): registered with default FixedSizeList formatter; storage_type() computed from value_type and list_size
  • VariableShapeTensor (arrow.variable_shape_tensor): registered with default Struct formatter; storage_type() computed from value_type and dimensions
  • TimestampWithOffset (arrow.timestamp_with_offset): registered with default Struct formatter

All six types are wired into MemoryExtensionTypeRegistry::new_with_canonical_extension_types().

New files:

  • datafusion/common/src/types/canonical_extensions/bool8.rs
  • datafusion/common/src/types/canonical_extensions/json.rs
  • datafusion/common/src/types/canonical_extensions/opaque.rs
  • datafusion/common/src/types/canonical_extensions/fixed_shape_tensor.rs
  • datafusion/common/src/types/canonical_extensions/variable_shape_tensor.rs
  • datafusion/common/src/types/canonical_extensions/timestamp_with_offset.rs

Are there any user-facing changes?

MemoryExtensionTypeRegistry::new_with_canonical_extension_types() now registers all 7 Arrow canonical extension types (previously only UUID). Sessions using this registry will now pretty-print Bool8 columns as true/false and have all other canonical types properly recognized.

ebembi-crdb added 2 commits April 7, 2026 18:33
…on types

Closes apache#21144

Implements DFExtensionType for all remaining canonical Arrow extension
types so they are recognized and pretty-printed by the extension type
registry:

- Bool8: displays Int8 values as 'true'/'false' instead of raw integers
- Json: uses default string formatter (values are already valid JSON)
- Opaque: uses default formatter
- FixedShapeTensor: uses default formatter, storage_type computed from
  value_type and list_size
- VariableShapeTensor: uses default formatter, storage_type computed
  from value_type and dimensions
- TimestampWithOffset: uses default formatter

All six types are registered in
MemoryExtensionTypeRegistry::new_with_canonical_extension_types()
alongside the existing UUID registration.
@github-actions github-actions bot added logical-expr Logical plan and expressions core Core DataFusion crate common Related to common crate labels Apr 8, 2026
@tobixdev
Copy link
Copy Markdown
Contributor

tobixdev commented Apr 8, 2026

Cool to see others working in this area!

Did you see the other PR on that topic? #21291

There, we had a small discussion about whether we should implement the trait on the arrow-rs types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate core Core DataFusion crate logical-expr Logical plan and expressions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement DFExtensionType for Arrow's Canonical Extension Types

2 participants