feat: implement DFExtensionType for remaining canonical Arrow extension types#21458
Open
EeshanBembi wants to merge 2 commits intoapache:mainfrom
Open
feat: implement DFExtensionType for remaining canonical Arrow extension types#21458EeshanBembi wants to merge 2 commits intoapache:mainfrom
EeshanBembi wants to merge 2 commits intoapache:mainfrom
Conversation
added 2 commits
April 7, 2026 18:33
…on types Closes apache#21144 Implements DFExtensionType for all remaining canonical Arrow extension types so they are recognized and pretty-printed by the extension type registry: - Bool8: displays Int8 values as 'true'/'false' instead of raw integers - Json: uses default string formatter (values are already valid JSON) - Opaque: uses default formatter - FixedShapeTensor: uses default formatter, storage_type computed from value_type and list_size - VariableShapeTensor: uses default formatter, storage_type computed from value_type and dimensions - TimestampWithOffset: uses default formatter All six types are registered in MemoryExtensionTypeRegistry::new_with_canonical_extension_types() alongside the existing UUID registration.
Contributor
|
Cool to see others working in this area! Did you see the other PR on that topic? #21291 There, we had a small discussion about whether we should implement the trait on the arrow-rs types. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #21144
Rationale for this change
PR #20312 added the extension type registry framework with UUID pretty-printing support. This PR implements
DFExtensionTypefor the remaining six Arrow canonical extension types so they are recognized and correctly formatted byMemoryExtensionTypeRegistry::new_with_canonical_extension_types().What changes are included in this PR?
Added
DFExtensionTypeimplementations for:Bool8(arrow.bool8): displaysInt8values astrue/falseinstead of raw integers (zero →false, non-zero →true)Json(arrow.json): registered with default string formatter; values are already valid UTF-8 JSONOpaque(arrow.opaque): registered with default formatter; storage type isNullper the spec recommendationFixedShapeTensor(arrow.fixed_shape_tensor): registered with defaultFixedSizeListformatter;storage_type()computed fromvalue_typeandlist_sizeVariableShapeTensor(arrow.variable_shape_tensor): registered with defaultStructformatter;storage_type()computed fromvalue_typeanddimensionsTimestampWithOffset(arrow.timestamp_with_offset): registered with defaultStructformatterAll six types are wired into
MemoryExtensionTypeRegistry::new_with_canonical_extension_types().New files:
datafusion/common/src/types/canonical_extensions/bool8.rsdatafusion/common/src/types/canonical_extensions/json.rsdatafusion/common/src/types/canonical_extensions/opaque.rsdatafusion/common/src/types/canonical_extensions/fixed_shape_tensor.rsdatafusion/common/src/types/canonical_extensions/variable_shape_tensor.rsdatafusion/common/src/types/canonical_extensions/timestamp_with_offset.rsAre there any user-facing changes?
MemoryExtensionTypeRegistry::new_with_canonical_extension_types()now registers all 7 Arrow canonical extension types (previously only UUID). Sessions using this registry will now pretty-printBool8columns astrue/falseand have all other canonical types properly recognized.