Skip to content

Make UnaryPredicate predicate JSON serializable #2522

@Fokko

Description

@Fokko

Feature Request / Improvement

Make sure that the UnaryPredicate predicate can be serialized to JSON:

class UnaryPredicate(UnboundPredicate[Any], ABC):
def bind(self, schema: Schema, case_sensitive: bool = True) -> BoundUnaryPredicate[Any]:
bound_term = self.term.bind(schema, case_sensitive)
return self.as_bound(bound_term)
def __repr__(self) -> str:
"""Return the string representation of the UnaryPredicate class."""
return f"{str(self.__class__.__name__)}(term={repr(self.term)})"
@property
@abstractmethod
def as_bound(self) -> Type[BoundUnaryPredicate[Any]]: ...

This predicate has four implementations: IsNull, NotNull, IsNaN, and NotNan, and translates to:

{
    "type": "is-null" // Or not-null, is-nan, not-nan
    "term": str, // The column name
}

We use Pydantic for JSON serialization, which can be enabled by deriving from the IcebergBaseModel:

class PartitionSpec(IcebergBaseModel):

Example tests can be found here:

def test_serialize_partition_spec() -> None:
partitioned = PartitionSpec(
PartitionField(source_id=1, field_id=1000, transform=TruncateTransform(width=19), name="str_truncate"),
PartitionField(source_id=2, field_id=1001, transform=BucketTransform(num_buckets=25), name="int_bucket"),
spec_id=3,
)
assert (
partitioned.model_dump_json()
== """{"spec-id":3,"fields":[{"source-id":1,"field-id":1000,"transform":"truncate[19]","name":"str_truncate"},{"source-id":2,"field-id":1001,"transform":"bucket[25]","name":"int_bucket"}]}"""
)
def test_deserialize_unpartition_spec() -> None:
json_partition_spec = """{"spec-id":0,"fields":[]}"""
spec = PartitionSpec.model_validate_json(json_partition_spec)
assert spec == PartitionSpec(spec_id=0)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions