From 7aa98a20cab6db5a95632e215fe6009d29f15382 Mon Sep 17 00:00:00 2001 From: Daniel Weeks Date: Thu, 11 Jun 2026 08:36:43 -0700 Subject: [PATCH] Spec: Add support for default expression values --- format/spec.md | 22 +++++++++++++++++----- 1 file changed, 17 insertions(+), 5 deletions(-) diff --git a/format/spec.md b/format/spec.md index d5d78ef9713d..91ae49c363cd 100644 --- a/format/spec.md +++ b/format/spec.md @@ -62,6 +62,7 @@ The full set of changes are listed in [Appendix E](#version-3). Version 4 of the Iceberg spec restructures metadata for improved performance and new capabilities: * Support for [relative locations](#file-locations-in-metadata) in metadata fields +* [Default value expressions](#default-values) for write defaults The full set of changes are listed in [Appendix E](#version-4). @@ -330,7 +331,15 @@ The `initial-default` is set only when a field is added to an existing schema. T The `initial-default` and `write-default` produce SQL default value behavior, without rewriting data files. SQL default value behavior when a field is added handles all existing rows as though the rows were written with the new field's default value. Default value changes may only affect future records and all known fields are written into data files. Omitting a known field when writing a data file is never allowed. The write default for a field must be written if a field is not supplied to a write. If the write default for a required field is not set, the writer must fail. -All columns of `unknown`, `variant`, `geometry`, and `geography` types must default to null. Non-null values for `initial-default` or `write-default` are invalid. +Starting in v4, a field's `write-default` may be a [value expression](expressions-spec.md) rather than a literal. This allows defaults such as `current_timestamp()` to be evaluated when a row is written. A value expression `write-default` is subject to the following requirements: + +* The expression must be a constant or a function application (apply); [field references](expressions-spec.md#field-reference) (bound or unbound) are not allowed in a default value expression +* The expression must produce a value of the field's type, subject to [type promotion](#schema-evolution) +* The `write-default` expression is evaluated to populate the field for any record written after the field was added when the writer does not supply the field's value + +The `initial-default` must always be a constant (a single value); it is not allowed to be a function application. A literal is itself a value expression, so existing `write-default` values remain valid value expressions. In format version 3, both `initial-default` and `write-default` must be constants. + +All columns of `unknown`, `variant`, `geometry`, and `geography` types must default to null. Non-null `initial-default` or `write-default` values, including value expressions, are invalid. Default values for the fields of a struct are tracked as `initial-default` and `write-default` at the field level. Default values for fields that are nested structs must not contain default values for the struct's fields (sub-fields). Sub-field defaults are tracked in sub-field's metadata. As a result, the default stored for a nested struct may be either null or a non-null struct with no field values. The effective default value is produced by setting each fields' default in a new struct. @@ -385,8 +394,6 @@ Grouping a subset of a struct’s fields into a nested struct is **not** allowed Struct evolution requires the following rules for default values: -* The `initial-default` must be set when a field is added and cannot change -* The `write-default` must be set when a field is added and may change * When a required field is added, both defaults must be set to a non-null value * When an optional field is added, the defaults may be null and should be explicitly set * When a field that is a struct type is added, its default may only be null or a non-null struct with no field values. Default values for fields must be stored in field metadata. @@ -1676,14 +1683,14 @@ Types are serialized according to this table: |**`fixed(L)`**|`JSON string: "fixed[]"`|`"fixed[16]"`| |**`binary`**|`JSON string: "binary"`|`"binary"`| |**`decimal(P, S)`**|`JSON string: "decimal(

,)"`|`"decimal(9,2)"`,
`"decimal(9, 2)"`| -|**`struct`**|`JSON object: {`
  `"type": "struct",`
  `"fields": [ {`
    `"id": ,`
    `"name": ,`
    `"required": ,`
    `"type": ,`
    `"doc": ,`
    `"initial-default": ,`
    `"write-default": `
    `}, ...`
  `] }`|`{`
  `"type": "struct",`
  `"fields": [ {`
    `"id": 1,`
    `"name": "id",`
    `"required": true,`
    `"type": "uuid",`
    `"initial-default": "0db3e2a8-9d1d-42b9-aa7b-74ebe558dceb",`
    `"write-default": "ec5911be-b0a7-458c-8438-c9a3e53cffae"`
  `}, {`
    `"id": 2,`
    `"name": "data",`
    `"required": false,`
    `"type": {`
      `"type": "list",`
      `...`
    `}`
  `} ]`
`}`| +|**`struct`**|`JSON object: {`
  `"type": "struct",`
  `"fields": [ {`
    `"id": ,`
    `"name": ,`
    `"required": ,`
    `"type": ,`
    `"doc": ,`
    `"initial-default": ,`
    `"write-default": `
    `}, ...`
  `] }`|`{`
  `"type": "struct",`
  `"fields": [ {`
    `"id": 1,`
    `"name": "id",`
    `"required": true,`
    `"type": "uuid",`
    `"initial-default": "0db3e2a8-9d1d-42b9-aa7b-74ebe558dceb",`
    `"write-default": "ec5911be-b0a7-458c-8438-c9a3e53cffae"`
  `}, {`
    `"id": 2,`
    `"name": "data",`
    `"required": false,`
    `"type": {`
      `"type": "list",`
      `...`
    `}`
  `} ]`
`}`| |**`list`**|`JSON object: {`
  `"type": "list",`
  `"element-id": ,`
  `"element-required": `
  `"element": `
`}`|`{`
  `"type": "list",`
  `"element-id": 3,`
  `"element-required": true,`
  `"element": "string"`
`}`| |**`map`**|`JSON object: {`
  `"type": "map",`
  `"key-id": ,`
  `"key": ,`
  `"value-id": ,`
  `"value-required": `
  `"value": `
`}`|`{`
  `"type": "map",`
  `"key-id": 4,`
  `"key": "string",`
  `"value-id": 5,`
  `"value-required": false,`
  `"value": "double"`
`}`| | **`variant`**| `JSON string: "variant"`|`"variant"`| | **`geometry(C)`** |`JSON string: "geometry()"`|`"geometry(srid:4326)"`| | **`geography(C, A)`** |`JSON string: "geography(,)"`|`"geography(srid:4326,spherical)"`| -Note that default values are serialized using the JSON single-value serialization in [Appendix D](#appendix-d-single-value-serialization). +Note that constant default values are serialized using the JSON single-value serialization in [Appendix D](#appendix-d-single-value-serialization). Starting in v4, a `write-default` that is a value expression is serialized using the value expression JSON serialization defined in the [expressions spec](expressions-spec.md#appendix-b-json-serialization). Because a literal is a valid value expression, constant defaults are serialized identically in all format versions. ### Partition Specs @@ -1882,6 +1889,11 @@ The binary single-value serialization can be used to store the lower and upper b ### Version 4 +Default value expressions are added in v4. A field's `write-default` may be a [value expression](expressions-spec.md) (a constant or a function). Field references are not allowed in a default value expression and `initial-default` must always be a constant. See [Default values](#default-values). + +* An expression `write-default` is a write-time concern only. Because `write-default` is never used at read time and `initial-default` remains a constant, readers of a v4 table are unaffected by expression write-defaults. +* Writers that do not support expression write-defaults must not write a v4 table that defines one. As in prior versions, a writer that cannot supply a required field's value must fail. + Relative path support is added in v4. Reading v3 or prior metadata for v4: