-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Spec: Add support for default expression values #16777
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
danielcweeks
wants to merge
1
commit into
apache:main
Choose a base branch
from
danielcweeks:default-expressions
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+17
−5
Draft
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -62,6 +62,7 @@ The full set of changes are listed in [Appendix E](#version-3). | |
| Version 4 of the Iceberg spec restructures metadata for improved performance and new capabilities: | ||
|
|
||
| * Support for [relative locations](#file-locations-in-metadata) in metadata fields | ||
| * [Default value expressions](#default-values) for write defaults | ||
|
|
||
| The full set of changes are listed in [Appendix E](#version-4). | ||
|
|
||
|
|
@@ -330,7 +331,15 @@ The `initial-default` is set only when a field is added to an existing schema. T | |
|
|
||
| The `initial-default` and `write-default` produce SQL default value behavior, without rewriting data files. SQL default value behavior when a field is added handles all existing rows as though the rows were written with the new field's default value. Default value changes may only affect future records and all known fields are written into data files. Omitting a known field when writing a data file is never allowed. The write default for a field must be written if a field is not supplied to a write. If the write default for a required field is not set, the writer must fail. | ||
|
|
||
| All columns of `unknown`, `variant`, `geometry`, and `geography` types must default to null. Non-null values for `initial-default` or `write-default` are invalid. | ||
| Starting in v4, a field's `write-default` may be a [value expression](expressions-spec.md) rather than a literal. This allows defaults such as `current_timestamp()` to be evaluated when a row is written. A value expression `write-default` is subject to the following requirements: | ||
|
|
||
| * The expression must be a constant or a function application (apply); [field references](expressions-spec.md#field-reference) (bound or unbound) are not allowed in a default value expression | ||
| * The expression must produce a value of the field's type, subject to [type promotion](#schema-evolution) | ||
| * The `write-default` expression is evaluated to populate the field for any record written after the field was added when the writer does not supply the field's value | ||
|
|
||
| The `initial-default` must always be a constant (a single value); it is not allowed to be a function application. A literal is itself a value expression, so existing `write-default` values remain valid value expressions. In format version 3, both `initial-default` and `write-default` must be constants. | ||
|
|
||
| All columns of `unknown`, `variant`, `geometry`, and `geography` types must default to null. Non-null `initial-default` or `write-default` values, including value expressions, are invalid. | ||
|
|
||
| Default values for the fields of a struct are tracked as `initial-default` and `write-default` at the field level. Default values for fields that are nested structs must not contain default values for the struct's fields (sub-fields). Sub-field defaults are tracked in sub-field's metadata. As a result, the default stored for a nested struct may be either null or a non-null struct with no field values. The effective default value is produced by setting each fields' default in a new struct. | ||
|
|
||
|
|
@@ -385,8 +394,6 @@ Grouping a subset of a struct’s fields into a nested struct is **not** allowed | |
|
|
||
| Struct evolution requires the following rules for default values: | ||
|
|
||
| * The `initial-default` must be set when a field is added and cannot change | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why did we remove this? Is this captured somewhere else? |
||
| * The `write-default` must be set when a field is added and may change | ||
| * When a required field is added, both defaults must be set to a non-null value | ||
| * When an optional field is added, the defaults may be null and should be explicitly set | ||
| * When a field that is a struct type is added, its default may only be null or a non-null struct with no field values. Default values for fields must be stored in field metadata. | ||
|
|
@@ -1676,14 +1683,14 @@ Types are serialized according to this table: | |
| |**`fixed(L)`**|`JSON string: "fixed[<L>]"`|`"fixed[16]"`| | ||
| |**`binary`**|`JSON string: "binary"`|`"binary"`| | ||
| |**`decimal(P, S)`**|`JSON string: "decimal(<P>,<S>)"`|`"decimal(9,2)"`,<br />`"decimal(9, 2)"`| | ||
| |**`struct`**|`JSON object: {`<br /> `"type": "struct",`<br /> `"fields": [ {`<br /> `"id": <field id int>,`<br /> `"name": <name string>,`<br /> `"required": <boolean>,`<br /> `"type": <type JSON>,`<br /> `"doc": <comment string>,`<br /> `"initial-default": <JSON encoding of default value>,`<br /> `"write-default": <JSON encoding of default value>`<br /> `}, ...`<br /> `] }`|`{`<br /> `"type": "struct",`<br /> `"fields": [ {`<br /> `"id": 1,`<br /> `"name": "id",`<br /> `"required": true,`<br /> `"type": "uuid",`<br /> `"initial-default": "0db3e2a8-9d1d-42b9-aa7b-74ebe558dceb",`<br /> `"write-default": "ec5911be-b0a7-458c-8438-c9a3e53cffae"`<br /> `}, {`<br /> `"id": 2,`<br /> `"name": "data",`<br /> `"required": false,`<br /> `"type": {`<br /> `"type": "list",`<br /> `...`<br /> `}`<br /> `} ]`<br />`}`| | ||
| |**`struct`**|`JSON object: {`<br /> `"type": "struct",`<br /> `"fields": [ {`<br /> `"id": <field id int>,`<br /> `"name": <name string>,`<br /> `"required": <boolean>,`<br /> `"type": <type JSON>,`<br /> `"doc": <comment string>,`<br /> `"initial-default": <JSON encoding of default value>,`<br /> `"write-default": <JSON encoding of default value or, in v4+, value expression>`<br /> `}, ...`<br /> `] }`|`{`<br /> `"type": "struct",`<br /> `"fields": [ {`<br /> `"id": 1,`<br /> `"name": "id",`<br /> `"required": true,`<br /> `"type": "uuid",`<br /> `"initial-default": "0db3e2a8-9d1d-42b9-aa7b-74ebe558dceb",`<br /> `"write-default": "ec5911be-b0a7-458c-8438-c9a3e53cffae"`<br /> `}, {`<br /> `"id": 2,`<br /> `"name": "data",`<br /> `"required": false,`<br /> `"type": {`<br /> `"type": "list",`<br /> `...`<br /> `}`<br /> `} ]`<br />`}`| | ||
| |**`list`**|`JSON object: {`<br /> `"type": "list",`<br /> `"element-id": <id int>,`<br /> `"element-required": <bool>`<br /> `"element": <type JSON>`<br />`}`|`{`<br /> `"type": "list",`<br /> `"element-id": 3,`<br /> `"element-required": true,`<br /> `"element": "string"`<br />`}`| | ||
| |**`map`**|`JSON object: {`<br /> `"type": "map",`<br /> `"key-id": <key id int>,`<br /> `"key": <type JSON>,`<br /> `"value-id": <val id int>,`<br /> `"value-required": <bool>`<br /> `"value": <type JSON>`<br />`}`|`{`<br /> `"type": "map",`<br /> `"key-id": 4,`<br /> `"key": "string",`<br /> `"value-id": 5,`<br /> `"value-required": false,`<br /> `"value": "double"`<br />`}`| | ||
| | **`variant`**| `JSON string: "variant"`|`"variant"`| | ||
| | **`geometry(C)`** |`JSON string: "geometry(<C>)"`|`"geometry(srid:4326)"`| | ||
| | **`geography(C, A)`** |`JSON string: "geography(<C>,<E>)"`|`"geography(srid:4326,spherical)"`| | ||
|
|
||
| Note that default values are serialized using the JSON single-value serialization in [Appendix D](#appendix-d-single-value-serialization). | ||
| Note that constant default values are serialized using the JSON single-value serialization in [Appendix D](#appendix-d-single-value-serialization). Starting in v4, a `write-default` that is a value expression is serialized using the value expression JSON serialization defined in the [expressions spec](expressions-spec.md#appendix-b-json-serialization). Because a literal is a valid value expression, constant defaults are serialized identically in all format versions. | ||
|
|
||
| ### Partition Specs | ||
|
|
||
|
|
@@ -1882,6 +1889,11 @@ The binary single-value serialization can be used to store the lower and upper b | |
|
|
||
| ### Version 4 | ||
|
|
||
| Default value expressions are added in v4. A field's `write-default` may be a [value expression](expressions-spec.md) (a constant or a function). Field references are not allowed in a default value expression and `initial-default` must always be a constant. See [Default values](#default-values). | ||
|
|
||
| * An expression `write-default` is a write-time concern only. Because `write-default` is never used at read time and `initial-default` remains a constant, readers of a v4 table are unaffected by expression write-defaults. | ||
| * Writers that do not support expression write-defaults must not write a v4 table that defines one. As in prior versions, a writer that cannot supply a required field's value must fail. | ||
|
|
||
| Relative path support is added in v4. | ||
|
|
||
| Reading v3 or prior metadata for v4: | ||
|
|
||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the rationale? Is it to contain scope or is there some other reason?