-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Spec: Clarify decimal type serialization #16798
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
kevinjqliu
wants to merge
1
commit into
apache:main
Choose a base branch
from
kevinjqliu:codex/spec-decimal-format
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+13
−7
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -268,7 +268,7 @@ Supported primitive types are defined in the table below. Primitive types added | |
| | | **`long`** | 64-bit signed integers | | | ||
| | | **`float`** | [32-bit IEEE 754](https://en.wikipedia.org/wiki/IEEE_754) floating point | Can promote to double | | ||
| | | **`double`** | [64-bit IEEE 754](https://en.wikipedia.org/wiki/IEEE_754) floating point | | | ||
| | | **`decimal(P,S)`** | Fixed-point decimal; precision P, scale S | Scale is fixed, precision must be 38 or less | | ||
| | | **`decimal(P, S)`** | Fixed-point decimal; precision P, scale S | Scale is fixed, precision must be 38 or less | | ||
| | | **`date`** | Calendar date without timezone or time | | | ||
| | | **`time`** | Time of day, microsecond precision, without date, timezone | | | ||
| | | **`timestamp`** | Timestamp, microsecond precision, without timezone | [1] | | ||
|
|
@@ -1482,7 +1482,7 @@ Maps with non-string keys must use an array representation with the `map` logica | |
| |**`long`**|`long`|| | ||
| |**`float`**|`float`|| | ||
| |**`double`**|`double`|| | ||
| |**`decimal(P,S)`**|`{ "type": "fixed",`<br /> `"size": minBytesRequired(P),`<br /> `"logicalType": "decimal",`<br /> `"precision": P,`<br /> `"scale": S }`|Stored as fixed using the minimum number of bytes for the given precision.| | ||
| |**`decimal(P, S)`**|`{ "type": "fixed",`<br /> `"size": minBytesRequired(P),`<br /> `"logicalType": "decimal",`<br /> `"precision": P,`<br /> `"scale": S }`|Stored as fixed using the minimum number of bytes for the given precision.| | ||
| |**`date`**|`{ "type": "int",`<br /> `"logicalType": "date" }`|Stores days from 1970-01-01.| | ||
| |**`time`**|`{ "type": "long",`<br /> `"logicalType": "time-micros" }`|Stores microseconds from midnight.| | ||
| |**`timestamp`** | `{ "type": "long",`<br /> `"logicalType": "timestamp-micros",`<br /> `"adjust-to-utc": false }` | Stores microseconds from 1970-01-01 00:00:00.000000. [1] | | ||
|
|
@@ -1537,7 +1537,7 @@ Lists must use the [3-level representation](https://github.com/apache/parquet-fo | |
| | **`long`** | `int64` | | | | ||
| | **`float`** | `float` | | | | ||
| | **`double`** | `double` | | | | ||
| | **`decimal(P,S)`** | `P <= 9`: `int32`,<br />`P <= 18`: `int64`,<br />`fixed` otherwise | `DECIMAL(P,S)` | Fixed must use the minimum number of bytes that can store `P`. | | ||
| | **`decimal(P, S)`** | `P <= 9`: `int32`,<br />`P <= 18`: `int64`,<br />`fixed` otherwise | `DECIMAL(P, S)` | Fixed must use the minimum number of bytes that can store `P`. | | ||
| | **`date`** | `int32` | `DATE` | Stores days from 1970-01-01. | | ||
| | **`time`** | `int64` | `TIME_MICROS` with `adjustToUtc=false` | Stores microseconds from midnight. | | ||
| | **`timestamp`** | `int64` | `TIMESTAMP_MICROS` with `adjustToUtc=false` | Stores microseconds from 1970-01-01 00:00:00.000000. | | ||
|
|
@@ -1569,7 +1569,7 @@ When reading an `unknown` column, any corresponding column must be ignored and r | |
| | **`long`** | `long` | | | | ||
| | **`float`** | `float` | | | | ||
| | **`double`** | `double` | | | | ||
| | **`decimal(P,S)`** | `decimal` | | | | ||
| | **`decimal(P, S)`** | `decimal` | | | | ||
| | **`date`** | `date` | | | | ||
| | **`time`** | `long` | `iceberg.long-type`=`TIME` | Stores microseconds from midnight. | | ||
| | **`timestamp`** | `timestamp` | `iceberg.timestamp-unit`=`MICROS` | Stores microseconds from 2015-01-01 00:00:00.000000. [1], [2] | | ||
|
|
@@ -1611,7 +1611,7 @@ The 32-bit hash implementation is 32-bit Murmur3 hash, x86 variant, seeded with | |
| |--------------------|-------------------------------------------|--------------------------------------------| | ||
| | **`int`** | `hashLong(long(v))` [1] | `34` → `2017239379` | | ||
| | **`long`** | `hashBytes(littleEndianBytes(v))` | `34L` → `2017239379` | | ||
| | **`decimal(P,S)`** | `hashBytes(minBigEndian(unscaled(v)))`[2] | `14.20` → `-500754589` | | ||
| | **`decimal(P, S)`** | `hashBytes(minBigEndian(unscaled(v)))`[2] | `14.20` → `-500754589` | | ||
| | **`date`** | `hashInt(daysFromUnixEpoch(v))` | `2017-11-16` → `-653330422` | | ||
| | **`time`** | `hashLong(microsecsFromMidnight(v))` | `22:31:08` → `-662762989` | | ||
| | **`timestamp`** | `hashLong(microsecsFromUnixEpoch(v))` | `2017-11-16T22:31:08` → `-2047944441`<br />`2017-11-16T22:31:08.000001` → `-1207196810` | | ||
|
|
@@ -1675,14 +1675,16 @@ Types are serialized according to this table: | |
| |**`uuid`**|`JSON string: "uuid"`|`"uuid"`| | ||
| |**`fixed(L)`**|`JSON string: "fixed[<L>]"`|`"fixed[16]"`| | ||
| |**`binary`**|`JSON string: "binary"`|`"binary"`| | ||
| |**`decimal(P, S)`**|`JSON string: "decimal(<P>,<S>)"`|`"decimal(9,2)"`,<br />`"decimal(9, 2)"`| | ||
| |**`decimal(P, S)`**|`JSON string: "decimal(<P>, <S>)"`|`"decimal(9, 2)"`| | ||
| |**`struct`**|`JSON object: {`<br /> `"type": "struct",`<br /> `"fields": [ {`<br /> `"id": <field id int>,`<br /> `"name": <name string>,`<br /> `"required": <boolean>,`<br /> `"type": <type JSON>,`<br /> `"doc": <comment string>,`<br /> `"initial-default": <JSON encoding of default value>,`<br /> `"write-default": <JSON encoding of default value>`<br /> `}, ...`<br /> `] }`|`{`<br /> `"type": "struct",`<br /> `"fields": [ {`<br /> `"id": 1,`<br /> `"name": "id",`<br /> `"required": true,`<br /> `"type": "uuid",`<br /> `"initial-default": "0db3e2a8-9d1d-42b9-aa7b-74ebe558dceb",`<br /> `"write-default": "ec5911be-b0a7-458c-8438-c9a3e53cffae"`<br /> `}, {`<br /> `"id": 2,`<br /> `"name": "data",`<br /> `"required": false,`<br /> `"type": {`<br /> `"type": "list",`<br /> `...`<br /> `}`<br /> `} ]`<br />`}`| | ||
| |**`list`**|`JSON object: {`<br /> `"type": "list",`<br /> `"element-id": <id int>,`<br /> `"element-required": <bool>`<br /> `"element": <type JSON>`<br />`}`|`{`<br /> `"type": "list",`<br /> `"element-id": 3,`<br /> `"element-required": true,`<br /> `"element": "string"`<br />`}`| | ||
| |**`map`**|`JSON object: {`<br /> `"type": "map",`<br /> `"key-id": <key id int>,`<br /> `"key": <type JSON>,`<br /> `"value-id": <val id int>,`<br /> `"value-required": <bool>`<br /> `"value": <type JSON>`<br />`}`|`{`<br /> `"type": "map",`<br /> `"key-id": 4,`<br /> `"key": "string",`<br /> `"value-id": 5,`<br /> `"value-required": false,`<br /> `"value": "double"`<br />`}`| | ||
| | **`variant`**| `JSON string: "variant"`|`"variant"`| | ||
| | **`geometry(C)`** |`JSON string: "geometry(<C>)"`|`"geometry(srid:4326)"`| | ||
| | **`geography(C, A)`** |`JSON string: "geography(<C>,<E>)"`|`"geography(srid:4326,spherical)"`| | ||
|
|
||
| The schema JSON type strings in this table are the canonical serialized forms. Readers should accept optional whitespace around parameters and separators in parameterized type strings. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. SHOULD or MUST ? |
||
|
|
||
| Note that default values are serialized using the JSON single-value serialization in [Appendix D](#appendix-d-single-value-serialization). | ||
|
|
||
| ### Partition Specs | ||
|
|
@@ -1861,7 +1863,7 @@ The binary single-value serialization can be used to store the lower and upper b | |
| | **`long`** | **`JSON long`** | `34` | | | ||
| | **`float`** | **`JSON number`** | `1.0` | | | ||
| | **`double`** | **`JSON number`** | `1.0` | | | ||
| | **`decimal(P,S)`** | **`JSON string`** | `"14.20"`, `"2E+20"` | Stores the string representation of the decimal value, specifically, for values with a positive scale, the number of digits to the right of the decimal point is used to indicate scale, for values with a negative scale, the scientific notation is used and the exponent must equal the negated scale | | ||
| | **`decimal(P, S)`** | **`JSON string`** | `"14.20"`, `"2E+20"` | Stores the string representation of the decimal value, specifically, for values with a positive scale, the number of digits to the right of the decimal point is used to indicate scale, for values with a negative scale, the scientific notation is used and the exponent must equal the negated scale | | ||
| | **`date`** | **`JSON string`** | `"2017-11-16"` | Stores ISO-8601 standard date | | ||
| | **`time`** | **`JSON string`** | `"22:31:08.123456"` | Stores ISO-8601 standard time with microsecond precision | | ||
| | **`timestamp`** | **`JSON string`** | `"2017-11-16T22:31:08.123456"` | Stores ISO-8601 standard timestamp with microsecond precision; must not include a zone offset | | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -231,6 +231,10 @@ header .md-search__input::placeholder { | |
| font-family: "Source Sans Pro", sans-serif; | ||
| } | ||
|
|
||
| .md-typeset table code { | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| white-space: nowrap; | ||
| } | ||
|
|
||
| .md-typeset h1 { | ||
| color: var(--color-default-primary); | ||
| font-size: 32px; | ||
|
|
||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed all
decimal(P,S)todecimal(P, S)