From f7164c57c9c4c452cd139d09ca255e1d2c2188ca Mon Sep 17 00:00:00 2001 From: Kevin Liu Date: Sat, 13 Jun 2026 11:45:08 -0700 Subject: [PATCH] Docs: Clarify decimal type serialization --- format/spec.md | 16 +++++++++------- site/docs/assets/stylesheets/extra.css | 4 ++++ 2 files changed, 13 insertions(+), 7 deletions(-) diff --git a/format/spec.md b/format/spec.md index d5d78ef9713d..7281ed484bf4 100644 --- a/format/spec.md +++ b/format/spec.md @@ -268,7 +268,7 @@ Supported primitive types are defined in the table below. Primitive types added | | **`long`** | 64-bit signed integers | | | | **`float`** | [32-bit IEEE 754](https://en.wikipedia.org/wiki/IEEE_754) floating point | Can promote to double | | | **`double`** | [64-bit IEEE 754](https://en.wikipedia.org/wiki/IEEE_754) floating point | | -| | **`decimal(P,S)`** | Fixed-point decimal; precision P, scale S | Scale is fixed, precision must be 38 or less | +| | **`decimal(P, S)`** | Fixed-point decimal; precision P, scale S | Scale is fixed, precision must be 38 or less | | | **`date`** | Calendar date without timezone or time | | | | **`time`** | Time of day, microsecond precision, without date, timezone | | | | **`timestamp`** | Timestamp, microsecond precision, without timezone | [1] | @@ -1482,7 +1482,7 @@ Maps with non-string keys must use an array representation with the `map` logica |**`long`**|`long`|| |**`float`**|`float`|| |**`double`**|`double`|| -|**`decimal(P,S)`**|`{ "type": "fixed",`
  `"size": minBytesRequired(P),`
  `"logicalType": "decimal",`
  `"precision": P,`
  `"scale": S }`|Stored as fixed using the minimum number of bytes for the given precision.| +|**`decimal(P, S)`**|`{ "type": "fixed",`
  `"size": minBytesRequired(P),`
  `"logicalType": "decimal",`
  `"precision": P,`
  `"scale": S }`|Stored as fixed using the minimum number of bytes for the given precision.| |**`date`**|`{ "type": "int",`
  `"logicalType": "date" }`|Stores days from 1970-01-01.| |**`time`**|`{ "type": "long",`
  `"logicalType": "time-micros" }`|Stores microseconds from midnight.| |**`timestamp`** | `{ "type": "long",`
  `"logicalType": "timestamp-micros",`
  `"adjust-to-utc": false }` | Stores microseconds from 1970-01-01 00:00:00.000000. [1] | @@ -1537,7 +1537,7 @@ Lists must use the [3-level representation](https://github.com/apache/parquet-fo | **`long`** | `int64` | | | | **`float`** | `float` | | | | **`double`** | `double` | | | -| **`decimal(P,S)`** | `P <= 9`: `int32`,
`P <= 18`: `int64`,
`fixed` otherwise | `DECIMAL(P,S)` | Fixed must use the minimum number of bytes that can store `P`. | +| **`decimal(P, S)`** | `P <= 9`: `int32`,
`P <= 18`: `int64`,
`fixed` otherwise | `DECIMAL(P, S)` | Fixed must use the minimum number of bytes that can store `P`. | | **`date`** | `int32` | `DATE` | Stores days from 1970-01-01. | | **`time`** | `int64` | `TIME_MICROS` with `adjustToUtc=false` | Stores microseconds from midnight. | | **`timestamp`** | `int64` | `TIMESTAMP_MICROS` with `adjustToUtc=false` | Stores microseconds from 1970-01-01 00:00:00.000000. | @@ -1569,7 +1569,7 @@ When reading an `unknown` column, any corresponding column must be ignored and r | **`long`** | `long` | | | | **`float`** | `float` | | | | **`double`** | `double` | | | -| **`decimal(P,S)`** | `decimal` | | | +| **`decimal(P, S)`** | `decimal` | | | | **`date`** | `date` | | | | **`time`** | `long` | `iceberg.long-type`=`TIME` | Stores microseconds from midnight. | | **`timestamp`** | `timestamp` | `iceberg.timestamp-unit`=`MICROS` | Stores microseconds from 2015-01-01 00:00:00.000000. [1], [2] | @@ -1611,7 +1611,7 @@ The 32-bit hash implementation is 32-bit Murmur3 hash, x86 variant, seeded with |--------------------|-------------------------------------------|--------------------------------------------| | **`int`** | `hashLong(long(v))` [1] | `34` → `2017239379` | | **`long`** | `hashBytes(littleEndianBytes(v))` | `34L` → `2017239379` | -| **`decimal(P,S)`** | `hashBytes(minBigEndian(unscaled(v)))`[2] | `14.20` → `-500754589` | +| **`decimal(P, S)`** | `hashBytes(minBigEndian(unscaled(v)))`[2] | `14.20` → `-500754589` | | **`date`** | `hashInt(daysFromUnixEpoch(v))` | `2017-11-16` → `-653330422` | | **`time`** | `hashLong(microsecsFromMidnight(v))` | `22:31:08` → `-662762989` | | **`timestamp`** | `hashLong(microsecsFromUnixEpoch(v))` | `2017-11-16T22:31:08` → `-2047944441`
`2017-11-16T22:31:08.000001` → `-1207196810` | @@ -1675,7 +1675,7 @@ Types are serialized according to this table: |**`uuid`**|`JSON string: "uuid"`|`"uuid"`| |**`fixed(L)`**|`JSON string: "fixed[]"`|`"fixed[16]"`| |**`binary`**|`JSON string: "binary"`|`"binary"`| -|**`decimal(P, S)`**|`JSON string: "decimal(

,)"`|`"decimal(9,2)"`,
`"decimal(9, 2)"`| +|**`decimal(P, S)`**|`JSON string: "decimal(

, )"`|`"decimal(9, 2)"`| |**`struct`**|`JSON object: {`
  `"type": "struct",`
  `"fields": [ {`
    `"id": ,`
    `"name": ,`
    `"required": ,`
    `"type": ,`
    `"doc": ,`
    `"initial-default": ,`
    `"write-default": `
    `}, ...`
  `] }`|`{`
  `"type": "struct",`
  `"fields": [ {`
    `"id": 1,`
    `"name": "id",`
    `"required": true,`
    `"type": "uuid",`
    `"initial-default": "0db3e2a8-9d1d-42b9-aa7b-74ebe558dceb",`
    `"write-default": "ec5911be-b0a7-458c-8438-c9a3e53cffae"`
  `}, {`
    `"id": 2,`
    `"name": "data",`
    `"required": false,`
    `"type": {`
      `"type": "list",`
      `...`
    `}`
  `} ]`
`}`| |**`list`**|`JSON object: {`
  `"type": "list",`
  `"element-id": ,`
  `"element-required": `
  `"element": `
`}`|`{`
  `"type": "list",`
  `"element-id": 3,`
  `"element-required": true,`
  `"element": "string"`
`}`| |**`map`**|`JSON object: {`
  `"type": "map",`
  `"key-id": ,`
  `"key": ,`
  `"value-id": ,`
  `"value-required": `
  `"value": `
`}`|`{`
  `"type": "map",`
  `"key-id": 4,`
  `"key": "string",`
  `"value-id": 5,`
  `"value-required": false,`
  `"value": "double"`
`}`| @@ -1683,6 +1683,8 @@ Types are serialized according to this table: | **`geometry(C)`** |`JSON string: "geometry()"`|`"geometry(srid:4326)"`| | **`geography(C, A)`** |`JSON string: "geography(,)"`|`"geography(srid:4326,spherical)"`| +The schema JSON type strings in this table are the canonical serialized forms. Readers should accept optional whitespace around parameters and separators in parameterized type strings. + Note that default values are serialized using the JSON single-value serialization in [Appendix D](#appendix-d-single-value-serialization). ### Partition Specs @@ -1861,7 +1863,7 @@ The binary single-value serialization can be used to store the lower and upper b | **`long`** | **`JSON long`** | `34` | | | **`float`** | **`JSON number`** | `1.0` | | | **`double`** | **`JSON number`** | `1.0` | | -| **`decimal(P,S)`** | **`JSON string`** | `"14.20"`, `"2E+20"` | Stores the string representation of the decimal value, specifically, for values with a positive scale, the number of digits to the right of the decimal point is used to indicate scale, for values with a negative scale, the scientific notation is used and the exponent must equal the negated scale | +| **`decimal(P, S)`** | **`JSON string`** | `"14.20"`, `"2E+20"` | Stores the string representation of the decimal value, specifically, for values with a positive scale, the number of digits to the right of the decimal point is used to indicate scale, for values with a negative scale, the scientific notation is used and the exponent must equal the negated scale | | **`date`** | **`JSON string`** | `"2017-11-16"` | Stores ISO-8601 standard date | | **`time`** | **`JSON string`** | `"22:31:08.123456"` | Stores ISO-8601 standard time with microsecond precision | | **`timestamp`** | **`JSON string`** | `"2017-11-16T22:31:08.123456"` | Stores ISO-8601 standard timestamp with microsecond precision; must not include a zone offset | diff --git a/site/docs/assets/stylesheets/extra.css b/site/docs/assets/stylesheets/extra.css index e8af4a6a7584..c2744e0d1d99 100644 --- a/site/docs/assets/stylesheets/extra.css +++ b/site/docs/assets/stylesheets/extra.css @@ -231,6 +231,10 @@ header .md-search__input::placeholder { font-family: "Source Sans Pro", sans-serif; } +.md-typeset table code { + white-space: nowrap; +} + .md-typeset h1 { color: var(--color-default-primary); font-size: 32px;