[SPARK-57526][SQL] Add the timestamp_nanos function to create nanosecond-precision timestamps from numeric nanoseconds#56616
Open
MaxGekk wants to merge 3 commits into
Open
[SPARK-57526][SQL] Add the timestamp_nanos function to create nanosecond-precision timestamps from numeric nanoseconds#56616MaxGekk wants to merge 3 commits into
timestamp_nanos function to create nanosecond-precision timestamps from numeric nanoseconds#56616MaxGekk wants to merge 3 commits into
Conversation
…econd-precision timestamps from numeric nanoseconds ### What changes were proposed in this pull request? This PR adds a new built-in function `timestamp_nanos(expr)` that interprets `expr` as the number of nanoseconds since `1970-01-01 00:00:00 UTC` and returns a nanosecond-precision `TIMESTAMP_LTZ(9)`. Concretely: - Adds a `NanosToTimestamp` expression in `datetimeExpressions.scala`. It declares a single `DECIMAL` input type with `ImplicitCastInputTypes`, so integral arguments are coerced to their natural decimal automatically while `DECIMAL` arguments are accepted as-is. - Maps the nanosecond count `N` to the internal `(epochMicros, nanosWithinMicro)` pair with floor semantics (`epochMicros = floorDiv(N, 1000)`, `nanosWithinMicro = floorMod(N, 1000)`, always in `[0, 999]`), computed via `BigInteger` in both the interpreted (`eval`) and codegen (`doGenCode`) paths. `longValueExact` throws `ArithmeticException` when the value is outside the representable timestamp range. - A `DECIMAL` input (rather than `BIGINT`) is required to reach the full `[0001, 9999]` calendar range: nanoseconds for year 9999 (~2.5e20) overflow a 64-bit `BIGINT`, the same reason the inverse `unix_nanos` returns `DECIMAL(21, 0)`. - Registers `timestamp_nanos` in `FunctionRegistry` and adds the Scala `functions.timestamp_nanos`. - Adds catalyst unit tests (interpreted + codegen, full-range and round-trip with `unix_nanos`, overflow), Scala/SQL end-to-end tests, and SQL golden-file coverage. Scope notes: the PySpark API (classic and Spark Connect Python) and R are out of scope here and tracked as follow-ups; `timestamp_nanos` is recorded in the PySpark function-parity allowlist in the meantime. The Scala Spark Connect client picks up `timestamp_nanos` automatically because `functions.scala` lives in the shared `sql/api` module. ### Why are the changes needed? Part of the [SPARK-56822](https://issues.apache.org/jira/browse/SPARK-56822) umbrella (timestamps with nanosecond precision). Spark has `timestamp_seconds` / `timestamp_millis` / `timestamp_micros` but no nanosecond counterpart, which is the natural inverse of `unix_nanos`. ### Does this PR introduce _any_ user-facing change? Yes. A new `timestamp_nanos(expr)` function is available in SQL and the Scala API (including the Scala Spark Connect client). It returns `TIMESTAMP_LTZ(9)`. This is a change only within the unreleased nanosecond-timestamp preview. Example: ```sql SELECT timestamp_nanos(1230219000123456789); -- 2008-12-25 07:30:00.123456789 ``` ### How was this patch tested? - `build/sbt 'catalyst/testOnly org.apache.spark.sql.catalyst.expressions.DateExpressionsSuite'` - `build/sbt 'sql/testOnly org.apache.spark.sql.TimestampNanosFunctionsAnsiOnSuite org.apache.spark.sql.TimestampNanosFunctionsAnsiOffSuite'` - `build/sbt 'sql/testOnly org.apache.spark.sql.expressions.ExpressionInfoSuite org.apache.spark.sql.ExpressionsSchemaSuite'` - `SPARK_GENERATE_GOLDEN_FILES=1 build/sbt 'sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z "nanos"'` - `./dev/scalastyle` ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor
… analysis `NanosToTimestamp` declared `inputTypes = Seq(DecimalType)` with `ImplicitCastInputTypes`, which silently coerced FLOAT/DOUBLE/STRING to DECIMAL(14,7)/(30,15)/(38,18). Those targets hold far fewer integer digits than a realistic nanosecond count, so a finite FLOAT/DOUBLE argument overflowed the coerced decimal and yielded NULL (ANSI off) or an overflow error (ANSI on) instead of a timestamp -- contrary to the documented "accepted and floored" behavior. Switch to `ExpectsInputTypes` with `Seq(TypeCollection(IntegralType, DecimalType))` so only integral and DECIMAL nanosecond counts are accepted; FLOAT/DOUBLE/STRING now fail at analysis with a clear DATATYPE_MISMATCH, matching the "count of time units" semantics of timestamp_micros/millis. The interpreted and codegen paths widen an integral argument to BigInteger directly and keep the DECIMAL floor path unchanged. Add catalyst coverage for the integral path and the FLOAT/DOUBLE/STRING rejection, a SQL rejection case, and regenerate the golden files. Co-authored-by: Isaac
…ow and add negative tests `NanosToTimestamp` let `BigInteger.longValueExact()` throw a raw `java.lang.ArithmeticException` when `epochMicros` overflows a 64-bit long. Surface it instead as a proper Spark error condition: add `QueryExecutionErrors.timestampNanosOverflowError`, which raises a `SparkArithmeticException` with the `DATETIME_OVERFLOW` condition (SQLSTATE 22008), and catch/rethrow in both the interpreted and codegen paths. Strengthen the negative coverage: the catalyst FLOAT/DOUBLE/STRING rejection now asserts the `UNEXPECTED_INPUT_TYPE` `DataTypeMismatch` (not just `isFailure`), the overflow test asserts the `DATETIME_OVERFLOW` condition via `checkErrorInExpression`, and a SQL golden case exercises the runtime overflow end-to-end. Regenerate the golden files. Co-authored-by: Isaac
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Adds a built-in
timestamp_nanos(expr)function. It readsexpras a count of nanoseconds since1970-01-01 00:00:00 UTCand returns a nanosecond-precisionTIMESTAMP_LTZ(9)— the natural inverse ofunix_nanos.The argument is an integral or
DECIMALcount.DECIMALis what lets it reach the whole[0001, 9999]calendar range, since year-9999 nanoseconds (~2.5e20) overflow a 64-bitBIGINT— the same reasonunix_nanosreturnsDECIMAL(21, 0).FLOAT/DOUBLE/STRINGare rejected at analysis (a fractional or string nanosecond count isn't meaningful), and a count outside the representable range fails with theDATETIME_OVERFLOWerror condition.Implementation: a new
NanosToTimestampexpression indatetimeExpressions.scala(interpreted + codegen), registered inFunctionRegistry, and exposed asfunctions.timestamp_nanosin the sharedsql/apimodule so the Scala Spark Connect client picks it up automatically. PySpark and R are out of scope and tracked as follow-ups;timestamp_nanosis on the PySpark function-parity allowlist meanwhile.Follow-up: the peer
timestamp_seconds/timestamp_millis/timestamp_microsstill throw a rawArithmeticExceptionon overflow; migrating them toDATETIME_OVERFLOWis tracked in SPARK-57577.Why are the changes needed?
Part of the SPARK-56822 umbrella (nanosecond-precision timestamps). Spark has
timestamp_seconds/timestamp_millis/timestamp_microsbut no nanosecond counterpart.Does this PR introduce any user-facing change?
Yes — a new
timestamp_nanos(expr)function in SQL and the Scala API (including the Scala Spark Connect client), returningTIMESTAMP_LTZ(9). This is a change only within the unreleased nanosecond-timestamp preview.How was this patch tested?
build/sbt 'catalyst/testOnly org.apache.spark.sql.catalyst.expressions.DateExpressionsSuite'build/sbt 'sql/testOnly org.apache.spark.sql.TimestampNanosFunctionsAnsiOnSuite org.apache.spark.sql.TimestampNanosFunctionsAnsiOffSuite'build/sbt 'sql/testOnly org.apache.spark.sql.expressions.ExpressionInfoSuite org.apache.spark.sql.ExpressionsSchemaSuite'SPARK_GENERATE_GOLDEN_FILES=1 build/sbt 'sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z "nanos"'./dev/scalastyleWas this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor