Skip to content

Incorrect timestamp comparison with mixed time units #22756

@fengys1996

Description

@fengys1996

Describe the bug

When comparing two Timestamp values with different TimeUnits, DataFusion coerces both sides to the coarser time unit. This may result in a loss of precision and lead to incorrect comparison semantics.

And other timestamp operations, such as arithmetic operations and type-unification contexts like UNION, also choose the coarser time unit.

To Reproduce

  1. compare ts with different time unit.
SELECT
    CAST(TIMESTAMP '2024-01-01 00:00:00' AS TIMESTAMP(0)) =
    CAST(TIMESTAMP '2024-01-01 00:00:00.123' AS TIMESTAMP(3)) AS eq;

Expected:

+---------------------------------------------------------------+
| Utf8("2024-01-01 00:00:00") = Utf8("2024-01-01 00:00:00.123") |
+---------------------------------------------------------------+
| false                                                         |
+---------------------------------------------------------------+

Actual:

+---------------------------------------------------------------+
| Utf8("2024-01-01 00:00:00") = Utf8("2024-01-01 00:00:00.123") |
+---------------------------------------------------------------+
| true                                                          |
+---------------------------------------------------------------+
  1. ts subtraction
SELECT
    CAST(TIMESTAMP '2024-01-01 00:00:00.123' AS TIMESTAMP(3)) -
    CAST(TIMESTAMP '2024-01-01 00:00:00' AS TIMESTAMP(0)) AS diff;

Expected:

+----------+
| diff     |
+----------+
| PT0.123S |
+----------+

Actual:

+------+
| diff |
+------+
| P0D  |
+------+
  1. ts union
SELECT CAST(TIMESTAMP '2024-01-01 00:00:00' AS TIMESTAMP(0))
    UNION ALL
    SELECT CAST(TIMESTAMP '2024-01-01 00:00:00.123' AS TIMESTAMP(3));

Expected:

+-----------------------------+
| Utf8("2024-01-01 00:00:00") |
+-----------------------------+
| 2024-01-01T00:00:00.123     |
| 2024-01-01T00:00:00         |
+-----------------------------+

Actual:

+-----------------------------+
| Utf8("2024-01-01 00:00:00") |
+-----------------------------+
| 2024-01-01T00:00:00         |
| 2024-01-01T00:00:00         |
+-----------------------------+

Expected behavior

No response

Additional context

I also ran a few similar queries in ClickHouse. It appears that ClickHouse
prefers the finer precision.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions